Re: [PATCH V4] powercap/drivers/idle_injection: Add an idle injection framework

2018-06-05 Thread Viresh Kumar
On 05-06-18, 08:53, Daniel Lezcano wrote:
> On 05/06/2018 07:53, Viresh Kumar wrote:
> > On 05-06-18, 07:48, Daniel Lezcano wrote:
> >> As soon as we reach complete(), no timer can be set because of the
> >> condition before.
> > 
> > Why not ? We aren't using any locks here and it is possible that 
> > run_duration_ms
> > is set to 0 from idle_injection_stop() only after the first thread has 
> > restarted
> > the hrtimer. Isn't it ?
> 
> If he restarted the timer, complete() won't be called and
> idle_injection_stop() will wait until the idle cycle injection is finished.

Obviously I missed that return statement within the conditional block :)

-- 
viresh


Re: [PATCH V4] powercap/drivers/idle_injection: Add an idle injection framework

2018-06-05 Thread Viresh Kumar
On 05-06-18, 08:53, Daniel Lezcano wrote:
> On 05/06/2018 07:53, Viresh Kumar wrote:
> > On 05-06-18, 07:48, Daniel Lezcano wrote:
> >> As soon as we reach complete(), no timer can be set because of the
> >> condition before.
> > 
> > Why not ? We aren't using any locks here and it is possible that 
> > run_duration_ms
> > is set to 0 from idle_injection_stop() only after the first thread has 
> > restarted
> > the hrtimer. Isn't it ?
> 
> If he restarted the timer, complete() won't be called and
> idle_injection_stop() will wait until the idle cycle injection is finished.

Obviously I missed that return statement within the conditional block :)

-- 
viresh


Re: [PATCH V4] powercap/drivers/idle_injection: Add an idle injection framework

2018-06-05 Thread Daniel Lezcano
On 05/06/2018 07:53, Viresh Kumar wrote:
> On 05-06-18, 07:48, Daniel Lezcano wrote:
>> As soon as we reach complete(), no timer can be set because of the
>> condition before.
> 
> Why not ? We aren't using any locks here and it is possible that 
> run_duration_ms
> is set to 0 from idle_injection_stop() only after the first thread has 
> restarted
> the hrtimer. Isn't it ?

If he restarted the timer, complete() won't be called and
idle_injection_stop() will wait until the idle cycle injection is finished.


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



Re: [PATCH V4] powercap/drivers/idle_injection: Add an idle injection framework

2018-06-05 Thread Daniel Lezcano
On 05/06/2018 07:53, Viresh Kumar wrote:
> On 05-06-18, 07:48, Daniel Lezcano wrote:
>> As soon as we reach complete(), no timer can be set because of the
>> condition before.
> 
> Why not ? We aren't using any locks here and it is possible that 
> run_duration_ms
> is set to 0 from idle_injection_stop() only after the first thread has 
> restarted
> the hrtimer. Isn't it ?

If he restarted the timer, complete() won't be called and
idle_injection_stop() will wait until the idle cycle injection is finished.


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



Re: [PATCH V4] powercap/drivers/idle_injection: Add an idle injection framework

2018-06-04 Thread Viresh Kumar
On 05-06-18, 07:48, Daniel Lezcano wrote:
> As soon as we reach complete(), no timer can be set because of the
> condition before.

Why not ? We aren't using any locks here and it is possible that run_duration_ms
is set to 0 from idle_injection_stop() only after the first thread has restarted
the hrtimer. Isn't it ?

-- 
viresh


Re: [PATCH V4] powercap/drivers/idle_injection: Add an idle injection framework

2018-06-04 Thread Viresh Kumar
On 05-06-18, 07:48, Daniel Lezcano wrote:
> As soon as we reach complete(), no timer can be set because of the
> condition before.

Why not ? We aren't using any locks here and it is possible that run_duration_ms
is set to 0 from idle_injection_stop() only after the first thread has restarted
the hrtimer. Isn't it ?

-- 
viresh


Re: [PATCH V4] powercap/drivers/idle_injection: Add an idle injection framework

2018-06-04 Thread Daniel Lezcano
On 05/06/2018 07:14, Viresh Kumar wrote:
> On 31-05-18, 20:25, Daniel Lezcano wrote:
>> On 29/05/2018 11:31, Viresh Kumar wrote:
>>> On 25-05-18, 11:49, Daniel Lezcano wrote:
 +  /* + * The last CPU waking up is in charge of setting the
 timer. If + * the CPU is hotplugged, the timer will move to
 another CPU +   * (which may not belong to the same cluster) but
 that is not a + * problem as the timer will be set again by
 another CPU +   * belonging to the cluster. This mechanism is
 self adaptive. +*/
>>> 
>>> I am afraid that the above comment may not be completely true all
>>> the time. For a quad-core platform, it is possible for 3 CPUs
>>> (0,1,2) to run this function as soon as the kthread is woken up,
>>> but one of the CPUs (3) may be stuck servicing an IRQ, Deadline
>>> or RT activity. Because you do atomic_inc() also in this function
>>> (above) itself, below decrement may return a true value for the
>>> CPU2 and that will restart the hrtimer, while one of the CPUs
>>> never got a chance to increment count in the first place.
>>> 
>>> The fix is simple though, do the increment in
>>> idle_injection_wakeup() and things should be fine then.
>> 
>> Ok.
>> 
 +  if (!atomic_dec_and_test(_dev->count)) + return; + +
 run_duration_ms = atomic_read(_dev->run_duration_ms); + if
 (run_duration_ms) { +  hrtimer_start(_dev->timer,
 ms_to_ktime(run_duration_ms), +
 HRTIMER_MODE_REL_PINNED); +return; +   } + +
 complete(_dev->stop_complete);
>>> 
>>> So you call complete() after hrtimer is potentially restarted.
>>> This can happen if idle_injection_stop() is called right after
>>> the above atomic_read() has finished :)
>>> 
>>> IOW, this doesn't look safe now as well.
>> 
>> It is safe, we just missed a cycle and the stop will block until
>> the next cycle. I did it on purpose and for me it is correct.
> 
> Okay, what about this then:
> 
> Path A Path B
> 
> idle_injection_fn()
> idle_injection_unregister() hrtimer_start()
> idle_injection_stop()
> 
> complete()
> 
> wait_for_completion() kfree(ii_dev);
> 
> Hrtimer is still used here after getting freed.
> 
> Is this not possible ?

As soon as we reach complete(), no timer can be set because of the
condition before.

 +struct idle_injection_device *idle_injection_register(struct
 cpumask *cpumask) +{ + struct idle_injection_device *ii_dev; +
 int cpu, cpu2; + + ii_dev = ii_dev_alloc(); +  if (!ii_dev) +
 return NULL; + +   cpumask_copy(ii_dev->cpumask, cpumask); +
 hrtimer_init(_dev->timer, CLOCK_MONOTONIC,
 HRTIMER_MODE_REL); +   ii_dev->timer.function =
 idle_injection_wakeup_fn; + +  for_each_cpu(cpu,
 ii_dev->cpumask) { + + if (per_cpu(idle_injection_device,
 cpu)) {
>>> 
>>> Maybe add unlikely here ?
>> 
>> For this kind of init function and tight loop, there is no benefit
>> of adding the unlikely / likely. I can add it if you want, but it
>> is useless.
> 
> Okay.
> 
 +  pr_err("cpu%d is already registered\n", cpu); + 
 goto
 out_rollback_per_cpu; +} + +   
 per_cpu(idle_injection_device,
 cpu) = ii_dev; +   } + +   return ii_dev; + +out_rollback_per_cpu: 
 +  for_each_cpu(cpu2, ii_dev->cpumask) { + if (cpu == cpu2)
>>> 
>>> And is this really required? Perhaps this conditional is making
>>> it worse and just setting the per-cpu variable forcefully to NULL
>>> would be much faster :)
>> 
>> We want undo what was done, setting all variables to NULL resets
>> the values from a previous register and we don't want that.
> 
> Why will that happen ? We are only iterating over a particular
> cpumask and not all possible CPUs. Yes I understand the "undo only
> what we did" part, but the conditional is more expensive than that I
> feel.

Ok, I will remove it. In any case, there is no point of having the
function called twice. If that happens, there is something wrong with
the caller.


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



Re: [PATCH V4] powercap/drivers/idle_injection: Add an idle injection framework

2018-06-04 Thread Daniel Lezcano
On 05/06/2018 07:14, Viresh Kumar wrote:
> On 31-05-18, 20:25, Daniel Lezcano wrote:
>> On 29/05/2018 11:31, Viresh Kumar wrote:
>>> On 25-05-18, 11:49, Daniel Lezcano wrote:
 +  /* + * The last CPU waking up is in charge of setting the
 timer. If + * the CPU is hotplugged, the timer will move to
 another CPU +   * (which may not belong to the same cluster) but
 that is not a + * problem as the timer will be set again by
 another CPU +   * belonging to the cluster. This mechanism is
 self adaptive. +*/
>>> 
>>> I am afraid that the above comment may not be completely true all
>>> the time. For a quad-core platform, it is possible for 3 CPUs
>>> (0,1,2) to run this function as soon as the kthread is woken up,
>>> but one of the CPUs (3) may be stuck servicing an IRQ, Deadline
>>> or RT activity. Because you do atomic_inc() also in this function
>>> (above) itself, below decrement may return a true value for the
>>> CPU2 and that will restart the hrtimer, while one of the CPUs
>>> never got a chance to increment count in the first place.
>>> 
>>> The fix is simple though, do the increment in
>>> idle_injection_wakeup() and things should be fine then.
>> 
>> Ok.
>> 
 +  if (!atomic_dec_and_test(_dev->count)) + return; + +
 run_duration_ms = atomic_read(_dev->run_duration_ms); + if
 (run_duration_ms) { +  hrtimer_start(_dev->timer,
 ms_to_ktime(run_duration_ms), +
 HRTIMER_MODE_REL_PINNED); +return; +   } + +
 complete(_dev->stop_complete);
>>> 
>>> So you call complete() after hrtimer is potentially restarted.
>>> This can happen if idle_injection_stop() is called right after
>>> the above atomic_read() has finished :)
>>> 
>>> IOW, this doesn't look safe now as well.
>> 
>> It is safe, we just missed a cycle and the stop will block until
>> the next cycle. I did it on purpose and for me it is correct.
> 
> Okay, what about this then:
> 
> Path A Path B
> 
> idle_injection_fn()
> idle_injection_unregister() hrtimer_start()
> idle_injection_stop()
> 
> complete()
> 
> wait_for_completion() kfree(ii_dev);
> 
> Hrtimer is still used here after getting freed.
> 
> Is this not possible ?

As soon as we reach complete(), no timer can be set because of the
condition before.

 +struct idle_injection_device *idle_injection_register(struct
 cpumask *cpumask) +{ + struct idle_injection_device *ii_dev; +
 int cpu, cpu2; + + ii_dev = ii_dev_alloc(); +  if (!ii_dev) +
 return NULL; + +   cpumask_copy(ii_dev->cpumask, cpumask); +
 hrtimer_init(_dev->timer, CLOCK_MONOTONIC,
 HRTIMER_MODE_REL); +   ii_dev->timer.function =
 idle_injection_wakeup_fn; + +  for_each_cpu(cpu,
 ii_dev->cpumask) { + + if (per_cpu(idle_injection_device,
 cpu)) {
>>> 
>>> Maybe add unlikely here ?
>> 
>> For this kind of init function and tight loop, there is no benefit
>> of adding the unlikely / likely. I can add it if you want, but it
>> is useless.
> 
> Okay.
> 
 +  pr_err("cpu%d is already registered\n", cpu); + 
 goto
 out_rollback_per_cpu; +} + +   
 per_cpu(idle_injection_device,
 cpu) = ii_dev; +   } + +   return ii_dev; + +out_rollback_per_cpu: 
 +  for_each_cpu(cpu2, ii_dev->cpumask) { + if (cpu == cpu2)
>>> 
>>> And is this really required? Perhaps this conditional is making
>>> it worse and just setting the per-cpu variable forcefully to NULL
>>> would be much faster :)
>> 
>> We want undo what was done, setting all variables to NULL resets
>> the values from a previous register and we don't want that.
> 
> Why will that happen ? We are only iterating over a particular
> cpumask and not all possible CPUs. Yes I understand the "undo only
> what we did" part, but the conditional is more expensive than that I
> feel.

Ok, I will remove it. In any case, there is no point of having the
function called twice. If that happens, there is something wrong with
the caller.


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



Re: [PATCH V4] powercap/drivers/idle_injection: Add an idle injection framework

2018-06-04 Thread Viresh Kumar
On 31-05-18, 20:25, Daniel Lezcano wrote:
> On 29/05/2018 11:31, Viresh Kumar wrote:
> > On 25-05-18, 11:49, Daniel Lezcano wrote:
> >> +  /*
> >> +   * The last CPU waking up is in charge of setting the timer. If
> >> +   * the CPU is hotplugged, the timer will move to another CPU
> >> +   * (which may not belong to the same cluster) but that is not a
> >> +   * problem as the timer will be set again by another CPU
> >> +   * belonging to the cluster. This mechanism is self adaptive.
> >> +   */
> > 
> > I am afraid that the above comment may not be completely true all the
> > time. For a quad-core platform, it is possible for 3 CPUs (0,1,2) to
> > run this function as soon as the kthread is woken up, but one of the
> > CPUs (3) may be stuck servicing an IRQ, Deadline or RT activity.
> > Because you do atomic_inc() also in this function (above) itself,
> > below decrement may return a true value for the CPU2 and that will
> > restart the hrtimer, while one of the CPUs never got a chance to
> > increment count in the first place.
> >
> > The fix is simple though, do the increment in idle_injection_wakeup()
> > and things should be fine then.
> 
> Ok.
> 
> >> +  if (!atomic_dec_and_test(_dev->count))
> >> +  return;
> >> +
> >> +  run_duration_ms = atomic_read(_dev->run_duration_ms);
> >> +  if (run_duration_ms) {
> >> +  hrtimer_start(_dev->timer, ms_to_ktime(run_duration_ms),
> >> +HRTIMER_MODE_REL_PINNED);
> >> +  return;
> >> +  }
> >> +
> >> +  complete(_dev->stop_complete);
> > 
> > So you call complete() after hrtimer is potentially restarted. This
> > can happen if idle_injection_stop() is called right after the above
> > atomic_read() has finished :)
> > 
> > IOW, this doesn't look safe now as well.
> 
> It is safe, we just missed a cycle and the stop will block until the
> next cycle. I did it on purpose and for me it is correct.

Okay, what about this then:

Path A Path B

idle_injection_fn()idle_injection_unregister()
  hrtimer_start()idle_injection_stop()

  complete()

   wait_for_completion()
 kfree(ii_dev);
  
  Hrtimer is still used here after
  getting freed.


Is this not possible ?

> >> +struct idle_injection_device *idle_injection_register(struct cpumask 
> >> *cpumask)
> >> +{
> >> +  struct idle_injection_device *ii_dev;
> >> +  int cpu, cpu2;
> >> +
> >> +  ii_dev = ii_dev_alloc();
> >> +  if (!ii_dev)
> >> +  return NULL;
> >> +
> >> +  cpumask_copy(ii_dev->cpumask, cpumask);
> >> +  hrtimer_init(_dev->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> >> +  ii_dev->timer.function = idle_injection_wakeup_fn;
> >> +
> >> +  for_each_cpu(cpu, ii_dev->cpumask) {
> >> +
> >> +  if (per_cpu(idle_injection_device, cpu)) {
> > 
> > Maybe add unlikely here ?
> 
> For this kind of init function and tight loop, there is no benefit of
> adding the unlikely / likely. I can add it if you want, but it is useless.

Okay.

> >> +  pr_err("cpu%d is already registered\n", cpu);
> >> +  goto out_rollback_per_cpu;
> >> +  }
> >> +
> >> +  per_cpu(idle_injection_device, cpu) = ii_dev;
> >> +  }
> >> +
> >> +  return ii_dev;
> >> +
> >> +out_rollback_per_cpu:
> >> +  for_each_cpu(cpu2, ii_dev->cpumask) {
> >> +  if (cpu == cpu2)
> > 
> > And is this really required? Perhaps this conditional is making it
> > worse and just setting the per-cpu variable forcefully to NULL would
> > be much faster :)
> 
> We want undo what was done, setting all variables to NULL resets the
> values from a previous register and we don't want that.

Why will that happen ? We are only iterating over a particular cpumask and not
all possible CPUs. Yes I understand the "undo only what we did" part, but the
conditional is more expensive than that I feel.

-- 
viresh


Re: [PATCH V4] powercap/drivers/idle_injection: Add an idle injection framework

2018-06-04 Thread Viresh Kumar
On 31-05-18, 20:25, Daniel Lezcano wrote:
> On 29/05/2018 11:31, Viresh Kumar wrote:
> > On 25-05-18, 11:49, Daniel Lezcano wrote:
> >> +  /*
> >> +   * The last CPU waking up is in charge of setting the timer. If
> >> +   * the CPU is hotplugged, the timer will move to another CPU
> >> +   * (which may not belong to the same cluster) but that is not a
> >> +   * problem as the timer will be set again by another CPU
> >> +   * belonging to the cluster. This mechanism is self adaptive.
> >> +   */
> > 
> > I am afraid that the above comment may not be completely true all the
> > time. For a quad-core platform, it is possible for 3 CPUs (0,1,2) to
> > run this function as soon as the kthread is woken up, but one of the
> > CPUs (3) may be stuck servicing an IRQ, Deadline or RT activity.
> > Because you do atomic_inc() also in this function (above) itself,
> > below decrement may return a true value for the CPU2 and that will
> > restart the hrtimer, while one of the CPUs never got a chance to
> > increment count in the first place.
> >
> > The fix is simple though, do the increment in idle_injection_wakeup()
> > and things should be fine then.
> 
> Ok.
> 
> >> +  if (!atomic_dec_and_test(_dev->count))
> >> +  return;
> >> +
> >> +  run_duration_ms = atomic_read(_dev->run_duration_ms);
> >> +  if (run_duration_ms) {
> >> +  hrtimer_start(_dev->timer, ms_to_ktime(run_duration_ms),
> >> +HRTIMER_MODE_REL_PINNED);
> >> +  return;
> >> +  }
> >> +
> >> +  complete(_dev->stop_complete);
> > 
> > So you call complete() after hrtimer is potentially restarted. This
> > can happen if idle_injection_stop() is called right after the above
> > atomic_read() has finished :)
> > 
> > IOW, this doesn't look safe now as well.
> 
> It is safe, we just missed a cycle and the stop will block until the
> next cycle. I did it on purpose and for me it is correct.

Okay, what about this then:

Path A Path B

idle_injection_fn()idle_injection_unregister()
  hrtimer_start()idle_injection_stop()

  complete()

   wait_for_completion()
 kfree(ii_dev);
  
  Hrtimer is still used here after
  getting freed.


Is this not possible ?

> >> +struct idle_injection_device *idle_injection_register(struct cpumask 
> >> *cpumask)
> >> +{
> >> +  struct idle_injection_device *ii_dev;
> >> +  int cpu, cpu2;
> >> +
> >> +  ii_dev = ii_dev_alloc();
> >> +  if (!ii_dev)
> >> +  return NULL;
> >> +
> >> +  cpumask_copy(ii_dev->cpumask, cpumask);
> >> +  hrtimer_init(_dev->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> >> +  ii_dev->timer.function = idle_injection_wakeup_fn;
> >> +
> >> +  for_each_cpu(cpu, ii_dev->cpumask) {
> >> +
> >> +  if (per_cpu(idle_injection_device, cpu)) {
> > 
> > Maybe add unlikely here ?
> 
> For this kind of init function and tight loop, there is no benefit of
> adding the unlikely / likely. I can add it if you want, but it is useless.

Okay.

> >> +  pr_err("cpu%d is already registered\n", cpu);
> >> +  goto out_rollback_per_cpu;
> >> +  }
> >> +
> >> +  per_cpu(idle_injection_device, cpu) = ii_dev;
> >> +  }
> >> +
> >> +  return ii_dev;
> >> +
> >> +out_rollback_per_cpu:
> >> +  for_each_cpu(cpu2, ii_dev->cpumask) {
> >> +  if (cpu == cpu2)
> > 
> > And is this really required? Perhaps this conditional is making it
> > worse and just setting the per-cpu variable forcefully to NULL would
> > be much faster :)
> 
> We want undo what was done, setting all variables to NULL resets the
> values from a previous register and we don't want that.

Why will that happen ? We are only iterating over a particular cpumask and not
all possible CPUs. Yes I understand the "undo only what we did" part, but the
conditional is more expensive than that I feel.

-- 
viresh


Re: [PATCH V4] powercap/drivers/idle_injection: Add an idle injection framework

2018-05-31 Thread Daniel Lezcano
On 29/05/2018 11:31, Viresh Kumar wrote:
> Hi Daniel,
> 
> Thanks for yet another version :)
> 
> On 25-05-18, 11:49, Daniel Lezcano wrote:
>> +++ b/drivers/powercap/idle_injection.c
>> +static void idle_injection_wakeup(struct idle_injection_device *ii_dev)
>> +{
>> +struct idle_injection_thread *iit;
>> +int cpu;
>> +
>> +for_each_cpu_and(cpu, ii_dev->cpumask, cpu_online_mask) {
>> +iit = per_cpu_ptr(_injection_thread, cpu);
>> +iit->should_run = 1;
>> +wake_up_process(iit->tsk);
>> +}
>> +}
>> +
>> +/**
>> + * idle_injection_wakeup_fn - idle injection timer callback
>> + * @timer: a hrtimer structure
>> + *
>> + * This function is called when the idle injection timer expires which
>> + * will wake up the idle injection tasks and these ones, in turn, play
>> + * idle a specified amount of time.
>> + *
>> + * Return: HRTIMER_NORESTART.
>> + */
>> +static enum hrtimer_restart idle_injection_wakeup_fn(struct hrtimer *timer)
>> +{
>> +struct idle_injection_device *ii_dev =
>> +container_of(timer, struct idle_injection_device, timer);
>> +
>> +idle_injection_wakeup(ii_dev);
>> +
>> +return HRTIMER_NORESTART;
>> +}
>> +
>> +/**
>> + * idle_injection_fn - idle injection routine
>> + * @cpu: the CPU number the tasks belongs to
>> + *
>> + * The idle injection routine will stay idle the specified amount of
>> + * time
>> + */
>> +static void idle_injection_fn(unsigned int cpu)
>> +{
>> +struct idle_injection_device *ii_dev;
>> +struct idle_injection_thread *iit;
>> +int run_duration_ms, idle_duration_ms;
>> +
>> +ii_dev = per_cpu(idle_injection_device, cpu);
>> +
>> +if (WARN_ON_ONCE(!ii_dev))
> 
> Yes, this is marked as "unlikely" and is kind of not that harmful, but
> I would suggest to just drop it and let the kernel crash if that
> serious of a bug is present in this code where ii_dev can be NULL
> here.

Ok.

>> +return;
>> +
>> +iit = per_cpu_ptr(_injection_thread, cpu);
> 
> See, we don't check this one, why check only ii_dev ? :)
> 
>> +
>> +/*
>> + * Boolean used by the smpboot main loop and used as a
>> + * flip-flop in this function
>> + */
>> +iit->should_run = 0;
>> +
>> +atomic_inc(_dev->count);
>> +
>> +idle_duration_ms = atomic_read(_dev->idle_duration_ms);
>> +if (idle_duration_ms)
>> +play_idle(idle_duration_ms);
>> +
>> +/*
>> + * The last CPU waking up is in charge of setting the timer. If
>> + * the CPU is hotplugged, the timer will move to another CPU
>> + * (which may not belong to the same cluster) but that is not a
>> + * problem as the timer will be set again by another CPU
>> + * belonging to the cluster. This mechanism is self adaptive.
>> + */
> 
> I am afraid that the above comment may not be completely true all the
> time. For a quad-core platform, it is possible for 3 CPUs (0,1,2) to
> run this function as soon as the kthread is woken up, but one of the
> CPUs (3) may be stuck servicing an IRQ, Deadline or RT activity.
> Because you do atomic_inc() also in this function (above) itself,
> below decrement may return a true value for the CPU2 and that will
> restart the hrtimer, while one of the CPUs never got a chance to
> increment count in the first place.
>
> The fix is simple though, do the increment in idle_injection_wakeup()
> and things should be fine then.

Ok.

>> +if (!atomic_dec_and_test(_dev->count))
>> +return;
>> +
>> +run_duration_ms = atomic_read(_dev->run_duration_ms);
>> +if (run_duration_ms) {
>> +hrtimer_start(_dev->timer, ms_to_ktime(run_duration_ms),
>> +  HRTIMER_MODE_REL_PINNED);
>> +return;
>> +}
>> +
>> +complete(_dev->stop_complete);
> 
> So you call complete() after hrtimer is potentially restarted. This
> can happen if idle_injection_stop() is called right after the above
> atomic_read() has finished :)
> 
> IOW, this doesn't look safe now as well.

It is safe, we just missed a cycle and the stop will block until the
next cycle. I did it on purpose and for me it is correct.

>> +}
> 
>> +/**
>> + * idle_injection_stop - stops the idle injections
>> + * @ii_dev: a pointer to an idle injection_device structure
>> + *
>> + * The function stops the idle injection by resetting the idle and
>> + * running durations and wait for the threads to complete. If we are
>> + * in the process of injecting an idle cycle, then this will wait the
>> + * end of the cycle.
>> + */
>> +void idle_injection_stop(struct idle_injection_device *ii_dev)
>> +{
>> +pr_debug("Stopping injecting idle cycles on CPUs '%*pbl'\n",
>> + cpumask_pr_args(ii_dev->cpumask));
>> +
>> +init_completion(_dev->stop_complete);
> 
> This looks completely Borken (yeah, broken :)). complete() may be
> running in parallel under spinlock and updating x->done while you just
> set it to 0 here without any locking in place. 

Re: [PATCH V4] powercap/drivers/idle_injection: Add an idle injection framework

2018-05-31 Thread Daniel Lezcano
On 29/05/2018 11:31, Viresh Kumar wrote:
> Hi Daniel,
> 
> Thanks for yet another version :)
> 
> On 25-05-18, 11:49, Daniel Lezcano wrote:
>> +++ b/drivers/powercap/idle_injection.c
>> +static void idle_injection_wakeup(struct idle_injection_device *ii_dev)
>> +{
>> +struct idle_injection_thread *iit;
>> +int cpu;
>> +
>> +for_each_cpu_and(cpu, ii_dev->cpumask, cpu_online_mask) {
>> +iit = per_cpu_ptr(_injection_thread, cpu);
>> +iit->should_run = 1;
>> +wake_up_process(iit->tsk);
>> +}
>> +}
>> +
>> +/**
>> + * idle_injection_wakeup_fn - idle injection timer callback
>> + * @timer: a hrtimer structure
>> + *
>> + * This function is called when the idle injection timer expires which
>> + * will wake up the idle injection tasks and these ones, in turn, play
>> + * idle a specified amount of time.
>> + *
>> + * Return: HRTIMER_NORESTART.
>> + */
>> +static enum hrtimer_restart idle_injection_wakeup_fn(struct hrtimer *timer)
>> +{
>> +struct idle_injection_device *ii_dev =
>> +container_of(timer, struct idle_injection_device, timer);
>> +
>> +idle_injection_wakeup(ii_dev);
>> +
>> +return HRTIMER_NORESTART;
>> +}
>> +
>> +/**
>> + * idle_injection_fn - idle injection routine
>> + * @cpu: the CPU number the tasks belongs to
>> + *
>> + * The idle injection routine will stay idle the specified amount of
>> + * time
>> + */
>> +static void idle_injection_fn(unsigned int cpu)
>> +{
>> +struct idle_injection_device *ii_dev;
>> +struct idle_injection_thread *iit;
>> +int run_duration_ms, idle_duration_ms;
>> +
>> +ii_dev = per_cpu(idle_injection_device, cpu);
>> +
>> +if (WARN_ON_ONCE(!ii_dev))
> 
> Yes, this is marked as "unlikely" and is kind of not that harmful, but
> I would suggest to just drop it and let the kernel crash if that
> serious of a bug is present in this code where ii_dev can be NULL
> here.

Ok.

>> +return;
>> +
>> +iit = per_cpu_ptr(_injection_thread, cpu);
> 
> See, we don't check this one, why check only ii_dev ? :)
> 
>> +
>> +/*
>> + * Boolean used by the smpboot main loop and used as a
>> + * flip-flop in this function
>> + */
>> +iit->should_run = 0;
>> +
>> +atomic_inc(_dev->count);
>> +
>> +idle_duration_ms = atomic_read(_dev->idle_duration_ms);
>> +if (idle_duration_ms)
>> +play_idle(idle_duration_ms);
>> +
>> +/*
>> + * The last CPU waking up is in charge of setting the timer. If
>> + * the CPU is hotplugged, the timer will move to another CPU
>> + * (which may not belong to the same cluster) but that is not a
>> + * problem as the timer will be set again by another CPU
>> + * belonging to the cluster. This mechanism is self adaptive.
>> + */
> 
> I am afraid that the above comment may not be completely true all the
> time. For a quad-core platform, it is possible for 3 CPUs (0,1,2) to
> run this function as soon as the kthread is woken up, but one of the
> CPUs (3) may be stuck servicing an IRQ, Deadline or RT activity.
> Because you do atomic_inc() also in this function (above) itself,
> below decrement may return a true value for the CPU2 and that will
> restart the hrtimer, while one of the CPUs never got a chance to
> increment count in the first place.
>
> The fix is simple though, do the increment in idle_injection_wakeup()
> and things should be fine then.

Ok.

>> +if (!atomic_dec_and_test(_dev->count))
>> +return;
>> +
>> +run_duration_ms = atomic_read(_dev->run_duration_ms);
>> +if (run_duration_ms) {
>> +hrtimer_start(_dev->timer, ms_to_ktime(run_duration_ms),
>> +  HRTIMER_MODE_REL_PINNED);
>> +return;
>> +}
>> +
>> +complete(_dev->stop_complete);
> 
> So you call complete() after hrtimer is potentially restarted. This
> can happen if idle_injection_stop() is called right after the above
> atomic_read() has finished :)
> 
> IOW, this doesn't look safe now as well.

It is safe, we just missed a cycle and the stop will block until the
next cycle. I did it on purpose and for me it is correct.

>> +}
> 
>> +/**
>> + * idle_injection_stop - stops the idle injections
>> + * @ii_dev: a pointer to an idle injection_device structure
>> + *
>> + * The function stops the idle injection by resetting the idle and
>> + * running durations and wait for the threads to complete. If we are
>> + * in the process of injecting an idle cycle, then this will wait the
>> + * end of the cycle.
>> + */
>> +void idle_injection_stop(struct idle_injection_device *ii_dev)
>> +{
>> +pr_debug("Stopping injecting idle cycles on CPUs '%*pbl'\n",
>> + cpumask_pr_args(ii_dev->cpumask));
>> +
>> +init_completion(_dev->stop_complete);
> 
> This looks completely Borken (yeah, broken :)). complete() may be
> running in parallel under spinlock and updating x->done while you just
> set it to 0 here without any locking in place. 

Re: [PATCH V4] powercap/drivers/idle_injection: Add an idle injection framework

2018-05-29 Thread Viresh Kumar
Hi Daniel,

Thanks for yet another version :)

On 25-05-18, 11:49, Daniel Lezcano wrote:
> +++ b/drivers/powercap/idle_injection.c
> +static void idle_injection_wakeup(struct idle_injection_device *ii_dev)
> +{
> + struct idle_injection_thread *iit;
> + int cpu;
> +
> + for_each_cpu_and(cpu, ii_dev->cpumask, cpu_online_mask) {
> + iit = per_cpu_ptr(_injection_thread, cpu);
> + iit->should_run = 1;
> + wake_up_process(iit->tsk);
> + }
> +}
> +
> +/**
> + * idle_injection_wakeup_fn - idle injection timer callback
> + * @timer: a hrtimer structure
> + *
> + * This function is called when the idle injection timer expires which
> + * will wake up the idle injection tasks and these ones, in turn, play
> + * idle a specified amount of time.
> + *
> + * Return: HRTIMER_NORESTART.
> + */
> +static enum hrtimer_restart idle_injection_wakeup_fn(struct hrtimer *timer)
> +{
> + struct idle_injection_device *ii_dev =
> + container_of(timer, struct idle_injection_device, timer);
> +
> + idle_injection_wakeup(ii_dev);
> +
> + return HRTIMER_NORESTART;
> +}
> +
> +/**
> + * idle_injection_fn - idle injection routine
> + * @cpu: the CPU number the tasks belongs to
> + *
> + * The idle injection routine will stay idle the specified amount of
> + * time
> + */
> +static void idle_injection_fn(unsigned int cpu)
> +{
> + struct idle_injection_device *ii_dev;
> + struct idle_injection_thread *iit;
> + int run_duration_ms, idle_duration_ms;
> +
> + ii_dev = per_cpu(idle_injection_device, cpu);
> +
> + if (WARN_ON_ONCE(!ii_dev))

Yes, this is marked as "unlikely" and is kind of not that harmful, but
I would suggest to just drop it and let the kernel crash if that
serious of a bug is present in this code where ii_dev can be NULL
here.

> + return;
> +
> + iit = per_cpu_ptr(_injection_thread, cpu);

See, we don't check this one, why check only ii_dev ? :)

> +
> + /*
> +  * Boolean used by the smpboot main loop and used as a
> +  * flip-flop in this function
> +  */
> + iit->should_run = 0;
> +
> + atomic_inc(_dev->count);
> +
> + idle_duration_ms = atomic_read(_dev->idle_duration_ms);
> + if (idle_duration_ms)
> + play_idle(idle_duration_ms);
> +
> + /*
> +  * The last CPU waking up is in charge of setting the timer. If
> +  * the CPU is hotplugged, the timer will move to another CPU
> +  * (which may not belong to the same cluster) but that is not a
> +  * problem as the timer will be set again by another CPU
> +  * belonging to the cluster. This mechanism is self adaptive.
> +  */

I am afraid that the above comment may not be completely true all the
time. For a quad-core platform, it is possible for 3 CPUs (0,1,2) to
run this function as soon as the kthread is woken up, but one of the
CPUs (3) may be stuck servicing an IRQ, Deadline or RT activity.
Because you do atomic_inc() also in this function (above) itself,
below decrement may return a true value for the CPU2 and that will
restart the hrtimer, while one of the CPUs never got a chance to
increment count in the first place.

The fix is simple though, do the increment in idle_injection_wakeup()
and things should be fine then.

> + if (!atomic_dec_and_test(_dev->count))
> + return;
> +
> + run_duration_ms = atomic_read(_dev->run_duration_ms);
> + if (run_duration_ms) {
> + hrtimer_start(_dev->timer, ms_to_ktime(run_duration_ms),
> +   HRTIMER_MODE_REL_PINNED);
> + return;
> + }
> +
> + complete(_dev->stop_complete);

So you call complete() after hrtimer is potentially restarted. This
can happen if idle_injection_stop() is called right after the above
atomic_read() has finished :)

IOW, this doesn't look safe now as well.

> +}

> +/**
> + * idle_injection_stop - stops the idle injections
> + * @ii_dev: a pointer to an idle injection_device structure
> + *
> + * The function stops the idle injection by resetting the idle and
> + * running durations and wait for the threads to complete. If we are
> + * in the process of injecting an idle cycle, then this will wait the
> + * end of the cycle.
> + */
> +void idle_injection_stop(struct idle_injection_device *ii_dev)
> +{
> + pr_debug("Stopping injecting idle cycles on CPUs '%*pbl'\n",
> +  cpumask_pr_args(ii_dev->cpumask));
> +
> + init_completion(_dev->stop_complete);

This looks completely Borken (yeah, broken :)). complete() may be
running in parallel under spinlock and updating x->done while you just
set it to 0 here without any locking in place. init_completion()
should be used only once after ii_dev is allocated and I don't see
that being done either, so that looks incorrect as well.

> +
> + idle_injection_set_duration(ii_dev, 0, 0);
> +
> + wait_for_completion_interruptible(_dev->stop_complete);
> +}
> +
> +/**
> + * 

Re: [PATCH V4] powercap/drivers/idle_injection: Add an idle injection framework

2018-05-29 Thread Viresh Kumar
Hi Daniel,

Thanks for yet another version :)

On 25-05-18, 11:49, Daniel Lezcano wrote:
> +++ b/drivers/powercap/idle_injection.c
> +static void idle_injection_wakeup(struct idle_injection_device *ii_dev)
> +{
> + struct idle_injection_thread *iit;
> + int cpu;
> +
> + for_each_cpu_and(cpu, ii_dev->cpumask, cpu_online_mask) {
> + iit = per_cpu_ptr(_injection_thread, cpu);
> + iit->should_run = 1;
> + wake_up_process(iit->tsk);
> + }
> +}
> +
> +/**
> + * idle_injection_wakeup_fn - idle injection timer callback
> + * @timer: a hrtimer structure
> + *
> + * This function is called when the idle injection timer expires which
> + * will wake up the idle injection tasks and these ones, in turn, play
> + * idle a specified amount of time.
> + *
> + * Return: HRTIMER_NORESTART.
> + */
> +static enum hrtimer_restart idle_injection_wakeup_fn(struct hrtimer *timer)
> +{
> + struct idle_injection_device *ii_dev =
> + container_of(timer, struct idle_injection_device, timer);
> +
> + idle_injection_wakeup(ii_dev);
> +
> + return HRTIMER_NORESTART;
> +}
> +
> +/**
> + * idle_injection_fn - idle injection routine
> + * @cpu: the CPU number the tasks belongs to
> + *
> + * The idle injection routine will stay idle the specified amount of
> + * time
> + */
> +static void idle_injection_fn(unsigned int cpu)
> +{
> + struct idle_injection_device *ii_dev;
> + struct idle_injection_thread *iit;
> + int run_duration_ms, idle_duration_ms;
> +
> + ii_dev = per_cpu(idle_injection_device, cpu);
> +
> + if (WARN_ON_ONCE(!ii_dev))

Yes, this is marked as "unlikely" and is kind of not that harmful, but
I would suggest to just drop it and let the kernel crash if that
serious of a bug is present in this code where ii_dev can be NULL
here.

> + return;
> +
> + iit = per_cpu_ptr(_injection_thread, cpu);

See, we don't check this one, why check only ii_dev ? :)

> +
> + /*
> +  * Boolean used by the smpboot main loop and used as a
> +  * flip-flop in this function
> +  */
> + iit->should_run = 0;
> +
> + atomic_inc(_dev->count);
> +
> + idle_duration_ms = atomic_read(_dev->idle_duration_ms);
> + if (idle_duration_ms)
> + play_idle(idle_duration_ms);
> +
> + /*
> +  * The last CPU waking up is in charge of setting the timer. If
> +  * the CPU is hotplugged, the timer will move to another CPU
> +  * (which may not belong to the same cluster) but that is not a
> +  * problem as the timer will be set again by another CPU
> +  * belonging to the cluster. This mechanism is self adaptive.
> +  */

I am afraid that the above comment may not be completely true all the
time. For a quad-core platform, it is possible for 3 CPUs (0,1,2) to
run this function as soon as the kthread is woken up, but one of the
CPUs (3) may be stuck servicing an IRQ, Deadline or RT activity.
Because you do atomic_inc() also in this function (above) itself,
below decrement may return a true value for the CPU2 and that will
restart the hrtimer, while one of the CPUs never got a chance to
increment count in the first place.

The fix is simple though, do the increment in idle_injection_wakeup()
and things should be fine then.

> + if (!atomic_dec_and_test(_dev->count))
> + return;
> +
> + run_duration_ms = atomic_read(_dev->run_duration_ms);
> + if (run_duration_ms) {
> + hrtimer_start(_dev->timer, ms_to_ktime(run_duration_ms),
> +   HRTIMER_MODE_REL_PINNED);
> + return;
> + }
> +
> + complete(_dev->stop_complete);

So you call complete() after hrtimer is potentially restarted. This
can happen if idle_injection_stop() is called right after the above
atomic_read() has finished :)

IOW, this doesn't look safe now as well.

> +}

> +/**
> + * idle_injection_stop - stops the idle injections
> + * @ii_dev: a pointer to an idle injection_device structure
> + *
> + * The function stops the idle injection by resetting the idle and
> + * running durations and wait for the threads to complete. If we are
> + * in the process of injecting an idle cycle, then this will wait the
> + * end of the cycle.
> + */
> +void idle_injection_stop(struct idle_injection_device *ii_dev)
> +{
> + pr_debug("Stopping injecting idle cycles on CPUs '%*pbl'\n",
> +  cpumask_pr_args(ii_dev->cpumask));
> +
> + init_completion(_dev->stop_complete);

This looks completely Borken (yeah, broken :)). complete() may be
running in parallel under spinlock and updating x->done while you just
set it to 0 here without any locking in place. init_completion()
should be used only once after ii_dev is allocated and I don't see
that being done either, so that looks incorrect as well.

> +
> + idle_injection_set_duration(ii_dev, 0, 0);
> +
> + wait_for_completion_interruptible(_dev->stop_complete);
> +}
> +
> +/**
> + * 

[PATCH V4] powercap/drivers/idle_injection: Add an idle injection framework

2018-05-25 Thread Daniel Lezcano
Initially, the cpu_cooling device for ARM was changed by adding a new
policy inserting idle cycles. The intel_powerclamp driver does a
similar action.

Instead of implementing idle injections privately in the cpu_cooling
device, move the idle injection code in a dedicated framework and give
the opportunity to other frameworks to make use of it.

The framework relies on the smpboot kthreads which handles via its
main loop the common code for hotplugging and [un]parking.

This code was previously tested with the cpu cooling device and went
through several iterations. It results now in split code and API
exported in the header file. It was tested with the cpu cooling device
with success.

Signed-off-by: Daniel Lezcano 
---
V4:
   - Wait for completion when stopping (Viresh Kumar)
   - Check init already done and rollback (Viresh Kumar)

V3:
   - Fixed typos (Viresh Kumar)
   - Removed extra blank line (Viresh Kumar)
   - Added full stop (Viresh Kumar)
   - Fixed Return kerneldoc format (Viresh Kumar)
   - Fixed multiple kthreads initialization (Viresh Kumar)
   - Fixed rollbacking the actions in the unregister function (Viresh Kumar)
   - Make idle injection kthreads name hardcoded
   - Kthreads are created in the initcall process

 V2: Fixed checkpatch warnings
---
 drivers/powercap/Kconfig  |  10 ++
 drivers/powercap/Makefile |   1 +
 drivers/powercap/idle_injection.c | 368 ++
 include/linux/idle_injection.h|  29 +++
 4 files changed, 408 insertions(+)
 create mode 100644 drivers/powercap/idle_injection.c
 create mode 100644 include/linux/idle_injection.h

diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig
index 85727ef..a767ef2 100644
--- a/drivers/powercap/Kconfig
+++ b/drivers/powercap/Kconfig
@@ -29,4 +29,14 @@ config INTEL_RAPL
  controller, CPU core (Power Plance 0), graphics uncore (Power Plane
  1), etc.
 
+config IDLE_INJECTION
+   bool "Idle injection framework"
+   depends on CPU_IDLE
+   default n
+   help
+ This enables support for the idle injection framework. It
+ provides a way to force idle periods on a set of specified
+ CPUs for power capping. Idle period can be injected
+ synchronously on a set of specified CPUs or alternatively
+ on a per CPU basis.
 endif
diff --git a/drivers/powercap/Makefile b/drivers/powercap/Makefile
index 0a21ef3..c3bbfee 100644
--- a/drivers/powercap/Makefile
+++ b/drivers/powercap/Makefile
@@ -1,2 +1,3 @@
 obj-$(CONFIG_POWERCAP) += powercap_sys.o
 obj-$(CONFIG_INTEL_RAPL) += intel_rapl.o
+obj-$(CONFIG_IDLE_INJECTION) += idle_injection.o
diff --git a/drivers/powercap/idle_injection.c 
b/drivers/powercap/idle_injection.c
new file mode 100644
index 000..ecf290a
--- /dev/null
+++ b/drivers/powercap/idle_injection.c
@@ -0,0 +1,368 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2018 Linaro Limited
+ *
+ * Author: Daniel Lezcano 
+ *
+ * The idle injection framework proposes a way to force a cpu to enter
+ * an idle state during a specified amount of time for a specified
+ * period.
+ *
+ * It relies on the smpboot kthreads which handles, via its main loop,
+ * the common code for hotplugging and [un]parking.
+ *
+ * At init time, all the kthreads are created and parked.
+ *
+ * A cpumask is specified as parameter for the idle injection
+ * registering function. The kthreads will be synchronized regarding
+ * this cpumask.
+ *
+ * The idle + run duration is specified via the helpers and then the
+ * idle injection can be started at this point.
+ *
+ * A kthread will call play_idle() with the specified idle duration
+ * from above and then will schedule itself. The latest CPU belonging
+ * to the group is in charge of setting the timer for the next idle
+ * injection deadline.
+ *
+ * The task handling the timer interrupt will wakeup all the kthreads
+ * belonging to the cpumask.
+ */
+#define pr_fmt(fmt) "ii_dev: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+/**
+ * struct idle_injection_thread - task on/off switch structure
+ * @tsk: a pointer to a task_struct injecting the idle cycles
+ * @should_run: a integer used as a boolean by the smpboot kthread API
+ */
+struct idle_injection_thread {
+   struct task_struct *tsk;
+   int should_run;
+};
+
+/**
+ * struct idle_injection_device - data for the idle injection
+ * @cpumask: a cpumask containing the list of CPUs managed by the device
+ * @timer: a hrtimer giving the tempo for the idle injection
+ * @count: an atomic to keep track of the last task exiting the idle cycle
+ * @idle_duration_ms: an atomic specifying the idle duration
+ * @run_duration_ms: an atomic specifying the running duration
+ */
+struct idle_injection_device {
+   cpumask_var_t cpumask;
+   struct hrtimer timer;
+   struct completion stop_complete;
+   atomic_t count;
+

[PATCH V4] powercap/drivers/idle_injection: Add an idle injection framework

2018-05-25 Thread Daniel Lezcano
Initially, the cpu_cooling device for ARM was changed by adding a new
policy inserting idle cycles. The intel_powerclamp driver does a
similar action.

Instead of implementing idle injections privately in the cpu_cooling
device, move the idle injection code in a dedicated framework and give
the opportunity to other frameworks to make use of it.

The framework relies on the smpboot kthreads which handles via its
main loop the common code for hotplugging and [un]parking.

This code was previously tested with the cpu cooling device and went
through several iterations. It results now in split code and API
exported in the header file. It was tested with the cpu cooling device
with success.

Signed-off-by: Daniel Lezcano 
---
V4:
   - Wait for completion when stopping (Viresh Kumar)
   - Check init already done and rollback (Viresh Kumar)

V3:
   - Fixed typos (Viresh Kumar)
   - Removed extra blank line (Viresh Kumar)
   - Added full stop (Viresh Kumar)
   - Fixed Return kerneldoc format (Viresh Kumar)
   - Fixed multiple kthreads initialization (Viresh Kumar)
   - Fixed rollbacking the actions in the unregister function (Viresh Kumar)
   - Make idle injection kthreads name hardcoded
   - Kthreads are created in the initcall process

 V2: Fixed checkpatch warnings
---
 drivers/powercap/Kconfig  |  10 ++
 drivers/powercap/Makefile |   1 +
 drivers/powercap/idle_injection.c | 368 ++
 include/linux/idle_injection.h|  29 +++
 4 files changed, 408 insertions(+)
 create mode 100644 drivers/powercap/idle_injection.c
 create mode 100644 include/linux/idle_injection.h

diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig
index 85727ef..a767ef2 100644
--- a/drivers/powercap/Kconfig
+++ b/drivers/powercap/Kconfig
@@ -29,4 +29,14 @@ config INTEL_RAPL
  controller, CPU core (Power Plance 0), graphics uncore (Power Plane
  1), etc.
 
+config IDLE_INJECTION
+   bool "Idle injection framework"
+   depends on CPU_IDLE
+   default n
+   help
+ This enables support for the idle injection framework. It
+ provides a way to force idle periods on a set of specified
+ CPUs for power capping. Idle period can be injected
+ synchronously on a set of specified CPUs or alternatively
+ on a per CPU basis.
 endif
diff --git a/drivers/powercap/Makefile b/drivers/powercap/Makefile
index 0a21ef3..c3bbfee 100644
--- a/drivers/powercap/Makefile
+++ b/drivers/powercap/Makefile
@@ -1,2 +1,3 @@
 obj-$(CONFIG_POWERCAP) += powercap_sys.o
 obj-$(CONFIG_INTEL_RAPL) += intel_rapl.o
+obj-$(CONFIG_IDLE_INJECTION) += idle_injection.o
diff --git a/drivers/powercap/idle_injection.c 
b/drivers/powercap/idle_injection.c
new file mode 100644
index 000..ecf290a
--- /dev/null
+++ b/drivers/powercap/idle_injection.c
@@ -0,0 +1,368 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2018 Linaro Limited
+ *
+ * Author: Daniel Lezcano 
+ *
+ * The idle injection framework proposes a way to force a cpu to enter
+ * an idle state during a specified amount of time for a specified
+ * period.
+ *
+ * It relies on the smpboot kthreads which handles, via its main loop,
+ * the common code for hotplugging and [un]parking.
+ *
+ * At init time, all the kthreads are created and parked.
+ *
+ * A cpumask is specified as parameter for the idle injection
+ * registering function. The kthreads will be synchronized regarding
+ * this cpumask.
+ *
+ * The idle + run duration is specified via the helpers and then the
+ * idle injection can be started at this point.
+ *
+ * A kthread will call play_idle() with the specified idle duration
+ * from above and then will schedule itself. The latest CPU belonging
+ * to the group is in charge of setting the timer for the next idle
+ * injection deadline.
+ *
+ * The task handling the timer interrupt will wakeup all the kthreads
+ * belonging to the cpumask.
+ */
+#define pr_fmt(fmt) "ii_dev: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+/**
+ * struct idle_injection_thread - task on/off switch structure
+ * @tsk: a pointer to a task_struct injecting the idle cycles
+ * @should_run: a integer used as a boolean by the smpboot kthread API
+ */
+struct idle_injection_thread {
+   struct task_struct *tsk;
+   int should_run;
+};
+
+/**
+ * struct idle_injection_device - data for the idle injection
+ * @cpumask: a cpumask containing the list of CPUs managed by the device
+ * @timer: a hrtimer giving the tempo for the idle injection
+ * @count: an atomic to keep track of the last task exiting the idle cycle
+ * @idle_duration_ms: an atomic specifying the idle duration
+ * @run_duration_ms: an atomic specifying the running duration
+ */
+struct idle_injection_device {
+   cpumask_var_t cpumask;
+   struct hrtimer timer;
+   struct completion stop_complete;
+   atomic_t count;
+   atomic_t idle_duration_ms;
+   atomic_t