Re: [PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs

2016-06-01 Thread Steve Muckle
On Sat, May 21, 2016 at 12:46:06PM -0700, Steve Muckle wrote:
> Hi Peter, Ingo,

Hi Peter/Ingo would appreciate any thoughts you may have on the issue
below.

thanks,
Steve

> 
> On Thu, May 19, 2016 at 04:04:19PM -0700, Steve Muckle wrote:
> > On Thu, May 19, 2016 at 11:06:14PM +0200, Rafael J. Wysocki wrote:
> > > > In the case of a remote update the hook has to run (or not) after it is
> > > > known whether preemption will occur so we don't do needless work or
> > > > IPIs. If the policy CPUs aren't known in the scheduler then the early
> > > > hook will always need to be called along with an indication that it is
> > > > the early hook being called. If it turns out to be a remote update it
> > > > could then be deferred to the later hook, which would only be called
> > > > when a remote update has been deferred and preemption has not occurred.
> > > >
> > > > This means two hook inovcations for a remote non-preempting wakeup
> > > > though instead of one.  Perhaps this is a good middle ground on code
> > > > churn vs. optimization though.
> > > 
> > > I would think so.
> > 
> > Ok, I will pursue this approach.
> 
> I'd like to get your opinion here before proceeding further...
> 
> To catch you up and summarize, I'm trying to implement support for
> calling the scheduler cpufreq callback during remote wakeups.  Currently
> the scheduler cpufreq callback is only called when the target CPU is the
> current CPU. If a remote wakeup does not result in preemption, the CPU
> frequency may currently not be adjusted appropriately for up to a tick,
> when we are guaranteed to call the hook again.
> 
> Invoking schedutil promptly for the target CPU in this situation means
> sending an IPI if the current CPU is not in the target CPU's frequency
> domain. This is because often a cpufreq driver must run on a CPU within
> the frequency domain it is bound to.  But the catch is that we should
> not do this and incur the overhead of an IPI if preemption will occur,
> as in that case the scheduler (and schedutil) will run soon on the
> target CPU anyway, potentially as a result of the scheduler sending its
> own IPI.
> 
> I figured this unnecessary overhead would be unacceptable and so have
> been working on an approach to avoid it. Unfortunately the current hooks
> happen before the preemption decision is made. My current implementation
> sets a flag if schedutil sees a remote wakeup and then bails. There's a
> test to call the hook again at the end of check_preempt_curr() if the flag
> is set.  The flag is cleared by resched_curr() as that means preemption
> will happen on the target CPU. The flag currently lives at the end of
> the rq struct. I could move it into the update_util_data hook structure
> or elsewhere, but that would mean accessing another per-cpu item in
> hot scheduler paths.
> 
> Thoughts? Note the current implementation described above differs a bit
> from the last posting in this thread, per discussion with Rafael.
> 
> thanks,
> Steve


Re: [PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs

2016-06-01 Thread Steve Muckle
On Sat, May 21, 2016 at 12:46:06PM -0700, Steve Muckle wrote:
> Hi Peter, Ingo,

Hi Peter/Ingo would appreciate any thoughts you may have on the issue
below.

thanks,
Steve

> 
> On Thu, May 19, 2016 at 04:04:19PM -0700, Steve Muckle wrote:
> > On Thu, May 19, 2016 at 11:06:14PM +0200, Rafael J. Wysocki wrote:
> > > > In the case of a remote update the hook has to run (or not) after it is
> > > > known whether preemption will occur so we don't do needless work or
> > > > IPIs. If the policy CPUs aren't known in the scheduler then the early
> > > > hook will always need to be called along with an indication that it is
> > > > the early hook being called. If it turns out to be a remote update it
> > > > could then be deferred to the later hook, which would only be called
> > > > when a remote update has been deferred and preemption has not occurred.
> > > >
> > > > This means two hook inovcations for a remote non-preempting wakeup
> > > > though instead of one.  Perhaps this is a good middle ground on code
> > > > churn vs. optimization though.
> > > 
> > > I would think so.
> > 
> > Ok, I will pursue this approach.
> 
> I'd like to get your opinion here before proceeding further...
> 
> To catch you up and summarize, I'm trying to implement support for
> calling the scheduler cpufreq callback during remote wakeups.  Currently
> the scheduler cpufreq callback is only called when the target CPU is the
> current CPU. If a remote wakeup does not result in preemption, the CPU
> frequency may currently not be adjusted appropriately for up to a tick,
> when we are guaranteed to call the hook again.
> 
> Invoking schedutil promptly for the target CPU in this situation means
> sending an IPI if the current CPU is not in the target CPU's frequency
> domain. This is because often a cpufreq driver must run on a CPU within
> the frequency domain it is bound to.  But the catch is that we should
> not do this and incur the overhead of an IPI if preemption will occur,
> as in that case the scheduler (and schedutil) will run soon on the
> target CPU anyway, potentially as a result of the scheduler sending its
> own IPI.
> 
> I figured this unnecessary overhead would be unacceptable and so have
> been working on an approach to avoid it. Unfortunately the current hooks
> happen before the preemption decision is made. My current implementation
> sets a flag if schedutil sees a remote wakeup and then bails. There's a
> test to call the hook again at the end of check_preempt_curr() if the flag
> is set.  The flag is cleared by resched_curr() as that means preemption
> will happen on the target CPU. The flag currently lives at the end of
> the rq struct. I could move it into the update_util_data hook structure
> or elsewhere, but that would mean accessing another per-cpu item in
> hot scheduler paths.
> 
> Thoughts? Note the current implementation described above differs a bit
> from the last posting in this thread, per discussion with Rafael.
> 
> thanks,
> Steve


Re: [PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs

2016-05-21 Thread Steve Muckle
Hi Peter, Ingo,

On Thu, May 19, 2016 at 04:04:19PM -0700, Steve Muckle wrote:
> On Thu, May 19, 2016 at 11:06:14PM +0200, Rafael J. Wysocki wrote:
> > > In the case of a remote update the hook has to run (or not) after it is
> > > known whether preemption will occur so we don't do needless work or
> > > IPIs. If the policy CPUs aren't known in the scheduler then the early
> > > hook will always need to be called along with an indication that it is
> > > the early hook being called. If it turns out to be a remote update it
> > > could then be deferred to the later hook, which would only be called
> > > when a remote update has been deferred and preemption has not occurred.
> > >
> > > This means two hook inovcations for a remote non-preempting wakeup
> > > though instead of one.  Perhaps this is a good middle ground on code
> > > churn vs. optimization though.
> > 
> > I would think so.
> 
> Ok, I will pursue this approach.

I'd like to get your opinion here before proceeding further...

To catch you up and summarize, I'm trying to implement support for
calling the scheduler cpufreq callback during remote wakeups.  Currently
the scheduler cpufreq callback is only called when the target CPU is the
current CPU. If a remote wakeup does not result in preemption, the CPU
frequency may currently not be adjusted appropriately for up to a tick,
when we are guaranteed to call the hook again.

Invoking schedutil promptly for the target CPU in this situation means
sending an IPI if the current CPU is not in the target CPU's frequency
domain. This is because often a cpufreq driver must run on a CPU within
the frequency domain it is bound to.  But the catch is that we should
not do this and incur the overhead of an IPI if preemption will occur,
as in that case the scheduler (and schedutil) will run soon on the
target CPU anyway, potentially as a result of the scheduler sending its
own IPI.

I figured this unnecessary overhead would be unacceptable and so have
been working on an approach to avoid it. Unfortunately the current hooks
happen before the preemption decision is made. My current implementation
sets a flag if schedutil sees a remote wakeup and then bails. There's a
test to call the hook again at the end of check_preempt_curr() if the flag
is set.  The flag is cleared by resched_curr() as that means preemption
will happen on the target CPU. The flag currently lives at the end of
the rq struct. I could move it into the update_util_data hook structure
or elsewhere, but that would mean accessing another per-cpu item in
hot scheduler paths.

Thoughts? Note the current implementation described above differs a bit
from the last posting in this thread, per discussion with Rafael.

thanks,
Steve


Re: [PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs

2016-05-21 Thread Steve Muckle
Hi Peter, Ingo,

On Thu, May 19, 2016 at 04:04:19PM -0700, Steve Muckle wrote:
> On Thu, May 19, 2016 at 11:06:14PM +0200, Rafael J. Wysocki wrote:
> > > In the case of a remote update the hook has to run (or not) after it is
> > > known whether preemption will occur so we don't do needless work or
> > > IPIs. If the policy CPUs aren't known in the scheduler then the early
> > > hook will always need to be called along with an indication that it is
> > > the early hook being called. If it turns out to be a remote update it
> > > could then be deferred to the later hook, which would only be called
> > > when a remote update has been deferred and preemption has not occurred.
> > >
> > > This means two hook inovcations for a remote non-preempting wakeup
> > > though instead of one.  Perhaps this is a good middle ground on code
> > > churn vs. optimization though.
> > 
> > I would think so.
> 
> Ok, I will pursue this approach.

I'd like to get your opinion here before proceeding further...

To catch you up and summarize, I'm trying to implement support for
calling the scheduler cpufreq callback during remote wakeups.  Currently
the scheduler cpufreq callback is only called when the target CPU is the
current CPU. If a remote wakeup does not result in preemption, the CPU
frequency may currently not be adjusted appropriately for up to a tick,
when we are guaranteed to call the hook again.

Invoking schedutil promptly for the target CPU in this situation means
sending an IPI if the current CPU is not in the target CPU's frequency
domain. This is because often a cpufreq driver must run on a CPU within
the frequency domain it is bound to.  But the catch is that we should
not do this and incur the overhead of an IPI if preemption will occur,
as in that case the scheduler (and schedutil) will run soon on the
target CPU anyway, potentially as a result of the scheduler sending its
own IPI.

I figured this unnecessary overhead would be unacceptable and so have
been working on an approach to avoid it. Unfortunately the current hooks
happen before the preemption decision is made. My current implementation
sets a flag if schedutil sees a remote wakeup and then bails. There's a
test to call the hook again at the end of check_preempt_curr() if the flag
is set.  The flag is cleared by resched_curr() as that means preemption
will happen on the target CPU. The flag currently lives at the end of
the rq struct. I could move it into the update_util_data hook structure
or elsewhere, but that would mean accessing another per-cpu item in
hot scheduler paths.

Thoughts? Note the current implementation described above differs a bit
from the last posting in this thread, per discussion with Rafael.

thanks,
Steve


Re: [PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs

2016-05-19 Thread Steve Muckle
On Thu, May 19, 2016 at 11:06:14PM +0200, Rafael J. Wysocki wrote:
> > In the case of a remote update the hook has to run (or not) after it is
> > known whether preemption will occur so we don't do needless work or
> > IPIs. If the policy CPUs aren't known in the scheduler then the early
> > hook will always need to be called along with an indication that it is
> > the early hook being called. If it turns out to be a remote update it
> > could then be deferred to the later hook, which would only be called
> > when a remote update has been deferred and preemption has not occurred.
> >
> > This means two hook inovcations for a remote non-preempting wakeup
> > though instead of one.  Perhaps this is a good middle ground on code
> > churn vs. optimization though.
> 
> I would think so.

Ok, I will pursue this approach.

thanks,
Steve


Re: [PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs

2016-05-19 Thread Steve Muckle
On Thu, May 19, 2016 at 11:06:14PM +0200, Rafael J. Wysocki wrote:
> > In the case of a remote update the hook has to run (or not) after it is
> > known whether preemption will occur so we don't do needless work or
> > IPIs. If the policy CPUs aren't known in the scheduler then the early
> > hook will always need to be called along with an indication that it is
> > the early hook being called. If it turns out to be a remote update it
> > could then be deferred to the later hook, which would only be called
> > when a remote update has been deferred and preemption has not occurred.
> >
> > This means two hook inovcations for a remote non-preempting wakeup
> > though instead of one.  Perhaps this is a good middle ground on code
> > churn vs. optimization though.
> 
> I would think so.

Ok, I will pursue this approach.

thanks,
Steve


Re: [PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs

2016-05-19 Thread Rafael J. Wysocki
On Thu, May 19, 2016 at 9:19 PM, Steve Muckle  wrote:
> On Thu, May 19, 2016 at 02:00:54PM +0200, Rafael J. Wysocki wrote:
>> On Thu, May 19, 2016 at 1:33 AM, Rafael J. Wysocki  wrote:
>> > On Mon, May 9, 2016 at 11:20 PM, Steve Muckle  
>> > wrote:
>> >> Without calling the cpufreq hook for a remote wakeup it is possible
>> >> for such a wakeup to go unnoticed by cpufreq on the target CPU for up
>> >> to a full tick. This can occur if the target CPU is running a
>> >> CPU-bound task and preemption does not occur. If preemption does occur
>> >> then the scheduler is expected to run soon on the target CPU anyway so
>> >> invoking the cpufreq hook on the remote wakeup is not required.
>> >>
>> >> In order to avoid unnecessary interprocessor communication in the
>> >> governor for the preemption case, the existing hook (which happens
>> >> before preemption is decided) is only called when the target CPU
>> >> is within the current CPU's cpufreq policy. A new hook is added in
>> >> check_preempt_curr() to handle events outside the current CPU's
>> >> cpufreq policy where preemption does not happen.
>> >>
>> >> Some governors may opt to not receive remote CPU callbacks. This
>> >> behavior is possible by providing NULL as the new policy_cpus
>> >> parameter to cpufreq_add_update_util_hook(). Callbacks will only be
>> >> issued in this case when the target CPU and the current CPU are the
>> >> same. Otherwise policy_cpus is used to determine what is a local
>> >> vs. remote callback.
>> >
>> > I don't really like the extra complexity added by this patch.
>> >
>> > It makes the code look fragile at some places
>
> Perhaps I can improve these, can you point them out?
>
>> > and it only really matters for schedutil
>
> True. As we are trying to create an integrated scheduler+CPU frequency
> control solution I think some of this is to be expected, and should be
> worthwhile since this is hopefully the future and will eventually
> replace the need for the other governors.
>
>> > and for the fast switch case in there.
>
> Once there is a high priority context for the slow path I expect it to
> benefit from this as well.

I don't think that saving an occasional IPI would matter for that case
overall, though.

>> >
>> > Overall, it looks like a premature optimization to me.
>
> Are you referring to this new approach of avoiding duplicate IPIs,

Yes.

> or supporting updates on remote wakeups overall?

No.  I already said I would be fine with that if done properly.

> The duplicate IPI thing is admittedly not required for the problem I'm
> looking to solve but I figured at least some people would be concerned
> about it.

Avoiding IPIs that aren't necessary is fine by me in general, but
doing that at the scheduler level doesn't seem to be necessary as I
said.

> If folks can live with it for now then I can go back to the
> simpler approach I had in my first posting.

Please at least avoid introducing internal cpufreq concepts into the
scheduler uncleanly.

>> In particular, the dance with checking the policy CPUs from the
>> scheduler is questionable (the scheduler might be interested in this
>> information for other purposes too and hooking it up in an ad-hoc way
>> just for cpufreq doesn't seem to be appropriate from that perspective)
>> and also doesn't seem to be necessary.
>>
>> You can check if the current CPU is a policy one from the governor and
>> if that is the case, simply run the frequency update on it directly
>> without any IPI (because if both the target CPU and the current CPU
>> belong to the same policy, it doesn't matter which one of them will
>> run the update).
>
> The checking of policy CPUs from the scheduler was done so as to
> minimize the number of calls to the hook, given their expense.

But policy CPUs is entirely an internal cpufreq concept and adding
that to the scheduler kind of via a kitchen door doesn't look good to
me.

> In the case of a remote update the hook has to run (or not) after it is
> known whether preemption will occur so we don't do needless work or
> IPIs. If the policy CPUs aren't known in the scheduler then the early
> hook will always need to be called along with an indication that it is
> the early hook being called. If it turns out to be a remote update it
> could then be deferred to the later hook, which would only be called
> when a remote update has been deferred and preemption has not occurred.
>
> This means two hook inovcations for a remote non-preempting wakeup
> though instead of one.  Perhaps this is a good middle ground on code
> churn vs. optimization though.

I would think so.


Re: [PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs

2016-05-19 Thread Rafael J. Wysocki
On Thu, May 19, 2016 at 9:19 PM, Steve Muckle  wrote:
> On Thu, May 19, 2016 at 02:00:54PM +0200, Rafael J. Wysocki wrote:
>> On Thu, May 19, 2016 at 1:33 AM, Rafael J. Wysocki  wrote:
>> > On Mon, May 9, 2016 at 11:20 PM, Steve Muckle  
>> > wrote:
>> >> Without calling the cpufreq hook for a remote wakeup it is possible
>> >> for such a wakeup to go unnoticed by cpufreq on the target CPU for up
>> >> to a full tick. This can occur if the target CPU is running a
>> >> CPU-bound task and preemption does not occur. If preemption does occur
>> >> then the scheduler is expected to run soon on the target CPU anyway so
>> >> invoking the cpufreq hook on the remote wakeup is not required.
>> >>
>> >> In order to avoid unnecessary interprocessor communication in the
>> >> governor for the preemption case, the existing hook (which happens
>> >> before preemption is decided) is only called when the target CPU
>> >> is within the current CPU's cpufreq policy. A new hook is added in
>> >> check_preempt_curr() to handle events outside the current CPU's
>> >> cpufreq policy where preemption does not happen.
>> >>
>> >> Some governors may opt to not receive remote CPU callbacks. This
>> >> behavior is possible by providing NULL as the new policy_cpus
>> >> parameter to cpufreq_add_update_util_hook(). Callbacks will only be
>> >> issued in this case when the target CPU and the current CPU are the
>> >> same. Otherwise policy_cpus is used to determine what is a local
>> >> vs. remote callback.
>> >
>> > I don't really like the extra complexity added by this patch.
>> >
>> > It makes the code look fragile at some places
>
> Perhaps I can improve these, can you point them out?
>
>> > and it only really matters for schedutil
>
> True. As we are trying to create an integrated scheduler+CPU frequency
> control solution I think some of this is to be expected, and should be
> worthwhile since this is hopefully the future and will eventually
> replace the need for the other governors.
>
>> > and for the fast switch case in there.
>
> Once there is a high priority context for the slow path I expect it to
> benefit from this as well.

I don't think that saving an occasional IPI would matter for that case
overall, though.

>> >
>> > Overall, it looks like a premature optimization to me.
>
> Are you referring to this new approach of avoiding duplicate IPIs,

Yes.

> or supporting updates on remote wakeups overall?

No.  I already said I would be fine with that if done properly.

> The duplicate IPI thing is admittedly not required for the problem I'm
> looking to solve but I figured at least some people would be concerned
> about it.

Avoiding IPIs that aren't necessary is fine by me in general, but
doing that at the scheduler level doesn't seem to be necessary as I
said.

> If folks can live with it for now then I can go back to the
> simpler approach I had in my first posting.

Please at least avoid introducing internal cpufreq concepts into the
scheduler uncleanly.

>> In particular, the dance with checking the policy CPUs from the
>> scheduler is questionable (the scheduler might be interested in this
>> information for other purposes too and hooking it up in an ad-hoc way
>> just for cpufreq doesn't seem to be appropriate from that perspective)
>> and also doesn't seem to be necessary.
>>
>> You can check if the current CPU is a policy one from the governor and
>> if that is the case, simply run the frequency update on it directly
>> without any IPI (because if both the target CPU and the current CPU
>> belong to the same policy, it doesn't matter which one of them will
>> run the update).
>
> The checking of policy CPUs from the scheduler was done so as to
> minimize the number of calls to the hook, given their expense.

But policy CPUs is entirely an internal cpufreq concept and adding
that to the scheduler kind of via a kitchen door doesn't look good to
me.

> In the case of a remote update the hook has to run (or not) after it is
> known whether preemption will occur so we don't do needless work or
> IPIs. If the policy CPUs aren't known in the scheduler then the early
> hook will always need to be called along with an indication that it is
> the early hook being called. If it turns out to be a remote update it
> could then be deferred to the later hook, which would only be called
> when a remote update has been deferred and preemption has not occurred.
>
> This means two hook inovcations for a remote non-preempting wakeup
> though instead of one.  Perhaps this is a good middle ground on code
> churn vs. optimization though.

I would think so.


Re: [PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs

2016-05-19 Thread Steve Muckle
On Thu, May 19, 2016 at 02:00:54PM +0200, Rafael J. Wysocki wrote:
> On Thu, May 19, 2016 at 1:33 AM, Rafael J. Wysocki  wrote:
> > On Mon, May 9, 2016 at 11:20 PM, Steve Muckle  
> > wrote:
> >> Without calling the cpufreq hook for a remote wakeup it is possible
> >> for such a wakeup to go unnoticed by cpufreq on the target CPU for up
> >> to a full tick. This can occur if the target CPU is running a
> >> CPU-bound task and preemption does not occur. If preemption does occur
> >> then the scheduler is expected to run soon on the target CPU anyway so
> >> invoking the cpufreq hook on the remote wakeup is not required.
> >>
> >> In order to avoid unnecessary interprocessor communication in the
> >> governor for the preemption case, the existing hook (which happens
> >> before preemption is decided) is only called when the target CPU
> >> is within the current CPU's cpufreq policy. A new hook is added in
> >> check_preempt_curr() to handle events outside the current CPU's
> >> cpufreq policy where preemption does not happen.
> >>
> >> Some governors may opt to not receive remote CPU callbacks. This
> >> behavior is possible by providing NULL as the new policy_cpus
> >> parameter to cpufreq_add_update_util_hook(). Callbacks will only be
> >> issued in this case when the target CPU and the current CPU are the
> >> same. Otherwise policy_cpus is used to determine what is a local
> >> vs. remote callback.
> >
> > I don't really like the extra complexity added by this patch.
> >
> > It makes the code look fragile at some places 

Perhaps I can improve these, can you point them out?

> > and it only really matters for schedutil 

True. As we are trying to create an integrated scheduler+CPU frequency
control solution I think some of this is to be expected, and should be
worthwhile since this is hopefully the future and will eventually
replace the need for the other governors.

> > and for the fast switch case in there.

Once there is a high priority context for the slow path I expect it to
benefit from this as well.

> >
> > Overall, it looks like a premature optimization to me.

Are you referring to this new approach of avoiding duplicate IPIs, or
supporting updates on remote wakeups overall?

The duplicate IPI thing is admittedly not required for the problem I'm
looking to solve but I figured at least some people would be concerned
about it.  If folks can live with it for now then I can go back to the
simpler approach I had in my first posting.
 
> In particular, the dance with checking the policy CPUs from the
> scheduler is questionable (the scheduler might be interested in this
> information for other purposes too and hooking it up in an ad-hoc way
> just for cpufreq doesn't seem to be appropriate from that perspective)
> and also doesn't seem to be necessary.
> 
> You can check if the current CPU is a policy one from the governor and
> if that is the case, simply run the frequency update on it directly
> without any IPI (because if both the target CPU and the current CPU
> belong to the same policy, it doesn't matter which one of them will
> run the update).

The checking of policy CPUs from the scheduler was done so as to
minimize the number of calls to the hook, given their expense.

In the case of a remote update the hook has to run (or not) after it is
known whether preemption will occur so we don't do needless work or
IPIs. If the policy CPUs aren't known in the scheduler then the early
hook will always need to be called along with an indication that it is
the early hook being called. If it turns out to be a remote update it
could then be deferred to the later hook, which would only be called
when a remote update has been deferred and preemption has not occurred.

This means two hook inovcations for a remote non-preempting wakeup
though instead of one.  Perhaps this is a good middle ground on code
churn vs. optimization though.

thanks,
Steve


Re: [PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs

2016-05-19 Thread Steve Muckle
On Thu, May 19, 2016 at 02:00:54PM +0200, Rafael J. Wysocki wrote:
> On Thu, May 19, 2016 at 1:33 AM, Rafael J. Wysocki  wrote:
> > On Mon, May 9, 2016 at 11:20 PM, Steve Muckle  
> > wrote:
> >> Without calling the cpufreq hook for a remote wakeup it is possible
> >> for such a wakeup to go unnoticed by cpufreq on the target CPU for up
> >> to a full tick. This can occur if the target CPU is running a
> >> CPU-bound task and preemption does not occur. If preemption does occur
> >> then the scheduler is expected to run soon on the target CPU anyway so
> >> invoking the cpufreq hook on the remote wakeup is not required.
> >>
> >> In order to avoid unnecessary interprocessor communication in the
> >> governor for the preemption case, the existing hook (which happens
> >> before preemption is decided) is only called when the target CPU
> >> is within the current CPU's cpufreq policy. A new hook is added in
> >> check_preempt_curr() to handle events outside the current CPU's
> >> cpufreq policy where preemption does not happen.
> >>
> >> Some governors may opt to not receive remote CPU callbacks. This
> >> behavior is possible by providing NULL as the new policy_cpus
> >> parameter to cpufreq_add_update_util_hook(). Callbacks will only be
> >> issued in this case when the target CPU and the current CPU are the
> >> same. Otherwise policy_cpus is used to determine what is a local
> >> vs. remote callback.
> >
> > I don't really like the extra complexity added by this patch.
> >
> > It makes the code look fragile at some places 

Perhaps I can improve these, can you point them out?

> > and it only really matters for schedutil 

True. As we are trying to create an integrated scheduler+CPU frequency
control solution I think some of this is to be expected, and should be
worthwhile since this is hopefully the future and will eventually
replace the need for the other governors.

> > and for the fast switch case in there.

Once there is a high priority context for the slow path I expect it to
benefit from this as well.

> >
> > Overall, it looks like a premature optimization to me.

Are you referring to this new approach of avoiding duplicate IPIs, or
supporting updates on remote wakeups overall?

The duplicate IPI thing is admittedly not required for the problem I'm
looking to solve but I figured at least some people would be concerned
about it.  If folks can live with it for now then I can go back to the
simpler approach I had in my first posting.
 
> In particular, the dance with checking the policy CPUs from the
> scheduler is questionable (the scheduler might be interested in this
> information for other purposes too and hooking it up in an ad-hoc way
> just for cpufreq doesn't seem to be appropriate from that perspective)
> and also doesn't seem to be necessary.
> 
> You can check if the current CPU is a policy one from the governor and
> if that is the case, simply run the frequency update on it directly
> without any IPI (because if both the target CPU and the current CPU
> belong to the same policy, it doesn't matter which one of them will
> run the update).

The checking of policy CPUs from the scheduler was done so as to
minimize the number of calls to the hook, given their expense.

In the case of a remote update the hook has to run (or not) after it is
known whether preemption will occur so we don't do needless work or
IPIs. If the policy CPUs aren't known in the scheduler then the early
hook will always need to be called along with an indication that it is
the early hook being called. If it turns out to be a remote update it
could then be deferred to the later hook, which would only be called
when a remote update has been deferred and preemption has not occurred.

This means two hook inovcations for a remote non-preempting wakeup
though instead of one.  Perhaps this is a good middle ground on code
churn vs. optimization though.

thanks,
Steve


Re: [PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs

2016-05-19 Thread Rafael J. Wysocki
On Thu, May 19, 2016 at 1:33 AM, Rafael J. Wysocki  wrote:
> On Mon, May 9, 2016 at 11:20 PM, Steve Muckle  wrote:
>> Without calling the cpufreq hook for a remote wakeup it is possible
>> for such a wakeup to go unnoticed by cpufreq on the target CPU for up
>> to a full tick. This can occur if the target CPU is running a
>> CPU-bound task and preemption does not occur. If preemption does occur
>> then the scheduler is expected to run soon on the target CPU anyway so
>> invoking the cpufreq hook on the remote wakeup is not required.
>>
>> In order to avoid unnecessary interprocessor communication in the
>> governor for the preemption case, the existing hook (which happens
>> before preemption is decided) is only called when the target CPU
>> is within the current CPU's cpufreq policy. A new hook is added in
>> check_preempt_curr() to handle events outside the current CPU's
>> cpufreq policy where preemption does not happen.
>>
>> Some governors may opt to not receive remote CPU callbacks. This
>> behavior is possible by providing NULL as the new policy_cpus
>> parameter to cpufreq_add_update_util_hook(). Callbacks will only be
>> issued in this case when the target CPU and the current CPU are the
>> same. Otherwise policy_cpus is used to determine what is a local
>> vs. remote callback.
>
> I don't really like the extra complexity added by this patch.
>
> It makes the code look fragile at some places and it only really
> matters for schedutil and for the fast switch case in there.
>
> Overall, it looks like a premature optimization to me.

In particular, the dance with checking the policy CPUs from the
scheduler is questionable (the scheduler might be interested in this
information for other purposes too and hooking it up in an ad-hoc way
just for cpufreq doesn't seem to be appropriate from that perspective)
and also doesn't seem to be necessary.

You can check if the current CPU is a policy one from the governor and
if that is the case, simply run the frequency update on it directly
without any IPI (because if both the target CPU and the current CPU
belong to the same policy, it doesn't matter which one of them will
run the update).


Re: [PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs

2016-05-19 Thread Rafael J. Wysocki
On Thu, May 19, 2016 at 1:33 AM, Rafael J. Wysocki  wrote:
> On Mon, May 9, 2016 at 11:20 PM, Steve Muckle  wrote:
>> Without calling the cpufreq hook for a remote wakeup it is possible
>> for such a wakeup to go unnoticed by cpufreq on the target CPU for up
>> to a full tick. This can occur if the target CPU is running a
>> CPU-bound task and preemption does not occur. If preemption does occur
>> then the scheduler is expected to run soon on the target CPU anyway so
>> invoking the cpufreq hook on the remote wakeup is not required.
>>
>> In order to avoid unnecessary interprocessor communication in the
>> governor for the preemption case, the existing hook (which happens
>> before preemption is decided) is only called when the target CPU
>> is within the current CPU's cpufreq policy. A new hook is added in
>> check_preempt_curr() to handle events outside the current CPU's
>> cpufreq policy where preemption does not happen.
>>
>> Some governors may opt to not receive remote CPU callbacks. This
>> behavior is possible by providing NULL as the new policy_cpus
>> parameter to cpufreq_add_update_util_hook(). Callbacks will only be
>> issued in this case when the target CPU and the current CPU are the
>> same. Otherwise policy_cpus is used to determine what is a local
>> vs. remote callback.
>
> I don't really like the extra complexity added by this patch.
>
> It makes the code look fragile at some places and it only really
> matters for schedutil and for the fast switch case in there.
>
> Overall, it looks like a premature optimization to me.

In particular, the dance with checking the policy CPUs from the
scheduler is questionable (the scheduler might be interested in this
information for other purposes too and hooking it up in an ad-hoc way
just for cpufreq doesn't seem to be appropriate from that perspective)
and also doesn't seem to be necessary.

You can check if the current CPU is a policy one from the governor and
if that is the case, simply run the frequency update on it directly
without any IPI (because if both the target CPU and the current CPU
belong to the same policy, it doesn't matter which one of them will
run the update).


Re: [PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs

2016-05-18 Thread Rafael J. Wysocki
On Mon, May 9, 2016 at 11:20 PM, Steve Muckle  wrote:
> Without calling the cpufreq hook for a remote wakeup it is possible
> for such a wakeup to go unnoticed by cpufreq on the target CPU for up
> to a full tick. This can occur if the target CPU is running a
> CPU-bound task and preemption does not occur. If preemption does occur
> then the scheduler is expected to run soon on the target CPU anyway so
> invoking the cpufreq hook on the remote wakeup is not required.
>
> In order to avoid unnecessary interprocessor communication in the
> governor for the preemption case, the existing hook (which happens
> before preemption is decided) is only called when the target CPU
> is within the current CPU's cpufreq policy. A new hook is added in
> check_preempt_curr() to handle events outside the current CPU's
> cpufreq policy where preemption does not happen.
>
> Some governors may opt to not receive remote CPU callbacks. This
> behavior is possible by providing NULL as the new policy_cpus
> parameter to cpufreq_add_update_util_hook(). Callbacks will only be
> issued in this case when the target CPU and the current CPU are the
> same. Otherwise policy_cpus is used to determine what is a local
> vs. remote callback.

I don't really like the extra complexity added by this patch.

It makes the code look fragile at some places and it only really
matters for schedutil and for the fast switch case in there.

Overall, it looks like a premature optimization to me.


Re: [PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs

2016-05-18 Thread Rafael J. Wysocki
On Mon, May 9, 2016 at 11:20 PM, Steve Muckle  wrote:
> Without calling the cpufreq hook for a remote wakeup it is possible
> for such a wakeup to go unnoticed by cpufreq on the target CPU for up
> to a full tick. This can occur if the target CPU is running a
> CPU-bound task and preemption does not occur. If preemption does occur
> then the scheduler is expected to run soon on the target CPU anyway so
> invoking the cpufreq hook on the remote wakeup is not required.
>
> In order to avoid unnecessary interprocessor communication in the
> governor for the preemption case, the existing hook (which happens
> before preemption is decided) is only called when the target CPU
> is within the current CPU's cpufreq policy. A new hook is added in
> check_preempt_curr() to handle events outside the current CPU's
> cpufreq policy where preemption does not happen.
>
> Some governors may opt to not receive remote CPU callbacks. This
> behavior is possible by providing NULL as the new policy_cpus
> parameter to cpufreq_add_update_util_hook(). Callbacks will only be
> issued in this case when the target CPU and the current CPU are the
> same. Otherwise policy_cpus is used to determine what is a local
> vs. remote callback.

I don't really like the extra complexity added by this patch.

It makes the code look fragile at some places and it only really
matters for schedutil and for the fast switch case in there.

Overall, it looks like a premature optimization to me.


[PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs

2016-05-09 Thread Steve Muckle
Without calling the cpufreq hook for a remote wakeup it is possible
for such a wakeup to go unnoticed by cpufreq on the target CPU for up
to a full tick. This can occur if the target CPU is running a
CPU-bound task and preemption does not occur. If preemption does occur
then the scheduler is expected to run soon on the target CPU anyway so
invoking the cpufreq hook on the remote wakeup is not required.

In order to avoid unnecessary interprocessor communication in the
governor for the preemption case, the existing hook (which happens
before preemption is decided) is only called when the target CPU
is within the current CPU's cpufreq policy. A new hook is added in
check_preempt_curr() to handle events outside the current CPU's
cpufreq policy where preemption does not happen.

Some governors may opt to not receive remote CPU callbacks. This
behavior is possible by providing NULL as the new policy_cpus
parameter to cpufreq_add_update_util_hook(). Callbacks will only be
issued in this case when the target CPU and the current CPU are the
same. Otherwise policy_cpus is used to determine what is a local
vs. remote callback.

Signed-off-by: Steve Muckle 
---
 drivers/cpufreq/cpufreq_governor.c |   2 +-
 drivers/cpufreq/intel_pstate.c |   2 +-
 include/linux/sched.h  |   4 +-
 kernel/sched/core.c|   4 ++
 kernel/sched/cpufreq.c |  13 -
 kernel/sched/cpufreq_schedutil.c   |   6 ++-
 kernel/sched/fair.c|  40 +++---
 kernel/sched/sched.h   | 106 +
 8 files changed, 127 insertions(+), 50 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c 
b/drivers/cpufreq/cpufreq_governor.c
index 20f0a4e114d1..e127a7a22177 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -311,7 +311,7 @@ static void gov_set_update_util(struct policy_dbs_info 
*policy_dbs,
struct cpu_dbs_info *cdbs = _cpu(cpu_dbs, cpu);
 
cpufreq_add_update_util_hook(cpu, >update_util,
-dbs_update_util_handler);
+dbs_update_util_handler, NULL);
}
 }
 
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 6c7cff13f0ed..9cf262ef23af 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -1266,7 +1266,7 @@ static void intel_pstate_set_update_util_hook(unsigned 
int cpu_num)
/* Prevent intel_pstate_update_util() from using stale data. */
cpu->sample.time = 0;
cpufreq_add_update_util_hook(cpu_num, >update_util,
-intel_pstate_update_util);
+intel_pstate_update_util, NULL);
 }
 
 static void intel_pstate_clear_update_util_hook(unsigned int cpu)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 81aba7dc5966..ce154518119a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -3239,11 +3239,13 @@ struct update_util_data {
void (*func)(struct update_util_data *data,
 u64 time, unsigned long util, unsigned long max);
int cpu;
+   cpumask_var_t *policy_cpus;
 };
 
 void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data,
void (*func)(struct update_util_data *data, u64 time,
-unsigned long util, unsigned long max));
+unsigned long util, unsigned long max),
+ cpumask_var_t *policy_cpus);
 void cpufreq_remove_update_util_hook(int cpu);
 #endif /* CONFIG_CPU_FREQ */
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b489fcac37b..fce6c0b43231 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -450,6 +450,8 @@ void resched_curr(struct rq *rq)
 
lockdep_assert_held(>lock);
 
+   cpufreq_set_skip_cb(rq);
+
if (test_tsk_need_resched(curr))
return;
 
@@ -922,6 +924,8 @@ void check_preempt_curr(struct rq *rq, struct task_struct 
*p, int flags)
 */
if (task_on_rq_queued(rq->curr) && test_tsk_need_resched(rq->curr))
rq_clock_skip_update(rq, true);
+
+   cpufreq_update_remote(rq);
 }
 
 #ifdef CONFIG_SMP
diff --git a/kernel/sched/cpufreq.c b/kernel/sched/cpufreq.c
index d88a78ea805d..2946d2096bf2 100644
--- a/kernel/sched/cpufreq.c
+++ b/kernel/sched/cpufreq.c
@@ -18,6 +18,7 @@ DEFINE_PER_CPU(struct update_util_data *, 
cpufreq_update_util_data);
  * @cpu: The CPU to set the pointer for.
  * @data: New pointer value.
  * @func: Callback function to set for the CPU.
+ * @policy_cpus: Pointer to cpumask for CPUs which share the same policy.
  *
  * Set and publish the update_util_data pointer for the given CPU.
  *
@@ -28,12 +29,21 @@ DEFINE_PER_CPU(struct update_util_data *, 
cpufreq_update_util_data);
  * passed to it as the 

[PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs

2016-05-09 Thread Steve Muckle
Without calling the cpufreq hook for a remote wakeup it is possible
for such a wakeup to go unnoticed by cpufreq on the target CPU for up
to a full tick. This can occur if the target CPU is running a
CPU-bound task and preemption does not occur. If preemption does occur
then the scheduler is expected to run soon on the target CPU anyway so
invoking the cpufreq hook on the remote wakeup is not required.

In order to avoid unnecessary interprocessor communication in the
governor for the preemption case, the existing hook (which happens
before preemption is decided) is only called when the target CPU
is within the current CPU's cpufreq policy. A new hook is added in
check_preempt_curr() to handle events outside the current CPU's
cpufreq policy where preemption does not happen.

Some governors may opt to not receive remote CPU callbacks. This
behavior is possible by providing NULL as the new policy_cpus
parameter to cpufreq_add_update_util_hook(). Callbacks will only be
issued in this case when the target CPU and the current CPU are the
same. Otherwise policy_cpus is used to determine what is a local
vs. remote callback.

Signed-off-by: Steve Muckle 
---
 drivers/cpufreq/cpufreq_governor.c |   2 +-
 drivers/cpufreq/intel_pstate.c |   2 +-
 include/linux/sched.h  |   4 +-
 kernel/sched/core.c|   4 ++
 kernel/sched/cpufreq.c |  13 -
 kernel/sched/cpufreq_schedutil.c   |   6 ++-
 kernel/sched/fair.c|  40 +++---
 kernel/sched/sched.h   | 106 +
 8 files changed, 127 insertions(+), 50 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c 
b/drivers/cpufreq/cpufreq_governor.c
index 20f0a4e114d1..e127a7a22177 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -311,7 +311,7 @@ static void gov_set_update_util(struct policy_dbs_info 
*policy_dbs,
struct cpu_dbs_info *cdbs = _cpu(cpu_dbs, cpu);
 
cpufreq_add_update_util_hook(cpu, >update_util,
-dbs_update_util_handler);
+dbs_update_util_handler, NULL);
}
 }
 
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 6c7cff13f0ed..9cf262ef23af 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -1266,7 +1266,7 @@ static void intel_pstate_set_update_util_hook(unsigned 
int cpu_num)
/* Prevent intel_pstate_update_util() from using stale data. */
cpu->sample.time = 0;
cpufreq_add_update_util_hook(cpu_num, >update_util,
-intel_pstate_update_util);
+intel_pstate_update_util, NULL);
 }
 
 static void intel_pstate_clear_update_util_hook(unsigned int cpu)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 81aba7dc5966..ce154518119a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -3239,11 +3239,13 @@ struct update_util_data {
void (*func)(struct update_util_data *data,
 u64 time, unsigned long util, unsigned long max);
int cpu;
+   cpumask_var_t *policy_cpus;
 };
 
 void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data,
void (*func)(struct update_util_data *data, u64 time,
-unsigned long util, unsigned long max));
+unsigned long util, unsigned long max),
+ cpumask_var_t *policy_cpus);
 void cpufreq_remove_update_util_hook(int cpu);
 #endif /* CONFIG_CPU_FREQ */
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b489fcac37b..fce6c0b43231 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -450,6 +450,8 @@ void resched_curr(struct rq *rq)
 
lockdep_assert_held(>lock);
 
+   cpufreq_set_skip_cb(rq);
+
if (test_tsk_need_resched(curr))
return;
 
@@ -922,6 +924,8 @@ void check_preempt_curr(struct rq *rq, struct task_struct 
*p, int flags)
 */
if (task_on_rq_queued(rq->curr) && test_tsk_need_resched(rq->curr))
rq_clock_skip_update(rq, true);
+
+   cpufreq_update_remote(rq);
 }
 
 #ifdef CONFIG_SMP
diff --git a/kernel/sched/cpufreq.c b/kernel/sched/cpufreq.c
index d88a78ea805d..2946d2096bf2 100644
--- a/kernel/sched/cpufreq.c
+++ b/kernel/sched/cpufreq.c
@@ -18,6 +18,7 @@ DEFINE_PER_CPU(struct update_util_data *, 
cpufreq_update_util_data);
  * @cpu: The CPU to set the pointer for.
  * @data: New pointer value.
  * @func: Callback function to set for the CPU.
+ * @policy_cpus: Pointer to cpumask for CPUs which share the same policy.
  *
  * Set and publish the update_util_data pointer for the given CPU.
  *
@@ -28,12 +29,21 @@ DEFINE_PER_CPU(struct update_util_data *, 
cpufreq_update_util_data);
  * passed to it as the first argument which