subject:"\[PATCH V2 Resend 3\/4\] workqueue\: Schedule work on non\-idle cpu instead of current one"

Hi Steven,

On 8 January 2013 03:59, Steven Rostedt  wrote:
> On Mon, 2013-01-07 at 23:29 +0530, Viresh Kumar wrote:
>
>> > By default, I would suggest for cache locality,
>> > that we try to keep it on the same CPU. But if there's a better CPU to
>> > run on, it runs there.
>>
>> That would break our intention behind this routine. We should run
>> it on a cpu which we think is the best one for it power/performance wise.
>
> If you are running on a big.Little box sure. But for normal (read x86 ;)
> systems, it should probably stay on the current CPU.

But that's not the purpose of this new call. If the caller want it to be on
local cpu, he must not use this call. It is upto the core routine
(sched_select_cpu()
in our case) to decide where to launch it. If we need something special for x86,
we can hack this routine.

Even for non bigLITTLE systems, we may want to run a work on non-idle cpu and
so we can't guarantee local cpu here.

>> So, i would like to call the sched_select_cpu() routine from this routine to
>> get the suggested cpu.
>
> Does the caller need to know? Or if you have a big.LITTLE setup, it
> should just work automagically?

Caller wouldn't know, he just needs to trust on sched_select_cpu() to give the
most optimum cpu.

Again, it is not only for big LITTLE, but for any SMP system, where we
don't want
an idle cpu to do this work.

>> I don't think we need it :(
>> It would be a new API, with zero existing users. And so, whomsoever uses
>> it, should know what exactly he is doing :)
>
> Heh, you don't know kernel developers well do you? ;-)

I agree with you, but we don't need to care for foolish new code here.

> Once a new API is added to the kernel tree, it quickly becomes
> (mis)used.

Its true with all pieces of code in kernel and we really don't need to take
care of such users here :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

On Mon, 2013-01-07 at 23:29 +0530, Viresh Kumar wrote:

> > By default, I would suggest for cache locality,
> > that we try to keep it on the same CPU. But if there's a better CPU to
> > run on, it runs there.
> 
> That would break our intention behind this routine. We should run
> it on a cpu which we think is the best one for it power/performance wise.

If you are running on a big.Little box sure. But for normal (read x86 ;)
systems, it should probably stay on the current CPU.

> 
> So, i would like to call the sched_select_cpu() routine from this routine to
> get the suggested cpu.

Does the caller need to know? Or if you have a big.LITTLE setup, it
should just work automagically?

> 
> This routine would however would see more changes later to optimize it
> more.
> 
> > Also, we could still add a debug option that
> > makes it always run on other CPUs to slap developers that don't read.
> 
> I don't think we need it :(
> It would be a new API, with zero existing users. And so, whomsoever uses
> it, should know what exactly he is doing :)

Heh, you don't know kernel developers well do you? ;-)

Once a new API is added to the kernel tree, it quickly becomes
(mis)used.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

[Removed Suresh and Venki from discussion, they switched their companies
probably]

On 7 January 2013 20:34, Tejun Heo  wrote:
> The latter part "not using idle cpu just for processing work" does
> apply to homogeneous systems too but as I wrote earlier work items
> don't spontaneously happen on an idle core.  Something, including
> timer, needs to kick it.  So, by definition, a CPU already isn't idle
> when a work item starts execution on it.  What am I missing here?

We are talking about a core being idle from schedulers perspective :)

>> I have another idea that we can try:
>>
>> queue_work_on_any_cpu().
>>
>> With this we would not break any existing code and can try to migrate
>> old users to
>> this new infrastructure (atleast the ones which are rearming works from their
>> work_handlers). What do you say?
>
> Yeah, this could be a better solution, I think.  Plus, it's not like
> finding the optimal cpu is free.

Thanks for the first part (When i shared this idea with Vincent and Amit, i
wasn't sure at all about the feedback i will get from you and others, but i
am very happy now :) ).

I couldn't understand the second part. We still need to search for a free cpu
for this new routine. And the implementation would almost be same as the
implementation of queue_work() in my initial patch

>> To take care of the cache locality issue, we can pass an argument to
>> this routine,
>> that can provide
>> - the mask of cpus to schedule this work on
>>   OR
>> - Sched Level (SD_LEVEL) of cpus to run it.
>
> Let's start simple for now.  If we really need it, we can always add
> more later.

:)
Agreed. But i liked the idea from steven, we can have two routines:
queue_work_on_any_cpu() and queue_work_on_cpus()
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

On 7 January 2013 18:58, Steven Rostedt  wrote:
> On Mon, 2013-01-07 at 15:28 +0530, Viresh Kumar wrote:
>> I have another idea that we can try:
>>
>> queue_work_on_any_cpu().
>
> I think this is a good idea.

:) :)

>> - the mask of cpus to schedule this work on
>>   OR
>> - Sched Level (SD_LEVEL) of cpus to run it.
>
> I wouldn't give a mask. If one is needed, we could have a
> queue_work_on_mask_cpus(), or something. I think the "any" in the name
> should be good enough to let developers know that this will not be on
> the CPU that is called.

Fair Enough.

> By default, I would suggest for cache locality,
> that we try to keep it on the same CPU. But if there's a better CPU to
> run on, it runs there.

That would break our intention behind this routine. We should run
it on a cpu which we think is the best one for it power/performance wise.

So, i would like to call the sched_select_cpu() routine from this routine to
get the suggested cpu.

This routine would however would see more changes later to optimize it
more.

> Also, we could still add a debug option that
> makes it always run on other CPUs to slap developers that don't read.

I don't think we need it :(
It would be a new API, with zero existing users. And so, whomsoever uses
it, should know what exactly he is doing :)

Thanks for your quick feedback.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

2013-01-07 Thread Amit Kucheria

On Mon, Jan 7, 2013 at 8:34 PM, Tejun Heo  wrote:
> Hello, Viresh.
>
> On Mon, Jan 07, 2013 at 03:28:33PM +0530, Viresh Kumar wrote:
>> Firstly the root cause of this patchset.
>>
>> Myself and some others in Linaro are working on ARM future cores:
>> big.LITTLE systems.
>> Here we have few very powerful, high power consuming cores (big,
>> currently A15's) and
>> few very efficient ones (LITTLE, currently A7's).
>>
>> The ultimate goal is to save as much power as possible without compromising
>> much with performance. For, that we wanted most of the stuff to run on LITTLE
>> cores and some performance-demanding stuff on big Cores. There are
>> multiple things
>> going around in this direction. Now, we thought A15's or big cores
>> shouldn't be used
>> for running small tasks like timers/workqueues and hence this patch is
>> an attempt
>> towards reaching that goal.
>
> I see.  My understanding of big.little is very superficial so there
> are very good chances that I'm utterly wrong, but to me it seems like
> more of a stop-gap solution than something which will have staying
> power in the sense that the complexity doesn't seem to have any
> inherent reason other than the failure to make the big ones power
> efficient enough while idle or under minor load.

The two processors use different manufacturing processes - one
optimised for performance, the other for really low power. So the
reason is physics at this point. Other architectures are known to be
playing with similar schemes. ARM's big.LITTLE is just the first one
to the market.

> Maybe this isn't specific to big.little and heterogeneous cores will
> be the way of future with big.little being just a forefront of the
> trend.  And even if that's not the case, it definitely doesn't mean
> that we can't accomodate big.little, but, at this point, it does make
> me a bit more reluctant to make wholesale changes specifically for
> big.little.

The patches aren't targeted to benefit only b.L systems though it was
definitely the trigger for our investigations. Our hope is that after
presenting more analysis results from our side we'll be able to
convince the community of the general usefulness of this feature.

Here are a few short introductions to big.LITTLE in case you're
interested.[1][2]

[1] http://www.arm.com/files/downloads/big.LITTLE_Final.pdf
[2] http://lwn.net/Articles/481055/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

2013-01-07 Thread Tejun Heo

Hello, Viresh.

On Mon, Jan 07, 2013 at 03:28:33PM +0530, Viresh Kumar wrote:
> Firstly the root cause of this patchset.
> 
> Myself and some others in Linaro are working on ARM future cores:
> big.LITTLE systems.
> Here we have few very powerful, high power consuming cores (big,
> currently A15's) and
> few very efficient ones (LITTLE, currently A7's).
> 
> The ultimate goal is to save as much power as possible without compromising
> much with performance. For, that we wanted most of the stuff to run on LITTLE
> cores and some performance-demanding stuff on big Cores. There are
> multiple things
> going around in this direction. Now, we thought A15's or big cores
> shouldn't be used
> for running small tasks like timers/workqueues and hence this patch is
> an attempt
> towards reaching that goal.

I see.  My understanding of big.little is very superficial so there
are very good chances that I'm utterly wrong, but to me it seems like
more of a stop-gap solution than something which will have staying
power in the sense that the complexity doesn't seem to have any
inherent reason other than the failure to make the big ones power
efficient enough while idle or under minor load.

Maybe this isn't specific to big.little and heterogeneous cores will
be the way of future with big.little being just a forefront of the
trend.  And even if that's not the case, it definitely doesn't mean
that we can't accomodate big.little, but, at this point, it does make
me a bit more reluctant to make wholesale changes specifically for
big.little.

> Over that we can do some load balancing of works within multiple alive
> cpus, so that
> it can get done quickly. Also, we shouldn't start using an idle cpu
> just for processing
> work :)

The latter part "not using idle cpu just for processing work" does
apply to homogeneous systems too but as I wrote earlier work items
don't spontaneously happen on an idle core.  Something, including
timer, needs to kick it.  So, by definition, a CPU already isn't idle
when a work item starts execution on it.  What am I missing here?

> I have another idea that we can try:
> 
> queue_work_on_any_cpu().
>
> With this we would not break any existing code and can try to migrate
> old users to
> this new infrastructure (atleast the ones which are rearming works from their
> work_handlers). What do you say?

Yeah, this could be a better solution, I think.  Plus, it's not like
finding the optimal cpu is free.

> To take care of the cache locality issue, we can pass an argument to
> this routine,
> that can provide
> - the mask of cpus to schedule this work on
>   OR
> - Sched Level (SD_LEVEL) of cpus to run it.

Let's start simple for now.  If we really need it, we can always add
more later.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

On Mon, 2013-01-07 at 15:28 +0530, Viresh Kumar wrote:
> Hi Tejun,
> 
> On 4 January 2013 20:39, Tejun Heo  wrote:
> > I don't know either.  Changing behavior subtly like this is hard.  I
> > usually try to spot some problem cases and try to identify patterns
> > there.  Once you identify a few of them, understanding and detecting
> > other problem cases get a lot easier.  In this case, maybe there are
> > too many places to audit and the problems are too subtle, and, if we
> > *have* to do it, the only thing we can do is implementing a debug
> > option to make such problems more visible - say, always schedule to a
> > different cpu on queue_work().
> >
> > That said, at this point, the patchset doesn't seem all that
> > convincing to me and if I'm comprehending responses from others
> > correctly that seems to be the consensus.  It might be a better
> > approach to identify the specific offending workqueue users and make
> > them handle the situation more intelligently than trying to impose the
> > behavior on all workqueue users.  At any rate, we need way more data
> > showing this actually helps and if so why.
> 
> I understand your concerns and believe me, even i feel the same :)
> I had another idea, that i wanted to share.
> 
> Firstly the root cause of this patchset.
> 
> Myself and some others in Linaro are working on ARM future cores:
> big.LITTLE systems.
> Here we have few very powerful, high power consuming cores (big,
> currently A15's) and
> few very efficient ones (LITTLE, currently A7's).
> 
> The ultimate goal is to save as much power as possible without compromising
> much with performance. For, that we wanted most of the stuff to run on LITTLE
> cores and some performance-demanding stuff on big Cores. There are
> multiple things
> going around in this direction. Now, we thought A15's or big cores
> shouldn't be used
> for running small tasks like timers/workqueues and hence this patch is
> an attempt
> towards reaching that goal.
> 
> Over that we can do some load balancing of works within multiple alive
> cpus, so that
> it can get done quickly. Also, we shouldn't start using an idle cpu
> just for processing
> work :)
> 
> I have another idea that we can try:
> 
> queue_work_on_any_cpu().

I think this is a good idea.

> 
> With this we would not break any existing code and can try to migrate
> old users to
> this new infrastructure (atleast the ones which are rearming works from their
> work_handlers). What do you say?
> 
> To take care of the cache locality issue, we can pass an argument to
> this routine,
> that can provide
> - the mask of cpus to schedule this work on
>   OR
> - Sched Level (SD_LEVEL) of cpus to run it.

I wouldn't give a mask. If one is needed, we could have a
queue_work_on_mask_cpus(), or something. I think the "any" in the name
should be good enough to let developers know that this will not be on
the CPU that is called. By default, I would suggest for cache locality,
that we try to keep it on the same CPU. But if there's a better CPU to
run on, it runs there. Also, we could still add a debug option that
makes it always run on other CPUs to slap developers that don't read.

-- Steve

> 
> Waiting for your view on it :)
> 
> --
> viresh


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

Hi Tejun,

On 4 January 2013 20:39, Tejun Heo  wrote:
> I don't know either.  Changing behavior subtly like this is hard.  I
> usually try to spot some problem cases and try to identify patterns
> there.  Once you identify a few of them, understanding and detecting
> other problem cases get a lot easier.  In this case, maybe there are
> too many places to audit and the problems are too subtle, and, if we
> *have* to do it, the only thing we can do is implementing a debug
> option to make such problems more visible - say, always schedule to a
> different cpu on queue_work().
>
> That said, at this point, the patchset doesn't seem all that
> convincing to me and if I'm comprehending responses from others
> correctly that seems to be the consensus.  It might be a better
> approach to identify the specific offending workqueue users and make
> them handle the situation more intelligently than trying to impose the
> behavior on all workqueue users.  At any rate, we need way more data
> showing this actually helps and if so why.

I understand your concerns and believe me, even i feel the same :)
I had another idea, that i wanted to share.

Firstly the root cause of this patchset.

Myself and some others in Linaro are working on ARM future cores:
big.LITTLE systems.
Here we have few very powerful, high power consuming cores (big,
currently A15's) and
few very efficient ones (LITTLE, currently A7's).

The ultimate goal is to save as much power as possible without compromising
much with performance. For, that we wanted most of the stuff to run on LITTLE
cores and some performance-demanding stuff on big Cores. There are
multiple things
going around in this direction. Now, we thought A15's or big cores
shouldn't be used
for running small tasks like timers/workqueues and hence this patch is
an attempt
towards reaching that goal.

Over that we can do some load balancing of works within multiple alive
cpus, so that
it can get done quickly. Also, we shouldn't start using an idle cpu
just for processing
work :)

I have another idea that we can try:

queue_work_on_any_cpu().

With this we would not break any existing code and can try to migrate
old users to
this new infrastructure (atleast the ones which are rearming works from their
work_handlers). What do you say?

To take care of the cache locality issue, we can pass an argument to
this routine,
that can provide
- the mask of cpus to schedule this work on
  OR
- Sched Level (SD_LEVEL) of cpus to run it.

Waiting for your view on it :)

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

Hi Tejun,

On 4 January 2013 20:39, Tejun Heo t...@kernel.org wrote:
 I don't know either.  Changing behavior subtly like this is hard.  I
 usually try to spot some problem cases and try to identify patterns
 there.  Once you identify a few of them, understanding and detecting
 other problem cases get a lot easier.  In this case, maybe there are
 too many places to audit and the problems are too subtle, and, if we
 *have* to do it, the only thing we can do is implementing a debug
 option to make such problems more visible - say, always schedule to a
 different cpu on queue_work().

 That said, at this point, the patchset doesn't seem all that
 convincing to me and if I'm comprehending responses from others
 correctly that seems to be the consensus.  It might be a better
 approach to identify the specific offending workqueue users and make
 them handle the situation more intelligently than trying to impose the
 behavior on all workqueue users.  At any rate, we need way more data
 showing this actually helps and if so why.

I understand your concerns and believe me, even i feel the same :)
I had another idea, that i wanted to share.

Firstly the root cause of this patchset.

Myself and some others in Linaro are working on ARM future cores:
big.LITTLE systems.
Here we have few very powerful, high power consuming cores (big,
currently A15's) and
few very efficient ones (LITTLE, currently A7's).

The ultimate goal is to save as much power as possible without compromising
much with performance. For, that we wanted most of the stuff to run on LITTLE
cores and some performance-demanding stuff on big Cores. There are
multiple things
going around in this direction. Now, we thought A15's or big cores
shouldn't be used
for running small tasks like timers/workqueues and hence this patch is
an attempt
towards reaching that goal.

Over that we can do some load balancing of works within multiple alive
cpus, so that
it can get done quickly. Also, we shouldn't start using an idle cpu
just for processing
work :)

I have another idea that we can try:

queue_work_on_any_cpu().

With this we would not break any existing code and can try to migrate
old users to
this new infrastructure (atleast the ones which are rearming works from their
work_handlers). What do you say?

To take care of the cache locality issue, we can pass an argument to
this routine,
that can provide
- the mask of cpus to schedule this work on
  OR
- Sched Level (SD_LEVEL) of cpus to run it.

Waiting for your view on it :)

--
viresh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

On Mon, 2013-01-07 at 15:28 +0530, Viresh Kumar wrote:
 Hi Tejun,
 
 On 4 January 2013 20:39, Tejun Heo t...@kernel.org wrote:
  I don't know either.  Changing behavior subtly like this is hard.  I
  usually try to spot some problem cases and try to identify patterns
  there.  Once you identify a few of them, understanding and detecting
  other problem cases get a lot easier.  In this case, maybe there are
  too many places to audit and the problems are too subtle, and, if we
  *have* to do it, the only thing we can do is implementing a debug
  option to make such problems more visible - say, always schedule to a
  different cpu on queue_work().
 
  That said, at this point, the patchset doesn't seem all that
  convincing to me and if I'm comprehending responses from others
  correctly that seems to be the consensus.  It might be a better
  approach to identify the specific offending workqueue users and make
  them handle the situation more intelligently than trying to impose the
  behavior on all workqueue users.  At any rate, we need way more data
  showing this actually helps and if so why.
 
 I understand your concerns and believe me, even i feel the same :)
 I had another idea, that i wanted to share.
 
 Firstly the root cause of this patchset.
 
 Myself and some others in Linaro are working on ARM future cores:
 big.LITTLE systems.
 Here we have few very powerful, high power consuming cores (big,
 currently A15's) and
 few very efficient ones (LITTLE, currently A7's).
 
 The ultimate goal is to save as much power as possible without compromising
 much with performance. For, that we wanted most of the stuff to run on LITTLE
 cores and some performance-demanding stuff on big Cores. There are
 multiple things
 going around in this direction. Now, we thought A15's or big cores
 shouldn't be used
 for running small tasks like timers/workqueues and hence this patch is
 an attempt
 towards reaching that goal.
 
 Over that we can do some load balancing of works within multiple alive
 cpus, so that
 it can get done quickly. Also, we shouldn't start using an idle cpu
 just for processing
 work :)
 
 I have another idea that we can try:
 
 queue_work_on_any_cpu().

I think this is a good idea.

 
 With this we would not break any existing code and can try to migrate
 old users to
 this new infrastructure (atleast the ones which are rearming works from their
 work_handlers). What do you say?
 
 To take care of the cache locality issue, we can pass an argument to
 this routine,
 that can provide
 - the mask of cpus to schedule this work on
   OR
 - Sched Level (SD_LEVEL) of cpus to run it.

I wouldn't give a mask. If one is needed, we could have a
queue_work_on_mask_cpus(), or something. I think the any in the name
should be good enough to let developers know that this will not be on
the CPU that is called. By default, I would suggest for cache locality,
that we try to keep it on the same CPU. But if there's a better CPU to
run on, it runs there. Also, we could still add a debug option that
makes it always run on other CPUs to slap developers that don't read.

-- Steve

 
 Waiting for your view on it :)
 
 --
 viresh


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

2013-01-07 Thread Tejun Heo

Hello, Viresh.

On Mon, Jan 07, 2013 at 03:28:33PM +0530, Viresh Kumar wrote:
 Firstly the root cause of this patchset.
 
 Myself and some others in Linaro are working on ARM future cores:
 big.LITTLE systems.
 Here we have few very powerful, high power consuming cores (big,
 currently A15's) and
 few very efficient ones (LITTLE, currently A7's).
 
 The ultimate goal is to save as much power as possible without compromising
 much with performance. For, that we wanted most of the stuff to run on LITTLE
 cores and some performance-demanding stuff on big Cores. There are
 multiple things
 going around in this direction. Now, we thought A15's or big cores
 shouldn't be used
 for running small tasks like timers/workqueues and hence this patch is
 an attempt
 towards reaching that goal.

I see.  My understanding of big.little is very superficial so there
are very good chances that I'm utterly wrong, but to me it seems like
more of a stop-gap solution than something which will have staying
power in the sense that the complexity doesn't seem to have any
inherent reason other than the failure to make the big ones power
efficient enough while idle or under minor load.

Maybe this isn't specific to big.little and heterogeneous cores will
be the way of future with big.little being just a forefront of the
trend.  And even if that's not the case, it definitely doesn't mean
that we can't accomodate big.little, but, at this point, it does make
me a bit more reluctant to make wholesale changes specifically for
big.little.

 Over that we can do some load balancing of works within multiple alive
 cpus, so that
 it can get done quickly. Also, we shouldn't start using an idle cpu
 just for processing
 work :)

The latter part not using idle cpu just for processing work does
apply to homogeneous systems too but as I wrote earlier work items
don't spontaneously happen on an idle core.  Something, including
timer, needs to kick it.  So, by definition, a CPU already isn't idle
when a work item starts execution on it.  What am I missing here?

 I have another idea that we can try:
 
 queue_work_on_any_cpu().

 With this we would not break any existing code and can try to migrate
 old users to
 this new infrastructure (atleast the ones which are rearming works from their
 work_handlers). What do you say?

Yeah, this could be a better solution, I think.  Plus, it's not like
finding the optimal cpu is free.

 To take care of the cache locality issue, we can pass an argument to
 this routine,
 that can provide
 - the mask of cpus to schedule this work on
   OR
 - Sched Level (SD_LEVEL) of cpus to run it.

Let's start simple for now.  If we really need it, we can always add
more later.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

2013-01-07 Thread Amit Kucheria

On Mon, Jan 7, 2013 at 8:34 PM, Tejun Heo t...@kernel.org wrote:
 Hello, Viresh.

 On Mon, Jan 07, 2013 at 03:28:33PM +0530, Viresh Kumar wrote:
 Firstly the root cause of this patchset.

 Myself and some others in Linaro are working on ARM future cores:
 big.LITTLE systems.
 Here we have few very powerful, high power consuming cores (big,
 currently A15's) and
 few very efficient ones (LITTLE, currently A7's).

 The ultimate goal is to save as much power as possible without compromising
 much with performance. For, that we wanted most of the stuff to run on LITTLE
 cores and some performance-demanding stuff on big Cores. There are
 multiple things
 going around in this direction. Now, we thought A15's or big cores
 shouldn't be used
 for running small tasks like timers/workqueues and hence this patch is
 an attempt
 towards reaching that goal.

 I see.  My understanding of big.little is very superficial so there
 are very good chances that I'm utterly wrong, but to me it seems like
 more of a stop-gap solution than something which will have staying
 power in the sense that the complexity doesn't seem to have any
 inherent reason other than the failure to make the big ones power
 efficient enough while idle or under minor load.

The two processors use different manufacturing processes - one
optimised for performance, the other for really low power. So the
reason is physics at this point. Other architectures are known to be
playing with similar schemes. ARM's big.LITTLE is just the first one
to the market.

 Maybe this isn't specific to big.little and heterogeneous cores will
 be the way of future with big.little being just a forefront of the
 trend.  And even if that's not the case, it definitely doesn't mean
 that we can't accomodate big.little, but, at this point, it does make
 me a bit more reluctant to make wholesale changes specifically for
 big.little.

The patches aren't targeted to benefit only b.L systems though it was
definitely the trigger for our investigations. Our hope is that after
presenting more analysis results from our side we'll be able to
convince the community of the general usefulness of this feature.

Here are a few short introductions to big.LITTLE in case you're
interested.[1][2]

[1] http://www.arm.com/files/downloads/big.LITTLE_Final.pdf
[2] http://lwn.net/Articles/481055/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

On 7 January 2013 18:58, Steven Rostedt rost...@goodmis.org wrote:
 On Mon, 2013-01-07 at 15:28 +0530, Viresh Kumar wrote:
 I have another idea that we can try:

 queue_work_on_any_cpu().

 I think this is a good idea.

:) :)

 - the mask of cpus to schedule this work on
   OR
 - Sched Level (SD_LEVEL) of cpus to run it.

 I wouldn't give a mask. If one is needed, we could have a
 queue_work_on_mask_cpus(), or something. I think the any in the name
 should be good enough to let developers know that this will not be on
 the CPU that is called.

Fair Enough.

 By default, I would suggest for cache locality,
 that we try to keep it on the same CPU. But if there's a better CPU to
 run on, it runs there.

That would break our intention behind this routine. We should run
it on a cpu which we think is the best one for it power/performance wise.

So, i would like to call the sched_select_cpu() routine from this routine to
get the suggested cpu.

This routine would however would see more changes later to optimize it
more.

 Also, we could still add a debug option that
 makes it always run on other CPUs to slap developers that don't read.

I don't think we need it :(
It would be a new API, with zero existing users. And so, whomsoever uses
it, should know what exactly he is doing :)

Thanks for your quick feedback.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

[Removed Suresh and Venki from discussion, they switched their companies
probably]

On 7 January 2013 20:34, Tejun Heo t...@kernel.org wrote:
 The latter part not using idle cpu just for processing work does
 apply to homogeneous systems too but as I wrote earlier work items
 don't spontaneously happen on an idle core.  Something, including
 timer, needs to kick it.  So, by definition, a CPU already isn't idle
 when a work item starts execution on it.  What am I missing here?

We are talking about a core being idle from schedulers perspective :)

 I have another idea that we can try:

 queue_work_on_any_cpu().

 With this we would not break any existing code and can try to migrate
 old users to
 this new infrastructure (atleast the ones which are rearming works from their
 work_handlers). What do you say?

 Yeah, this could be a better solution, I think.  Plus, it's not like
 finding the optimal cpu is free.

Thanks for the first part (When i shared this idea with Vincent and Amit, i
wasn't sure at all about the feedback i will get from you and others, but i
am very happy now :) ).

I couldn't understand the second part. We still need to search for a free cpu
for this new routine. And the implementation would almost be same as the
implementation of queue_work() in my initial patch

 To take care of the cache locality issue, we can pass an argument to
 this routine,
 that can provide
 - the mask of cpus to schedule this work on
   OR
 - Sched Level (SD_LEVEL) of cpus to run it.

 Let's start simple for now.  If we really need it, we can always add
 more later.

:)
Agreed. But i liked the idea from steven, we can have two routines:
queue_work_on_any_cpu() and queue_work_on_cpus()
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

On Mon, 2013-01-07 at 23:29 +0530, Viresh Kumar wrote:

  By default, I would suggest for cache locality,
  that we try to keep it on the same CPU. But if there's a better CPU to
  run on, it runs there.
 
 That would break our intention behind this routine. We should run
 it on a cpu which we think is the best one for it power/performance wise.

If you are running on a big.Little box sure. But for normal (read x86 ;)
systems, it should probably stay on the current CPU.

 
 So, i would like to call the sched_select_cpu() routine from this routine to
 get the suggested cpu.

Does the caller need to know? Or if you have a big.LITTLE setup, it
should just work automagically?

 
 This routine would however would see more changes later to optimize it
 more.
 
  Also, we could still add a debug option that
  makes it always run on other CPUs to slap developers that don't read.
 
 I don't think we need it :(
 It would be a new API, with zero existing users. And so, whomsoever uses
 it, should know what exactly he is doing :)

Heh, you don't know kernel developers well do you? ;-)

Once a new API is added to the kernel tree, it quickly becomes
(mis)used.

-- Steve


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

Hi Steven,

On 8 January 2013 03:59, Steven Rostedt rost...@goodmis.org wrote:
 On Mon, 2013-01-07 at 23:29 +0530, Viresh Kumar wrote:

  By default, I would suggest for cache locality,
  that we try to keep it on the same CPU. But if there's a better CPU to
  run on, it runs there.

 That would break our intention behind this routine. We should run
 it on a cpu which we think is the best one for it power/performance wise.

 If you are running on a big.Little box sure. But for normal (read x86 ;)
 systems, it should probably stay on the current CPU.

But that's not the purpose of this new call. If the caller want it to be on
local cpu, he must not use this call. It is upto the core routine
(sched_select_cpu()
in our case) to decide where to launch it. If we need something special for x86,
we can hack this routine.

Even for non bigLITTLE systems, we may want to run a work on non-idle cpu and
so we can't guarantee local cpu here.

 So, i would like to call the sched_select_cpu() routine from this routine to
 get the suggested cpu.

 Does the caller need to know? Or if you have a big.LITTLE setup, it
 should just work automagically?

Caller wouldn't know, he just needs to trust on sched_select_cpu() to give the
most optimum cpu.

Again, it is not only for big LITTLE, but for any SMP system, where we
don't want
an idle cpu to do this work.

 I don't think we need it :(
 It would be a new API, with zero existing users. And so, whomsoever uses
 it, should know what exactly he is doing :)

 Heh, you don't know kernel developers well do you? ;-)

I agree with you, but we don't need to care for foolish new code here.

 Once a new API is added to the kernel tree, it quickly becomes
 (mis)used.

Its true with all pieces of code in kernel and we really don't need to take
care of such users here :)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

2013-01-04 Thread Tejun Heo

Hello, Viresh.

On Fri, Jan 04, 2013 at 04:41:47PM +0530, Viresh Kumar wrote:
> I got a list of files where cpu/processor_id strings are found, which
> may break with
> this patch (still can't guarantee as i don't have knowledge of these 
> drivers)...
...
> I am not sure what to do now :) , can you assist ?

I don't know either.  Changing behavior subtly like this is hard.  I
usually try to spot some problem cases and try to identify patterns
there.  Once you identify a few of them, understanding and detecting
other problem cases get a lot easier.  In this case, maybe there are
too many places to audit and the problems are too subtle, and, if we
*have* to do it, the only thing we can do is implementing a debug
option to make such problems more visible - say, always schedule to a
different cpu on queue_work().

That said, at this point, the patchset doesn't seem all that
convincing to me and if I'm comprehending responses from others
correctly that seems to be the consensus.  It might be a better
approach to identify the specific offending workqueue users and make
them handle the situation more intelligently than trying to impose the
behavior on all workqueue users.  At any rate, we need way more data
showing this actually helps and if so why.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

2013-01-04 Thread Viresh Kumar

Hi Tejun,

On 27 November 2012 10:49, Viresh Kumar  wrote:
> On 26 November 2012 22:45, Tejun Heo  wrote:
>> On Tue, Nov 06, 2012 at 04:08:45PM +0530, Viresh Kumar wrote:
>
>> I'm pretty skeptical about this.  queue_work() w/o explicit CPU
>> assignment has always guaranteed that the work item will be executed
>> on the same CPU.  I don't think there are too many users depending on
>> that but am not sure at all that there are none.  I asked you last
>> time that you would at the very least need to audit most users but it
>> doesn't seem like there has been any effort there.
>
> My bad. I completely missed/forgot that comment from your earlier mails.
> Will do it.

And this is the first thing i did this year :)

So there are almost 1200 files which are using: queue_work, queue_delayed_work,
schedule_work, schedule_delayed_work or schedule_on_each_cpu

Obviously i can't try to understand all of them :) , and so i tried to
search two
strings in them: "cpu" or "processor_id". So, this would catch every
file of these 1200
odd files which use some variables/comment/code with cpu or smp_processor_id or
raw_processor_id, etc..

I got around 650 files with these two searches.. Then i went into
these files to see if
there is anything noticeable, which may break due to this patch...

I got a list of files where cpu/processor_id strings are found, which
may break with
this patch (still can't guarantee as i don't have knowledge of these drivers)...

- arch/powerpc/mm/numa.c
- arch/s390/appldata/appldata_base.c
- arch/s390/kernel/time.c
- arch/s390/kernel/topology.c
- arch/s390/oprofile/hwsampler.c
- arch/x86/kernel/cpu/mcheck/mce.c
- arch/x86/kernel/tsc.c
- arch/x86/kvm/x86.c
- block/blk-core.c
- block/blk-throttle.c
- block/blk-genhd.c
- crypto/cryptd.c
- drivers/base/power/domain.c
- drivers/cpufreq/cpufreq.c
- drivers/hv/vmbus_drv.c
- drivers/infiniband/hw/qib/qib_iba7322.c and
drivers/infiniband/hw/qib/qib_init.c
- drivers/md/dm-crypt.c
- drivers/md/md.c
- drivers/net/ethernet/sfc/efx.c
- drivers/net/ethernet/tile/tilepro.c
- drivers/net/team/team_mode_loadbalance.c
- drivers/oprofile/cpu_buffer.c
- drivers/s390/kvm/kvm_virtio.c
- drivers/scsi/fcoe/fcoe.c
- drivers/tty/sysrq.c
- drivers/xen/pcpu.c
- fs/file.c, file_table.c, fs/fscache/object.c
- include/linux/stop_machine.h, kernel/stop_machine.c
- mm/percpu.c
- mm/slab.c
- mm/vmstat.c
- net/core/flow.c
- net/iucv/iucv.c

I am not sure what to do now :) , can you assist ?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

2013-01-04 Thread Viresh Kumar

Hi Tejun,

On 27 November 2012 10:49, Viresh Kumar viresh.ku...@linaro.org wrote:
 On 26 November 2012 22:45, Tejun Heo t...@kernel.org wrote:
 On Tue, Nov 06, 2012 at 04:08:45PM +0530, Viresh Kumar wrote:

 I'm pretty skeptical about this.  queue_work() w/o explicit CPU
 assignment has always guaranteed that the work item will be executed
 on the same CPU.  I don't think there are too many users depending on
 that but am not sure at all that there are none.  I asked you last
 time that you would at the very least need to audit most users but it
 doesn't seem like there has been any effort there.

 My bad. I completely missed/forgot that comment from your earlier mails.
 Will do it.

And this is the first thing i did this year :)

So there are almost 1200 files which are using: queue_work, queue_delayed_work,
schedule_work, schedule_delayed_work or schedule_on_each_cpu

Obviously i can't try to understand all of them :) , and so i tried to
search two
strings in them: cpu or processor_id. So, this would catch every
file of these 1200
odd files which use some variables/comment/code with cpu or smp_processor_id or
raw_processor_id, etc..

I got around 650 files with these two searches.. Then i went into
these files to see if
there is anything noticeable, which may break due to this patch...

I got a list of files where cpu/processor_id strings are found, which
may break with
this patch (still can't guarantee as i don't have knowledge of these drivers)...

- arch/powerpc/mm/numa.c
- arch/s390/appldata/appldata_base.c
- arch/s390/kernel/time.c
- arch/s390/kernel/topology.c
- arch/s390/oprofile/hwsampler.c
- arch/x86/kernel/cpu/mcheck/mce.c
- arch/x86/kernel/tsc.c
- arch/x86/kvm/x86.c
- block/blk-core.c
- block/blk-throttle.c
- block/blk-genhd.c
- crypto/cryptd.c
- drivers/base/power/domain.c
- drivers/cpufreq/cpufreq.c
- drivers/hv/vmbus_drv.c
- drivers/infiniband/hw/qib/qib_iba7322.c and
drivers/infiniband/hw/qib/qib_init.c
- drivers/md/dm-crypt.c
- drivers/md/md.c
- drivers/net/ethernet/sfc/efx.c
- drivers/net/ethernet/tile/tilepro.c
- drivers/net/team/team_mode_loadbalance.c
- drivers/oprofile/cpu_buffer.c
- drivers/s390/kvm/kvm_virtio.c
- drivers/scsi/fcoe/fcoe.c
- drivers/tty/sysrq.c
- drivers/xen/pcpu.c
- fs/file.c, file_table.c, fs/fscache/object.c
- include/linux/stop_machine.h, kernel/stop_machine.c
- mm/percpu.c
- mm/slab.c
- mm/vmstat.c
- net/core/flow.c
- net/iucv/iucv.c

I am not sure what to do now :) , can you assist ?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

2013-01-04 Thread Tejun Heo

Hello, Viresh.

On Fri, Jan 04, 2013 at 04:41:47PM +0530, Viresh Kumar wrote:
 I got a list of files where cpu/processor_id strings are found, which
 may break with
 this patch (still can't guarantee as i don't have knowledge of these 
 drivers)...
...
 I am not sure what to do now :) , can you assist ?

I don't know either.  Changing behavior subtly like this is hard.  I
usually try to spot some problem cases and try to identify patterns
there.  Once you identify a few of them, understanding and detecting
other problem cases get a lot easier.  In this case, maybe there are
too many places to audit and the problems are too subtle, and, if we
*have* to do it, the only thing we can do is implementing a debug
option to make such problems more visible - say, always schedule to a
different cpu on queue_work().

That said, at this point, the patchset doesn't seem all that
convincing to me and if I'm comprehending responses from others
correctly that seems to be the consensus.  It might be a better
approach to identify the specific offending workqueue users and make
them handle the situation more intelligently than trying to impose the
behavior on all workqueue users.  At any rate, we need way more data
showing this actually helps and if so why.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

On 27 November 2012 16:04, Steven Rostedt  wrote:
> On Tue, 2012-11-27 at 15:55 +0100, Vincent Guittot wrote:
>> On 27 November 2012 14:59, Steven Rostedt  wrote:
>> > On Tue, 2012-11-27 at 19:18 +0530, Viresh Kumar wrote:
>> >> On 27 November 2012 18:56, Steven Rostedt  wrote:
>> >> > A couple of things. The sched_select_cpu() is not cheap. It has a double
>> >> > loop of domains/cpus looking for a non idle cpu. If we have 1024 CPUs,
>> >> > and we are CPU 1023 and all other CPUs happen to be idle, we could be
>> >> > searching 1023 CPUs before we come up with our own.
>> >>
>> >> Not sure if you missed the first check sched_select_cpu()
>> >>
>> >> +int sched_select_cpu(unsigned int sd_flags)
>> >> +{
>> >> +   /* If Current cpu isn't idle, don't migrate anything */
>> >> +   if (!idle_cpu(cpu))
>> >> +   return cpu;
>> >>
>> >> We aren't going to search if we aren't idle.
>> >
>> > OK, we are idle, but CPU 1022 isn't. We still need a large search. But,
>> > heh we are idle we can spin. But then why go through this in the first
>> > place ;-)
>>
>> By migrating it now, it will create its activity and wake up on the
>> right CPU next time.
>>
>> If migrating on any CPUs seems a bit risky, we could restrict the
>> migration on a CPU on the same node. We can pass such contraints on
>> sched_select_cpu
>>
>
> That's assuming that the CPUs stay idle. Now if we move the work to
> another CPU and it goes idle, then it may move that again. It could end
> up being a ping pong approach.
>
> I don't think idle is a strong enough heuristic for the general case. If
> interrupts are constantly going off on a CPU that happens to be idle
> most of the time, it will constantly be moving work onto CPUs that are
> currently doing real work, and by doing so, it will be slowing those
> CPUs down.
>

I agree that idle is probably not enough but it's the heuristic that
is currently used for selecting a CPU for a timer and the timer also
uses sched_select_cpu in this series. So in order to go step by step,
a common interface has been introduced for selecting a CPU and this
function uses the same algorithm than the timer already do.
Once we agreed on an interface, the heuristic could be updated.


> -- Steve
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

On Tue, 2012-11-27 at 15:55 +0100, Vincent Guittot wrote:
> On 27 November 2012 14:59, Steven Rostedt  wrote:
> > On Tue, 2012-11-27 at 19:18 +0530, Viresh Kumar wrote:
> >> On 27 November 2012 18:56, Steven Rostedt  wrote:
> >> > A couple of things. The sched_select_cpu() is not cheap. It has a double
> >> > loop of domains/cpus looking for a non idle cpu. If we have 1024 CPUs,
> >> > and we are CPU 1023 and all other CPUs happen to be idle, we could be
> >> > searching 1023 CPUs before we come up with our own.
> >>
> >> Not sure if you missed the first check sched_select_cpu()
> >>
> >> +int sched_select_cpu(unsigned int sd_flags)
> >> +{
> >> +   /* If Current cpu isn't idle, don't migrate anything */
> >> +   if (!idle_cpu(cpu))
> >> +   return cpu;
> >>
> >> We aren't going to search if we aren't idle.
> >
> > OK, we are idle, but CPU 1022 isn't. We still need a large search. But,
> > heh we are idle we can spin. But then why go through this in the first
> > place ;-)
> 
> By migrating it now, it will create its activity and wake up on the
> right CPU next time.
> 
> If migrating on any CPUs seems a bit risky, we could restrict the
> migration on a CPU on the same node. We can pass such contraints on
> sched_select_cpu
> 

That's assuming that the CPUs stay idle. Now if we move the work to
another CPU and it goes idle, then it may move that again. It could end
up being a ping pong approach.

I don't think idle is a strong enough heuristic for the general case. If
interrupts are constantly going off on a CPU that happens to be idle
most of the time, it will constantly be moving work onto CPUs that are
currently doing real work, and by doing so, it will be slowing those
CPUs down.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

On 27 November 2012 14:59, Steven Rostedt  wrote:
> On Tue, 2012-11-27 at 19:18 +0530, Viresh Kumar wrote:
>> On 27 November 2012 18:56, Steven Rostedt  wrote:
>> > A couple of things. The sched_select_cpu() is not cheap. It has a double
>> > loop of domains/cpus looking for a non idle cpu. If we have 1024 CPUs,
>> > and we are CPU 1023 and all other CPUs happen to be idle, we could be
>> > searching 1023 CPUs before we come up with our own.
>>
>> Not sure if you missed the first check sched_select_cpu()
>>
>> +int sched_select_cpu(unsigned int sd_flags)
>> +{
>> +   /* If Current cpu isn't idle, don't migrate anything */
>> +   if (!idle_cpu(cpu))
>> +   return cpu;
>>
>> We aren't going to search if we aren't idle.
>
> OK, we are idle, but CPU 1022 isn't. We still need a large search. But,
> heh we are idle we can spin. But then why go through this in the first
> place ;-)

By migrating it now, it will create its activity and wake up on the
right CPU next time.

If migrating on any CPUs seems a bit risky, we could restrict the
migration on a CPU on the same node. We can pass such contraints on
sched_select_cpu

>
>
>>
>> > Also, I really don't like this as a default behavior. It seems that this
>> > solution is for a very special case, and this can become very intrusive
>> > for the normal case.
>>
>> We tried with an KCONFIG option for it, which Tejun rejected.
>
> Yeah, I saw that. I don't like adding KCONFIG options either. Best is to
> get something working that doesn't add any regressions. If you can get
> this to work without making *any* regressions in the normal case than
> I'm totally fine with that. But if this adds any issues with the normal
> case, then it's a show stopper.
>
>>
>> > To be honest, I'm uncomfortable with this approach. It seems to be
>> > fighting a symptom and not the disease. I'd rather find a way to keep
>> > work from being queued on wrong CPU. If it is a timer, find a way to
>> > move the timer. If it is something else, lets work to fix that. Doing
>> > searches of possibly all CPUs (unlikely, but it is there), just seems
>> > wrong to me.
>>
>> As Vincent pointed out, on big LITTLE systems we just don't want to
>> serve works on big cores. That would be wasting too much of power.
>> Specially if we are going to wake up big cores.
>>
>> It would be difficult to control the source driver (which queues work) to
>> little cores. We thought, if somebody wanted to queue work on current
>> cpu then they must use queue_work_on().
>
> As Tejun has mentioned earlier, is there any assumptions anywhere that
> expects an unbounded work queue to not migrate? Where per cpu variables
> might be used. Tejun had a good idea of forcing this to migrate the work
> *every* time. To not let a work queue run on the same CPU that it was
> queued on. If it can survive that, then it is probably OK. Maybe add a
> config option that forces this? That way, anyone can test that this
> isn't an issue.
>
> -- Steve
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

On Tue, 2012-11-27 at 19:18 +0530, Viresh Kumar wrote:
> On 27 November 2012 18:56, Steven Rostedt  wrote:
> > A couple of things. The sched_select_cpu() is not cheap. It has a double
> > loop of domains/cpus looking for a non idle cpu. If we have 1024 CPUs,
> > and we are CPU 1023 and all other CPUs happen to be idle, we could be
> > searching 1023 CPUs before we come up with our own.
> 
> Not sure if you missed the first check sched_select_cpu()
> 
> +int sched_select_cpu(unsigned int sd_flags)
> +{
> +   /* If Current cpu isn't idle, don't migrate anything */
> +   if (!idle_cpu(cpu))
> +   return cpu;
> 
> We aren't going to search if we aren't idle.

OK, we are idle, but CPU 1022 isn't. We still need a large search. But,
heh we are idle we can spin. But then why go through this in the first
place ;-)

> 
> > Also, I really don't like this as a default behavior. It seems that this
> > solution is for a very special case, and this can become very intrusive
> > for the normal case.
> 
> We tried with an KCONFIG option for it, which Tejun rejected.

Yeah, I saw that. I don't like adding KCONFIG options either. Best is to
get something working that doesn't add any regressions. If you can get
this to work without making *any* regressions in the normal case than
I'm totally fine with that. But if this adds any issues with the normal
case, then it's a show stopper.

> 
> > To be honest, I'm uncomfortable with this approach. It seems to be
> > fighting a symptom and not the disease. I'd rather find a way to keep
> > work from being queued on wrong CPU. If it is a timer, find a way to
> > move the timer. If it is something else, lets work to fix that. Doing
> > searches of possibly all CPUs (unlikely, but it is there), just seems
> > wrong to me.
> 
> As Vincent pointed out, on big LITTLE systems we just don't want to
> serve works on big cores. That would be wasting too much of power.
> Specially if we are going to wake up big cores.
> 
> It would be difficult to control the source driver (which queues work) to
> little cores. We thought, if somebody wanted to queue work on current
> cpu then they must use queue_work_on().

As Tejun has mentioned earlier, is there any assumptions anywhere that
expects an unbounded work queue to not migrate? Where per cpu variables
might be used. Tejun had a good idea of forcing this to migrate the work
*every* time. To not let a work queue run on the same CPU that it was
queued on. If it can survive that, then it is probably OK. Maybe add a
config option that forces this? That way, anyone can test that this
isn't an issue.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

2012-11-27 Thread Viresh Kumar

On 27 November 2012 18:56, Steven Rostedt  wrote:
> A couple of things. The sched_select_cpu() is not cheap. It has a double
> loop of domains/cpus looking for a non idle cpu. If we have 1024 CPUs,
> and we are CPU 1023 and all other CPUs happen to be idle, we could be
> searching 1023 CPUs before we come up with our own.

Not sure if you missed the first check sched_select_cpu()

+int sched_select_cpu(unsigned int sd_flags)
+{
+   /* If Current cpu isn't idle, don't migrate anything */
+   if (!idle_cpu(cpu))
+   return cpu;

We aren't going to search if we aren't idle.

> Also, I really don't like this as a default behavior. It seems that this
> solution is for a very special case, and this can become very intrusive
> for the normal case.

We tried with an KCONFIG option for it, which Tejun rejected.

> To be honest, I'm uncomfortable with this approach. It seems to be
> fighting a symptom and not the disease. I'd rather find a way to keep
> work from being queued on wrong CPU. If it is a timer, find a way to
> move the timer. If it is something else, lets work to fix that. Doing
> searches of possibly all CPUs (unlikely, but it is there), just seems
> wrong to me.

As Vincent pointed out, on big LITTLE systems we just don't want to
serve works on big cores. That would be wasting too much of power.
Specially if we are going to wake up big cores.

It would be difficult to control the source driver (which queues work) to
little cores. We thought, if somebody wanted to queue work on current
cpu then they must use queue_work_on().

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

On Tue, 2012-11-06 at 16:08 +0530, Viresh Kumar wrote:
> Workqueues queues work on current cpu, if the caller haven't passed a 
> preferred
> cpu. This may wake up an idle CPU, which is actually not required.
> 
> This work can be processed by any CPU and so we must select a non-idle CPU 
> here.
> This patch adds in support in workqueue framework to get preferred CPU details
> from the scheduler, instead of using current CPU.
> 
> Most of the time when a work is queued, the current cpu isn't idle and so we
> will choose it only. There are cases when a cpu is idle when it queues some
> work. For example, consider following scenario:
> - A cpu has programmed a timer and is IDLE now.
> - CPU gets into interrupt handler due to timer and queues a work. As the CPU 
> is
>   currently IDLE, we can queue this work to some other CPU.
> 
> Signed-off-by: Viresh Kumar 
> ---
>  kernel/workqueue.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 042d221..d32efa2 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1238,7 +1238,7 @@ static void __queue_work(unsigned int cpu, struct 
> workqueue_struct *wq,
>   struct global_cwq *last_gcwq;
>  
>   if (cpu == WORK_CPU_UNBOUND)
> - cpu = raw_smp_processor_id();
> + cpu = sched_select_cpu(0);

A couple of things. The sched_select_cpu() is not cheap. It has a double
loop of domains/cpus looking for a non idle cpu. If we have 1024 CPUs,
and we are CPU 1023 and all other CPUs happen to be idle, we could be
searching 1023 CPUs before we come up with our own.

__queue_work() should be fast as there is a reason that it is delaying
the work and not running it itself.

Also, I really don't like this as a default behavior. It seems that this
solution is for a very special case, and this can become very intrusive
for the normal case.

To be honest, I'm uncomfortable with this approach. It seems to be
fighting a symptom and not the disease. I'd rather find a way to keep
work from being queued on wrong CPU. If it is a timer, find a way to
move the timer. If it is something else, lets work to fix that. Doing
searches of possibly all CPUs (unlikely, but it is there), just seems
wrong to me.

-- Steve

>  
>   /*
>* It's multi cpu.  If @work was previously on a different
> @@ -1383,7 +1383,7 @@ static void __queue_delayed_work(int cpu, struct 
> workqueue_struct *wq,
>   if (gcwq)
>   lcpu = gcwq->cpu;
>   if (lcpu == WORK_CPU_UNBOUND)
> - lcpu = raw_smp_processor_id();
> + lcpu = sched_select_cpu(0);
>   } else {
>   lcpu = WORK_CPU_UNBOUND;
>   }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

On 27 November 2012 06:19, Viresh Kumar  wrote:
> Hi Tejun,
>
> On 26 November 2012 22:45, Tejun Heo  wrote:
>> On Tue, Nov 06, 2012 at 04:08:45PM +0530, Viresh Kumar wrote:
>
>> I'm pretty skeptical about this.  queue_work() w/o explicit CPU
>> assignment has always guaranteed that the work item will be executed
>> on the same CPU.  I don't think there are too many users depending on
>> that but am not sure at all that there are none.  I asked you last
>> time that you would at the very least need to audit most users but it
>> doesn't seem like there has been any effort there.
>
> My bad. I completely missed/forgot that comment from your earlier mails.
> Will do it.
>
>> That said, if the obtained benefit is big enough, sure, we can
>> definitely change the behavior, which isn't all that great to begin
>> with, and try to shake out the bugs quicky by e.g. forcing all work
>> items to execute on different CPUs, but you're presenting a few
>> percent of work items being migrated to a different CPU from an
>> already active CPU, which doesn't seem like such a big benefit to me
>> even if the migration target CPU is somewhat more efficient.  How much
>> powersaving are we talking about?
>
> Hmm.. I actually implemented the problem discussed here:
> (I know you have seen this earlier :) )
>
> http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/08/lpc2012-sched-timer-workqueue.pdf
>
> Specifically slides: 12 & 19.
>
> I haven't done much power calculations with it and have tested it more from
> functionality point of view.
>
> @Vincent: Can you add some comments here?

Sorry for this late reply.

We have faced some situations on TC2 (as an example) where the tasks
are running in the LITTLE cluster whereas some periodic works stay on
the big cluster so we can have one cluster that wakes up for tasks and
another one that wakes up for work. We would like to consolidate the
behaviour of the work with the tasks behaviour.

Sorry, I don't have relevant figures as the patches are used with
other ones which also impact the power consumption.

This series introduces the possibility to run a work on another CPU
which is necessary if we want a better correlation of task and work
scheduling on the system. Most of the time the queue_work is used when
a driver don't mind the CPU on which you want to run whereas it looks
like it should be used only if you want to run locally. We would like
to solve this point with the new interface that is proposed by viresh

Vincent

>
> --
> viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

On 27 November 2012 06:19, Viresh Kumar viresh.ku...@linaro.org wrote:
Hi Tejun,

On 26 November 2012 22:45, Tejun Heo t...@kernel.org wrote:
On Tue, Nov 06, 2012 at 04:08:45PM +0530, Viresh Kumar wrote:

I'm pretty skeptical about this. queue_work() w/o explicit CPU
assignment has always guaranteed that the work item will be executed
on the same CPU. I don't think there are too many users depending on
that but am not sure at all that there are none. I asked you last
time that you would at the very least need to audit most users but it
doesn't seem like there has been any effort there.

My bad. I completely missed/forgot that comment from your earlier mails.
Will do it.

That said, if the obtained benefit is big enough, sure, we can
definitely change the behavior, which isn't all that great to begin
with, and try to shake out the bugs quicky by e.g. forcing all work
items to execute on different CPUs, but you're presenting a few
percent of work items being migrated to a different CPU from an
already active CPU, which doesn't seem like such a big benefit to me
even if the migration target CPU is somewhat more efficient. How much
powersaving are we talking about?

Hmm.. I actually implemented the problem discussed here:
(I know you have seen this earlier :) )

http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/08/lpc2012-sched-timer-workqueue.pdf

Specifically slides: 12 19.

I haven't done much power calculations with it and have tested it more from
functionality point of view.

@Vincent: Can you add some comments here?

Sorry for this late reply.

We have faced some situations on TC2 (as an example) where the tasks
are running in the LITTLE cluster whereas some periodic works stay on
the big cluster so we can have one cluster that wakes up for tasks and
another one that wakes up for work. We would like to consolidate the
behaviour of the work with the tasks behaviour.

Sorry, I don't have relevant figures as the patches are used with
other ones which also impact the power consumption.

This series introduces the possibility to run a work on another CPU
which is necessary if we want a better correlation of task and work
scheduling on the system. Most of the time the queue_work is used when
a driver don't mind the CPU on which you want to run whereas it looks
like it should be used only if you want to run locally. We would like
to solve this point with the new interface that is proposed by viresh

Vincent

--
viresh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

On Tue, 2012-11-06 at 16:08 +0530, Viresh Kumar wrote:
 Workqueues queues work on current cpu, if the caller haven't passed a 
 preferred
 cpu. This may wake up an idle CPU, which is actually not required.
 
 This work can be processed by any CPU and so we must select a non-idle CPU 
 here.
 This patch adds in support in workqueue framework to get preferred CPU details
 from the scheduler, instead of using current CPU.
 
 Most of the time when a work is queued, the current cpu isn't idle and so we
 will choose it only. There are cases when a cpu is idle when it queues some
 work. For example, consider following scenario:
 - A cpu has programmed a timer and is IDLE now.
 - CPU gets into interrupt handler due to timer and queues a work. As the CPU 
 is
   currently IDLE, we can queue this work to some other CPU.
 
 Signed-off-by: Viresh Kumar viresh.ku...@linaro.org
 ---
  kernel/workqueue.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/kernel/workqueue.c b/kernel/workqueue.c
 index 042d221..d32efa2 100644
 --- a/kernel/workqueue.c
 +++ b/kernel/workqueue.c
 @@ -1238,7 +1238,7 @@ static void __queue_work(unsigned int cpu, struct 
 workqueue_struct *wq,
   struct global_cwq *last_gcwq;
  
   if (cpu == WORK_CPU_UNBOUND)
 - cpu = raw_smp_processor_id();
 + cpu = sched_select_cpu(0);

A couple of things. The sched_select_cpu() is not cheap. It has a double
loop of domains/cpus looking for a non idle cpu. If we have 1024 CPUs,
and we are CPU 1023 and all other CPUs happen to be idle, we could be
searching 1023 CPUs before we come up with our own.

__queue_work() should be fast as there is a reason that it is delaying
the work and not running it itself.

Also, I really don't like this as a default behavior. It seems that this
solution is for a very special case, and this can become very intrusive
for the normal case.

To be honest, I'm uncomfortable with this approach. It seems to be
fighting a symptom and not the disease. I'd rather find a way to keep
work from being queued on wrong CPU. If it is a timer, find a way to
move the timer. If it is something else, lets work to fix that. Doing
searches of possibly all CPUs (unlikely, but it is there), just seems
wrong to me.

-- Steve


  
   /*
* It's multi cpu.  If @work was previously on a different
 @@ -1383,7 +1383,7 @@ static void __queue_delayed_work(int cpu, struct 
 workqueue_struct *wq,
   if (gcwq)
   lcpu = gcwq-cpu;
   if (lcpu == WORK_CPU_UNBOUND)
 - lcpu = raw_smp_processor_id();
 + lcpu = sched_select_cpu(0);
   } else {
   lcpu = WORK_CPU_UNBOUND;
   }


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

2012-11-27 Thread Viresh Kumar

On 27 November 2012 18:56, Steven Rostedt rost...@goodmis.org wrote:
 A couple of things. The sched_select_cpu() is not cheap. It has a double
 loop of domains/cpus looking for a non idle cpu. If we have 1024 CPUs,
 and we are CPU 1023 and all other CPUs happen to be idle, we could be
 searching 1023 CPUs before we come up with our own.

Not sure if you missed the first check sched_select_cpu()

+int sched_select_cpu(unsigned int sd_flags)
+{
+   /* If Current cpu isn't idle, don't migrate anything */
+   if (!idle_cpu(cpu))
+   return cpu;

We aren't going to search if we aren't idle.

 Also, I really don't like this as a default behavior. It seems that this
 solution is for a very special case, and this can become very intrusive
 for the normal case.

We tried with an KCONFIG option for it, which Tejun rejected.

 To be honest, I'm uncomfortable with this approach. It seems to be
 fighting a symptom and not the disease. I'd rather find a way to keep
 work from being queued on wrong CPU. If it is a timer, find a way to
 move the timer. If it is something else, lets work to fix that. Doing
 searches of possibly all CPUs (unlikely, but it is there), just seems
 wrong to me.

As Vincent pointed out, on big LITTLE systems we just don't want to
serve works on big cores. That would be wasting too much of power.
Specially if we are going to wake up big cores.

It would be difficult to control the source driver (which queues work) to
little cores. We thought, if somebody wanted to queue work on current
cpu then they must use queue_work_on().

--
viresh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

On Tue, 2012-11-27 at 19:18 +0530, Viresh Kumar wrote:
 On 27 November 2012 18:56, Steven Rostedt rost...@goodmis.org wrote:
  A couple of things. The sched_select_cpu() is not cheap. It has a double
  loop of domains/cpus looking for a non idle cpu. If we have 1024 CPUs,
  and we are CPU 1023 and all other CPUs happen to be idle, we could be
  searching 1023 CPUs before we come up with our own.
 
 Not sure if you missed the first check sched_select_cpu()
 
 +int sched_select_cpu(unsigned int sd_flags)
 +{
 +   /* If Current cpu isn't idle, don't migrate anything */
 +   if (!idle_cpu(cpu))
 +   return cpu;
 
 We aren't going to search if we aren't idle.

OK, we are idle, but CPU 1022 isn't. We still need a large search. But,
heh we are idle we can spin. But then why go through this in the first
place ;-)


 
  Also, I really don't like this as a default behavior. It seems that this
  solution is for a very special case, and this can become very intrusive
  for the normal case.
 
 We tried with an KCONFIG option for it, which Tejun rejected.

Yeah, I saw that. I don't like adding KCONFIG options either. Best is to
get something working that doesn't add any regressions. If you can get
this to work without making *any* regressions in the normal case than
I'm totally fine with that. But if this adds any issues with the normal
case, then it's a show stopper.

 
  To be honest, I'm uncomfortable with this approach. It seems to be
  fighting a symptom and not the disease. I'd rather find a way to keep
  work from being queued on wrong CPU. If it is a timer, find a way to
  move the timer. If it is something else, lets work to fix that. Doing
  searches of possibly all CPUs (unlikely, but it is there), just seems
  wrong to me.
 
 As Vincent pointed out, on big LITTLE systems we just don't want to
 serve works on big cores. That would be wasting too much of power.
 Specially if we are going to wake up big cores.
 
 It would be difficult to control the source driver (which queues work) to
 little cores. We thought, if somebody wanted to queue work on current
 cpu then they must use queue_work_on().

As Tejun has mentioned earlier, is there any assumptions anywhere that
expects an unbounded work queue to not migrate? Where per cpu variables
might be used. Tejun had a good idea of forcing this to migrate the work
*every* time. To not let a work queue run on the same CPU that it was
queued on. If it can survive that, then it is probably OK. Maybe add a
config option that forces this? That way, anyone can test that this
isn't an issue.

-- Steve


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

On 27 November 2012 14:59, Steven Rostedt rost...@goodmis.org wrote:
 On Tue, 2012-11-27 at 19:18 +0530, Viresh Kumar wrote:
 On 27 November 2012 18:56, Steven Rostedt rost...@goodmis.org wrote:
  A couple of things. The sched_select_cpu() is not cheap. It has a double
  loop of domains/cpus looking for a non idle cpu. If we have 1024 CPUs,
  and we are CPU 1023 and all other CPUs happen to be idle, we could be
  searching 1023 CPUs before we come up with our own.

 Not sure if you missed the first check sched_select_cpu()

 +int sched_select_cpu(unsigned int sd_flags)
 +{
 +   /* If Current cpu isn't idle, don't migrate anything */
 +   if (!idle_cpu(cpu))
 +   return cpu;

 We aren't going to search if we aren't idle.

 OK, we are idle, but CPU 1022 isn't. We still need a large search. But,
 heh we are idle we can spin. But then why go through this in the first
 place ;-)

By migrating it now, it will create its activity and wake up on the
right CPU next time.

If migrating on any CPUs seems a bit risky, we could restrict the
migration on a CPU on the same node. We can pass such contraints on
sched_select_cpu




  Also, I really don't like this as a default behavior. It seems that this
  solution is for a very special case, and this can become very intrusive
  for the normal case.

 We tried with an KCONFIG option for it, which Tejun rejected.

 Yeah, I saw that. I don't like adding KCONFIG options either. Best is to
 get something working that doesn't add any regressions. If you can get
 this to work without making *any* regressions in the normal case than
 I'm totally fine with that. But if this adds any issues with the normal
 case, then it's a show stopper.


  To be honest, I'm uncomfortable with this approach. It seems to be
  fighting a symptom and not the disease. I'd rather find a way to keep
  work from being queued on wrong CPU. If it is a timer, find a way to
  move the timer. If it is something else, lets work to fix that. Doing
  searches of possibly all CPUs (unlikely, but it is there), just seems
  wrong to me.

 As Vincent pointed out, on big LITTLE systems we just don't want to
 serve works on big cores. That would be wasting too much of power.
 Specially if we are going to wake up big cores.

 It would be difficult to control the source driver (which queues work) to
 little cores. We thought, if somebody wanted to queue work on current
 cpu then they must use queue_work_on().

 As Tejun has mentioned earlier, is there any assumptions anywhere that
 expects an unbounded work queue to not migrate? Where per cpu variables
 might be used. Tejun had a good idea of forcing this to migrate the work
 *every* time. To not let a work queue run on the same CPU that it was
 queued on. If it can survive that, then it is probably OK. Maybe add a
 config option that forces this? That way, anyone can test that this
 isn't an issue.

 -- Steve


 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

On Tue, 2012-11-27 at 15:55 +0100, Vincent Guittot wrote:
 On 27 November 2012 14:59, Steven Rostedt rost...@goodmis.org wrote:
  On Tue, 2012-11-27 at 19:18 +0530, Viresh Kumar wrote:
  On 27 November 2012 18:56, Steven Rostedt rost...@goodmis.org wrote:
   A couple of things. The sched_select_cpu() is not cheap. It has a double
   loop of domains/cpus looking for a non idle cpu. If we have 1024 CPUs,
   and we are CPU 1023 and all other CPUs happen to be idle, we could be
   searching 1023 CPUs before we come up with our own.
 
  Not sure if you missed the first check sched_select_cpu()
 
  +int sched_select_cpu(unsigned int sd_flags)
  +{
  +   /* If Current cpu isn't idle, don't migrate anything */
  +   if (!idle_cpu(cpu))
  +   return cpu;
 
  We aren't going to search if we aren't idle.
 
  OK, we are idle, but CPU 1022 isn't. We still need a large search. But,
  heh we are idle we can spin. But then why go through this in the first
  place ;-)
 
 By migrating it now, it will create its activity and wake up on the
 right CPU next time.
 
 If migrating on any CPUs seems a bit risky, we could restrict the
 migration on a CPU on the same node. We can pass such contraints on
 sched_select_cpu
 

That's assuming that the CPUs stay idle. Now if we move the work to
another CPU and it goes idle, then it may move that again. It could end
up being a ping pong approach.

I don't think idle is a strong enough heuristic for the general case. If
interrupts are constantly going off on a CPU that happens to be idle
most of the time, it will constantly be moving work onto CPUs that are
currently doing real work, and by doing so, it will be slowing those
CPUs down.

-- Steve


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one