Re: [OMPI users] High CPU usage with yield_when_idle =1 on CFS

2011-09-03 Thread Randolph Pullen
I have already implemented test/sleep code but the main problem is with the 
broadcasts that send out the SIMD instructions, because these are blocking and 
when the system is idle, its these guys who consume the CPU while waiting for 
work.
Implementing  
echo "1"
> /proc/sys/kernel/sched_compat_yield
Helps quite a bit (thanks Jeff) in that it makes yield more aggressive but the 
fundamental problem still remains.

A non-blocking broadcast would fix it, and/or a good scheduler.

Do other MPI's use busy loops so extensively in there comms?



From: Jeff Squyres <jsquy...@cisco.com>
To: Open MPI Users <us...@open-mpi.org>
Sent: Friday, 2 September 2011 9:45 PM
Subject: Re: [OMPI users] High CPU usage with yield_when_idle =1 on CFS

This might also be in reference to the issue that shed_yield() really does 
nothing in recent Linux kernels (there was big debate about this at 
kernel.org).  

IIRC, there's some kernel parameter that you can tweak to make it behave 
better, but I'm afraid I don't remember what it is.  Some googling might find 
it...?


On Sep 1, 2011, at 10:06 PM, Eugene Loh wrote:

> On 8/31/2011 11:48 PM, Randolph Pullen wrote:
>> I recall a discussion some time ago about yield, the Completely F%’d 
>> Scheduler (CFS) and OpenMPI.
>> 
>> My system is currently suffering from massive CPU use while busy waiting.  
>> This gets worse as I try to bump up user concurrency.
> Yup.
>> I am running with yield_when_idle but its not enough.
> Yup.
>> Is there anything else I can do to release some CPU resource?
>> I recall seeing one post where usleep(1) was inserted around the yields, is 
>> this still feasible?
>> 
>> I'm using 1.4.1 - is there a fix to be found in upgrading?
>> Unfortunately I am stuck  with the CFS as I need Linux.  Currently its 
>> Ubuntu 10.10 with 2.6.32.14
> I think OMPI doesn't yet do (much/any) better than what you've observed.  You 
> might be able to hack something up yourself.  In something I did recently, I 
> replaced blocking sends and receives with test/nanosleep loops.  An "optimum" 
> solution (minimum latency, optimal performance at arbitrary levels of under 
> and oversubscription) might be elusive, but hopefully you'll quickly be able 
> to piece together something for your particular purposes.  In my case, I was 
> lucky and the results were very gratifying... my bottleneck vaporized for 
> modest levels of oversubscription (2-4 more processes than processors).
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] High CPU usage with yield_when_idle =1 on CFS

2011-09-02 Thread Jeff Squyres
This might also be in reference to the issue that shed_yield() really does 
nothing in recent Linux kernels (there was big debate about this at 
kernel.org).  

IIRC, there's some kernel parameter that you can tweak to make it behave 
better, but I'm afraid I don't remember what it is.  Some googling might find 
it...?


On Sep 1, 2011, at 10:06 PM, Eugene Loh wrote:

> On 8/31/2011 11:48 PM, Randolph Pullen wrote:
>> I recall a discussion some time ago about yield, the Completely F%’d 
>> Scheduler (CFS) and OpenMPI.
>> 
>> My system is currently suffering from massive CPU use while busy waiting.  
>> This gets worse as I try to bump up user concurrency.
> Yup.
>> I am running with yield_when_idle but its not enough.
> Yup.
>> Is there anything else I can do to release some CPU resource?
>> I recall seeing one post where usleep(1) was inserted around the yields, is 
>> this still feasible?
>> 
>> I'm using 1.4.1 - is there a fix to be found in upgrading?
>> Unfortunately I am stuck  with the CFS as I need Linux.  Currently its 
>> Ubuntu 10.10 with 2.6.32.14
> I think OMPI doesn't yet do (much/any) better than what you've observed.  You 
> might be able to hack something up yourself.  In something I did recently, I 
> replaced blocking sends and receives with test/nanosleep loops.  An "optimum" 
> solution (minimum latency, optimal performance at arbitrary levels of under 
> and oversubscription) might be elusive, but hopefully you'll quickly be able 
> to piece together something for your particular purposes.  In my case, I was 
> lucky and the results were very gratifying... my bottleneck vaporized for 
> modest levels of oversubscription (2-4 more processes than processors).
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] High CPU usage with yield_when_idle =1 on CFS

2011-09-01 Thread Eugene Loh

On 8/31/2011 11:48 PM, Randolph Pullen wrote:
I recall a discussion some time ago about yield, the Completely F%’d 
Scheduler (CFS) and OpenMPI.


My system is currently suffering from massive CPU use while busy 
waiting.  This gets worse as I try to bump up user concurrency.

Yup.

I am running with yield_when_idle but its not enough.

Yup.

Is there anything else I can do to release some CPU resource?
I recall seeing one post where usleep(1) was inserted around the 
yields, is this still feasible?


I'm using 1.4.1 - is there a fix to be found in upgrading?
Unfortunately I am stuck  with the CFS as I need Linux.  Currently its 
Ubuntu 10.10 with 2.6.32.14
I think OMPI doesn't yet do (much/any) better than what you've 
observed.  You might be able to hack something up yourself.  In 
something I did recently, I replaced blocking sends and receives with 
test/nanosleep loops.  An "optimum" solution (minimum latency, optimal 
performance at arbitrary levels of under and oversubscription) might be 
elusive, but hopefully you'll quickly be able to piece together 
something for your particular purposes.  In my case, I was lucky and the 
results were very gratifying... my bottleneck vaporized for modest 
levels of oversubscription (2-4 more processes than processors).