Hi all -

I'm running my MPI program via open-mpi on a 4 core opteron machine.
I am trying to run 5 processes where 1 of these processes is simply a
coordinating process.  It does very little work other than once every
second or two recieving and sending a short ping to the other
processes.

I've seen some of the other threads regarding oversubscription, and I
read about it in the FAQ.  When I don't set my process to
yield_when_idle, naturally I get terrible performance because open-mpi
is using aggressive mode when oversubscribed.

Finally I was able to get it to sometimes act degraded by using the
yield_when_idle and by setting the number of slots=4.  However, it
seems as though this works off and on.  If I run top while running my
processes sometimes it looks like this:

11405 budge     39  15  957m 383m 4132 R  100  9.6  83:14.78
RenderFish
11406 budge     39  15  959m 386m 4204 R  100  9.6  71:24.22
RenderFish
11408 budge     39  15  959m 386m 4200 R  100  9.6  56:12.23
RenderFish
11407 budge     39  15  959m 386m 4204 R   50  9.6  46:45.78
RenderFish
11409 budge     39  15  959m 386m 4208 R   50  9.6  75:14.43 RenderFish

and sometimes it looks more like this:

11323 budge     39  15  959m 386m 4132 R  100  9.6  83:14.78
RenderFish
11324 budge     39  15  959m 386m 4204 R  100  9.6  71:24.22
RenderFish
11325 budge     39  15  959m 386m 4200 R  100  9.6  83:12.23
RenderFish
11326 budge     39  15  959m 386m 4204 R   99  9.6  72:45.78
RenderFish

Obviously, the second is better.  I only wish it were the norm.

I used LAM for a long time and it seems to always have run in
"degraded" mode, but I have other reasons (better infiniband support)
that I am switching to open-mpi.

Does anyone have any idea why open-mpi wouldn't think a process is
idle 99% of the time when it doesn't do anything 99% of the time?  One
possible clue is that my program nices itself... does this make a
difference?

Thanks,
 Brian

Reply via email to