Hi all - I'm running my MPI program via open-mpi on a 4 core opteron machine. I am trying to run 5 processes where 1 of these processes is simply a coordinating process. It does very little work other than once every second or two recieving and sending a short ping to the other processes.
I've seen some of the other threads regarding oversubscription, and I read about it in the FAQ. When I don't set my process to yield_when_idle, naturally I get terrible performance because open-mpi is using aggressive mode when oversubscribed. Finally I was able to get it to sometimes act degraded by using the yield_when_idle and by setting the number of slots=4. However, it seems as though this works off and on. If I run top while running my processes sometimes it looks like this: 11405 budge 39 15 957m 383m 4132 R 100 9.6 83:14.78 RenderFish 11406 budge 39 15 959m 386m 4204 R 100 9.6 71:24.22 RenderFish 11408 budge 39 15 959m 386m 4200 R 100 9.6 56:12.23 RenderFish 11407 budge 39 15 959m 386m 4204 R 50 9.6 46:45.78 RenderFish 11409 budge 39 15 959m 386m 4208 R 50 9.6 75:14.43 RenderFish and sometimes it looks more like this: 11323 budge 39 15 959m 386m 4132 R 100 9.6 83:14.78 RenderFish 11324 budge 39 15 959m 386m 4204 R 100 9.6 71:24.22 RenderFish 11325 budge 39 15 959m 386m 4200 R 100 9.6 83:12.23 RenderFish 11326 budge 39 15 959m 386m 4204 R 99 9.6 72:45.78 RenderFish Obviously, the second is better. I only wish it were the norm. I used LAM for a long time and it seems to always have run in "degraded" mode, but I have other reasons (better infiniband support) that I am switching to open-mpi. Does anyone have any idea why open-mpi wouldn't think a process is idle 99% of the time when it doesn't do anything 99% of the time? One possible clue is that my program nices itself... does this make a difference? Thanks, Brian