Ursula Winkler <[email protected]> writes: > Ursula Winkler wrote: >> Reuti wrote: >> >>> Am 11.04.2012 um 11:15 schrieb Ursula Winkler: >>> >>> >>>> Reuti wrote: >>>> >>>>> This could also be a problem of the MPI implementations. Which one do you >>>>> use - you use a plain mpiexec? >>>>> >>>>> -- Reuti >>>>> >>>>> >>>> I found that setting "MV2_ENABLE_AFFINITY=0" could be an option. But this >>>> should be set per default. Is this right? >>>> >>> I wouldn't bet on it. I found some sites where they suggest to set it to >>> zero. >>> >>> http://www.osc.edu/supercomputing/faq.shtml >>> >>> -- Reuti >>> >> >> Thanks. I told my users they should try it out. I hope it helps. >> > > Unfortunately it did not help. So, any ideas?
I don't know what's happening here in detail, but I can explain generally if it's not documented for mvapich. First of all, core binding is important for performance, particularly on NUMA systems, and you should _not_ leave it to the operating system. It sounds as if that's not what's happening here though, and mvapich has just done the binding badly. What should happen for nodes which run multiple jobs is that gridengine should bind specific cores (see -binding for qsub, e.g. http://arc.liv.ac.uk/SGE/htmlman/htmlman1/submit.html for up-to-date doc). As far as I know, you need SGE from the site in my sig to get the behaviour where you can have different numbers of cores bound on different hosts ("linear:slots") if that matters. Also you need that one, or another version based on the hwloc library, for binding to work properly on recent hardware or non-Linux kernels. The gridengine binding (which gridengine keeps track of) separates jobs, and it should be noticed by the MPI, which should then bind the individual processes to the cores it's been given. I don't know mvapich, but I know it uses hwloc, and should be able to do this properly like openmpi does (modulo issues with recent hardware, sigh). I thought mvapich would do the right thing automatically -- openmpi is said often to look bad performance-wise by not doing core binding by default. If your MPI jobs have exclusive access to the nodes it's simpler as the MPI system can do the binding itself without worrying about what else is running (e.g. the old paffinity_alone setting in openmpi). -- Community Grid Engine: http://arc.liv.ac.uk/SGE/ _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
