On 10/06/16 00:35, Jason Bacon wrote: > I can imagine this issue going unnoticed most of the time, because it > will only cause a problem when an OMPI job shares a node with another > job using core binding, which is infrequent on our clusters.
To reiterate: You *really* want to be using Linux cgroups in Slurm then, that will prevent this happening as the processes will only ever be able to bind to the cores they've been allocated. IIRC Open-MPI uses hwloc so it'll only see what cores are in its cgroup and just try and use them, it'll ignore anything else. All the best, Chris -- Christopher Samuel Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: [email protected] Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
