Re: [OMPI users] slurm openmpi 1.8.3 core bindings

2015-02-01 Thread Ralph Castain
Yeah, I don’t think that the slurm bindings will work for you. Problem is that 
the slurm directive gets applied to the launch of our daemon, not the 
application procs. So what you’ve done is bind our daemon to 3 cpus. This has 
nothing to do with the OMPI-Slurm integration - you told slurm to bind any  
process it launches to 3 cpus, and the only “processes” slurm launches are our 
daemons.

The only way to get what you want is to have slurm make the allocation without 
specifying cpus-per-task, and then have mpirun do the pe=N.


> On Jan 30, 2015, at 8:20 AM, Michael Di Domenico  
> wrote:
> 
> I'm trying to get slurm and openmpi to cooperate when running multi
> thread jobs.  i'm sure i'm doing something wrong, but i can't figure
> out what
> 
> my node configuration is
> 
> 2 nodes
> 2 sockets
> 6 cores per socket
> 
> i want to run
> 
> sbatch -N2 -n 8 --ntasks-per-node=4 --cpus-per-task=3 -w node1,node2
> program.sbatch
> 
> inside the program.sbatch i'm calling openmpi
> 
> mpirun -n $SLURM_NTASKS --report-bindings program
> 
> when the binds report comes out i get
> 
> node1 rank 0 socket 0 core 0
> node1 rank 1 socket 1 core 6
> node1 rank 2 socket 0 core 1
> node1 rank 3 socket 1 core 7
> node2 rank 4 socket 0 core 0
> node2 rank 5 socket 1 core 6
> node2 rank 6 socket 0 core 1
> node2 rank 7 socket 1 core 7
> 
> which is semi-fine, but when the job runs the resulting threads from
> the program are locked (according to top) to those eight cores rather
> then spreading themselves over the 24 cores available
> 
> i tried a few incantations of the map-by, bind-to, etc, but openmpi
> basically complained about everything i tried for one reason or
> another
> 
> my understand is that slurm should be passing the requested config to
> openmpi (or openmpi is pulling from the environment somehow) and it
> should magically work
> 
> if i skip slurm and run
> 
> mpirun -n 8 --map-by node:pe=3 -bind-to core -host node1,node2
> --report-bindings program
> 
> node1 rank 0 socket 0 core 0
> node2 rank 1 socket 0 core 0
> node1 rank 2 socket 0 core 3
> node2 rank 3 socket 0 core 3
> node1 rank 4 socket 1 core 6
> node2 rank 5 socket 1 core 6
> node1 rank 6 socket 1 core 9
> node2 rank 7 socket 1 core 9
> 
> i do get the behavior i want (though i would prefer a -npernode switch
> in there, but openmpi complains).  the bindings look better and the
> threads are not locked to the particular cores
> 
> therefore i'm pretty sure this is a problem between openmpi and slurm
> and not necessarily with either individually
> 
> i did compile openmpi with the slurm support switch and we're using
> the cgroups taskplugin within slurm
> 
> i guess ancillary to this, is there a way to turn off core
> binding/placement routines and control the placement manually?
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/01/26245.php



[OMPI users] slurm openmpi 1.8.3 core bindings

2015-01-30 Thread Michael Di Domenico
I'm trying to get slurm and openmpi to cooperate when running multi
thread jobs.  i'm sure i'm doing something wrong, but i can't figure
out what

my node configuration is

2 nodes
2 sockets
6 cores per socket

i want to run

sbatch -N2 -n 8 --ntasks-per-node=4 --cpus-per-task=3 -w node1,node2
program.sbatch

inside the program.sbatch i'm calling openmpi

mpirun -n $SLURM_NTASKS --report-bindings program

when the binds report comes out i get

node1 rank 0 socket 0 core 0
node1 rank 1 socket 1 core 6
node1 rank 2 socket 0 core 1
node1 rank 3 socket 1 core 7
node2 rank 4 socket 0 core 0
node2 rank 5 socket 1 core 6
node2 rank 6 socket 0 core 1
node2 rank 7 socket 1 core 7

which is semi-fine, but when the job runs the resulting threads from
the program are locked (according to top) to those eight cores rather
then spreading themselves over the 24 cores available

i tried a few incantations of the map-by, bind-to, etc, but openmpi
basically complained about everything i tried for one reason or
another

my understand is that slurm should be passing the requested config to
openmpi (or openmpi is pulling from the environment somehow) and it
should magically work

if i skip slurm and run

mpirun -n 8 --map-by node:pe=3 -bind-to core -host node1,node2
--report-bindings program

node1 rank 0 socket 0 core 0
node2 rank 1 socket 0 core 0
node1 rank 2 socket 0 core 3
node2 rank 3 socket 0 core 3
node1 rank 4 socket 1 core 6
node2 rank 5 socket 1 core 6
node1 rank 6 socket 1 core 9
node2 rank 7 socket 1 core 9

i do get the behavior i want (though i would prefer a -npernode switch
in there, but openmpi complains).  the bindings look better and the
threads are not locked to the particular cores

therefore i'm pretty sure this is a problem between openmpi and slurm
and not necessarily with either individually

i did compile openmpi with the slurm support switch and we're using
the cgroups taskplugin within slurm

i guess ancillary to this, is there a way to turn off core
binding/placement routines and control the placement manually?