FWIW: You don't need the --mpi openmpi option on the srun cmd with that OMPI config
On Oct 22, 2013, at 9:47 AM, Lior Amar <[email protected]> wrote: > Hi, > > My name is Lior Amar and I am trying to learn some internal stuff in slurm. > > I am running a test cluster of 1 node (both slurmctld and slurmd on the same > node) with > the latest slurm from git. > > I configured slurm with just ./configure > > I compiled openmpi 1.6.5 to run with slurm and ipi: > CFLAGS=-I/usr/local/include/slurm LDFLAGS=-L/usr/local/lib LIBS=-lpmi > ./configure --prefix /usr/local/software/mpi/openmpi-1.6.5 --with-ft=cr > --enable-opal-multi-threads --with-pmi --with-slurm > > > I have a very simple config file and I changed the MpiParams to > > MpiDefault=openmpi > MpiParams=ports=12000-12999 > > > When I run a simple mpi program with mpi run there are no problems. > When I run salloc and then mpirun ... again there are no problems > > When I try to run the mpi program directly with srun (from within an salloc > or directly) > I get the following: > > srun -n 2 --mpi openmpi ./app1 > /home/lior/src/migslurm/app1/./app1: symbol lookup error: > /usr/local/lib/slurm/auth_munge.so: undefined symbol: slurm_auth_get_arg_desc > srun: error: slurm_receive_msg: Zero Bytes were transmitted or received > /home/lior/src/migslurm/app1/./app1: symbol lookup error: > /usr/local/lib/slurm/auth_munge.so: undefined symbol: slurm_auth_get_arg_desc > srun: error: slurm_receive_msg[127.0.0.1]: Zero Bytes were transmitted or > received > srun: error: slurm_receive_msg: Zero Bytes were transmitted or received > srun: error: slurm_receive_msg[127.0.0.1]: Zero Bytes were transmitted or > received > srun: error: pictor: tasks 0-1: Exited with exit code 127 > > > I noticed that the slurm_auth_get_arg_desc is not present in the auth_munge.so > plugin but comes from libslurm.so. So when using the linker preload option I > manage > to run the application > > srun -n 2 --mpi openmpi env LD_PRELOAD=/usr/local/lib/libslurm.so ./app1 > > The mpi application manage to run, but I don't see any output (only after I > Ctrl-c the > srun I see the output). > > Please advice?? > > Regards > --lior > > > > > > > -- > ----------------------oo--o(:-:)o--oo---------------- > Lior > ---------------------------------------------------------- >
