Hi, My name is Lior Amar and I am trying to learn some internal stuff in slurm.
I am running a test cluster of 1 node (both slurmctld and slurmd on the same node) with the latest slurm from git. I configured slurm with just ./configure I compiled openmpi 1.6.5 to run with slurm and ipi: CFLAGS=-I/usr/local/include/slurm LDFLAGS=-L/usr/local/lib LIBS=-lpmi ./configure --prefix /usr/local/software/mpi/openmpi-1.6.5 --with-ft=cr --enable-opal-multi-threads --with-pmi --with-slurm I have a very simple config file and I changed the MpiParams to MpiDefault=openmpi MpiParams=ports=12000-12999 When I run a simple mpi program with mpi run there are no problems. When I run salloc and then mpirun ... again there are no problems When I try to run the mpi program directly with srun (from within an salloc or directly) I get the following: srun -n 2 --mpi openmpi ./app1 /home/lior/src/migslurm/app1/./app1: symbol lookup error: /usr/local/lib/slurm/auth_munge.so: undefined symbol: slurm_auth_get_arg_desc srun: error: slurm_receive_msg: Zero Bytes were transmitted or received /home/lior/src/migslurm/app1/./app1: symbol lookup error: /usr/local/lib/slurm/auth_munge.so: undefined symbol: slurm_auth_get_arg_desc srun: error: slurm_receive_msg[127.0.0.1]: Zero Bytes were transmitted or received srun: error: slurm_receive_msg: Zero Bytes were transmitted or received srun: error: slurm_receive_msg[127.0.0.1]: Zero Bytes were transmitted or received srun: error: pictor: tasks 0-1: Exited with exit code 127 I noticed that the slurm_auth_get_arg_desc is not present in the auth_munge.so plugin but comes from libslurm.so. So when using the linker preload option I manage to run the application srun -n 2 --mpi openmpi env LD_PRELOAD=/usr/local/lib/libslurm.so ./app1 The mpi application manage to run, but I don't see any output (only after I Ctrl-c the srun I see the output). Please advice?? Regards --lior -- ----------------------oo--o(:-:)o--oo---------------- Lior ----------------------------------------------------------
