Hi Yiannis,
thanks for your reply, but unfortunately i still seem to having issues.
i've rebuilt mpich2-3.0.2
./configure --with-slurm=/local1/slurm-2.5.4_INSTALL/ --with-pmi=pmi2
--enable-pmiport --prefix=/local1/mpich-3.0.2_SLURM/ --enable-shared
--enable-cxx ;
now I'm crashing right away in the mpi_init call:
srun --mpi=pmi2 -N2 myprogram
/tech/home/csprenger/mpich-3.0.2_SLURM//lib/libmpich.so.10(+0xe7927)[0x7f5daff73927]
/tech/home/csprenger/mpich-3.0.2_SLURM//lib/libmpich.so.10(PMI2_Init+0x7ff)[0x7f5daff7806f]
/tech/home/csprenger/mpich-3.0.2_SLURM//lib/libmpich.so.10(MPID_Init+0xac)[0x7f5daff371ac]
/tech/home/csprenger/mpich-3.0.2_SLURM//lib/libmpich.so.10(MPIR_Init_thread+0x240)[0x7f5dafff90d0]
/tech/home/csprenger/mpich-3.0.2_SLURM//lib/libmpich.so.10(MPI_Init+0xb1)[0x7f5dafff8a41]
not sure what i'm doing wrong or what else i'm missing. any help would
be highly appreciated.
Cheers,
Christoph
On 26/03/13 22:06, yiannis georgiou wrote:
Hi Christoph,
you need to make use of PMI2 version of slurm to test MPI_comm_spawn
primitive of mpich2.
In more detail, you have to rebuilt your mpich2 adding the following
flags on your configure:
--enable-pmiport --with-pmi=pmi2--with-slurm=$YOUR_SLURM
and when you run jobs with slurm you need to make use of the following
parameter on your srun:
--mpi=pmi2
By the way, since you are testing MPI_comm_spawn, the following slurm
feature that allows jobs to change their sizes while they are
executing may be interesting for you:
http://www.schedmd.com/slurmdocs/faq.html#job_size
Best Regards,
Yiannis
On 03/26/2013 09:18 AM, Christoph Sprenger wrote:
Hi,
i've been trying to test to test MPI_comm_spawn interface with slurm and
pm=srun opposed to hydra provided by mpich2
i've rebuilt my mpich2 version with these flags:
./configure --with-pmi=slurm --with-pm=no --enable-shared
whatever i've tried so far i can't get it to spawn new commands for me:
int err = MPI_Comm_spawn(cmd, &char_argv[0], 1, NULL, 0, mpi_mgr.comm(),
&intracomm, MPI_ERRCODES_IGNORE);
results in:
Fatal error in MPI_Comm_spawn: Other MPI error, error stack:
MPI_Comm_spawn(144)...........: MPI_Comm_spawn(cmd="spawn_tst",
argv=0x7faea00276b0, maxprocs=1, info=0x9c000000, root=0,
MPI_COMM_WORLD, intercomm=0x7fff96511850, errors=(nil)) failed
MPIDI_Comm_spawn_multiple(240): PMI_Spawn_multiple returned -1
all the code works fine via hydra. i'm curious if people are using srun
pm with mpi_comm_spawn successfully or if there are some caveats/known
issues i need to look out for ? i can't seem to even make the basics
examples work, so i am sure i must be doing something wrong.
i'm using slurm-2.5.4 and mpich2-1.5rc1
any help would be highly appreciated.
srun -N 2 -B '*:*:*' --exclusive mycmd args
Kind Regards,
Christoph