Hi Yiannis,

thanks for your reply, but unfortunately i still seem to having issues.
i've rebuilt mpich2-3.0.2
./configure --with-slurm=/local1/slurm-2.5.4_INSTALL/ --with-pmi=pmi2 --enable-pmiport --prefix=/local1/mpich-3.0.2_SLURM/ --enable-shared --enable-cxx ;

now I'm crashing right away in the mpi_init call:
 srun --mpi=pmi2 -N2 myprogram

/tech/home/csprenger/mpich-3.0.2_SLURM//lib/libmpich.so.10(+0xe7927)[0x7f5daff73927]
/tech/home/csprenger/mpich-3.0.2_SLURM//lib/libmpich.so.10(PMI2_Init+0x7ff)[0x7f5daff7806f]
/tech/home/csprenger/mpich-3.0.2_SLURM//lib/libmpich.so.10(MPID_Init+0xac)[0x7f5daff371ac]
/tech/home/csprenger/mpich-3.0.2_SLURM//lib/libmpich.so.10(MPIR_Init_thread+0x240)[0x7f5dafff90d0]
/tech/home/csprenger/mpich-3.0.2_SLURM//lib/libmpich.so.10(MPI_Init+0xb1)[0x7f5dafff8a41]


not sure what i'm doing wrong or what else i'm missing. any help would be highly appreciated.

Cheers,
Christoph






On 26/03/13 22:06, yiannis georgiou wrote:
Hi Christoph,

you need to make use of PMI2 version of slurm to test MPI_comm_spawn primitive of mpich2.

In more detail, you have to rebuilt your mpich2 adding the following flags on your configure:

--enable-pmiport --with-pmi=pmi2--with-slurm=$YOUR_SLURM

and when you run jobs with slurm you need to make use of the following parameter on your srun:

--mpi=pmi2

By the way, since you are testing MPI_comm_spawn, the following slurm feature that allows jobs to change their sizes while they are executing may be interesting for you:

http://www.schedmd.com/slurmdocs/faq.html#job_size

Best Regards,
Yiannis

On 03/26/2013 09:18 AM, Christoph Sprenger wrote:
Hi,

i've been trying to test to test MPI_comm_spawn interface with slurm and
pm=srun opposed to hydra provided by mpich2

i've rebuilt my mpich2 version with these flags:
./configure --with-pmi=slurm --with-pm=no --enable-shared

whatever i've tried so far i can't get it to spawn new commands for me:

int err = MPI_Comm_spawn(cmd, &char_argv[0], 1, NULL, 0, mpi_mgr.comm(),
&intracomm, MPI_ERRCODES_IGNORE);

results in:

Fatal error in MPI_Comm_spawn: Other MPI error, error stack:
MPI_Comm_spawn(144)...........: MPI_Comm_spawn(cmd="spawn_tst",
argv=0x7faea00276b0, maxprocs=1, info=0x9c000000, root=0,
MPI_COMM_WORLD, intercomm=0x7fff96511850, errors=(nil)) failed
MPIDI_Comm_spawn_multiple(240): PMI_Spawn_multiple returned -1

all the code works fine via hydra. i'm curious if people are using srun
pm with mpi_comm_spawn successfully or if there are some caveats/known
issues i need to look out for ? i can't seem to even make the basics
examples work, so i am sure i must be doing something wrong.

i'm using  slurm-2.5.4 and mpich2-1.5rc1

any help would be highly appreciated.

srun -N 2 -B '*:*:*' --exclusive mycmd args


Kind Regards,
Christoph



Reply via email to