Hello,
Here is some related notes that I found during further investigation:


1. PMI2_Init returns appnum=-1 and this is what it gets from SLURM PMI
server.
2. Application hangs if try to call PMI2_Init twice. I think this is due to
lack of response from PMI2 server. Correct behavior assumes returning an
error.


2014-05-07 9:44 GMT+07:00 Artem Polyakov
<artpo...@gmail.com<javascript:_e(%7B%7D,'cvml','artpo...@gmail.com');>
>:

> Hello, all.
>
> I am experiencing problems with SLURM PMI2 support. Here is my
> configuration:
> 1. SLURM 2.6.3
> 2. Open MPI current trunk (1.8.1 also affected).
>
> Starting from 1.8.x Open MPI supports PMI2 and tries to use it whenever it
> possible. However PMI2 mpi module is not guaranteed to be enabled in conf
> and user can forgot to pass  --mpi=mpi2 option to srun (this was my case
> initially). In this case Open MPI aborts abnormaly because SLURMs PMI2
> assumes that this is a singleton application and leaves PMI_fd == -1. Also
> PMI won't init rank and size in PMI2_Init().
> Later Open MPI will call PMI2_Job_GetId which results in follwing call of
> PMIi_WriteSimpleCommand:
> PMIi_WriteSimpleCommand (fd=-1, resp=0x7fff11ed2890, cmd=0x7f3a9388b748
> "job-getid", pairs=0xdc7780, npairs=0) at pmi2_api.c:1471
>
> I checked other versions of SLURM ending with the latest and it seems that
> this bug remains. Here is the fix for slurm-14.03.3-2. I checked it's
> functionality on my slurm 2.6.3 installation.
>
> --
> С Уважением, Поляков Артем Юрьевич
> Best regards, Artem Y. Polyakov
>



-- 
С Уважением, Поляков Артем Юрьевич
Best regards, Artem Y. Polyakov


-- 
С Уважением, Поляков Артем Юрьевич
Best regards, Artem Y. Polyakov

Reply via email to