Hello, Here is some related notes that I found during further investigation:
1. PMI2_Init returns appnum=-1 and this is what it gets from SLURM PMI server. 2. Application hangs if try to call PMI2_Init twice. I think this is due to lack of response from PMI2 server. Correct behavior assumes returning an error. 2014-05-07 9:44 GMT+07:00 Artem Polyakov <artpo...@gmail.com<javascript:_e(%7B%7D,'cvml','artpo...@gmail.com');> >: > Hello, all. > > I am experiencing problems with SLURM PMI2 support. Here is my > configuration: > 1. SLURM 2.6.3 > 2. Open MPI current trunk (1.8.1 also affected). > > Starting from 1.8.x Open MPI supports PMI2 and tries to use it whenever it > possible. However PMI2 mpi module is not guaranteed to be enabled in conf > and user can forgot to pass --mpi=mpi2 option to srun (this was my case > initially). In this case Open MPI aborts abnormaly because SLURMs PMI2 > assumes that this is a singleton application and leaves PMI_fd == -1. Also > PMI won't init rank and size in PMI2_Init(). > Later Open MPI will call PMI2_Job_GetId which results in follwing call of > PMIi_WriteSimpleCommand: > PMIi_WriteSimpleCommand (fd=-1, resp=0x7fff11ed2890, cmd=0x7f3a9388b748 > "job-getid", pairs=0xdc7780, npairs=0) at pmi2_api.c:1471 > > I checked other versions of SLURM ending with the latest and it seems that > this bug remains. Here is the fix for slurm-14.03.3-2. I checked it's > functionality on my slurm 2.6.3 installation. > > -- > С Уважением, Поляков Артем Юрьевич > Best regards, Artem Y. Polyakov > -- С Уважением, Поляков Артем Юрьевич Best regards, Artem Y. Polyakov -- С Уважением, Поляков Артем Юрьевич Best regards, Artem Y. Polyakov