2014-05-21 10:50 GMT+07:00 Artem Polyakov <[email protected]>:

> Here is an exact examples:
>
> 1. "appnum = -1" problem:
> Program pmi_appnum.c (attached) is allocated using batch script
> pmi_appnum.job (attached) and produces following results:
>
> PMI2_Init(0, 16, 0, -1)
> PMI2_Init(0, 16, 1, -1)
> PMI2_Init(0, 16, 2, -1)
> PMI2_Init(0, 16, 3, -1)
> PMI2_Init(0, 16, 5, -1)
> PMI2_Init(0, 16, 9, -1)
> PMI2_Init(0, 16, 12, -1)
> PMI2_Init(0, 16, 7, -1)
> PMI2_Init(0, 16, 4, -1)
> PMI2_Init(0, 16, 6, -1)
> PMI2_Init(0, 16, 11, -1)
> PMI2_Init(0, 16, 10, -1)
> PMI2_Init(0, 16, 8, -1)
> PMI2_Init(0, 16, 14, -1)
> PMI2_Init(0, 16, 15, -1)
> PMI2_Init(0, 16, 13, -1)
>
> 2. "Double init hang" problem:
> program pmi_double_init.c (attached) is launched with script
> pmi_double_init.job (attached) and it just hangs. Here is what GDB shows on
> one of the processes:
>
> (gdb) bt
> #0  0x0000003b722db730 in __read_nocancel () from /lib64/libc.so.6
> #1  0x00007f201cbd5ee4 in PMI2U_readline (fd=12, buf=0x7fffa4f80ba0
> "cmd=init pmi_version=2 pmi_subversion=0\n", maxlen=1024) at pmi2_util.c:72
> #2  0x00007f201cbcf74c in PMI2_Init (spawned=0x7fffa4f81404,
> size=0x7fffa4f81400, rank=0x7fffa4f813fc, appnum=0x7fffa4f813f8) at
> pmi2_api.c:221
> #3  0x0000000000400626 in main () at pmi_double_init.c:17
>
> (gdb) frame 3
> #3  0x0000000000400626 in main () at pmi_double_init.c:17
> 17          rc = PMI2_Init(&spawned, &size, &rank, &appnum);
>
> (gdb) frame 1
> #1  0x00007f201cbd5ee4 in PMI2U_readline (fd=12, buf=0x7fffa4f80ba0
> "cmd=init pmi_version=2 pmi_subversion=0\n", maxlen=1024) at pmi2_util.c:72
> 72                      n = read(fd, readbuf, sizeof(readbuf) - 1);
> (gdb) l
> 67          p = buf;
> 68          curlen = 1; /* Make room for the null */
> 69          while (curlen < maxlen) {
> 70              if (nextChar == lastChar) {
> 71                  do {
> 72                      n = read(fd, readbuf, sizeof(readbuf) - 1);
> 73                  } while (n == -1 && errno == EINTR);
> 74                  if (n == 0) {
> 75                      /* EOF */
> 76                      break;
>
> (gdb) frame 2
> #2  0x00007f201cbcf74c in PMI2_Init (spawned=0x7fffa4f81404,
> size=0x7fffa4f81400, rank=0x7fffa4f813fc, appnum=0x7fffa4f813f8) at
> pmi2_api.c:221
> 221         ret = PMI2U_readline(PMI2_fd, buf, PMI2_MAXLINE);
> (gdb) l
> 216         PMI2U_ERR_CHKANDJUMP(ret < 0, pmi2_errno, PMI2_ERR_OTHER,
> "**intern %s", "failed to generate init line");
> 217
> 218         ret = PMI2U_writeline(PMI2_fd, buf);
> 219         PMI2U_ERR_CHKANDJUMP(ret < 0, pmi2_errno, PMI2_ERR_OTHER,
> "**pmi2_init_send");
> 220
> 221         ret = PMI2U_readline(PMI2_fd, buf, PMI2_MAXLINE);
> 222         PMI2U_ERR_CHKANDJUMP(ret < 0, pmi2_errno, PMI2_ERR_OTHER,
> "**pmi2_initack %s", strerror(pmi2_errno));
> 223
> 224         PMI2U_parse_keyvals(buf);
> 225         cmdline[0] = 0;
>
> So apps are hanged on waiting for responce from PMI Server while doing
> non-full "init".
>
> And in error output I see following messages:
> ------------ 8< ------------------------------------------------
> slurmd[cn01]: mpi/pmi2: request not begin with 'cmd='
> slurmd[cn01]: mpi/pmi2: full request is:
> slurmd[cn01]: mpi/pmi2: invalid client request
> ------------ 8< ------------------------------------------------
>
> If I attach befor second PMI2_Init call I can see that buf is no empty:
> ... [ GDB attach right before PMI2_Init] ....
> (gdb) n
> 21          rc = PMI2_Init(&spawned, &size, &rank, &appnum);
> ------------------------ 8< -------------------------------------
> (gdb)
> 203         if (PMI2_fd == -1) {
> (gdb) p PMI2_fd
> $1 = 12
> (gdb) n
> 215         ret = snprintf(buf, PMI2_MAXLINE, "cmd=init pmi_version=%d
> pmi_subversion=%d\n", PMI_VERSION, PMI_SUBVERSION);
> (gdb)
> 216         PMI2U_ERR_CHKANDJUMP(ret < 0, pmi2_errno, PMI2_ERR_OTHER,
> "**intern %s", "failed to generate init line");
> (gdb) p buf
> $2 = "cmd=init pmi_version=2 pmi_subversion=0\n\000mi_subversion\000 ...
> "...
>
> According to _handle_task_request SLURM uses following logic:
> _handle_task_request(int fd, int lrank)
> if (initialized[lrank] == 0) {
>  rc = _handle_pmi1_init(fd, lrank);
> initialized[lrank] = 1;
> } else if (is_pmi11()) {
>  rc = handle_pmi1_cmd(fd, lrank);
> } else if (is_pmi20()) {
> rc = handle_pmi2_cmd(fd, lrank);
>  }
> So once we call PMI2_Init first time we will route next duplicating
> request to handle_pmi2_cmd (since this is what we setup at first call).
> And finaly handle_pmi2_cmd uses safe_read (!!) in two steps:
>  safe_read(fd, len_buf, 6);
> len_buf[6] = '\0';
> len = atoi(len_buf);
>  buf = xmalloc(len + 1);
> safe_read(fd, buf, len);
> buf[len] = '\0';
>
> and having "cmd=init pmi_version=2
> pmi_subversion=0\n\000mi_subversion\000" we will cut first 6 symbols from
> it and get:
> len_buf="cmd=in\000"
> fd remains: "it pmi_version=2 pmi_subversion=0\n\000mi_subversion\000"
> len = atoi("cmd=in\000") = 0;
> And we then read 0-length buffer and return (as I can see in strerr). This
> will be repeated until we finish the buffer.
> However it doesn't explain why we hang but probably a good start to
> continue debuging.
>

Actually it does! PMI server refuses request and has nothing to tell to the
client. And the client waits forever for the response!


>
> *I think additional check in PMI2_Init on "already-initialized" case will
> solve the problem.*
>
>
> 2014-05-21 7:54 GMT+07:00 Artem Polyakov <[email protected]>:
>
>>
>>
>> среда, 21 мая 2014 г. пользователь Artem Polyakov написал:
>>
>>
>>>
>>> среда, 21 мая 2014 г. пользователь David Bigagli написал:
>>>
>>>>
>>>> The srun --mpi=pmi2 option has to be specified if openmpi was built
>>>> with the --with-pmi options otherwise Slurm will not load the pmi2 plugins
>>>> and the mpi job will fail in MPI_Init().
>>>
>>>
>>>
>> Also important to note that this is for SLURM 2.6.3. So not the
>> newest one. However the problem may remain in newer versions.
>>
>>  Thank you.
>>> I need to add that I am working on PMI support in Open MPI, so I go
>>> slightly deeper than regular user. So here is additional info:
>>>
>>> 1. I get appnum =-1 with --mpi=pmi2 option. Not sure if this is
>>> configuration specific or not. Other things like rank and size was correct.
>>>
>>> 2. I caught PMI2_Init hang during development when forgot to put the
>>> check on PMI-already-initialized case. I'm also not sure that this is not
>>> cluster-specific. But I am ready to help with additional debugging.
>>> However I have no admin access on my cluster thus can only debug at libpmi2
>>> level but not the server.
>>>
>>>
>>>
>>>> On 05/17/2014 07:50 PM, Artem Polyakov wrote:
>>>>
>>>>> Hello,
>>>>> Here is some related notes that I found during further investigation:
>>>>>
>>>>>
>>>>> 1. PMI2_Init returns appnum=-1 and this is what it gets from SLURM PMI
>>>>> server.
>>>>> 2. Application hangs if try to call PMI2_Init twice. I think this is
>>>>> due
>>>>> to lack of response from PMI2 server. Correct behavior assumes
>>>>> returning
>>>>> an error.
>>>>>
>>>>>
>>>>> 2014-05-07 9:44 GMT+07:00 Artem Polyakov <[email protected]
>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>>:
>>>>>
>>>>>     Hello, all.
>>>>>
>>>>>     I am experiencing problems with SLURM PMI2 support. Here is my
>>>>>     configuration:
>>>>>     1. SLURM 2.6.3
>>>>>     2. Open MPI current trunk (1.8.1 also affected).
>>>>>
>>>>>     Starting from 1.8.x Open MPI supports PMI2 and tries to use it
>>>>>     whenever it possible. However PMI2 mpi module is not guaranteed to
>>>>>     be enabled in conf and user can forgot to pass  --mpi=mpi2 option
>>>>> to
>>>>>     srun (this was my case initially). In this case Open MPI aborts
>>>>>     abnormaly because SLURMs PMI2 assumes that this is a singleton
>>>>>     application and leaves PMI_fd == -1. Also PMI won't init rank and
>>>>>     size in PMI2_Init().
>>>>>     Later Open MPI will call PMI2_Job_GetId which results in follwing
>>>>>     call of PMIi_WriteSimpleCommand:
>>>>>     PMIi_WriteSimpleCommand (fd=-1, resp=0x7fff11ed2890,
>>>>>     cmd=0x7f3a9388b748 "job-getid", pairs=0xdc7780, npairs=0) at
>>>>>     pmi2_api.c:1471
>>>>>
>>>>>     I checked other versions of SLURM ending with the latest and it
>>>>>     seems that this bug remains. Here is the fix for slurm-14.03.3-2. I
>>>>>     checked it's functionality on my slurm 2.6.3 installation.
>>>>>
>>>>>     --
>>>>>     С Уважением, Поляков Артем Юрьевич
>>>>>     Best regards, Artem Y. Polyakov
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> С Уважением, Поляков Артем Юрьевич
>>>>> Best regards, Artem Y. Polyakov
>>>>>
>>>>>
>>>>> --
>>>>> С Уважением, Поляков Артем Юрьевич
>>>>> Best regards, Artem Y. Polyakov
>>>>>
>>>>
>>>> --
>>>>
>>>> Thanks,
>>>>       /David/Bigagli
>>>>
>>>> www.schedmd.com
>>>>
>>>
>>>
>>> --
>>> С Уважением, Поляков Артем Юрьевич
>>> Best regards, Artem Y. Polyakov
>>>
>>
>>
>> --
>> С Уважением, Поляков Артем Юрьевич
>> Best regards, Artem Y. Polyakov
>>
>
>
>
> --
> С Уважением, Поляков Артем Юрьевич
> Best regards, Artem Y. Polyakov
>



-- 
С Уважением, Поляков Артем Юрьевич
Best regards, Artem Y. Polyakov

Reply via email to