Hi Artem, Do you know if a fix for this was ever committed? We ran into this with a code base that builds non-mpi apps with mpicc and then attempts to run then multiple times from within a single SLURM task.
-Aaron On Wed, May 21, 2014 at 9:12 AM, Artem Polyakov <[email protected]> wrote: > 2014-05-21 19:28 GMT+07:00 Hongjia Cao <[email protected]>: > >> >> You debugging and analysis is correct. >> >> PMI2_init() initialize PMI in two steps. First a PMI 1.1 init command is >> sent to the server and the version is negotiated with the server. After >> that a PMI 2.0 fullinit command is sent. Everything goes well so far. >> But since the version number is decided, the server do not expect >> another PMI 1.1 init command any more, which is in different format (see >> http://wiki.mpich.org/mpich/index.php/PMI_v2_Wire_Protocol). >> >> The mpi/pmi2 plugin does not implement all functions of the PMI2 >> protocol (http://wiki.mpich.org/mpich/index.php/PMI_v2_API) yet. I just >> tested it with MPICH programs. It's not clearly specified whether a >> program may call PMI2_init() twice. I think this could be handled more >> easily in the client side: just return the old values in the second >> call. >> > > I agree: check PMI2_initialized and return immediately if set to something > other than PMI2_UNINITIALIZED. > > >> >> >> 在 2014-05-20二的 20:52 -0700,Artem Polyakov写道: >> > 2. "Double init hang" problem: program pmi_double_init.c (attached) is >> > launched with script pmi_double_init.job (attached) and it just hangs. >> > Here is what GDB shows on one of the processes: >> > >> > (gdb) bt #0 0x0000003b722db730 in __read_nocancel () >> > from /lib64/libc.so.6 #1 0x00007f201cbd5ee4 in PMI2U_readline (fd=12, >> > buf=0x7fffa4f80ba0 "cmd=init pmi_version=2 pmi_subversion=0\n", >> > maxlen=1024) at pmi2_util.c:72 #2 0x00007f201cbcf74c in PMI2_Init >> > (spawned=0x7fffa4f81404, size=0x7fffa4f81400, rank=0x7fffa4f813fc, >> > appnum=0x7fffa4f813f8) at pmi2_api.c:221 #3 0x0000000000400626 in >> > main () at pmi_double_init.c:17 >> > >> > (gdb) frame 3 #3 0x0000000000400626 in main () at >> > pmi_double_init.c:17 17 rc = PMI2_Init(&spawned, &size, >> > &rank, &appnum); >> > >> > (gdb) frame 1 #1 0x00007f201cbd5ee4 in PMI2U_readline (fd=12, >> > buf=0x7fffa4f80ba0 "cmd=init pmi_version=2 pmi_subversion=0\n", >> > maxlen=1024) at pmi2_util.c:72 72 n = read(fd, >> > readbuf, sizeof(readbuf) - 1); (gdb) l 67 p = buf; 68 >> > curlen = 1; /* Make room for the null */ 69 while (curlen >> > < maxlen) { 70 if (nextChar == lastChar) { 71 >> > do { 72 n = read(fd, readbuf, >> > sizeof(readbuf) - 1); 73 } while (n == -1 && errno == >> > EINTR); 74 if (n == 0) { 75 /* >> > EOF */ 76 break; >> > >> > (gdb) frame 2 #2 0x00007f201cbcf74c in PMI2_Init >> > (spawned=0x7fffa4f81404, size=0x7fffa4f81400, rank=0x7fffa4f813fc, >> > appnum=0x7fffa4f813f8) at pmi2_api.c:221 221 ret = >> > PMI2U_readline(PMI2_fd, buf, PMI2_MAXLINE); (gdb) l 216 >> > PMI2U_ERR_CHKANDJUMP(ret < 0, pmi2_errno, PMI2_ERR_OTHER, "**intern % >> > s", "failed to generate init line"); 217 218 ret = >> > PMI2U_writeline(PMI2_fd, buf); 219 PMI2U_ERR_CHKANDJUMP(ret < >> > 0, pmi2_errno, PMI2_ERR_OTHER, "**pmi2_init_send"); 220 221 >> > ret = PMI2U_readline(PMI2_fd, buf, PMI2_MAXLINE); 222 >> > PMI2U_ERR_CHKANDJUMP(ret < 0, pmi2_errno, PMI2_ERR_OTHER, >> > "**pmi2_initack %s", strerror(pmi2_errno)); 223 224 >> > PMI2U_parse_keyvals(buf); 225 cmdline[0] = 0; >> > >> > So apps are hanged on waiting for responce from PMI Server while doing >> > non-full "init". >> > >> > And in error output I see following messages: ------------ 8< >> > ------------------------------------------------ slurmd[cn01]: >> > mpi/pmi2: request not begin with 'cmd=' slurmd[cn01]: mpi/pmi2: full >> > request is: slurmd[cn01]: mpi/pmi2: invalid client request >> > ------------ 8< ------------------------------------------------ >> > >> > >> > >> > If I attach befor second PMI2_Init call I can see that buf is no >> > empty: ... [ GDB attach right before PMI2_Init] .... (gdb) n 21 >> > rc = PMI2_Init(&spawned, &size, &rank, &appnum); >> > ------------------------ 8< ------------------------------------- >> > (gdb) 203 if (PMI2_fd == -1) { (gdb) p PMI2_fd $1 = 12 (gdb) >> > n 215 ret = snprintf(buf, PMI2_MAXLINE, "cmd=init pmi_version= >> > %d pmi_subversion=%d\n", PMI_VERSION, PMI_SUBVERSION); (gdb) 216 >> > PMI2U_ERR_CHKANDJUMP(ret < 0, pmi2_errno, PMI2_ERR_OTHER, "**intern % >> > s", "failed to generate init line"); (gdb) p buf $2 = "cmd=init >> > pmi_version=2 pmi_subversion=0\n\000mi_subversion\000 ... "... >> > >> > According to _handle_task_request SLURM uses following logic: >> > _handle_task_request(int fd, int lrank) if (initialized[lrank] == 0) >> > { rc = _handle_pmi1_init(fd, lrank); initialized[lrank] = 1; } else if >> > (is_pmi11()) { rc = handle_pmi1_cmd(fd, lrank); } else if (is_pmi20()) >> > { rc = handle_pmi2_cmd(fd, lrank); } So once we call PMI2_Init first >> > time we will route next duplicating request to handle_pmi2_cmd (since >> > this is what we setup at first call). And finaly handle_pmi2_cmd uses >> > safe_read (!!) in two steps: safe_read(fd, len_buf, 6); len_buf[6] = >> > '\0'; len = atoi(len_buf); buf = xmalloc(len + 1); safe_read(fd, buf, >> > len); buf[len] = '\0'; >> > >> > and having "cmd=init pmi_version=2 pmi_subversion=0\n\000mi_subversion >> > \000" we will cut first 6 symbols from it and get: len_buf="cmd=in >> > \000" fd remains: "it pmi_version=2 pmi_subversion=0\n >> > \000mi_subversion\000" len = atoi("cmd=in\000") = 0; And we then read >> > 0-length buffer and return (as I can see in strerr). This will be >> > repeated until we finish the buffer. However it doesn't explain why we >> > hang but probably a good start to continue debuging. >> > >> > I think additional check in PMI2_Init on "already-initialized" case >> > will solve the problem. >> > >> > >> > > > > -- > С Уважением, Поляков Артем Юрьевич > Best regards, Artem Y. Polyakov >
