You debugging and analysis is correct.

PMI2_init() initialize PMI in two steps. First a PMI 1.1 init command is
sent to the server and the version is negotiated with the server. After
that a PMI 2.0 fullinit command is sent. Everything goes well so far.
But since the version number is decided, the server do not expect
another PMI 1.1 init command any more, which is in different format (see
http://wiki.mpich.org/mpich/index.php/PMI_v2_Wire_Protocol).

The mpi/pmi2 plugin does not implement all functions of the PMI2
protocol (http://wiki.mpich.org/mpich/index.php/PMI_v2_API) yet. I just
tested it with MPICH programs. It's not clearly specified whether a
program may call PMI2_init() twice. I think this could be handled more
easily in the client side: just return the old values in the second
call.


在 2014-05-20二的 20:52 -0700,Artem Polyakov写道:
> 2. "Double init hang" problem: program pmi_double_init.c (attached) is
> launched with script pmi_double_init.job (attached) and it just hangs.
> Here is what GDB shows on one of the processes: 
> 
> (gdb) bt #0  0x0000003b722db730 in __read_nocancel ()
> from /lib64/libc.so.6 #1  0x00007f201cbd5ee4 in PMI2U_readline (fd=12,
> buf=0x7fffa4f80ba0 "cmd=init pmi_version=2 pmi_subversion=0\n",
> maxlen=1024) at pmi2_util.c:72 #2  0x00007f201cbcf74c in PMI2_Init
> (spawned=0x7fffa4f81404, size=0x7fffa4f81400, rank=0x7fffa4f813fc,
> appnum=0x7fffa4f813f8) at pmi2_api.c:221 #3  0x0000000000400626 in
> main () at pmi_double_init.c:17 
> 
> (gdb) frame 3 #3  0x0000000000400626 in main () at
> pmi_double_init.c:17 17          rc = PMI2_Init(&spawned, &size,
> &rank, &appnum); 
> 
> (gdb) frame 1 #1  0x00007f201cbd5ee4 in PMI2U_readline (fd=12,
> buf=0x7fffa4f80ba0 "cmd=init pmi_version=2 pmi_subversion=0\n",
> maxlen=1024) at pmi2_util.c:72 72                      n = read(fd,
> readbuf, sizeof(readbuf) - 1); (gdb) l 67          p = buf; 68
>    curlen = 1; /* Make room for the null */ 69          while (curlen
> < maxlen) { 70              if (nextChar == lastChar) { 71
>        do { 72                      n = read(fd, readbuf,
> sizeof(readbuf) - 1); 73                  } while (n == -1 && errno ==
> EINTR); 74                  if (n == 0) { 75                      /*
> EOF */ 76                      break; 
> 
> (gdb) frame 2 #2  0x00007f201cbcf74c in PMI2_Init
> (spawned=0x7fffa4f81404, size=0x7fffa4f81400, rank=0x7fffa4f813fc,
> appnum=0x7fffa4f813f8) at pmi2_api.c:221 221         ret =
> PMI2U_readline(PMI2_fd, buf, PMI2_MAXLINE); (gdb) l 216
> PMI2U_ERR_CHKANDJUMP(ret < 0, pmi2_errno, PMI2_ERR_OTHER, "**intern %
> s", "failed to generate init line"); 217 218         ret =
> PMI2U_writeline(PMI2_fd, buf); 219         PMI2U_ERR_CHKANDJUMP(ret <
> 0, pmi2_errno, PMI2_ERR_OTHER, "**pmi2_init_send"); 220 221
> ret = PMI2U_readline(PMI2_fd, buf, PMI2_MAXLINE); 222
> PMI2U_ERR_CHKANDJUMP(ret < 0, pmi2_errno, PMI2_ERR_OTHER,
> "**pmi2_initack %s", strerror(pmi2_errno)); 223 224
> PMI2U_parse_keyvals(buf); 225         cmdline[0] = 0; 
> 
> So apps are hanged on waiting for responce from PMI Server while doing
> non-full "init". 
> 
> And in error output I see following messages: ------------ 8<
> ------------------------------------------------ slurmd[cn01]:
> mpi/pmi2: request not begin with 'cmd=' slurmd[cn01]: mpi/pmi2: full
> request is:  slurmd[cn01]: mpi/pmi2: invalid client request
> ------------ 8< ------------------------------------------------
> 
> 
> 
> If I attach befor second PMI2_Init call I can see that buf is no
> empty: ... [ GDB attach right before PMI2_Init] .... (gdb) n 21
>    rc = PMI2_Init(&spawned, &size, &rank, &appnum);
> ------------------------ 8< -------------------------------------
> (gdb)  203         if (PMI2_fd == -1) { (gdb) p PMI2_fd $1 = 12 (gdb)
> n 215         ret = snprintf(buf, PMI2_MAXLINE, "cmd=init pmi_version=
> %d pmi_subversion=%d\n", PMI_VERSION, PMI_SUBVERSION); (gdb)  216
> PMI2U_ERR_CHKANDJUMP(ret < 0, pmi2_errno, PMI2_ERR_OTHER, "**intern %
> s", "failed to generate init line"); (gdb) p buf $2 = "cmd=init
> pmi_version=2 pmi_subversion=0\n\000mi_subversion\000 ... "... 
> 
> According to _handle_task_request SLURM uses following logic:
> _handle_task_request(int fd, int lrank) if (initialized[lrank] == 0)
> { rc = _handle_pmi1_init(fd, lrank); initialized[lrank] = 1; } else if
> (is_pmi11()) { rc = handle_pmi1_cmd(fd, lrank); } else if (is_pmi20())
> { rc = handle_pmi2_cmd(fd, lrank); } So once we call PMI2_Init first
> time we will route next duplicating request to handle_pmi2_cmd (since
> this is what we setup at first call).  And finaly handle_pmi2_cmd uses
> safe_read (!!) in two steps: safe_read(fd, len_buf, 6); len_buf[6] =
> '\0'; len = atoi(len_buf); buf = xmalloc(len + 1); safe_read(fd, buf,
> len); buf[len] = '\0'; 
> 
> and having "cmd=init pmi_version=2 pmi_subversion=0\n\000mi_subversion
> \000" we will cut first 6 symbols from it and get: len_buf="cmd=in
> \000" fd remains: "it pmi_version=2 pmi_subversion=0\n
> \000mi_subversion\000" len = atoi("cmd=in\000") = 0; And we then read
> 0-length buffer and return (as I can see in strerr). This will be
> repeated until we finish the buffer. However it doesn't explain why we
> hang but probably a good start to continue debuging. 
> 
> I think additional check in PMI2_Init on "already-initialized" case
> will solve the problem. 
> 
> 

Reply via email to