You debugging and analysis is correct. PMI2_init() initialize PMI in two steps. First a PMI 1.1 init command is sent to the server and the version is negotiated with the server. After that a PMI 2.0 fullinit command is sent. Everything goes well so far. But since the version number is decided, the server do not expect another PMI 1.1 init command any more, which is in different format (see http://wiki.mpich.org/mpich/index.php/PMI_v2_Wire_Protocol).
The mpi/pmi2 plugin does not implement all functions of the PMI2 protocol (http://wiki.mpich.org/mpich/index.php/PMI_v2_API) yet. I just tested it with MPICH programs. It's not clearly specified whether a program may call PMI2_init() twice. I think this could be handled more easily in the client side: just return the old values in the second call. 在 2014-05-20二的 20:52 -0700,Artem Polyakov写道: > 2. "Double init hang" problem: program pmi_double_init.c (attached) is > launched with script pmi_double_init.job (attached) and it just hangs. > Here is what GDB shows on one of the processes: > > (gdb) bt #0 0x0000003b722db730 in __read_nocancel () > from /lib64/libc.so.6 #1 0x00007f201cbd5ee4 in PMI2U_readline (fd=12, > buf=0x7fffa4f80ba0 "cmd=init pmi_version=2 pmi_subversion=0\n", > maxlen=1024) at pmi2_util.c:72 #2 0x00007f201cbcf74c in PMI2_Init > (spawned=0x7fffa4f81404, size=0x7fffa4f81400, rank=0x7fffa4f813fc, > appnum=0x7fffa4f813f8) at pmi2_api.c:221 #3 0x0000000000400626 in > main () at pmi_double_init.c:17 > > (gdb) frame 3 #3 0x0000000000400626 in main () at > pmi_double_init.c:17 17 rc = PMI2_Init(&spawned, &size, > &rank, &appnum); > > (gdb) frame 1 #1 0x00007f201cbd5ee4 in PMI2U_readline (fd=12, > buf=0x7fffa4f80ba0 "cmd=init pmi_version=2 pmi_subversion=0\n", > maxlen=1024) at pmi2_util.c:72 72 n = read(fd, > readbuf, sizeof(readbuf) - 1); (gdb) l 67 p = buf; 68 > curlen = 1; /* Make room for the null */ 69 while (curlen > < maxlen) { 70 if (nextChar == lastChar) { 71 > do { 72 n = read(fd, readbuf, > sizeof(readbuf) - 1); 73 } while (n == -1 && errno == > EINTR); 74 if (n == 0) { 75 /* > EOF */ 76 break; > > (gdb) frame 2 #2 0x00007f201cbcf74c in PMI2_Init > (spawned=0x7fffa4f81404, size=0x7fffa4f81400, rank=0x7fffa4f813fc, > appnum=0x7fffa4f813f8) at pmi2_api.c:221 221 ret = > PMI2U_readline(PMI2_fd, buf, PMI2_MAXLINE); (gdb) l 216 > PMI2U_ERR_CHKANDJUMP(ret < 0, pmi2_errno, PMI2_ERR_OTHER, "**intern % > s", "failed to generate init line"); 217 218 ret = > PMI2U_writeline(PMI2_fd, buf); 219 PMI2U_ERR_CHKANDJUMP(ret < > 0, pmi2_errno, PMI2_ERR_OTHER, "**pmi2_init_send"); 220 221 > ret = PMI2U_readline(PMI2_fd, buf, PMI2_MAXLINE); 222 > PMI2U_ERR_CHKANDJUMP(ret < 0, pmi2_errno, PMI2_ERR_OTHER, > "**pmi2_initack %s", strerror(pmi2_errno)); 223 224 > PMI2U_parse_keyvals(buf); 225 cmdline[0] = 0; > > So apps are hanged on waiting for responce from PMI Server while doing > non-full "init". > > And in error output I see following messages: ------------ 8< > ------------------------------------------------ slurmd[cn01]: > mpi/pmi2: request not begin with 'cmd=' slurmd[cn01]: mpi/pmi2: full > request is: slurmd[cn01]: mpi/pmi2: invalid client request > ------------ 8< ------------------------------------------------ > > > > If I attach befor second PMI2_Init call I can see that buf is no > empty: ... [ GDB attach right before PMI2_Init] .... (gdb) n 21 > rc = PMI2_Init(&spawned, &size, &rank, &appnum); > ------------------------ 8< ------------------------------------- > (gdb) 203 if (PMI2_fd == -1) { (gdb) p PMI2_fd $1 = 12 (gdb) > n 215 ret = snprintf(buf, PMI2_MAXLINE, "cmd=init pmi_version= > %d pmi_subversion=%d\n", PMI_VERSION, PMI_SUBVERSION); (gdb) 216 > PMI2U_ERR_CHKANDJUMP(ret < 0, pmi2_errno, PMI2_ERR_OTHER, "**intern % > s", "failed to generate init line"); (gdb) p buf $2 = "cmd=init > pmi_version=2 pmi_subversion=0\n\000mi_subversion\000 ... "... > > According to _handle_task_request SLURM uses following logic: > _handle_task_request(int fd, int lrank) if (initialized[lrank] == 0) > { rc = _handle_pmi1_init(fd, lrank); initialized[lrank] = 1; } else if > (is_pmi11()) { rc = handle_pmi1_cmd(fd, lrank); } else if (is_pmi20()) > { rc = handle_pmi2_cmd(fd, lrank); } So once we call PMI2_Init first > time we will route next duplicating request to handle_pmi2_cmd (since > this is what we setup at first call). And finaly handle_pmi2_cmd uses > safe_read (!!) in two steps: safe_read(fd, len_buf, 6); len_buf[6] = > '\0'; len = atoi(len_buf); buf = xmalloc(len + 1); safe_read(fd, buf, > len); buf[len] = '\0'; > > and having "cmd=init pmi_version=2 pmi_subversion=0\n\000mi_subversion > \000" we will cut first 6 symbols from it and get: len_buf="cmd=in > \000" fd remains: "it pmi_version=2 pmi_subversion=0\n > \000mi_subversion\000" len = atoi("cmd=in\000") = 0; And we then read > 0-length buffer and return (as I can see in strerr). This will be > repeated until we finish the buffer. However it doesn't explain why we > hang but probably a good start to continue debuging. > > I think additional check in PMI2_Init on "already-initialized" case > will solve the problem. > >
