2014-05-21 10:50 GMT+07:00 Artem Polyakov <[email protected]>: > Here is an exact examples: > > 1. "appnum = -1" problem: > Program pmi_appnum.c (attached) is allocated using batch script > pmi_appnum.job (attached) and produces following results: > > PMI2_Init(0, 16, 0, -1) > PMI2_Init(0, 16, 1, -1) > PMI2_Init(0, 16, 2, -1) > PMI2_Init(0, 16, 3, -1) > PMI2_Init(0, 16, 5, -1) > PMI2_Init(0, 16, 9, -1) > PMI2_Init(0, 16, 12, -1) > PMI2_Init(0, 16, 7, -1) > PMI2_Init(0, 16, 4, -1) > PMI2_Init(0, 16, 6, -1) > PMI2_Init(0, 16, 11, -1) > PMI2_Init(0, 16, 10, -1) > PMI2_Init(0, 16, 8, -1) > PMI2_Init(0, 16, 14, -1) > PMI2_Init(0, 16, 15, -1) > PMI2_Init(0, 16, 13, -1) > > 2. "Double init hang" problem: > program pmi_double_init.c (attached) is launched with script > pmi_double_init.job (attached) and it just hangs. Here is what GDB shows on > one of the processes: > > (gdb) bt > #0 0x0000003b722db730 in __read_nocancel () from /lib64/libc.so.6 > #1 0x00007f201cbd5ee4 in PMI2U_readline (fd=12, buf=0x7fffa4f80ba0 > "cmd=init pmi_version=2 pmi_subversion=0\n", maxlen=1024) at pmi2_util.c:72 > #2 0x00007f201cbcf74c in PMI2_Init (spawned=0x7fffa4f81404, > size=0x7fffa4f81400, rank=0x7fffa4f813fc, appnum=0x7fffa4f813f8) at > pmi2_api.c:221 > #3 0x0000000000400626 in main () at pmi_double_init.c:17 > > (gdb) frame 3 > #3 0x0000000000400626 in main () at pmi_double_init.c:17 > 17 rc = PMI2_Init(&spawned, &size, &rank, &appnum); > > (gdb) frame 1 > #1 0x00007f201cbd5ee4 in PMI2U_readline (fd=12, buf=0x7fffa4f80ba0 > "cmd=init pmi_version=2 pmi_subversion=0\n", maxlen=1024) at pmi2_util.c:72 > 72 n = read(fd, readbuf, sizeof(readbuf) - 1); > (gdb) l > 67 p = buf; > 68 curlen = 1; /* Make room for the null */ > 69 while (curlen < maxlen) { > 70 if (nextChar == lastChar) { > 71 do { > 72 n = read(fd, readbuf, sizeof(readbuf) - 1); > 73 } while (n == -1 && errno == EINTR); > 74 if (n == 0) { > 75 /* EOF */ > 76 break; > > (gdb) frame 2 > #2 0x00007f201cbcf74c in PMI2_Init (spawned=0x7fffa4f81404, > size=0x7fffa4f81400, rank=0x7fffa4f813fc, appnum=0x7fffa4f813f8) at > pmi2_api.c:221 > 221 ret = PMI2U_readline(PMI2_fd, buf, PMI2_MAXLINE); > (gdb) l > 216 PMI2U_ERR_CHKANDJUMP(ret < 0, pmi2_errno, PMI2_ERR_OTHER, > "**intern %s", "failed to generate init line"); > 217 > 218 ret = PMI2U_writeline(PMI2_fd, buf); > 219 PMI2U_ERR_CHKANDJUMP(ret < 0, pmi2_errno, PMI2_ERR_OTHER, > "**pmi2_init_send"); > 220 > 221 ret = PMI2U_readline(PMI2_fd, buf, PMI2_MAXLINE); > 222 PMI2U_ERR_CHKANDJUMP(ret < 0, pmi2_errno, PMI2_ERR_OTHER, > "**pmi2_initack %s", strerror(pmi2_errno)); > 223 > 224 PMI2U_parse_keyvals(buf); > 225 cmdline[0] = 0; > > So apps are hanged on waiting for responce from PMI Server while doing > non-full "init". > > And in error output I see following messages: > ------------ 8< ------------------------------------------------ > slurmd[cn01]: mpi/pmi2: request not begin with 'cmd=' > slurmd[cn01]: mpi/pmi2: full request is: > slurmd[cn01]: mpi/pmi2: invalid client request > ------------ 8< ------------------------------------------------ > > If I attach befor second PMI2_Init call I can see that buf is no empty: > ... [ GDB attach right before PMI2_Init] .... > (gdb) n > 21 rc = PMI2_Init(&spawned, &size, &rank, &appnum); > ------------------------ 8< ------------------------------------- > (gdb) > 203 if (PMI2_fd == -1) { > (gdb) p PMI2_fd > $1 = 12 > (gdb) n > 215 ret = snprintf(buf, PMI2_MAXLINE, "cmd=init pmi_version=%d > pmi_subversion=%d\n", PMI_VERSION, PMI_SUBVERSION); > (gdb) > 216 PMI2U_ERR_CHKANDJUMP(ret < 0, pmi2_errno, PMI2_ERR_OTHER, > "**intern %s", "failed to generate init line"); > (gdb) p buf > $2 = "cmd=init pmi_version=2 pmi_subversion=0\n\000mi_subversion\000 ... > "... > > According to _handle_task_request SLURM uses following logic: > _handle_task_request(int fd, int lrank) > if (initialized[lrank] == 0) { > rc = _handle_pmi1_init(fd, lrank); > initialized[lrank] = 1; > } else if (is_pmi11()) { > rc = handle_pmi1_cmd(fd, lrank); > } else if (is_pmi20()) { > rc = handle_pmi2_cmd(fd, lrank); > } > So once we call PMI2_Init first time we will route next duplicating > request to handle_pmi2_cmd (since this is what we setup at first call). > And finaly handle_pmi2_cmd uses safe_read (!!) in two steps: > safe_read(fd, len_buf, 6); > len_buf[6] = '\0'; > len = atoi(len_buf); > buf = xmalloc(len + 1); > safe_read(fd, buf, len); > buf[len] = '\0'; > > and having "cmd=init pmi_version=2 > pmi_subversion=0\n\000mi_subversion\000" we will cut first 6 symbols from > it and get: > len_buf="cmd=in\000" > fd remains: "it pmi_version=2 pmi_subversion=0\n\000mi_subversion\000" > len = atoi("cmd=in\000") = 0; > And we then read 0-length buffer and return (as I can see in strerr). This > will be repeated until we finish the buffer. > However it doesn't explain why we hang but probably a good start to > continue debuging. >
Actually it does! PMI server refuses request and has nothing to tell to the client. And the client waits forever for the response! > > *I think additional check in PMI2_Init on "already-initialized" case will > solve the problem.* > > > 2014-05-21 7:54 GMT+07:00 Artem Polyakov <[email protected]>: > >> >> >> среда, 21 мая 2014 г. пользователь Artem Polyakov написал: >> >> >>> >>> среда, 21 мая 2014 г. пользователь David Bigagli написал: >>> >>>> >>>> The srun --mpi=pmi2 option has to be specified if openmpi was built >>>> with the --with-pmi options otherwise Slurm will not load the pmi2 plugins >>>> and the mpi job will fail in MPI_Init(). >>> >>> >>> >> Also important to note that this is for SLURM 2.6.3. So not the >> newest one. However the problem may remain in newer versions. >> >> Thank you. >>> I need to add that I am working on PMI support in Open MPI, so I go >>> slightly deeper than regular user. So here is additional info: >>> >>> 1. I get appnum =-1 with --mpi=pmi2 option. Not sure if this is >>> configuration specific or not. Other things like rank and size was correct. >>> >>> 2. I caught PMI2_Init hang during development when forgot to put the >>> check on PMI-already-initialized case. I'm also not sure that this is not >>> cluster-specific. But I am ready to help with additional debugging. >>> However I have no admin access on my cluster thus can only debug at libpmi2 >>> level but not the server. >>> >>> >>> >>>> On 05/17/2014 07:50 PM, Artem Polyakov wrote: >>>> >>>>> Hello, >>>>> Here is some related notes that I found during further investigation: >>>>> >>>>> >>>>> 1. PMI2_Init returns appnum=-1 and this is what it gets from SLURM PMI >>>>> server. >>>>> 2. Application hangs if try to call PMI2_Init twice. I think this is >>>>> due >>>>> to lack of response from PMI2 server. Correct behavior assumes >>>>> returning >>>>> an error. >>>>> >>>>> >>>>> 2014-05-07 9:44 GMT+07:00 Artem Polyakov <[email protected] >>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>>: >>>>> >>>>> Hello, all. >>>>> >>>>> I am experiencing problems with SLURM PMI2 support. Here is my >>>>> configuration: >>>>> 1. SLURM 2.6.3 >>>>> 2. Open MPI current trunk (1.8.1 also affected). >>>>> >>>>> Starting from 1.8.x Open MPI supports PMI2 and tries to use it >>>>> whenever it possible. However PMI2 mpi module is not guaranteed to >>>>> be enabled in conf and user can forgot to pass --mpi=mpi2 option >>>>> to >>>>> srun (this was my case initially). In this case Open MPI aborts >>>>> abnormaly because SLURMs PMI2 assumes that this is a singleton >>>>> application and leaves PMI_fd == -1. Also PMI won't init rank and >>>>> size in PMI2_Init(). >>>>> Later Open MPI will call PMI2_Job_GetId which results in follwing >>>>> call of PMIi_WriteSimpleCommand: >>>>> PMIi_WriteSimpleCommand (fd=-1, resp=0x7fff11ed2890, >>>>> cmd=0x7f3a9388b748 "job-getid", pairs=0xdc7780, npairs=0) at >>>>> pmi2_api.c:1471 >>>>> >>>>> I checked other versions of SLURM ending with the latest and it >>>>> seems that this bug remains. Here is the fix for slurm-14.03.3-2. I >>>>> checked it's functionality on my slurm 2.6.3 installation. >>>>> >>>>> -- >>>>> С Уважением, Поляков Артем Юрьевич >>>>> Best regards, Artem Y. Polyakov >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> С Уважением, Поляков Артем Юрьевич >>>>> Best regards, Artem Y. Polyakov >>>>> >>>>> >>>>> -- >>>>> С Уважением, Поляков Артем Юрьевич >>>>> Best regards, Artem Y. Polyakov >>>>> >>>> >>>> -- >>>> >>>> Thanks, >>>> /David/Bigagli >>>> >>>> www.schedmd.com >>>> >>> >>> >>> -- >>> С Уважением, Поляков Артем Юрьевич >>> Best regards, Artem Y. Polyakov >>> >> >> >> -- >> С Уважением, Поляков Артем Юрьевич >> Best regards, Artem Y. Polyakov >> > > > > -- > С Уважением, Поляков Артем Юрьевич > Best regards, Artem Y. Polyakov > -- С Уважением, Поляков Артем Юрьевич Best regards, Artem Y. Polyakov
