Re: [OMPI devel] OMPI BCOL hang with PMI1

2014-10-23 Thread Gilles Gouaillardet
Folks, i commited 248acbbc3ba06c2bef04f840e07816f71f864959 in order to fix a hang in coll/ml when using srun (both pmi1 and pmi2) could you please git it a try ? Cheers, Gilles On 2014/10/22 23:03, Joshua Ladd wrote: > Privet, Artem > > ML is the collective component that is invoking the

Re: [OMPI devel] OMPI BCOL hang with PMI1

2014-10-22 Thread Joshua Ladd
Privet, Artem ML is the collective component that is invoking the calls into BCOL. The triplet basesmuma,basesmuma,ptpcoll, for example, means I want three levels of hierarchy - socket level, UMA level, and then network level. I am guessing (only a guess after a quick glance) that maybe srun is

Re: [OMPI devel] OMPI BCOL hang with PMI1

2014-10-17 Thread Artem Polyakov
Hey, Lena :). 2014-10-17 22:07 GMT+07:00 Elena Elkina : > Hi Artem, > > Actually some time ago there was a known issue with coll ml. I used to run > my command lines with -mca coll ^ml to avoid these problems, so I don't > know if it was fixed or not. It looks like you

Re: [OMPI devel] OMPI BCOL hang with PMI1

2014-10-17 Thread Elena Elkina
Hi Artem, Actually some time ago there was a known issue with coll ml. I used to run my command lines with -mca coll ^ml to avoid these problems, so I don't know if it was fixed or not. It looks like you have the same problem. Best regards, Elena On Fri, Oct 17, 2014 at 7:01 PM, Artem Polyakov

Re: [OMPI devel] OMPI BCOL hang with PMI1

2014-10-17 Thread Artem Polyakov
Gilles, I checked your patch and it doesn't solve the problem I observe. I think the reason is somewhere else. 2014-10-17 19:13 GMT+07:00 Gilles Gouaillardet < gilles.gouaillar...@gmail.com>: > Artem, > > There is a known issue #235 with modex and i made PR #238 with a tentative > fix. > >

Re: [OMPI devel] OMPI BCOL hang with PMI1

2014-10-17 Thread Gilles Gouaillardet
Artem, There is a known issue #235 with modex and i made PR #238 with a tentative fix. Could you please give it a try and reports if it solves your problem ? Cheers Gilles Artem Polyakov wrote: >Hello, I have troubles with latest trunk if I use PMI1. > > >For example, if

[OMPI devel] OMPI BCOL hang with PMI1

2014-10-17 Thread Artem Polyakov
Hello, I have troubles with latest trunk if I use PMI1. For example, if I use 2 nodes the application hangs. See backtraces from both nodes below. From them I can see that second (non launching) node hangs in bcol component selection. Here is the default setting of bcol_base_string parameter: