Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-20 Thread tmishima
I'm not sure, but I guess it's related to Gilles's ticket. It's a quite bad binding pattern as Ralph pointed out, so checking for that condition and disqualifying coll/ml could be a practical solution as well. Tetsuya > It is related, but it means that coll/ml has a higher degree of sensitivity

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-20 Thread Gilles Gouaillardet
Ralph, my test VM is single socket four cores. here is something odd i just found when running mpirun -np 2 intercomm_create. tasks [0,1] are bound on cpus [0,1] => OK tasks[2-3] (first spawn) are bound on cpus [2,3] => OK tasks[4-5] (second spawn) are not bound (and cpuset is [0-3]) => OK in omp

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-20 Thread Gilles Gouaillardet
Ralph, Here is attached a patch that fixes/works around my issue. this is more of a proof of concept, so i did not commit it to the trunk. basically : opal_hwloc_base_get_relative_locality (topo, set1, set2) sets the locality based on the deepest element that is part of both set1 and set2. in m

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-20 Thread Ralph Castain
Hmmm...this is a tough one. It basically comes down to what we mean by relative locality. Initially, we meant "at what level do these procs share cpus" - however, coll/ml is using it as "at what level are these procs commonly bound". Subtle difference, but significant. Your proposed version imp

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-20 Thread tmishima
Hi Ralph, By the way, something is wrong with your latest rmaps_rank_file.c. I've got the error below. I'm tring to find the problem. But, you could find it more quickly... [mishima@manage trial]$ cat rankfile rank 0=node05 slot=0-1 rank 1=node05 slot=3-4 rank 2=node05 slot=6-7 [mishima@manage

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-20 Thread Ralph Castain
Should be fixed with r32058 On Jun 20, 2014, at 4:13 AM, tmish...@jcity.maeda.co.jp wrote: > > > Hi Ralph, > > By the way, something is wrong with your latest rmaps_rank_file.c. > I've got the error below. I'm tring to find the problem. But, you > could find it more quickly... > > [mishima@m

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-20 Thread tmishima
Thanks Ralph. I'll check it on next Monday. Tetsuya > Should be fixed with r32058 > > > On Jun 20, 2014, at 4:13 AM, tmish...@jcity.maeda.co.jp wrote: > > > > > > > Hi Ralph, > > > > By the way, something is wrong with your latest rmaps_rank_file.c. > > I've got the error below. I'm tring to fi