Re: [OMPI users] now 1.9 [was: I have still a problem withrankfiles in openmpi-1.6.4rc3]

2013-02-10 Thread Eugene Loh


On 2/10/2013 1:14 AM, Siegmar Gross wrote:

I don't think the problem is related to Solaris.  I think it's also on Linux.
E.g., I can reproduce the problem with 1.9a1r28035 on Linux using GCC compilers.

Siegmar: can you confirm this is a problem also on Linux?  E.g.,
with OMPI 1.9, on one of your Linux nodes (linpc0?) try

  % cat myrankfile
  rank 0=linpc0 slot=0:1
  % mpirun --report-bindings --rankfile myrankfile numactl --show

For me, the binding I get is not 0:1 but 0,1.

I get the following outputs for openmpi-1.6.4rc4 (without your patch)


Okay thanks, but 1.6 is not the issue here.  There is something going on 
in 1.9/trunk that is very different.  Thanks for the 1.6 output, but 
it's all right.



and openmpi-1.9 (both compiled with Sun C 5.12).


Thanks for the confirmation.  You, too, are showing Linux demonstrating 
this problem.  It looks like bindings are wrong in 1.9.  Ralph says he's 
taking a look.  The rankfile says "0:1", but you're getting "0,1".



linpc1 rankfiles 96 mpirun --report-bindings --rankfile rf_1_linux numactl 
--show
[linpc1:16061] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 
0]]: [B/B][./.]
physcpubind: 0 1
linpc1 rankfiles 97 ompi_info | grep "MPI:"
 Open MPI: 1.9a1r28035


Re: [OMPI users] now 1.9 [was: I have still a problem withrankfiles in openmpi-1.6.4rc3]

2013-02-10 Thread Siegmar Gross
Hi

> > You'll want to look at orte/mca/rmaps/rank_file/rmaps_rank_file.c 
> > - the bit map is now computed in mpirun and then sent to the daemons
> 
> Actually, I'm getting lost in this code.  Anyhow, I don't think
> the problem is related to Solaris.  I think it's also on Linux. 
> E.g., I can reproduce the problem with 1.9a1r28035 on Linux using 
> GCC compilers.
> 
> Siegmar: can you confirm this is a problem also on Linux?  E.g.,
> with OMPI 1.9, on one of your Linux nodes (linpc0?) try
> 
>  % cat myrankfile
>  rank 0=linpc0 slot=0:1
>  % mpirun --report-bindings --rankfile myrankfile numactl --show
> 
> For me, the binding I get is not 0:1 but 0,1.

I get the following outputs for openmpi-1.6.4rc4 (without your patch)
and openmpi-1.9 (both compiled with Sun C 5.12).

linpc1 rankfiles 96 cat rf_1_linux
rank 0=linpc1 slot=0:1
linpc1 rankfiles 97 mpirun --report-bindings --rankfile rf_1_linux \
  numactl --show
[linpc1:15882] MCW rank 0 bound to socket 0[core 1]:
  [. B][. .] (slot list 0:1)
policy: preferred
preferred node: 0
physcpubind: 1 
cpubind: 0 
nodebind: 0 
membind: 0 1 
linpc1 rankfiles 98 ompi_info | grep "MPI:"
Open MPI: 1.6.4rc4r28022



linpc1 rankfiles 96 mpirun --report-bindings --rankfile rf_1_linux \
  numactl --show
[linpc1:16061] MCW rank 0 bound to socket 0[core 0[hwt 0]],
  socket 0[core 1[hwt 0]]: [B/B][./.]
policy: default
preferred node: current
physcpubind: 0 1 
cpubind: 0 
nodebind: 0 
membind: 0 1 
linpc1 rankfiles 97 ompi_info | grep "MPI:"
Open MPI: 1.9a1r28035


Today I will build the current versions of openmpi-1.6.4rc4 and
openmpi-1.9, so that I can test a rankfile with two machines
tomorrow.


Kind regards

Siegmar