Re: [OMPI users] now 1.9 [was: I have still a problem withrankfiles in openmpi-1.6.4rc3]
On 2/10/2013 1:14 AM, Siegmar Gross wrote: I don't think the problem is related to Solaris. I think it's also on Linux. E.g., I can reproduce the problem with 1.9a1r28035 on Linux using GCC compilers. Siegmar: can you confirm this is a problem also on Linux? E.g., with OMPI 1.9, on one of your Linux nodes (linpc0?) try % cat myrankfile rank 0=linpc0 slot=0:1 % mpirun --report-bindings --rankfile myrankfile numactl --show For me, the binding I get is not 0:1 but 0,1. I get the following outputs for openmpi-1.6.4rc4 (without your patch) Okay thanks, but 1.6 is not the issue here. There is something going on in 1.9/trunk that is very different. Thanks for the 1.6 output, but it's all right. and openmpi-1.9 (both compiled with Sun C 5.12). Thanks for the confirmation. You, too, are showing Linux demonstrating this problem. It looks like bindings are wrong in 1.9. Ralph says he's taking a look. The rankfile says "0:1", but you're getting "0,1". linpc1 rankfiles 96 mpirun --report-bindings --rankfile rf_1_linux numactl --show [linpc1:16061] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B][./.] physcpubind: 0 1 linpc1 rankfiles 97 ompi_info | grep "MPI:" Open MPI: 1.9a1r28035
Re: [OMPI users] now 1.9 [was: I have still a problem withrankfiles in openmpi-1.6.4rc3]
Hi > > You'll want to look at orte/mca/rmaps/rank_file/rmaps_rank_file.c > > - the bit map is now computed in mpirun and then sent to the daemons > > Actually, I'm getting lost in this code. Anyhow, I don't think > the problem is related to Solaris. I think it's also on Linux. > E.g., I can reproduce the problem with 1.9a1r28035 on Linux using > GCC compilers. > > Siegmar: can you confirm this is a problem also on Linux? E.g., > with OMPI 1.9, on one of your Linux nodes (linpc0?) try > > % cat myrankfile > rank 0=linpc0 slot=0:1 > % mpirun --report-bindings --rankfile myrankfile numactl --show > > For me, the binding I get is not 0:1 but 0,1. I get the following outputs for openmpi-1.6.4rc4 (without your patch) and openmpi-1.9 (both compiled with Sun C 5.12). linpc1 rankfiles 96 cat rf_1_linux rank 0=linpc1 slot=0:1 linpc1 rankfiles 97 mpirun --report-bindings --rankfile rf_1_linux \ numactl --show [linpc1:15882] MCW rank 0 bound to socket 0[core 1]: [. B][. .] (slot list 0:1) policy: preferred preferred node: 0 physcpubind: 1 cpubind: 0 nodebind: 0 membind: 0 1 linpc1 rankfiles 98 ompi_info | grep "MPI:" Open MPI: 1.6.4rc4r28022 linpc1 rankfiles 96 mpirun --report-bindings --rankfile rf_1_linux \ numactl --show [linpc1:16061] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B][./.] policy: default preferred node: current physcpubind: 0 1 cpubind: 0 nodebind: 0 membind: 0 1 linpc1 rankfiles 97 ompi_info | grep "MPI:" Open MPI: 1.9a1r28035 Today I will build the current versions of openmpi-1.6.4rc4 and openmpi-1.9, so that I can test a rankfile with two machines tomorrow. Kind regards Siegmar