for whatever it's worth running the test program on my OPA cluster seems to work. well it keeps spitting out [INFO MEMORY] lines, not sure if it's supposed to stop at some point
i'm running rhel7, gcc 10.1, openmpi 4.0.5rc2, with-ofi, without-{psm,ucx,verbs} On Tue, Jan 26, 2021 at 3:44 PM Patrick Begou via users <users@lists.open-mpi.org> wrote: > > Hi Michael > > indeed I'm a little bit lost with all these parameters in OpenMPI, mainly > because for years it works just fine out of the box in all my deployments on > various architectures, interconnects and linux flavor. Some weeks ago I > deploy OpenMPI4.0.5 in Centos8 with gcc10, slurm and UCX on an AMD epyc2 > cluster with connectX6, and it just works fine. It is the first time I've > such trouble to deploy this library. > > If you have my mail posted the 25/01/2021 in this discussion at 18h54 (may > be Paris TZ) there is a small test case attached that show the problem. Did > you got it or did the list strip these attachments ? I can provide it again. > > Many thanks > > Patrick > > Le 26/01/2021 à 19:25, Heinz, Michael William a écrit : > > Patrick how are you using original PSM if you’re using Omni-Path hardware? > The original PSM was written for QLogic DDR and QDR Infiniband adapters. > > As far as needing openib - the issue is that the PSM2 MTL doesn’t support a > subset of MPI operations that we previously used the pt2pt BTL for. For > recent version of OMPI, the preferred BTL to use with PSM2 is OFI. > > Is there any chance you can give us a sample MPI app that reproduces the > problem? I can’t think of another way I can give you more help without being > able to see what’s going on. It’s always possible there’s a bug in the PSM2 > MTL but it would be surprising at this point. > > Sent from my iPad > > On Jan 26, 2021, at 1:13 PM, Patrick Begou via users > <users@lists.open-mpi.org> wrote: > > > Hi all, > > I ran many tests today. I saw that an older 4.0.2 version of OpenMPI packaged > with Nix was running using openib. So I add the --with-verbs option to setup > this module. > > That I can see now is that: > > mpirun -hostfile $OAR_NODEFILE --mca mtl psm -mca btl_openib_allow_ib true > .... > > - the testcase test_layout_array is running without error > > - the bandwidth measured with osu_bw is half of thar it should be: > > # OSU MPI Bandwidth Test v5.7 > # Size Bandwidth (MB/s) > 1 0.54 > 2 1.13 > 4 2.26 > 8 4.51 > 16 9.06 > 32 17.93 > 64 33.87 > 128 69.29 > 256 161.24 > 512 333.82 > 1024 682.66 > 2048 1188.63 > 4096 1760.14 > 8192 2166.08 > 16384 2036.95 > 32768 3466.63 > 65536 6296.73 > 131072 7509.43 > 262144 9104.78 > 524288 6908.55 > 1048576 5530.37 > 2097152 4489.16 > 4194304 3498.14 > > mpirun -hostfile $OAR_NODEFILE --mca mtl psm2 -mca btl_openib_allow_ib true > ... > > - the testcase test_layout_array is not giving correct results > > - the bandwidth measured with osu_bw is the right one: > > # OSU MPI Bandwidth Test v5.7 > # Size Bandwidth (MB/s) > 1 3.73 > 2 7.96 > 4 15.82 > 8 31.22 > 16 51.52 > 32 107.61 > 64 196.51 > 128 438.66 > 256 817.70 > 512 1593.90 > 1024 2786.09 > 2048 4459.77 > 4096 6658.70 > 8192 8092.95 > 16384 8664.43 > 32768 8495.96 > 65536 11458.77 > 131072 12094.64 > 262144 11781.84 > 524288 12297.58 > 1048576 12346.92 > 2097152 12206.53 > 4194304 12167.00 > > But yes, I know openib is deprecated too in 4.0.5. > > Patrick > >