for whatever it's worth running the test program on my OPA cluster
seems to work.  well it keeps spitting out [INFO MEMORY] lines, not
sure if it's supposed to stop at some point

i'm running rhel7, gcc 10.1, openmpi 4.0.5rc2, with-ofi, without-{psm,ucx,verbs}

On Tue, Jan 26, 2021 at 3:44 PM Patrick Begou via users
<users@lists.open-mpi.org> wrote:
>
> Hi Michael
>
> indeed I'm a little bit lost with all these parameters in OpenMPI, mainly 
> because for years it works just fine out of the box in all my deployments on 
> various architectures, interconnects and linux flavor. Some weeks ago I 
> deploy OpenMPI4.0.5 in Centos8 with gcc10, slurm and UCX on an AMD epyc2 
> cluster with connectX6, and it just works fine.  It is the first time I've 
> such trouble to deploy this library.
>
> If you have my mail posted  the 25/01/2021 in this discussion at 18h54 (may 
> be Paris TZ) there is a small test case attached that show the problem. Did 
> you got it or did the list strip these attachments ? I can provide it again.
>
> Many thanks
>
> Patrick
>
> Le 26/01/2021 à 19:25, Heinz, Michael William a écrit :
>
> Patrick how are you using original PSM if you’re using Omni-Path hardware? 
> The original PSM was written for QLogic DDR and QDR Infiniband adapters.
>
> As far as needing openib - the issue is that the PSM2 MTL doesn’t support a 
> subset of MPI operations that we previously used the pt2pt BTL for. For 
> recent version of OMPI, the preferred BTL to use with PSM2 is OFI.
>
> Is there any chance you can give us a sample MPI app that reproduces the 
> problem? I can’t think of another way I can give you more help without being 
> able to see what’s going on. It’s always possible there’s a bug in the PSM2 
> MTL but it would be surprising at this point.
>
> Sent from my iPad
>
> On Jan 26, 2021, at 1:13 PM, Patrick Begou via users 
> <users@lists.open-mpi.org> wrote:
>
> 
> Hi all,
>
> I ran many tests today. I saw that an older 4.0.2 version of OpenMPI packaged 
> with Nix was running using openib. So I add the --with-verbs option to setup 
> this module.
>
> That I can see now is that:
>
> mpirun -hostfile $OAR_NODEFILE  --mca mtl psm -mca btl_openib_allow_ib true 
> ....
>
> - the testcase test_layout_array is running without error
>
> - the bandwidth measured with osu_bw is half of thar it should be:
>
> # OSU MPI Bandwidth Test v5.7
> # Size      Bandwidth (MB/s)
> 1                       0.54
> 2                       1.13
> 4                       2.26
> 8                       4.51
> 16                      9.06
> 32                     17.93
> 64                     33.87
> 128                    69.29
> 256                   161.24
> 512                   333.82
> 1024                  682.66
> 2048                 1188.63
> 4096                 1760.14
> 8192                 2166.08
> 16384                2036.95
> 32768                3466.63
> 65536                6296.73
> 131072               7509.43
> 262144               9104.78
> 524288               6908.55
> 1048576              5530.37
> 2097152              4489.16
> 4194304              3498.14
>
> mpirun -hostfile $OAR_NODEFILE  --mca mtl psm2 -mca btl_openib_allow_ib true 
> ...
>
> - the testcase test_layout_array is not giving correct results
>
> - the bandwidth measured with osu_bw is the right one:
>
> # OSU MPI Bandwidth Test v5.7
> # Size      Bandwidth (MB/s)
> 1                       3.73
> 2                       7.96
> 4                      15.82
> 8                      31.22
> 16                     51.52
> 32                    107.61
> 64                    196.51
> 128                   438.66
> 256                   817.70
> 512                  1593.90
> 1024                 2786.09
> 2048                 4459.77
> 4096                 6658.70
> 8192                 8092.95
> 16384                8664.43
> 32768                8495.96
> 65536               11458.77
> 131072              12094.64
> 262144              11781.84
> 524288              12297.58
> 1048576             12346.92
> 2097152             12206.53
> 4194304             12167.00
>
> But yes, I know openib is deprecated too in 4.0.5.
>
> Patrick
>
>

Reply via email to