Hi, I ran the all gather becnhmarks and got this values which show also a step wise preformance drop as function of message size. Would this be linked to the underlying algorithm used for collective operation?
OSU MPI Allgather Latency Test v5.7.1 # Size Avg Latency(us) 1 70.36 2 47.01 4 72.42 8 49.62 16 57.93 32 50.11 64 57.29 128 74.05 256 454.41 512 544.04 1024 580.96 2048 711.40 4096 905.14 8192 2002.32 16384 2652.59 32768 4034.35 65536 6816.29 131072 14280.11 262144 28451.46 524288 54719.41 1048576 106607.19 I use srun and not mpirun, how to activate the flage for verbosity in that case? Best Denis ________________________________ From: Benson Muite <benson_mu...@emailplus.org> Sent: Monday, February 7, 2022 4:59:45 PM To: Bertini, Denis Dr.; Open MPI Users Subject: Re: [OMPI users] Using OSU benchmarks for checking Infiniband network Following https://www.open-mpi.org/doc/v3.1/man1/mpirun.1.php mpirun --verbose --display-map Have you tried newer OpenMPI versions? Do you get similar behavior for the osu_reduce and osu_gather benchmarks? Typically internal buffer sizes as well as your hardware will affect performance. Can you give specifications similar to what is available at: http://mvapich.cse.ohio-state.edu/performance/collectives/ where the operating system, switch, node type and memory are indicated. If you need good performance, may want to also specify the algorithm used. You can find some of the parameters you can tune using: ompi_info --all A particular helpful parameter is: MCA coll tuned: parameter "coll_tuned_allreduce_algorithm" (current value: "ignore", data source: default, level: 5 tuner/detail, type: int) Which allreduce algorithm is used. Can be locked down to any of: 0 ignore, 1 basic linear, 2 nonoverlapping (tuned reduce + tuned bcast), 3 recursive doubling, 4 ring, 5 segmented ring Valid values: 0:"ignore", 1:"basic_linear", 2:"nonoverlapping", 3:"recursive_doubling", 4:"ring", 5:"segmented_ring", 6:"rabenseifner" MCA coll tuned: parameter "coll_tuned_allreduce_algorithm_segmentsize" (current value: "0", data source: default, level: 5 tuner/detail, type: int) For OpenMPI 4.0, there is a tuning program [2] that might also be helpful. [1] https://stackoverflow.com/questions/36635061/how-to-check-which-mca-parameters-are-used-in-openmpi [2] https://github.com/open-mpi/ompi-collectives-tuning On 2/7/22 4:49 PM, Bertini, Denis Dr. wrote: > Hi > > When i repeat i always got the huge discrepancy at the > > message size of 16384. > > May be there is a way to run mpi in verbose mode in order > > to further investigate this behaviour? > > Best > > Denis > > ------------------------------------------------------------------------ > *From:* users <users-boun...@lists.open-mpi.org> on behalf of Benson > Muite via users <users@lists.open-mpi.org> > *Sent:* Monday, February 7, 2022 2:27:34 PM > *To:* users@lists.open-mpi.org > *Cc:* Benson Muite > *Subject:* Re: [OMPI users] Using OSU benchmarks for checking Infiniband > network > Hi, > Do you get similar results when you repeat the test? Another job could > have interfered with your run. > Benson > On 2/7/22 3:56 PM, Bertini, Denis Dr. via users wrote: >> Hi >> >> I am using OSU microbenchmarks compiled with openMPI 3.1.6 in order to >> check/benchmark >> >> the infiniband network for our cluster. >> >> For that i use the collective all_reduce benchmark and run over 200 >> nodes, using 1 process per node. >> >> And this is the results i obtained 😎 >> >> >> >> ################################################################ >> >> # OSU MPI Allreduce Latency Test v5.7.1 >> # Size Avg Latency(us) Min Latency(us) Max Latency(us) Iterations >> 4 114.65 83.22 147.98 1000 >> 8 133.85 106.47 164.93 1000 >> 16 116.41 87.57 150.58 1000 >> 32 112.17 93.25 130.23 1000 >> 64 106.85 81.93 134.74 1000 >> 128 117.53 87.50 152.27 1000 >> 256 143.08 115.63 173.97 1000 >> 512 130.34 100.20 167.56 1000 >> 1024 155.67 111.29 188.20 1000 >> 2048 151.82 116.03 198.19 1000 >> 4096 159.11 122.09 199.24 1000 >> 8192 176.74 143.54 221.98 1000 >> 16384 48862.85 39270.21 54970.96 1000 >> 32768 2737.37 2614.60 2802.68 1000 >> 65536 2723.15 2585.62 2813.65 1000 >> >> #################################################################### >> >> Could someone explain me what is happening for message = 16384 ? >> One can notice a huge latency (~ 300 time larger) compare to message >> size = 8192. >> I do not really understand what could create such an increase in the >> latency. >> The reason i use the OSU microbenchmarks is that we >> sporadically experience a drop >> in the bandwith for typical collective operations such as MPI_Reduce in >> our cluster >> which is difficult to understand. >> I would be grateful if somebody can share its expertise or such problem >> with me. >> >> Best, >> Denis >> >> >> >> --------- >> Denis Bertini >> Abteilung: CIT >> Ort: SB3 2.265a >> >> Tel: +49 6159 71 2240 >> Fax: +49 6159 71 2986 >> E-Mail: d.bert...@gsi.de >> >> GSI Helmholtzzentrum für Schwerionenforschung GmbH >> Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de >> >> Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 >> Managing Directors / Geschäftsführung: >> Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock >> Chairman of the GSI Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: >> Ministerialdirigent Dr. Volkmar Dietz >> >