Hi I am using OSU microbenchmarks compiled with openMPI 3.1.6 in order to check/benchmark
the infiniband network for our cluster. For that i use the collective all_reduce benchmark and run over 200 nodes, using 1 process per node. And this is the results i obtained 😎 ################################################################ # OSU MPI Allreduce Latency Test v5.7.1 # Size Avg Latency(us) Min Latency(us) Max Latency(us) Iterations 4 114.65 83.22 147.98 1000 8 133.85 106.47 164.93 1000 16 116.41 87.57 150.58 1000 32 112.17 93.25 130.23 1000 64 106.85 81.93 134.74 1000 128 117.53 87.50 152.27 1000 256 143.08 115.63 173.97 1000 512 130.34 100.20 167.56 1000 1024 155.67 111.29 188.20 1000 2048 151.82 116.03 198.19 1000 4096 159.11 122.09 199.24 1000 8192 176.74 143.54 221.98 1000 16384 48862.85 39270.21 54970.96 1000 32768 2737.37 2614.60 2802.68 1000 65536 2723.15 2585.62 2813.65 1000 #################################################################### Could someone explain me what is happening for message = 16384 ? One can notice a huge latency (~ 300 time larger) compare to message size = 8192. I do not really understand what could create such an increase in the latency. The reason i use the OSU microbenchmarks is that we sporadically experience a drop in the bandwith for typical collective operations such as MPI_Reduce in our cluster which is difficult to understand. I would be grateful if somebody can share its expertise or such problem with me. Best, Denis --------- Denis Bertini Abteilung: CIT Ort: SB3 2.265a Tel: +49 6159 71 2240 Fax: +49 6159 71 2986 E-Mail: d.bert...@gsi.de GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock Chairman of the GSI Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: Ministerialdirigent Dr. Volkmar Dietz