Re: [OMPI users] Using OSU benchmarks for checking Infiniband network

George Bosilca via users Fri, 11 Feb 2022 07:31:06 -0800

Collecting data during execution is possible in OMPI either with an
external tool, such as mpiP, or the internal infrastructure, SPC. Take a
look at ./examples/spc_example.c or ./test/spc/spc_test.c to see how to use
this.


  George.


On Fri, Feb 11, 2022 at 9:43 AM Bertini, Denis Dr. via users <
users@lists.open-mpi.org> wrote:

> I have seen in OSU INAM paper:
>
>
> "
> While we chose MVAPICH2 for implementing our designs, any MPI
> runtime (e.g.: OpenMPI [12]) can be modified to perform similar data
> collection and
> transmission.
> "
>
> But i do not know what it is meant with "modified" openMPI ?
>
>
> Cheers,
>
> Denis
>
>
> ------------------------------
> *From:* Joseph Schuchart <schuch...@icl.utk.edu>
> *Sent:* Friday, February 11, 2022 3:02:36 PM
> *To:* Bertini, Denis Dr.; Open MPI Users
> *Subject:* Re: [OMPI users] Using OSU benchmarks for checking Infiniband
> network
>
> I am not aware of anything similar in Open MPI. Maybe OSU-INAM can work
> with other MPI implementations? Would be worth investigating...
>
> Joseph
>
> On 2/11/22 06:54, Bertini, Denis Dr. wrote:
> >
> > Hi Joseph
> >
> > Looking at the MVAPICH i noticed that, in this MPI implementation
> >
> > a Infiniband Network Analysis  and Profiling Tool  is provided:
> >
> >
> > OSU-INAM
> >
> >
> > Is there something equivalent using openMPI ?
> >
> > Best
> >
> > Denis
> >
> >
> > ------------------------------------------------------------------------
> > *From:* users <users-boun...@lists.open-mpi.org> on behalf of Joseph
> > Schuchart via users <users@lists.open-mpi.org>
> > *Sent:* Tuesday, February 8, 2022 4:02:53 PM
> > *To:* users@lists.open-mpi.org
> > *Cc:* Joseph Schuchart
> > *Subject:* Re: [OMPI users] Using OSU benchmarks for checking
> > Infiniband network
> > Hi Denis,
> >
> > Sorry if I missed it in your previous messages but could you also try
> > running a different MPI implementation (MVAPICH) to see whether Open MPI
> > is at fault or the system is somehow to blame for it?
> >
> > Thanks
> > Joseph
> >
> > On 2/8/22 03:06, Bertini, Denis Dr. via users wrote:
> > >
> > > Hi
> > >
> > > Thanks for all these informations !
> > >
> > >
> > > But i have to confess that in this multi-tuning-parameter space,
> > >
> > > i got somehow lost.
> > >
> > > Furthermore it is somtimes mixing between user-space and kernel-space.
> > >
> > > I have only possibility to act on the user space.
> > >
> > >
> > > 1) So i have on the system max locked memory:
> > >
> > >                         - ulimit -l unlimited (default )
> > >
> > >   and i do not see any warnings/errors related to that when
> > launching MPI.
> > >
> > >
> > > 2) I tried differents algorithms for MPI_all_reduce op.  all showing
> > > drop in
> > >
> > > bw for size=16384
> > >
> > >
> > > 4) I disable openIB ( no RDMA, ) and used only TCP, and i noticed
> > >
> > > the same behaviour.
> > >
> > >
> > > 3) i realized that increasing the so-called warm up parameter  in the
> > >
> > > OSU benchmark (argument -x 200 as default) the discrepancy.
> > >
> > > At the contrary putting lower threshold ( -x 10 ) can increase this BW
> > >
> > > discrepancy up to factor 300 at message size 16384 compare to
> > >
> > > message size 8192 for example.
> > >
> > > So does it means that there are some caching effects
> > >
> > > in the internode communication?
> > >
> > >
> > > From my experience, to tune parameters is a time-consuming and
> > cumbersome
> > >
> > > task.
> > >
> > >
> > > Could it also be the problem is not really on the openMPI
> > > implemenation but on the
> > >
> > > system?
> > >
> > >
> > > Best
> > >
> > > Denis
> > >
> > >
> ------------------------------------------------------------------------
> > > *From:* users <users-boun...@lists.open-mpi.org> on behalf of Gus
> > > Correa via users <users@lists.open-mpi.org>
> > > *Sent:* Monday, February 7, 2022 9:14:19 PM
> > > *To:* Open MPI Users
> > > *Cc:* Gus Correa
> > > *Subject:* Re: [OMPI users] Using OSU benchmarks for checking
> > > Infiniband network
> > > This may have changed since, but these used to be relevant points.
> > > Overall, the Open MPI FAQ have lots of good suggestions:
> > > https://www.open-mpi.org/faq/
> > > some specific for performance tuning:
> > > https://www.open-mpi.org/faq/?category=tuning
> > > https://www.open-mpi.org/faq/?category=openfabrics
> > >
> > > 1) Make sure you are not using the Ethernet TCP/IP, which is widely
> > > available in compute nodes:
> > > mpirun  --mca btl self,sm,openib  ...
> > >
> > > https://www.open-mpi.org/faq/?category=tuning#selecting-components
> > >
> > > However, this may have changed lately:
> > > https://www.open-mpi.org/faq/?category=tcp#tcp-auto-disable
> > > 2) Maximum locked memory used by IB and their system limit. Start
> > > here:
> > >
> >
> https://www.open-mpi.org/faq/?category=openfabrics#limiting-registered-memory-usage
> > > 3) The eager vs. rendezvous message size threshold. I wonder if it may
> > > sit right where you see the latency spike.
> > > https://www.open-mpi.org/faq/?category=all#ib-locked-pages-user
> > > 4) Processor and memory locality/affinity and binding (please check
> > > the current options and syntax)
> > > https://www.open-mpi.org/faq/?category=tuning#using-paffinity-v1.4
> > >
> > > On Mon, Feb 7, 2022 at 11:01 AM Benson Muite via users
> > > <users@lists.open-mpi.org> wrote:
> > >
> > >     Following https://www.open-mpi.org/doc/v3.1/man1/mpirun.1.php
> > >
> > >     mpirun --verbose --display-map
> > >
> > >     Have you tried newer OpenMPI versions?
> > >
> > >     Do you get similar behavior for the osu_reduce and osu_gather
> > >     benchmarks?
> > >
> > >     Typically internal buffer sizes as well as your hardware will
> affect
> > >     performance. Can you give specifications similar to what is
> > >     available at:
> > > http://mvapich.cse.ohio-state.edu/performance/collectives/
> > >     where the operating system, switch, node type and memory are
> > >     indicated.
> > >
> > >     If you need good performance, may want to also specify the
> algorithm
> > >     used. You can find some of the parameters you can tune using:
> > >
> > >     ompi_info --all
> > >
> > >     A particular helpful parameter is:
> > >
> > >     MCA coll tuned: parameter "coll_tuned_allreduce_algorithm" (current
> > >     value: "ignore", data source: default, level: 5 tuner/detail,
> > >     type: int)
> > >                                Which allreduce algorithm is used. Can
> be
> > >     locked down to any of: 0 ignore, 1 basic linear, 2 nonoverlapping
> > >     (tuned
> > >     reduce + tuned bcast), 3 recursive doubling, 4 ring, 5 segmented
> > ring
> > >                                Valid values: 0:"ignore",
> > >     1:"basic_linear",
> > >     2:"nonoverlapping", 3:"recursive_doubling", 4:"ring",
> > >     5:"segmented_ring", 6:"rabenseifner"
> > >                MCA coll tuned: parameter
> > >     "coll_tuned_allreduce_algorithm_segmentsize" (current value: "0",
> > >     data
> > >     source: default, level: 5 tuner/detail, type: int)
> > >
> > >     For OpenMPI 4.0, there is a tuning program [2] that might also be
> > >     helpful.
> > >
> > >     [1]
> > >
> >
> https://stackoverflow.com/questions/36635061/how-to-check-which-mca-parameters-are-used-in-openmpi
> > >     [2] https://github.com/open-mpi/ompi-collectives-tuning
> > >
> > >     On 2/7/22 4:49 PM, Bertini, Denis Dr. wrote:
> > >     > Hi
> > >     >
> > >     > When i repeat i always got the huge discrepancy at the
> > >     >
> > >     > message size of 16384.
> > >     >
> > >     > May be there is a way to run mpi in verbose mode in order
> > >     >
> > >     > to further investigate this behaviour?
> > >     >
> > >     > Best
> > >     >
> > >     > Denis
> > >     >
> > >     >
> > >
> ------------------------------------------------------------------------
> > >     > *From:* users <users-boun...@lists.open-mpi.org> on behalf of
> > >     Benson
> > >     > Muite via users <users@lists.open-mpi.org>
> > >     > *Sent:* Monday, February 7, 2022 2:27:34 PM
> > >     > *To:* users@lists.open-mpi.org
> > >     > *Cc:* Benson Muite
> > >     > *Subject:* Re: [OMPI users] Using OSU benchmarks for checking
> > >     Infiniband
> > >     > network
> > >     > Hi,
> > >     > Do you get similar results when you repeat the test? Another job
> > >     could
> > >     > have interfered with your run.
> > >     > Benson
> > >     > On 2/7/22 3:56 PM, Bertini, Denis Dr. via users wrote:
> > >     >> Hi
> > >     >>
> > >     >> I am using OSU microbenchmarks compiled with openMPI 3.1.6 in
> > >     order to
> > >     >> check/benchmark
> > >     >>
> > >     >> the infiniband network for our cluster.
> > >     >>
> > >     >> For that i use the collective all_reduce benchmark and run over
> > >     200
> > >     >> nodes, using 1 process per node.
> > >     >>
> > >     >> And this is the results i obtained 😎
> > >     >>
> > >     >>
> > >     >>
> > >     >> ################################################################
> > >     >>
> > >     >> # OSU MPI Allreduce Latency Test v5.7.1
> > >     >> # Size       Avg Latency(us)   Min Latency(us)  Max
> > >     Latency(us)  Iterations
> > >     >> 4                     114.65  83.22       147.98
> > >         1000
> > >     >> 8                     133.85 106.47       164.93
> > >         1000
> > >     >> 16                    116.41  87.57       150.58
> > >         1000
> > >     >> 32                    112.17  93.25       130.23
> > >         1000
> > >     >> 64                    106.85  81.93       134.74
> > >         1000
> > >     >> 128                   117.53  87.50       152.27
> > >         1000
> > >     >> 256                   143.08 115.63       173.97
> > >         1000
> > >     >> 512                   130.34 100.20       167.56
> > >         1000
> > >     >> 1024                  155.67 111.29       188.20
> > >         1000
> > >     >> 2048                  151.82 116.03       198.19
> > >         1000
> > >     >> 4096                  159.11 122.09       199.24
> > >         1000
> > >     >> 8192                  176.74 143.54       221.98
> > >         1000
> > >     >> 16384               48862.85 39270.21     54970.96
> > >         1000
> > >     >> 32768                2737.37  2614.60      2802.68
> > >         1000
> > >     >> 65536                2723.15  2585.62      2813.65
> > >         1000
> > >     >>
> > >     >>
> > > ####################################################################
> > >     >>
> > >     >> Could someone explain me what is happening for message = 16384 ?
> > >     >> One can notice a huge latency (~ 300 time larger) compare to
> > >     message
> > >     >> size = 8192.
> > >     >> I do not really understand what could create such an increase
> > >     in the
> > >     >> latency.
> > >     >> The reason i use the OSU microbenchmarks is that we
> > >     >> sporadically experience a drop
> > >     >> in the bandwith for typical collective operations such as
> > >     MPI_Reduce in
> > >     >> our cluster
> > >     >> which is difficult to understand.
> > >     >> I would be grateful if somebody can share its expertise or such
> > >     problem
> > >     >> with me.
> > >     >>
> > >     >> Best,
> > >     >> Denis
> > >     >>
> > >     >>
> > >     >>
> > >     >> ---------
> > >     >> Denis Bertini
> > >     >> Abteilung: CIT
> > >     >> Ort: SB3 2.265a
> > >     >>
> > >     >> Tel: +49 6159 71 2240
> > >     >> Fax: +49 6159 71 2986
> > >     >> E-Mail: d.bert...@gsi.de
> > >     >>
> > >     >> GSI Helmholtzzentrum für Schwerionenforschung GmbH
> > >     >> Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de
> > >     <http://www.gsi.de>
> > >     >>
> > >     >> Commercial Register / Handelsregister: Amtsgericht Darmstadt,
> > >     HRB 1528
> > >     >> Managing Directors / Geschäftsführung:
> > >     >> Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
> > >     >> Chairman of the GSI Supervisory Board / Vorsitzender des
> > >     GSI-Aufsichtsrats:
> > >     >> Ministerialdirigent Dr. Volkmar Dietz
> > >     >>
> > >     >
> > >
> >
>
>

Re: [OMPI users] Using OSU benchmarks for checking Infiniband network

Reply via email to