This may have changed since, but these used to be relevant points.
Overall, the Open MPI FAQ have lots of good suggestions:
https://www.open-mpi.org/faq/
some specific for performance tuning:
https://www.open-mpi.org/faq/?category=tuning
https://www.open-mpi.org/faq/?category=openfabrics
1) Make
HI Jose,
I bet this device has not been tested with ucx.
You may want to join the ucx users mail list at
https://elist.ornl.gov/mailman/listinfo/ucx-group
and ask whether this Marvell device has been tested and workarounds for
disabling features that this device doesn't support.
Again
Sorry for late answer.
I thought the same, but after more testing now I don't, since re-running
the same code on the same data on the same node with the same parameters
sometimes works and sometimes doesn't.
The user says it works (reliably) unmodified on other clusters.
We'll try contacting
Hi,
Do you get similar results when you repeat the test? Another job could
have interfered with your run.
Benson
On 2/7/22 3:56 PM, Bertini, Denis Dr. via users wrote:
Hi
I am using OSU microbenchmarks compiled with openMPI 3.1.6 in order to
check/benchmark
the infiniband network for our
Hi
When i repeat i always got the huge discrepancy at the
message size of 16384.
May be there is a way to run mpi in verbose mode in order
to further investigate this behaviour?
Best
Denis
From: users on behalf of Benson Muite via
users
Sent: Monday,
Hi
I changed the algorithm used to ring algorithm 4 ( for example ) and the
scan changed to
# OSU MPI Allreduce Latency Test v5.7.1
# Size Avg Latency(us) Min Latency(us) Max Latency(us) Iterations
4 59.39 51.04 65.36 1
8
Following https://www.open-mpi.org/doc/v3.1/man1/mpirun.1.php
mpirun --verbose --display-map
Have you tried newer OpenMPI versions?
Do you get similar behavior for the osu_reduce and osu_gather benchmarks?
Typically internal buffer sizes as well as your hardware will affect
performance. Can
Hi Bernd,
Thanks for your valuable input! Your suggested approach indeed seems
like the correct one and is actually what I've always wanted to do. In
the past, I've also asked our cluster support if there was this
possibility, but they always suggested the following approach:
export
Hi,
I ran the all gather becnhmarks and got this values
which show also a step wise preformance drop as function
of message size.
Would this be linked to the underlying algorithm used for collective operation?
OSU MPI Allgather Latency Test v5.7.1
# Size Avg Latency(us)
1
Hello,
For information, glibc is struggling with the problematic of the
precise meaning of get_nprocs, get_nprocs_conf, _SC_NPROCESSORS_CONF,
_SC_NPROCESSORS_ONLN
https://sourceware.org/pipermail/libc-alpha/2022-February/136177.html
Samuel
___
Hi
I am using OSU microbenchmarks compiled with openMPI 3.1.6 in order to
check/benchmark
the infiniband network for our cluster.
For that i use the collective all_reduce benchmark and run over 200 nodes,
using 1 process per node.
And this is the results i obtained
11 matches
Mail list logo