Hi,
Have you tried anything with OpenMPI 4.0.1?
What are the specifications of the Infiniband system you are using?
Benson
On 5/9/19 9:37 AM, Joseph Schuchart via users wrote:
Nathan,
Over the last couple of weeks I made some more interesting
observations regarding the latencies of
Benson,
I just gave 4.0.1 a shot and the behavior is the same (the reason I'm
stuck with 3.1.2 is a regression with `osc_rdma_acc_single_intrinsic` on
4.0 [1]).
The IB cluster has both Mellanox ConnectX-3 (w/ Haswell CPU) and
ConnectX-4 (w/ Skylake CPU) nodes, the effect is visible on both
Stdout forwarding should continue to work in v4.0.x just like it did in v3.0.x.
I.e., printf's from your app should appear in the stdout of mpirun.
Sometimes they can get buffered, however, such as if you redirect the stdout to
a file or to a pipe. Such shell buffering may only emit output
You might want to try two things:
1. Upgrade to Open MPI v4.0.1.
2. Use the UCX PML instead of the openib BTL.
You may need to download/install UCX first.
Then configure Open MPI:
./configure --with-ucx --without-verbs --enable-mca-no-build=btl-uct ...
This will build the UCX PML, and that
I will try to take a look at it today.
-Nathan
> On May 9, 2019, at 12:37 AM, Joseph Schuchart via users
> wrote:
>
> Nathan,
>
> Over the last couple of weeks I made some more interesting observations
> regarding the latencies of accumulate operations on both Aries and InfiniBand
>
Hi all,
I am trying to run MPI on a distributed mode. The cluster setup is an 8-machine
cluster with Debian 8 (Jessie), Intel Xeon E5-2609 2.40 GHz and Mellanox-QDR
HCA Infiniband. My MPI version is 3.0.4. I can successfully run a simple
command on all nodes that doesn’t use the infiniband but
> On May 9, 2019, at 12:37 AM, Joseph Schuchart via users
> wrote:
>
> Nathan,
>
> Over the last couple of weeks I made some more interesting observations
> regarding the latencies of accumulate operations on both Aries and InfiniBand
> systems:
>
> 1) There seems to be a
Nathan,
Over the last couple of weeks I made some more interesting observations
regarding the latencies of accumulate operations on both Aries and
InfiniBand systems:
1) There seems to be a significant difference between 64bit and 32bit
operations: on Aries, the average latency for