Hi all,

I encountered a problem when I tested the performance of OpenMPI over ROCE
100Gbps.

I have two servers connected with mellanox 100Gbps Connect-X4 ROCE NICs on
them.
I used intel mpi benchmark to test the performance of OpenMPI (1.10.3) over
RDMA.
I found the bandwidth of benchmark pingpong (2 ranks, every server has only
one rank) could reach only 6GB/s (with openib btl).
I also used osu mpi benchmark, the bandwidth could reach only 6.5GB/s.
However, when I start two benchmarks at the same time, the total bandwidth
can reach about 11GB/s (every server has two ranks).

It seems that the CPU is the bottleneck.
Obviously, the bottleneck is not memcpy.
And RDMA itself ought not to comsume too much CPU resources, since the
perftest of ib_write_bw can reach 11GB/s easily.

Is the bandwidth limit is normal?
Is there anyone know what is the real bottleneck?

Thanks for your kindly help in advance.

Regards,
Zhaogeng
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to