Hi Martin,

Can you check if it is any better with  "-x MXM_TLS=rc,shm,self" ?

-Devendar


On Tue, Aug 16, 2016 at 11:28 AM, Audet, Martin <martin.au...@cnrc-nrc.gc.ca
> wrote:

> Hi Josh,
>
> Thanks for your reply. I did try setting MXM_RDMA_PORTS=mlx4_0:1 for all
> my MPI processes
> and it did improve performance but the performance I obtain isn't
> completely satisfying.
>
> When I use IMB 4.1 pingpong and sendrecv benchmarks between two nodes I
> get using
> Open MPI 1.10.3:
>
> without MXM_RDMA_PORTS
>
>    comm       lat_min      bw_max      bw_max
>               pingpong     pingpong    sendrecv
>               (us)         (MB/s)      (MB/s)
>    -------------------------------------------
>    openib     1.79         5947.07    11534
>    mxm        2.51         5166.96     8079.18
>    yalla      2.47         5167.29     8278.15
>
>
> with MXM_RDMA_PORTS=mlx4_0:1
>
>    comm       lat_min      bw_max      bw_max
>               pingpong     pingpong    sendrecv
>               (us)         (MB/s)      (MB/s)
>    -------------------------------------------
>    openib     1.79         5827.93    11552.4
>    mxm        2.23         5191.77     8201.76
>    yalla      2.18         5200.55     8109.48
>
>
> openib means: pml=ob1                 btl=openib,vader,self
> btl_openib_include_if=mlx4_0
> mxm    means: pml=cm,ob1     mtl=mxm  btl=vader,self
> yalla  means: pml=yalla,ob1           btl=vader,self
>
> lspci reports for our FDR Infiniband HCA:
>   Infiniband Controler: Mellanox Technologies MT27500 Family [ConnectX-3]
>
> and 16 lines like:
>   Infiniband Controler: Mellanox Technologies MT27500/MT27520 Family
> [ConnectX-3/ConnectX-3 Pro Virtual Function]
>
> the nodes use two octacore Xeon E5-2650v2 Ivybridge-EP 2.67 GHz sockets
>
> ofed_info reports that mxm version is 3.4.3cce223-0.32200
>
> As you can see the results are not very good. I would expect mxm and yalla
> to perform
> better than openib both in term of latency and bandwidth (note: sendrecv
> bandwidth is
> full duplex). I would expect the yalla bandwidth to be around 1.1 us like
> shown here
> https://www.open-mpi.org/papers/sc-2014/Open-MPI-SC14-BOF.pdf (page 33).
>
> I also ran mxm_perftest (located in /opt/mellanox/bin) and it reports the
> following
> latency between two nodes:
>
> without MXM_RDMA_PORTS                1.92 us
> with    MXM_RDMA_PORTS=mlx4_0:1       1.65 us
>
> Again I think we can expect a better latency with our configuration. 1.65
> us is not a
> very good result.
>
> Note however that the 0.27 us (1.92 - 1.65 = 0.27) reduction reduction in
> raw mxm
> latency correspond to the above Open MPI latencies observed with mxm (2.51
> - 2.23 = 0.28)
> and yalla (2.47 - 2.18 = 0.29).
>
> Another detail: everything is run inside LXC containers. Also SR-IOV is
> probably used.
>
> Does anyone has any idea what's wrong with our cluster ?
>
> Martin Audet
>
>
> > Hi, Martin
> >
> > The environment variable:
> >
> > MXM_RDMA_PORTS=device:port
> >
> > is what you're looking for. You can specify a device/port pair on your
> OMPI
> > command line like:
> >
> > mpirun -np 2 ... -x MXM_RDMA_PORTS=mlx4_0:1 ...
> >
> >
> > Best,
> >
> > Josh
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>



-- 


-Devendar
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to