Re: [OMPI users] Poor performance on Amazon EC2 with TCP

2016-03-08 Thread Gilles Gouaillardet
Jackson, one more thing, how did you build openmpi ? if you built from git (and without VPATH), then --enable-debug is automatically set, and this is hurting performance. if not already done, i recommend you download the latest openmpi tarball (1.10.2) and ./configure

Re: [OMPI users] Poor performance on Amazon EC2 with TCP

2016-03-08 Thread Rayson Ho
If you are using instance types that support SR-IOV (aka. "enhanced networking" in AWS), then turn it on. We saw huge differences when SR-IOV is enabled http://blogs.scalablelogic.com/2013/12/enhanced-networking-in-aws-cloud.html

Re: [OMPI users] Poor performance on Amazon EC2 with TCP

2016-03-08 Thread Gilles Gouaillardet
Jackson, i am surprised with the MTU value ... IIRC, MTU for ethernet jumbo frame is 9000, not 9001. can you run tracepath on both boxes (to check which mtu is used) ? then, can you try to set MTU=1500 on both boxes (warning, get ready to lose the connection) and try again with openmpi and

Re: [OMPI users] Poor performance on Amazon EC2 with TCP

2016-03-08 Thread Jackson, Gary L.
Nope, just one ethernet interface: $ ifconfig eth0 Link encap:Ethernet HWaddr 0E:47:0E:0B:59:27 inet addr:xxx.xxx.xxx.xxx Bcast:xxx.xxx.xxx.xxx Mask:255.255.252.0 inet6 addr: fe80::c47:eff:fe0b:5927/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:9001

Re: [OMPI users] iWARP usage issue

2016-03-08 Thread Nathan Hjelm
See https://github.com/open-mpi/ompi/pull/1439 I was seeing this problem when enabling CUDA support as it sets btl_openib_max_send_size to 128k but does not change the receive queue settings. Tested the commit in #1439 and it fixes the issue for me. -Nathan On Tue, Mar 08, 2016 at 03:57:39PM

Re: [OMPI users] iWARP usage issue

2016-03-08 Thread Nathan Hjelm
This is a bug we need to deal with. If we are getting queue pair settings from an ini file and the max_send_size if the default value we should set the max send size to the size of the largest queue pair. I will work on a fix. -Nathan On Tue, Mar 08, 2016 at 03:57:39PM +0900, Gilles

Re: [OMPI users] Poor performance on Amazon EC2 with TCP

2016-03-08 Thread Gilles Gouaillardet
Jason, how many Ethernet interfaces are there ? if several, can you try again with one only mpirun --mca btl_tcp_if_include eth0 ... Cheers, Gilles On Tuesday, March 8, 2016, Jackson, Gary L. wrote: > > I've built OpenMPI 1.10.1 on Amazon EC2. Using NetPIPE, I'm

[OMPI users] Poor performance on Amazon EC2 with TCP

2016-03-08 Thread Jackson, Gary L.
I've built OpenMPI 1.10.1 on Amazon EC2. Using NetPIPE, I'm seeing about half the performance for MPI over TCP as I do with raw TCP. Before I start digging in to this more deeply, does anyone know what might cause that? For what it's worth, I see the same issues with MPICH, but I do not see it

Re: [OMPI users] iWARP usage issue

2016-03-08 Thread Gilles Gouaillardet
Per the error message, can you try to mpirun --mca btl_openib_if_include cxgb3_0 --mca btl_openib_max_send_size 65536 ... and see whether it helps ? you can also try various settings for the receive queue, for example edit your /.../share/openmpi/mca-btl-openib-device-params.ini and set

[OMPI users] iWARP usage issue

2016-03-08 Thread dpchoudh .
Hello all I am asking for help for the following situation: I have two (mostly identical) nodes. Each of them have (completely identical) 1. qlogic 4x DDR infiniband, AND 2. Chelsio S310E (T3 chip based) 10GE iWARP cards. Both are connected back-to-back, without a switch. The connection is