Gary, I previously missed the fact you are running on a 10GbE network, and I still assumes you are not running a debug build.
maybe you need to increase send/recv buffer sizes ompi_info --all | grep btl_tcp will list the parameters that can be tuned, then you can mpirun --mca btl_tcp_<name> <value> Cheers, Gilles On Friday, March 11, 2016, Jackson, Gary L. <gary.jack...@jhuapl.edu> wrote: > I re-ran all experiments with 1.10.2 configured the way you specified. My > results are here: > > https://www.dropbox.com/s/4v4jaxe8sflgymj/collected.pdf?dl=0 > > Some remarks: > > 1. OpenMPI had poor performance relative to raw TCP and IMPI across > all MTUs. > 2. Those issues appeared at larger message sizes. > 3. Intel MPI and raw TCP were comparable across message sizes and MTUs. > > With respect to some other concerns: > > 1. I verified that the MTU values I'm using are correct with tracepath. > 2. I am using a placement group. > > -- > Gary Jackson > > From: users <users-boun...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','users-boun...@open-mpi.org');>> on behalf > of Gilles Gouaillardet <gil...@rist.or.jp > <javascript:_e(%7B%7D,'cvml','gil...@rist.or.jp');>> > Reply-To: Open MPI Users <us...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');>> > Date: Tuesday, March 8, 2016 at 11:07 PM > To: Open MPI Users <us...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');>> > Subject: Re: [OMPI users] Poor performance on Amazon EC2 with TCP > > Jackson, > > one more thing, how did you build openmpi ? > > if you built from git (and without VPATH), then --enable-debug is > automatically set, and this is hurting performance. > if not already done, i recommend you download the latest openmpi tarball > (1.10.2) and > ./configure --with-platform=contrib/platform/optimized --prefix=... > last but not least, you can > mpirun --mca mpi_leave_pinned 1 <your benchmark> > (that being said, i am not sure this is useful with TCP networks ...) > > Cheers, > > Gilles > > > > On 3/9/2016 11:34 AM, Rayson Ho wrote: > > If you are using instance types that support SR-IOV (aka. "enhanced > networking" in AWS), then turn it on. We saw huge differences when SR-IOV > is enabled > > > http://blogs.scalablelogic.com/2013/12/enhanced-networking-in-aws-cloud.html > > http://blogs.scalablelogic.com/2014/01/enhanced-networking-in-aws-cloud-part-2.html > > Make sure you start your instances with a placement group -- otherwise, > the instances can be data centers apart! > > And check that jumbo frames are enabled properly: > > http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html > > But still, it is interesting that Intel MPI is getting a 2X speedup with > the same setup! Can you post the raw numbers so that we can take a deeper > look?? > > Rayson > > ================================================== > Open Grid Scheduler - The Official Open Source Grid Engine > http://gridscheduler.sourceforge.net/ > http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html > > > > > On Tue, Mar 8, 2016 at 9:08 AM, Jackson, Gary L. < > <javascript:_e(%7B%7D,'cvml','gary.jack...@jhuapl.edu');> > gary.jack...@jhuapl.edu > <javascript:_e(%7B%7D,'cvml','gary.jack...@jhuapl.edu');>> wrote: > >> >> I've built OpenMPI 1.10.1 on Amazon EC2. Using NetPIPE, I'm seeing about >> half the performance for MPI over TCP as I do with raw TCP. Before I start >> digging in to this more deeply, does anyone know what might cause that? >> >> For what it's worth, I see the same issues with MPICH, but I do not see >> it with Intel MPI. >> >> -- >> Gary Jackson >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/03/28659.php >> > > > > _______________________________________________ > users mailing listus...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28665.php > > >