Re: [OMPI users] Strange TCP latency results on Amazon EC2

2012-01-13 Thread Rayson Ho
On Tue, Jan 10, 2012 at 10:02 AM, Roberto Rey  wrote:
> I'm running some tests on EC2 cluster instances with 10 Gigabit Ethernet
> hardware and I'm getting strange latency results with Netpipe and OpenMPI.

- There are 3 types of instances that can use 10 GbE. Are you using
"cc1.4xlarge", "cc2.8xlarge", or "cg1.4xlarge"??

- Did you set up a placement group??

- Also, which AMI are you using??


> I'm using the BTL TCP in OpenMPI, so I can't understand why OpenMPI
> outperforms raw TCP performance for small messages (40us of difference).
>
> Can OpenMPI outperform Netpipe over TCP? Why? Is OpenMPI  doing any
> optimization in BTL TCP?

It is indeed interesting!

If we can run strace with timing (like strace -tt) and compare the
difference between NPmpi & NPtcp, then we can get a better idea on
what's happening.

It is possible that one is doing more busy polling than another,
and/or triggering Xen to handle things a bit differently. Also, we
should check the socket options, and also check the system call
latency to see if the network is really accountable for the extra 40us
delay.


> The results for OpenMPI aren't so good but we must take into account the
> network virtualization overhead under Xen

If you are running Cluster Compute Instances, then you are using HVM.
If things are setup properly (HVM & placement group), then you can
even get a Top500 computer on EC2... Amazon uses similar setups for
their TOP500 submission:

http://i.top500.org/site/50321

Rayson

=
Open Grid Scheduler / Grid Engine
http://gridscheduler.sourceforge.net/

Scalable Grid Engine Support Program
http://www.scalablelogic.com/


Re: [OMPI users] Strange TCP latency results on Amazon EC2

2012-01-13 Thread Bogdan Costescu
On Thu, Jan 12, 2012 at 16:10, Jeff Squyres  wrote:
> It's very strange to me that Open MPI is getting *better* than raw TCP 
> performance.  I don't have an immediate explanation for that -- if you're 
> using the TCP BTL, then OMPI should be using TCP sockets, just like netpipe 
> and the others.

Could it be that the difference is not in the transfer but in the timing ?

Cheers,
Bogdan



Re: [OMPI users] Strange TCP latency results on Amazon EC2

2012-01-12 Thread Roberto Rey
With ifconfig I can only see one Ethernet card (eth0) as well as the
loopback interface

2012/1/12 teng ma 

> Is it possible your EC2 cluster has another "unknown" crappy Ethernet
> card(e.g. 1Gb
> Ethernet card) . For small messages, they go through different paths in
> NPtcp or MPI over NPmpi.
>
> Teng Ma
>
>
> On Thu, Jan 12, 2012 at 10:28 AM, Roberto Rey  wrote:
>
>> Thanks for your reply!
>>
>> I'm using TCP BTL because I don't have any other option in Amazon with 10
>> Gbit Ethernet.
>>
>> I also tried with MPICH2 1.4 and I got 60 microseconds...so I am very
>> confused about it...
>>
>> Regarding hyperthreading and process binding settings...I am using only
>> one MPI process in each node (2 nodes for a clasical ping-pong latency
>> benchmark). I don't know how it could affect on this test...but I could try
>> anything that anyone suggest to me
>>
>> 2012/1/12 Jeff Squyres 
>>
>>> Hi Roberto.
>>>
>>> We've had strange reports of performance from EC2 before; it's actually
>>> been on my to-do list to go check this out in detail.  I made contact with
>>> the EC2 folks at Supercomputing late last year.  They've hooked me up with
>>> some credits on EC2 to go check out what's happening, but the pent-up email
>>> deluge from the Christmas vacation and my travel to the MPI Forum this week
>>> prevented me from testing yet.
>>>
>>> I hope to be able to get time to test Open MPI on EC2 next week and see
>>> what's going on.
>>>
>>> It's very strange to me that Open MPI is getting *better* than raw TCP
>>> performance.  I don't have an immediate explanation for that -- if you're
>>> using the TCP BTL, then OMPI should be using TCP sockets, just like netpipe
>>> and the others.
>>>
>>> You *might* want to check hyperthreading and process binding settings in
>>> all your tests.
>>>
>>>
>>> On Jan 12, 2012, at 7:04 AM, Roberto Rey wrote:
>>>
>>> > Hi again,
>>> >
>>> > Today I was trying with another TCP benchmark included in the hpcbench
>>> suite, and with a ping-pong test I'm also getting 100us of latency. Then, I
>>> tried with netperf and the same result
>>> >
>>> > So, in summary, I'm measuring TCP latency with messages size between
>>> 1-32 bytes:
>>> >
>>> > Netperf over TCP -> 100us
>>> > Netpipe over TCP (NPtcp)-> 100us
>>> > HPCbench over TCP-> 100us
>>> > Netpipe over OpenMPI (NPmpi) -> 60us
>>> > HPCBench over OpenMPI -> 60us
>>> >
>>> > Any clues?
>>> >
>>> > Thanks a lot!
>>> >
>>> > 2012/1/10 Roberto Rey 
>>> > Hi,
>>> >
>>> > I'm running some tests on EC2 cluster instances with 10 Gigabit
>>> Ethernet hardware and I'm getting strange latency results with Netpipe and
>>> OpenMPI.
>>> >
>>> > If I run Netpipe over OpenMPI (NPmpi) I get a network latency around
>>> 60 microseconds for small messages (less than 2kbytes). However, when I run
>>> Netpipe over TCP (NPtcp) I always get around 100 microseconds. For bigger
>>> messages everything seems to be OK.
>>> >
>>> > I'm using the BTL TCP in OpenMPI, so I can't understand why OpenMPI
>>> outperforms raw TCP performance for small messages (40us of difference). I
>>> also have run the PingPong test from the Intel Media Benchmarks and the
>>> latency results for OpenMPI are very similar (60us) to those obtained with
>>> NPmpi
>>> >
>>> > Can OpenMPI outperform Netpipe over TCP? Why? Is OpenMPI  doing any
>>> optimization in BTL TCP?
>>> >
>>> > The results for OpenMPI aren't so good but we must take into account
>>> the network virtualization overhead under Xen
>>> >
>>> > Thanks for your reply
>>> >
>>> >
>>> >
>>> > --
>>> > Roberto Rey Expósito
>>> > ___
>>> > users mailing list
>>> > us...@open-mpi.org
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>>
>> --
>> Roberto Rey Expósito
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
> | Teng Ma  Univ. of Tennessee |
> | t...@cs.utk.eduKnoxville, TN |
> | http://web.eecs.utk.edu/~tma/   |
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Roberto Rey Expósito


Re: [OMPI users] Strange TCP latency results on Amazon EC2

2012-01-12 Thread teng ma
Is it possible your EC2 cluster has another "unknown" crappy Ethernet
card(e.g. 1Gb
Ethernet card) . For small messages, they go through different paths in
NPtcp or MPI over NPmpi.

Teng Ma

On Thu, Jan 12, 2012 at 10:28 AM, Roberto Rey  wrote:

> Thanks for your reply!
>
> I'm using TCP BTL because I don't have any other option in Amazon with 10
> Gbit Ethernet.
>
> I also tried with MPICH2 1.4 and I got 60 microseconds...so I am very
> confused about it...
>
> Regarding hyperthreading and process binding settings...I am using only
> one MPI process in each node (2 nodes for a clasical ping-pong latency
> benchmark). I don't know how it could affect on this test...but I could try
> anything that anyone suggest to me
>
> 2012/1/12 Jeff Squyres 
>
>> Hi Roberto.
>>
>> We've had strange reports of performance from EC2 before; it's actually
>> been on my to-do list to go check this out in detail.  I made contact with
>> the EC2 folks at Supercomputing late last year.  They've hooked me up with
>> some credits on EC2 to go check out what's happening, but the pent-up email
>> deluge from the Christmas vacation and my travel to the MPI Forum this week
>> prevented me from testing yet.
>>
>> I hope to be able to get time to test Open MPI on EC2 next week and see
>> what's going on.
>>
>> It's very strange to me that Open MPI is getting *better* than raw TCP
>> performance.  I don't have an immediate explanation for that -- if you're
>> using the TCP BTL, then OMPI should be using TCP sockets, just like netpipe
>> and the others.
>>
>> You *might* want to check hyperthreading and process binding settings in
>> all your tests.
>>
>>
>> On Jan 12, 2012, at 7:04 AM, Roberto Rey wrote:
>>
>> > Hi again,
>> >
>> > Today I was trying with another TCP benchmark included in the hpcbench
>> suite, and with a ping-pong test I'm also getting 100us of latency. Then, I
>> tried with netperf and the same result
>> >
>> > So, in summary, I'm measuring TCP latency with messages size between
>> 1-32 bytes:
>> >
>> > Netperf over TCP -> 100us
>> > Netpipe over TCP (NPtcp)-> 100us
>> > HPCbench over TCP-> 100us
>> > Netpipe over OpenMPI (NPmpi) -> 60us
>> > HPCBench over OpenMPI -> 60us
>> >
>> > Any clues?
>> >
>> > Thanks a lot!
>> >
>> > 2012/1/10 Roberto Rey 
>> > Hi,
>> >
>> > I'm running some tests on EC2 cluster instances with 10 Gigabit
>> Ethernet hardware and I'm getting strange latency results with Netpipe and
>> OpenMPI.
>> >
>> > If I run Netpipe over OpenMPI (NPmpi) I get a network latency around 60
>> microseconds for small messages (less than 2kbytes). However, when I run
>> Netpipe over TCP (NPtcp) I always get around 100 microseconds. For bigger
>> messages everything seems to be OK.
>> >
>> > I'm using the BTL TCP in OpenMPI, so I can't understand why OpenMPI
>> outperforms raw TCP performance for small messages (40us of difference). I
>> also have run the PingPong test from the Intel Media Benchmarks and the
>> latency results for OpenMPI are very similar (60us) to those obtained with
>> NPmpi
>> >
>> > Can OpenMPI outperform Netpipe over TCP? Why? Is OpenMPI  doing any
>> optimization in BTL TCP?
>> >
>> > The results for OpenMPI aren't so good but we must take into account
>> the network virtualization overhead under Xen
>> >
>> > Thanks for your reply
>> >
>> >
>> >
>> > --
>> > Roberto Rey Expósito
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
> Roberto Rey Expósito
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
| Teng Ma  Univ. of Tennessee |
| t...@cs.utk.eduKnoxville, TN |
| http://web.eecs.utk.edu/~tma/   |


Re: [OMPI users] Strange TCP latency results on Amazon EC2

2012-01-12 Thread Roberto Rey
Thanks for your reply!

I'm using TCP BTL because I don't have any other option in Amazon with 10
Gbit Ethernet.

I also tried with MPICH2 1.4 and I got 60 microseconds...so I am very
confused about it...

Regarding hyperthreading and process binding settings...I am using only one
MPI process in each node (2 nodes for a clasical ping-pong latency
benchmark). I don't know how it could affect on this test...but I could try
anything that anyone suggest to me

2012/1/12 Jeff Squyres 

> Hi Roberto.
>
> We've had strange reports of performance from EC2 before; it's actually
> been on my to-do list to go check this out in detail.  I made contact with
> the EC2 folks at Supercomputing late last year.  They've hooked me up with
> some credits on EC2 to go check out what's happening, but the pent-up email
> deluge from the Christmas vacation and my travel to the MPI Forum this week
> prevented me from testing yet.
>
> I hope to be able to get time to test Open MPI on EC2 next week and see
> what's going on.
>
> It's very strange to me that Open MPI is getting *better* than raw TCP
> performance.  I don't have an immediate explanation for that -- if you're
> using the TCP BTL, then OMPI should be using TCP sockets, just like netpipe
> and the others.
>
> You *might* want to check hyperthreading and process binding settings in
> all your tests.
>
>
> On Jan 12, 2012, at 7:04 AM, Roberto Rey wrote:
>
> > Hi again,
> >
> > Today I was trying with another TCP benchmark included in the hpcbench
> suite, and with a ping-pong test I'm also getting 100us of latency. Then, I
> tried with netperf and the same result
> >
> > So, in summary, I'm measuring TCP latency with messages size between
> 1-32 bytes:
> >
> > Netperf over TCP -> 100us
> > Netpipe over TCP (NPtcp)-> 100us
> > HPCbench over TCP-> 100us
> > Netpipe over OpenMPI (NPmpi) -> 60us
> > HPCBench over OpenMPI -> 60us
> >
> > Any clues?
> >
> > Thanks a lot!
> >
> > 2012/1/10 Roberto Rey 
> > Hi,
> >
> > I'm running some tests on EC2 cluster instances with 10 Gigabit Ethernet
> hardware and I'm getting strange latency results with Netpipe and OpenMPI.
> >
> > If I run Netpipe over OpenMPI (NPmpi) I get a network latency around 60
> microseconds for small messages (less than 2kbytes). However, when I run
> Netpipe over TCP (NPtcp) I always get around 100 microseconds. For bigger
> messages everything seems to be OK.
> >
> > I'm using the BTL TCP in OpenMPI, so I can't understand why OpenMPI
> outperforms raw TCP performance for small messages (40us of difference). I
> also have run the PingPong test from the Intel Media Benchmarks and the
> latency results for OpenMPI are very similar (60us) to those obtained with
> NPmpi
> >
> > Can OpenMPI outperform Netpipe over TCP? Why? Is OpenMPI  doing any
> optimization in BTL TCP?
> >
> > The results for OpenMPI aren't so good but we must take into account the
> network virtualization overhead under Xen
> >
> > Thanks for your reply
> >
> >
> >
> > --
> > Roberto Rey Expósito
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Roberto Rey Expósito


Re: [OMPI users] Strange TCP latency results on Amazon EC2

2012-01-12 Thread Jeff Squyres
Hi Roberto.

We've had strange reports of performance from EC2 before; it's actually been on 
my to-do list to go check this out in detail.  I made contact with the EC2 
folks at Supercomputing late last year.  They've hooked me up with some credits 
on EC2 to go check out what's happening, but the pent-up email deluge from the 
Christmas vacation and my travel to the MPI Forum this week prevented me from 
testing yet.

I hope to be able to get time to test Open MPI on EC2 next week and see what's 
going on.

It's very strange to me that Open MPI is getting *better* than raw TCP 
performance.  I don't have an immediate explanation for that -- if you're using 
the TCP BTL, then OMPI should be using TCP sockets, just like netpipe and the 
others.

You *might* want to check hyperthreading and process binding settings in all 
your tests.


On Jan 12, 2012, at 7:04 AM, Roberto Rey wrote:

> Hi again,
> 
> Today I was trying with another TCP benchmark included in the hpcbench suite, 
> and with a ping-pong test I'm also getting 100us of latency. Then, I tried 
> with netperf and the same result
> 
> So, in summary, I'm measuring TCP latency with messages size between 1-32 
> bytes:
> 
> Netperf over TCP -> 100us
> Netpipe over TCP (NPtcp)-> 100us
> HPCbench over TCP-> 100us
> Netpipe over OpenMPI (NPmpi) -> 60us
> HPCBench over OpenMPI -> 60us
> 
> Any clues?
> 
> Thanks a lot!
> 
> 2012/1/10 Roberto Rey 
> Hi,
> 
> I'm running some tests on EC2 cluster instances with 10 Gigabit Ethernet 
> hardware and I'm getting strange latency results with Netpipe and OpenMPI. 
> 
> If I run Netpipe over OpenMPI (NPmpi) I get a network latency around 60 
> microseconds for small messages (less than 2kbytes). However, when I run 
> Netpipe over TCP (NPtcp) I always get around 100 microseconds. For bigger 
> messages everything seems to be OK.
> 
> I'm using the BTL TCP in OpenMPI, so I can't understand why OpenMPI 
> outperforms raw TCP performance for small messages (40us of difference). I 
> also have run the PingPong test from the Intel Media Benchmarks and the 
> latency results for OpenMPI are very similar (60us) to those obtained with 
> NPmpi
> 
> Can OpenMPI outperform Netpipe over TCP? Why? Is OpenMPI  doing any 
> optimization in BTL TCP?
> 
> The results for OpenMPI aren't so good but we must take into account the 
> network virtualization overhead under Xen
> 
> Thanks for your reply
> 
> 
> 
> -- 
> Roberto Rey Expósito
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Strange TCP latency results on Amazon EC2

2012-01-12 Thread Roberto Rey
Hi again,

Today I was trying with another TCP benchmark included in the hpcbench
suite, and with a ping-pong test I'm also getting 100us of latency. Then, I
tried with netperf and the same result

So, in summary, I'm measuring TCP latency with messages size between 1-32
bytes:

Netperf over TCP -> 100us
Netpipe over TCP (NPtcp)-> 100us
HPCbench over TCP-> 100us
Netpipe over OpenMPI (NPmpi) -> 60us
HPCBench over OpenMPI -> 60us

Any clues?

Thanks a lot!

2012/1/10 Roberto Rey 

> Hi,
>
> I'm running some tests on EC2 cluster instances with 10 Gigabit Ethernet
> hardware and I'm getting strange latency results with Netpipe and OpenMPI.
>
> If I run Netpipe over OpenMPI (NPmpi) I get a network latency around 60
> microseconds for small messages (less than 2kbytes). However, when I run
> Netpipe over TCP (NPtcp) I always get around 100 microseconds. For bigger
> messages everything seems to be OK.
>
> I'm using the BTL TCP in OpenMPI, so I can't understand why OpenMPI
> outperforms raw TCP performance for small messages (40us of difference). I
> also have run the PingPong test from the Intel Media Benchmarks and the
> latency results for OpenMPI are very similar (60us) to those obtained with
> NPmpi
>
> Can OpenMPI outperform Netpipe over TCP? Why? Is OpenMPI  doing any
> optimization in BTL TCP?
>
> The results for OpenMPI aren't so good but we must take into account the
> network virtualization overhead under Xen
>
> Thanks for your reply
>



-- 
Roberto Rey Expósito


[OMPI users] Strange TCP latency results on Amazon EC2

2012-01-10 Thread Roberto Rey
Hi,

I'm running some tests on EC2 cluster instances with 10 Gigabit Ethernet
hardware and I'm getting strange latency results with Netpipe and OpenMPI.

If I run Netpipe over OpenMPI (NPmpi) I get a network latency around 60
microseconds for small messages (less than 2kbytes). However, when I run
Netpipe over TCP (NPtcp) I always get around 100 microseconds. For bigger
messages everything seems to be OK.

I'm using the BTL TCP in OpenMPI, so I can't understand why OpenMPI
outperforms raw TCP performance for small messages (40us of difference). I
also have run the PingPong test from the Intel Media Benchmarks and the
latency results for OpenMPI are very similar (60us) to those obtained with
NPmpi

Can OpenMPI outperform Netpipe over TCP? Why? Is OpenMPI  doing any
optimization in BTL TCP?

The results for OpenMPI aren't so good but we must take into account the
network virtualization overhead under Xen

Thanks for your reply