subject:"Re\: \[Openstack\] Directional network performance issues with Neutron \+ OpenvSwitch"

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-11-20 Thread Gmi M

Hi Thiago,

I updated your bug report with my own tests and I don't experience your
performance issues.

George



On Tue, Nov 19, 2013 at 6:53 PM, Martinx - ジェームズ
thiagocmarti...@gmail.comwrote:

 Okay!

 BUG filled: https://bugs.launchpad.net/neutron/+bug/1252900

 Regards,
 Thiago


 On 19 November 2013 16:00, Razique Mahroua razique.mahr...@gmail.comwrote:

 Yup :)


 On 18 Nov 2013, at 22:09, Martinx - ジェームズ wrote:

  Guys,

 Can I fill a BUG about this issue?! If yes, where?! Neutron Launchpad
 page?

 Tks,
 Thiago


 On 12 November 2013 04:24, Martinx - ジェームズ thiagocmarti...@gmail.com
 wrote:

  At least one guy from Rackspace is aware of this problem, thanks Anne
 and
 James Denton!   ^_^

 Hope to talk with James Page on IRC tomorrow, today was too complicated
 for me... More experts coming!

 I have a good environment for you guys to test and debug this in deep,
 if
 desired.

 BTW, hey Ubuntu guys! Please, release the ML2 plugin! ASAP!!  I would
 love
 to try it!=D

 Best,
 Thiago


 On 12 November 2013 02:40, Geraint Jones gera...@koding.com wrote:

  I suddenly have the identical situation occurring here - of note I am
 using grizzly and there have been two changes to the environment that
 have
 seemingly caused this : upgrade of OVS to 1.11 and upgrade of quantum-*
 from 2013.1.2 to 2013.1.3

 I haven’t tried the default 1.04 from 12.04 and I can’t as this is a
 prod
 system.

 However if the openstack update is causing it then here is the place to
 start I suspect : https://launchpad.net/neutron/grizzly/2013.1.3

 Performance of 1.04 in my env makes that unusable.


 --
 Geraint Jones




 On 11/11/13 2:47 am, Jay Pipes jaypi...@gmail.com wrote:

  On 11/10/2013 01:35 PM, Martinx - ジェームズ wrote:

 Hi Jay!

 Thank you! I'll definitely take a look at those cookbooks but, I
 already
 tried Havana (Cloud Archive) with OVS 1.11.0, same poor results.

 Also, my previous region based on Grizzly / Quantum / GRE, worked
 perfectly for months (except with MTU = 1400) and, Havana is somehow
 different.


 Interesting. Well, we're just beginning the process of our Havana
 deployment testing and changes, so we'll certainly be double-checking
 performance based on the above feedback.

 Best,
 -jay


 ___
 Mailing list:
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
 Post to : openstack@lists.openstack.org
 Unsubscribe :
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack





  ___
 Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/
 openstack
 Post to : openstack@lists.openstack.org
 Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/
 openstack



 ___
 Mailing list:
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
 Post to : openstack@lists.openstack.org
 Unsubscribe :
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-11-19 Thread Razique Mahroua


Yup :)

On 18 Nov 2013, at 22:09, Martinx - ジェームズ wrote:


Guys,

Can I fill a BUG about this issue?! If yes, where?! Neutron Launchpad 
page?


Tks,
Thiago


On 12 November 2013 04:24, Martinx - ジェームズ 
thiagocmarti...@gmail.comwrote:


At least one guy from Rackspace is aware of this problem, thanks Anne 
and

James Denton!   ^_^

Hope to talk with James Page on IRC tomorrow, today was too 
complicated

for me... More experts coming!

I have a good environment for you guys to test and debug this in 
deep, if

desired.

BTW, hey Ubuntu guys! Please, release the ML2 plugin! ASAP!!  I would 
love

to try it!=D

Best,
Thiago


On 12 November 2013 02:40, Geraint Jones gera...@koding.com wrote:

I suddenly have the identical situation occurring here - of note I 
am
using grizzly and there have been two changes to the environment 
that have
seemingly caused this : upgrade of OVS to 1.11 and upgrade of 
quantum-*

from 2013.1.2 to 2013.1.3

I haven’t tried the default 1.04 from 12.04 and I can’t as this 
is a prod

system.

However if the openstack update is causing it then here is the place 
to

start I suspect : https://launchpad.net/neutron/grizzly/2013.1.3

Performance of 1.04 in my env makes that unusable.


--
Geraint Jones




On 11/11/13 2:47 am, Jay Pipes jaypi...@gmail.com wrote:


On 11/10/2013 01:35 PM, Martinx - ジェームズ wrote:

Hi Jay!

Thank you! I'll definitely take a look at those cookbooks but, I
already
tried Havana (Cloud Archive) with OVS 1.11.0, same poor results.

Also, my previous region based on Grizzly / Quantum / GRE, worked
perfectly for months (except with MTU = 1400) and, Havana is 
somehow

different.


Interesting. Well, we're just beginning the process of our Havana
deployment testing and changes, so we'll certainly be 
double-checking

performance based on the above feedback.

Best,
-jay


___
Mailing list:
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe :
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack







___
Mailing list: 
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Post to : openstack@lists.openstack.org
Unsubscribe : 
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-11-19 Thread Martinx - ジェームズ

Okay!

BUG filled: https://bugs.launchpad.net/neutron/+bug/1252900

Regards,
Thiago


On 19 November 2013 16:00, Razique Mahroua razique.mahr...@gmail.comwrote:

 Yup :)


 On 18 Nov 2013, at 22:09, Martinx - ジェームズ wrote:

  Guys,

 Can I fill a BUG about this issue?! If yes, where?! Neutron Launchpad
 page?

 Tks,
 Thiago


 On 12 November 2013 04:24, Martinx - ジェームズ thiagocmarti...@gmail.com
 wrote:

  At least one guy from Rackspace is aware of this problem, thanks Anne and
 James Denton!   ^_^

 Hope to talk with James Page on IRC tomorrow, today was too complicated
 for me... More experts coming!

 I have a good environment for you guys to test and debug this in deep, if
 desired.

 BTW, hey Ubuntu guys! Please, release the ML2 plugin! ASAP!!  I would
 love
 to try it!=D

 Best,
 Thiago


 On 12 November 2013 02:40, Geraint Jones gera...@koding.com wrote:

  I suddenly have the identical situation occurring here - of note I am
 using grizzly and there have been two changes to the environment that
 have
 seemingly caused this : upgrade of OVS to 1.11 and upgrade of quantum-*
 from 2013.1.2 to 2013.1.3

 I haven’t tried the default 1.04 from 12.04 and I can’t as this is a
 prod
 system.

 However if the openstack update is causing it then here is the place to
 start I suspect : https://launchpad.net/neutron/grizzly/2013.1.3

 Performance of 1.04 in my env makes that unusable.


 --
 Geraint Jones




 On 11/11/13 2:47 am, Jay Pipes jaypi...@gmail.com wrote:

  On 11/10/2013 01:35 PM, Martinx - ジェームズ wrote:

 Hi Jay!

 Thank you! I'll definitely take a look at those cookbooks but, I
 already
 tried Havana (Cloud Archive) with OVS 1.11.0, same poor results.

 Also, my previous region based on Grizzly / Quantum / GRE, worked
 perfectly for months (except with MTU = 1400) and, Havana is somehow
 different.


 Interesting. Well, we're just beginning the process of our Havana
 deployment testing and changes, so we'll certainly be double-checking
 performance based on the above feedback.

 Best,
 -jay


 ___
 Mailing list:
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
 Post to : openstack@lists.openstack.org
 Unsubscribe :
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack





  ___
 Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/
 openstack
 Post to : openstack@lists.openstack.org
 Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/
 openstack


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-11-11 Thread Geraint Jones

I suddenly have the identical situation occurring here - of note I am
using grizzly and there have been two changes to the environment that have
seemingly caused this : upgrade of OVS to 1.11 and upgrade of quantum-*
from 2013.1.2 to 2013.1.3

I haven’t tried the default 1.04 from 12.04 and I can’t as this is a prod
system.

However if the openstack update is causing it then here is the place to
start I suspect : https://launchpad.net/neutron/grizzly/2013.1.3

Performance of 1.04 in my env makes that unusable.


-- 
Geraint Jones




On 11/11/13 2:47 am, Jay Pipes jaypi...@gmail.com wrote:

On 11/10/2013 01:35 PM, Martinx - ジェームズ wrote:
 Hi Jay!
 
 Thank you! I'll definitely take a look at those cookbooks but, I
already 
 tried Havana (Cloud Archive) with OVS 1.11.0, same poor results.
 
 Also, my previous region based on Grizzly / Quantum / GRE, worked
 perfectly for months (except with MTU = 1400) and, Havana is somehow
 different.

Interesting. Well, we're just beginning the process of our Havana
deployment testing and changes, so we'll certainly be double-checking
performance based on the above feedback.

Best,
-jay


___
Mailing list: 
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : 
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack



___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-11-10 Thread Jay Pipes

On 11/09/2013 07:09 PM, Martinx - ジェームズ wrote:

Guys,

This problem is kind of a deal breaker... I was counting on OpenStack
Havana (and with Ubuntu) for my first public cloud that I'm (was) about
to announce / launch but, this problem changed everything.

I can not put Havana with Ubuntu LTS into production because of this
network issue. This is a very serious problem for me... Since all sites,
or even ssh connections, that pass through the Floating IPs entering
into the tenant's subnets, are very slow and, all the connections
freezes for seconds, every minute.

Again, I'm seeing that there is no way to put Havana into production
(using Per-Tenant Routers with Private Networks), _because the Network
Node is broken_. At least when with Ubuntu... I'll try it with Debian 7,
or CentOS (I don't like it), just to see if the problem persist but, I
prefer Ubuntu distro since Warty Warthog...:-/

So, what is being done to fix it? I already tried everything I could,
without any kind of success...

Also, I followed this doc (to triple * triple re-check my env):
http://docs.openstack.org/havana/install-guide/install/apt/content/section_networking-routers-with-private-networks.html
but,
it does not work as expected.

I'd just like to point out that it is indeed possible to achieve good
network performance (bi-directional) with Ubuntu 12.04, OVS 1.11, and
OpenStack Grizzly with Neutron and GRE tunnels. We've deployed two zones
with it and after upgrading to OVS 1.11, we are seeing pretty good
performance.

We use the OpenStack Chef cookbooks to configure Neutron:

https://github.com/stackforge/cookbook-openstack-network

You may want to go through the above cookbook and check the default
settings that are in the attributes and written to the configuration
file templates.

I don't know of anything that changed between Grizzly and Havana that
would have had an impact on network performance, but perhaps someone
from the Neutron dev community could chime in here and write if there's
been anything added in the Havana timeframe that may affect network
performance...

Best,
-jay

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-11-10 Thread Martinx - ジェームズ

Hi Jay!

Thank you! I'll definitely take a look at those cookbooks but, I already
tried Havana (Cloud Archive) with OVS 1.11.0, same poor results.

Also, my previous region based on Grizzly / Quantum / GRE, worked perfectly
for months (except with MTU = 1400) and, Havana is somehow different.

Thanks!
Thiago


On 10 November 2013 15:21, Jay Pipes jaypi...@gmail.com wrote:

 On 11/09/2013 07:09 PM, Martinx - ジェ�`ムズ wrote:

 Guys,

 This problem is kind of a deal breaker... I was counting on OpenStack
 Havana (and with Ubuntu) for my first public cloud that I'm (was) about
 to announce / launch but, this problem changed everything.

 I can not put Havana with Ubuntu LTS into production because of this
 network issue. This is a very serious problem for me... Since all sites,
 or even ssh connections, that pass through the Floating IPs entering
 into the tenant's subnets, are very slow and, all the connections
 freezes for seconds, every minute.

 Again, I'm seeing that there is no way to put Havana into production
 (using Per-Tenant Routers with Private Networks), _because the Network
 Node is broken_. At least when with Ubuntu... I'll try it with Debian 7,

 or CentOS (I don't like it), just to see if the problem persist but, I
 prefer Ubuntu distro since Warty Warthog...:-/

 So, what is being done to fix it? I already tried everything I could,
 without any kind of success...

 Also, I followed this doc (to triple * triple re-check my env):
 http://docs.openstack.org/havana/install-guide/install/
 apt/content/section_networking-routers-with-private-networks.html but,
 it does not work as expected.


 I'd just like to point out that it is indeed possible to achieve good
 network performance (bi-directional) with Ubuntu 12.04, OVS 1.11, and
 OpenStack Grizzly with Neutron and GRE tunnels. We've deployed two zones
 with it and after upgrading to OVS 1.11, we are seeing pretty good
 performance.

 We use the OpenStack Chef cookbooks to configure Neutron:

 https://github.com/stackforge/cookbook-openstack-network

 You may want to go through the above cookbook and check the default
 settings that are in the attributes and written to the configuration file
 templates.

 I don't know of anything that changed between Grizzly and Havana that
 would have had an impact on network performance, but perhaps someone from
 the Neutron dev community could chime in here and write if there's been
 anything added in the Havana timeframe that may affect network
 performance...

 Best,
 -jay



 ___
 Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/
 openstack
 Post to : openstack@lists.openstack.org
 Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/
 openstack

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-11-10 Thread Jay Pipes

On 11/10/2013 01:35 PM, Martinx - ジェームズ wrote:
 Hi Jay!
 
 Thank you! I'll definitely take a look at those cookbooks but, I already 
 tried Havana (Cloud Archive) with OVS 1.11.0, same poor results.
 
 Also, my previous region based on Grizzly / Quantum / GRE, worked 
 perfectly for months (except with MTU = 1400) and, Havana is somehow 
 different.

Interesting. Well, we're just beginning the process of our Havana
deployment testing and changes, so we'll certainly be double-checking
performance based on the above feedback.

Best,
-jay


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-11-10 Thread Martinx - ジェームズ

Cool! Let me know what do you'll need.

I'll make a tenant / project / user for you here at my cloud and I can give
you root access to the network node (or any openstack node).

Let me know if it is enough for you to debug / test it.

Cheers!
Thiago


On 10 November 2013 07:34, James Page james.p...@ubuntu.com wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 On 10/11/13 00:09, Martinx - ジェームズ wrote:
  Also, I followed this doc (to triple * triple re-check my env):
 
 http://docs.openstack.org/havana/install-guide/install/apt/content/section_networking-routers-with-private-networks.html
  but, it does not work as expected.
 
  BTW, I can give full access into my environment for you guys, no
  problem... I can build a lab from scratch, following your
  instructions, I can also give root access to OpenStack experts...
  Just, let me know... =)

 Hey

 If you can set this up I can spare some time to help you debug
 tomorrow (monday) between 0900 and 1800 utc

 Cheers

 James

 - --
 James Page
 Ubuntu and Debian Developer
 james.p...@ubuntu.com
 jamesp...@debian.org
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.14 (GNU/Linux)
 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

 iQIcBAEBCAAGBQJSf1M3AAoJEL/srsug59jDuacP/1hU3tDk6Dk9I+jsxjwlIH9H
 JeIs00GoLdB3yg+M/c1aWbPV9Pihgm1mC27aI/mnBXO5gOhQ/U8oss4jjz+46Cpx
 qCWuIgsRFD0OpR/DzZ0cbzB64Pa/vzg9Sb3NP5YrQxvcI/WYJVDHhLuc/rvyfBsD
 zi/H4ODIOb9ptZ5fbJyQGbmZUHArdUJ9FaN57PYB0Y7KQOejhYE3qjqk/IjIXm7e
 mMAVVyHf8EVadcEFy+D+CxpIBXQIgjrzy5Amhrw/3q9DPs3OHoXWAGU8/ApDZiVP
 yo01Pm3ZnlnXfFw3csf0PJEMKAkE3wKb/9YzXWBXNHHND0+zRKNyCCB8RE+hDDnu
 M72Lj1zrXkFHhAWbPM3gsGHzGY8bsTswYDvOGrB8cTf8KcF54m8ruJb/lzdesHh3
 l0cyTUKkwuWkZ4LJ63oI7FIsL4bTGt/bBvjf3FF0iFIK0OFxuGuvKtZpdi9xek8i
 ihy/f0r+AlPA5pU1nMkTsOhS1v61GKLF1ygXBK0PLBeHX5wnnnxqchS4yVkjSRup
 fwPmb0u2gLD8gbPINXi46sePuCwn8acBFdIvNoz9v4APYGrLgnS7rWinrjrOCHTq
 EsuZ6fYs5Lnr48tPlv3WxmpHM9UNknio1zy+Bk3vrNL/43ppjkJYXVVE/JstmcYk
 NjrHeUuQkdENzBZvRODx
 =CcA2
 -END PGP SIGNATURE-

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-11-09 Thread Martinx - ジェームズ

Guys,

This problem is kind of a deal breaker... I was counting on OpenStack
Havana (and with Ubuntu) for my first public cloud that I'm (was) about to
announce / launch but, this problem changed everything.

I can not put Havana with Ubuntu LTS into production because of this
network issue. This is a very serious problem for me... Since all sites, or
even ssh connections, that pass through the Floating IPs entering into
the tenant's subnets, are very slow and, all the connections freezes for
seconds, every minute.

Again, I'm seeing that there is no way to put Havana into production (using
Per-Tenant Routers with Private Networks), *because the Network Node is
broken*. At least when with Ubuntu... I'll try it with Debian 7, or CentOS
(I don't like it), just to see if the problem persist but, I prefer Ubuntu
distro since Warty Warthog...:-/

So, what is being done to fix it? I already tried everything I could,
without any kind of success...

BTW, I can give full access into my environment for you guys, no problem...
I can build a lab from scratch, following your instructions, I can also
give root access to OpenStack experts... Just, let me know...=)

Thanks!
Thiago

On 6 November 2013 09:20, Martinx - ジェームズ thiagocmarti...@gmail.com wrote:

Hello Stackers!

Sorry to not back on this topic last week, too many things to do...

So, instead of trying this and that, reply this, reply again... I made a
video about this problem, I hope that helps more than those e-mails that
I'm writing!=P

Honestly, I don't know the source of this problem, if it is with OpenStack
/ Neutron, or with Linux / Namespace / OVS... It would be great to test
it alone, Ubuntu Linux + Namespace + OVS (without Neutron), to see if the
problem persist but, I have no idea about how to setup everything, just
like Neutron does. Maybe, I just need to reproduce the Namespace and OVS
bridges / ports / VXLAN - as is, without Neutron?! I can try that...

Also, my Grizzly setup is gone, I deleted it... Sorry about that... I know
it works because it is the first time I'm seeing this problem... I had used
Grizzly for ~5 months with only 1 problem (related to MTU 1400) but, this
problem with Havana is totally different...

Video:

OpenStack Havana L3 Router problem - Ubuntu 12.04.3 LTS:
http://www.youtube.com/watch?v=jVjiphMuuzM

* After 5 minutes, I inserted a new video, showing how I fixed it, by
running Squid within the Tenant router. You guys can see that, using the
default Tenant router (10:30), it will take about 1 hour to finish the
apt-get download and, with Squid (09:27), it goes down to about 3 minutes
(no, it is still not cached, I clean it for each test).

Sorry about the size of the video, it is about 12 minutes and high-res (to
see the screen details) but, it is a serious problem and I think it worth
watching it...

NOTE: Sorry about my English! It is very hard to speak a non-native
language, handling an Android phone and typing the keyboard...:-)

Best!
Thiago

On 28 October 2013 07:00, Darragh O'Reilly
dara2002-openst...@yahoo.comwrote:

Thiago,

some more answers below.

Btw: I saw the problem with a qemu-nbd -c process using all the cpu on
the compute. It happened just once - must be a bug in it. You can disable
libvirt injection if you don't want it by setting libvirt_inject_partition
= -2 in nova.conf.

On Saturday, 26 October 2013, 16:58, Martinx - ジェームズ
thiagocmarti...@gmail.com wrote:

Hi Darragh,

Yes, on the same net-node machine, Grizzly works, Havana don't... But,
for Grizzly, I have Ubuntu 12.04 with Linux 3.2 and OVS 1.4.0-1ubuntu1.6.

so we don't know if the problem is due to Neutron, the Ubuntu kernel or
OVS. I suspect the kernel as it implements the routing/natting, interfaces
and namespaces. I don't think Neutron Havana changes how these things are
setup too much.

Can you try running Havana on a network node with the Linux 3.2 kernel?

If I replace the Havana net-node hardware entirely, the problem persist
(i.e. it follows Havana net-node), so, I think, it can not be related to
the hardware.

I tried Havana with both OVS 1.10.2 (from Cloud Archive) and with OVS
1.11.0 (compiled and installed by myself using dpkg-buildpackage / dpkg).

My logs (including Open vSwitch) right after starting an Instance
(nothing at OVS logs):

http://paste.openstack.org/show/49870/

I tried everything, including installing the Network Node on top of a
KVM virtual machine or directly on a dedicated server, same result, the
problem follows Hanava node (virtual or physical). Grizzly Network Node
works both on a KVM VM or on a dedicated server.

Regards,
Thiago

On 26 October 2013 06:28, Darragh OReilly wrote:

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-28 Thread Darragh O'Reilly

Thiago,

some more answers below.

Btw: I saw the problem with a qemu-nbd -c process using all the cpu on the 
compute. It happened just once - must be a bug in it. You can disable libvirt 
injection if you don't want it by setting libvirt_inject_partition = -2 in 
nova.conf.


On Saturday, 26 October 2013, 16:58, Martinx - ジェームズ 
thiagocmarti...@gmail.com wrote:

Hi Darragh,


Yes, on the same net-node machine, Grizzly works, Havana don't... But, for 
Grizzly, I have Ubuntu 12.04 with Linux 3.2 and OVS 1.4.0-1ubuntu1.6.


so we don't know if the problem is due to Neutron, the Ubuntu kernel or OVS. I 
suspect the kernel as it implements the routing/natting, interfaces and 
namespaces.  I don't think Neutron Havana changes how these things are setup 
too much.

Can you try running Havana on a network node with the Linux 3.2 kernel?




If I replace the Havana net-node hardware entirely, the problem persist (i.e. 
it follows Havana net-node), so, I think, it can not be related to the 
hardware.


I tried Havana with both OVS 1.10.2 (from Cloud Archive) and with OVS 1.11.0 
(compiled and installed by myself using dpkg-buildpackage / dpkg).


My logs (including Open vSwitch) right after starting an Instance (nothing at 
OVS logs):


http://paste.openstack.org/show/49870/



I tried everything, including installing the Network Node on top of a KVM 
virtual machine or directly on a dedicated server, same result, the problem 
follows Hanava node (virtual or physical). Grizzly Network Node works both on 
a KVM VM or on a dedicated server.


Regards,
Thiago



On 26 October 2013 06:28, Darragh OReilly wrote:

Hi Thiago,

so just to confirm - on the same netnode machine, with the same OS, kernal 
and OVS versions - Grizzly is ok and Havana is not?

Also, on the network node, are there any errors in the neutron logs, the 
syslog, or /var/log/openvswitch/* ?



Re, Darragh.




On Saturday, 26 October 2013, 5:25, Martinx - ジェームズ 
thiagocmarti...@gmail.com wrote:
 
LOL... One day, Internet via Quantum Entanglement! Oops, Neutron!     =P



I'll ignore the problems related to the performance between two instances 
on different hypervisors for now. My priority is the connectivity issue 
with the External networks... At least, internal is slow but it works.


I'm about to remove the L3 Agent / Namespaces entirely from my topology... 
It is a shame because it is pretty cool! With Grizzly I had no problems at 
all. Plus, I need to put Havana into production ASAP!    :-/


Why I'm giving it up (of L3 / NS) for now? Because I tried:


The option tenant_network_type with gre, vxlan and vlan (range 
physnet1:206:256 configured at the 3Com switch as tagged).


From the instances, the connection with External network is always slow, no 
matter if I choose for Tenants, GRE, VXLAN or VLAN.


For example, right now, I'm using VLAN, same problem.


Don't you guys think that this can be a problem with the bridge br-ex and 
its internals ? Since I swapped the Tenant Network Type 3 times, same 
result... But I still did not removed the br-ex from the scene.


If someone wants to debug it, I can give the root password, no problem, it 
is just a lab...   =)


Thanks!
Thiago


On 25 October 2013 19:45, Rick Jones rick.jon...@hp.com wrote:

On 10/25/2013 02:37 PM, Martinx - ジェームズ wrote:

WOW!! Thank you for your time Rick! Awesome answer!!    =D

I'll do this tests (with ethtool GRO / CKO) tonight but, do you think
that this is the main root of the problem?!


I mean, I'm seeing two distinct problems here:

1- Slow connectivity to the External network plus SSH lags all over the
cloud (everything that pass trough L3 / Namespace is problematic), and;

2- Communication between two Instances on different hypervisors (i.e.
maybe it is related to this GRO / CKO thing).


So, two different problems, right?!


One or two problems I cannot say.    Certainly if one got the benefit of 
stateless offloads in one direction and not the other, one could see different 
performance limits in each direction.

All I can really say is I liked it better when we were called Quantum, 
because then I could refer to it as Spooky networking at a distance.  
Sadly, describing Neutron as Networking with no inherent charge doesn't 
work as well :)

rick jones









___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-27 Thread Martinx - ジェームズ

Stackers,

I have a small report from my latest tests.


Tests:

* Namespace (br-ex) *-* Internet - OK

* Namespace (vxlan,gre,vlan) *-* Tenant - OK

* Tenant *-* Namespace *-* Internet - *NOT-OK* (Very slow / Unstable /
Intermittent)


Since the connectivity from Tenant to its Namespace is fine AND, from its
Namespace to the Internet is also fine too, then, come to my mind: Hey, why
not run Squid WITHIN the Tenant Namespace as a workaround?!

And... Voialá! There I Fixed It!=P


New Test:

Tenant *-* *Namespace with Squid* *-* Internet - OK!


*NOTE:* I'm sure that the entire ethernet path (without L3, Namespace, OVS,
VXLANs, GREs, or Linux bridges, just plain Linux + IPs), *from the
hypervisor to the Internet*, *passing trough the same Network Node hardware
/ path*, is working smoothly. I mean, I tested the entire path BEFORE
installing OpenStack Havana... So, I it can not be a infrastructure /
hardware issue, it must be something else, located at the software layer
running within the Network Node itself.

I'm about to send more info about this problem.

Thanks!
Thiago

On 26 October 2013 13:57, Martinx - ジェームズ thiagocmarti...@gmail.com wrote:

 Hi Darragh,

 Yes, on the same net-node machine, Grizzly works, Havana don't... But, for
 Grizzly, I have Ubuntu 12.04 with Linux 3.2 and OVS 1.4.0-1ubuntu1.6.

 If I replace the Havana net-node hardware entirely, the problem persist
 (i.e. it follows Havana net-node), so, I think, it can not be related to
 the hardware.

 I tried Havana with both OVS 1.10.2 (from Cloud Archive) and with OVS
 1.11.0 (compiled and installed by myself using dpkg-buildpackage / dpkg).

 My logs (including Open vSwitch) right after starting an Instance (nothing
 at OVS logs):

 http://paste.openstack.org/show/49870/

 I tried everything, including installing the Network Node on top of a KVM
 virtual machine or directly on a dedicated server, same result, the problem
 follows Hanava node (virtual or physical). Grizzly Network Node works both
 on a KVM VM or on a dedicated server.

 Regards,
 Thiago


 On 26 October 2013 06:28, Darragh OReilly darragh.orei...@yahoo.comwrote:

 Hi Thiago,

 so just to confirm - on the same netnode machine, with the same OS,
 kernal and OVS versions - Grizzly is ok and Havana is not?

 Also, on the network node, are there any errors in the neutron logs, the
 syslog, or /var/log/openvswitch/* ?

 Re, Darragh.


   On Saturday, 26 October 2013, 5:25, Martinx - ジェームズ 
 thiagocmarti...@gmail.com wrote:

 LOL... One day, Internet via Quantum Entanglement! Oops, Neutron! =P

 I'll ignore the problems related to the performance between two
 instances on different hypervisors for now. My priority is the
 connectivity issue with the External networks... At least, internal is slow
 but it works.

 I'm about to remove the L3 Agent / Namespaces entirely from my
 topology... It is a shame because it is pretty cool! With Grizzly I had no
 problems at all. Plus, I need to put Havana into production ASAP!:-/

 Why I'm giving it up (of L3 / NS) for now? Because I tried:

 The option tenant_network_type with gre, vxlan and vlan (range
 physnet1:206:256 configured at the 3Com switch as tagged).

 From the instances, the connection with External network *is always slow*,
 no matter if I choose for Tenants, GRE, VXLAN or VLAN.

 For example, right now, I'm using VLAN, same problem.

 Don't you guys think that this can be a problem with the bridge br-ex
 and its internals ? Since I swapped the Tenant Network Type 3 times, same
 result... But I still did not removed the br-ex from the scene.

 If someone wants to debug it, I can give the root password, no problem,
 it is just a lab...   =)

 Thanks!
 Thiago

 On 25 October 2013 19:45, Rick Jones rick.jon...@hp.com wrote:

 On 10/25/2013 02:37 PM, Martinx - ジェームズ wrote:

 WOW!! Thank you for your time Rick! Awesome answer!!=D

 I'll do this tests (with ethtool GRO / CKO) tonight but, do you think
 that this is the main root of the problem?!


 I mean, I'm seeing two distinct problems here:

 1- Slow connectivity to the External network plus SSH lags all over the
 cloud (everything that pass trough L3 / Namespace is problematic), and;

 2- Communication between two Instances on different hypervisors (i.e.
 maybe it is related to this GRO / CKO thing).


 So, two different problems, right?!


 One or two problems I cannot say.Certainly if one got the benefit of
 stateless offloads in one direction and not the other, one could see
 different performance limits in each direction.

 All I can really say is I liked it better when we were called Quantum,
 because then I could refer to it as Spooky networking at a distance.
  Sadly, describing Neutron as Networking with no inherent charge doesn't
 work as well :)

 rick jones






___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe :

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-26 Thread Darragh OReilly

Hi Thiago,

so just to confirm - on the same netnode machine, with the same OS, kernal and 
OVS versions - Grizzly is ok and Havana is not?

Also, on the network node, are there any errors in the neutron logs, the 
syslog, or /var/log/openvswitch/* ?


Re, Darragh.




On Saturday, 26 October 2013, 5:25, Martinx - ジェームズ thiagocmarti...@gmail.com 
wrote:
 
LOL... One day, Internet via Quantum Entanglement! Oops, Neutron!     =P



I'll ignore the problems related to the performance between two instances on 
different hypervisors for now. My priority is the connectivity issue with the 
External networks... At least, internal is slow but it works.


I'm about to remove the L3 Agent / Namespaces entirely from my topology... It 
is a shame because it is pretty cool! With Grizzly I had no problems at all. 
Plus, I need to put Havana into production ASAP!    :-/


Why I'm giving it up (of L3 / NS) for now? Because I tried:


The option tenant_network_type with gre, vxlan and vlan (range 
physnet1:206:256 configured at the 3Com switch as tagged).


From the instances, the connection with External network is always slow, no 
matter if I choose for Tenants, GRE, VXLAN or VLAN.


For example, right now, I'm using VLAN, same problem.


Don't you guys think that this can be a problem with the bridge br-ex and 
its internals ? Since I swapped the Tenant Network Type 3 times, same 
result... But I still did not removed the br-ex from the scene.


If someone wants to debug it, I can give the root password, no problem, it is 
just a lab...   =)


Thanks!
Thiago


On 25 October 2013 19:45, Rick Jones rick.jon...@hp.com wrote:

On 10/25/2013 02:37 PM, Martinx - ジェームズ wrote:

WOW!! Thank you for your time Rick! Awesome answer!!    =D

I'll do this tests (with ethtool GRO / CKO) tonight but, do you think
that this is the main root of the problem?!


I mean, I'm seeing two distinct problems here:

1- Slow connectivity to the External network plus SSH lags all over the
cloud (everything that pass trough L3 / Namespace is problematic), and;

2- Communication between two Instances on different hypervisors (i.e.
maybe it is related to this GRO / CKO thing).


So, two different problems, right?!


One or two problems I cannot say.    Certainly if one got the benefit of 
stateless offloads in one direction and not the other, one could see different 
performance limits in each direction.

All I can really say is I liked it better when we were called Quantum, 
because then I could refer to it as Spooky networking at a distance.  
Sadly, describing Neutron as Networking with no inherent charge doesn't 
work as well :)

rick jones




___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-26 Thread Martinx - ジェームズ

Hi Darragh,

Yes, on the same net-node machine, Grizzly works, Havana don't... But, for
Grizzly, I have Ubuntu 12.04 with Linux 3.2 and OVS 1.4.0-1ubuntu1.6.

If I replace the Havana net-node hardware entirely, the problem persist
(i.e. it follows Havana net-node), so, I think, it can not be related to
the hardware.

I tried Havana with both OVS 1.10.2 (from Cloud Archive) and with OVS
1.11.0 (compiled and installed by myself using dpkg-buildpackage / dpkg).

My logs (including Open vSwitch) right after starting an Instance (nothing
at OVS logs):

http://paste.openstack.org/show/49870/

I tried everything, including installing the Network Node on top of a KVM
virtual machine or directly on a dedicated server, same result, the problem
follows Hanava node (virtual or physical). Grizzly Network Node works both
on a KVM VM or on a dedicated server.

Regards,
Thiago


On 26 October 2013 06:28, Darragh OReilly darragh.orei...@yahoo.com wrote:

 Hi Thiago,

 so just to confirm - on the same netnode machine, with the same OS, kernal
 and OVS versions - Grizzly is ok and Havana is not?

 Also, on the network node, are there any errors in the neutron logs, the
 syslog, or /var/log/openvswitch/* ?

 Re, Darragh.


   On Saturday, 26 October 2013, 5:25, Martinx - ジェームズ 
 thiagocmarti...@gmail.com wrote:

 LOL... One day, Internet via Quantum Entanglement! Oops, Neutron! =P

 I'll ignore the problems related to the performance between two instances
 on different hypervisors for now. My priority is the connectivity issue
 with the External networks... At least, internal is slow but it works.

 I'm about to remove the L3 Agent / Namespaces entirely from my topology...
 It is a shame because it is pretty cool! With Grizzly I had no problems at
 all. Plus, I need to put Havana into production ASAP!:-/

 Why I'm giving it up (of L3 / NS) for now? Because I tried:

 The option tenant_network_type with gre, vxlan and vlan (range
 physnet1:206:256 configured at the 3Com switch as tagged).

 From the instances, the connection with External network *is always slow*,
 no matter if I choose for Tenants, GRE, VXLAN or VLAN.

 For example, right now, I'm using VLAN, same problem.

 Don't you guys think that this can be a problem with the bridge br-ex
 and its internals ? Since I swapped the Tenant Network Type 3 times, same
 result... But I still did not removed the br-ex from the scene.

 If someone wants to debug it, I can give the root password, no problem, it
 is just a lab...   =)

 Thanks!
 Thiago

 On 25 October 2013 19:45, Rick Jones rick.jon...@hp.com wrote:

 On 10/25/2013 02:37 PM, Martinx - ジェームズ wrote:

 WOW!! Thank you for your time Rick! Awesome answer!!=D

 I'll do this tests (with ethtool GRO / CKO) tonight but, do you think
 that this is the main root of the problem?!


 I mean, I'm seeing two distinct problems here:

 1- Slow connectivity to the External network plus SSH lags all over the
 cloud (everything that pass trough L3 / Namespace is problematic), and;

 2- Communication between two Instances on different hypervisors (i.e.
 maybe it is related to this GRO / CKO thing).


 So, two different problems, right?!


 One or two problems I cannot say.Certainly if one got the benefit of
 stateless offloads in one direction and not the other, one could see
 different performance limits in each direction.

 All I can really say is I liked it better when we were called Quantum,
 because then I could refer to it as Spooky networking at a distance.
  Sadly, describing Neutron as Networking with no inherent charge doesn't
 work as well :)

 rick jones





___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-25 Thread Darragh O'Reilly

Hi Thiago,

you have configured DHCP to push out a MTU of 1400. Can you confirm that the 
1400 MTU is actually getting out to the instances by running 'ip link' on them?

There is an open problem where the veth used to connect the OVS and Linux 
bridges causes a performance drop on some kernels - 
https://bugs.launchpad.net/nova-project/+bug/1223267 .  If you are using the 
LibvirtHybridOVSBridgeDriver VIF driver, can you try changing to 
LibvirtOpenVswitchDriver and repeat the iperf test between instances on 
different compute-nodes.

What NICs (maker+model) are you using? You could try disabling any off-load 
functionality - 'ethtool -k iface-used-for-gre'.

What kernal are you using: 'uname -a'?

Re, Darragh.

 Hi Daniel,

 
 I followed that page, my Instances MTU is lowered by DHCP Agent but, same
 result: poor network performance (internal between Instances and when
 trying to reach the Internet).
 
 No matter if I use dnsmasq_config_file=/etc/neutron/dnsmasq-neutron.conf +
 dhcp-option-force=26,1400 for my Neutron DHCP agent, or not (i.e. MTU =
 1500), the result is almost the same.
 
 I'll try VXLAN (or just VLANs) this weekend to see if I can get better
 results...
 
 Thanks!
 Thiago

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-25 Thread Darragh O'Reilly

Hi Thiago,

for the VIF error: you will need to change qemu.conf as described here:
http://openvswitch.org/openstack/documentation/

Re, Darragh.




On Friday, 25 October 2013, 15:14, Martinx - ジェームズ thiagocmarti...@gmail.com 
wrote:
 
Hi Darragh,


Yes, Instances are getting MTU 1400.


I'm using LibvirtHybridOVSBridgeDriver at my Compute Nodes. I'll check BG 
1223267 right now! 




The LibvirtOpenVswitchDriver doesn't work, look:


http://paste.openstack.org/show/49709/



http://paste.openstack.org/show/49710/





My NICs are RTL8111/8168/8411 PCI Express Gigabit Ethernet, Hypervisors 
motherboard are MSI-890FXA-GD70.


The command ethtool -K eth1 gro off did not had any effect on the 
communication between instances on different hypervisors, still poor, around 
248Mbit/sec, when its physical path reach 1Gbit/s (where GRE is built).


My Linux version is Linux hypervisor-1 3.8.0-32-generic #47~precise1-Ubuntu, 
same kernel on Network Node and others nodes too (Ubuntu 12.04.3 installed 
from scratch for this Havana deployment).


The only difference I can see right now, between my two hypervisors, is that 
my second is just a spare machine, with a slow CPU but, I don't think it will 
have a negative impact at the network throughput, since I have only 1 Instance 
running into it (plus a qemu-nbd process eating 90% of its CPU). I'll replace 
this CPU tomorrow, to redo this tests again but, I don't think that this is 
the source of my problem. The MOBOs of two hypervisors are identical, 1 3Com 
(manageable) switch connecting the two.


Thanks!
Thiago



On 25 October 2013 07:15, Darragh O'Reilly dara2002-openst...@yahoo.com 
wrote:

Hi Thiago,

you have configured DHCP to push out a MTU of 1400. Can you confirm that the 
1400 MTU is actually getting out to the instances by running 'ip link' on 
them?

There is an open problem where the veth used to connect the OVS and Linux 
bridges causes a performance drop on some kernels - 
https://bugs.launchpad.net/nova-project/+bug/1223267 .  If you are using the 
LibvirtHybridOVSBridgeDriver VIF driver, can you try changing to 
LibvirtOpenVswitchDriver and repeat the iperf test between instances on 
different compute-nodes.

What NICs (maker+model) are you using? You could try disabling any off-load 
functionality - 'ethtool -k iface-used-for-gre'.

What kernal are you using: 'uname -a'?

Re, Darragh.


 Hi Daniel,


 I followed that page, my Instances MTU is lowered by DHCP Agent but, same
 result: poor network performance (internal between Instances and when
 trying to reach the Internet).

 No matter if I use dnsmasq_config_file=/etc/neutron/dnsmasq-neutron.conf +
 dhcp-option-force=26,1400 for my Neutron DHCP agent, or not (i.e. MTU =
 1500), the result is almost the same.

 I'll try VXLAN (or just VLANs) this weekend to see if I can get better
 results...

 Thanks!
 Thiago


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack



___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-25 Thread Darragh O'Reilly



the uneven ssh performance is strange - maybe learning on the tunnel mesh is 
not stablizing. It is easy to mess it up by giving a wrong local_ip in the 
ovs-plugin config file. Check the tunnels ports on br-tun with 'ovs-vsctl 
show'. Is each one using the correct IPs? Br-tun should have N-1 gre-x ports - 
no more! Maybe you can put 'ovs-vsctl show' from the nodes on paste.openstack 
if there are not to many?

Re, Darragh.




On Friday, 25 October 2013, 16:20, Martinx - ジェームズ thiagocmarti...@gmail.com 
wrote:
 
I think can say... YAY!!    :-D


With LibvirtOpenVswitchDriver my internal communication is the double now! 
It goes from ~200 (with LibvirtHybridOVSBridgeDriver) to 400Mbit/s (with 
LibvirtOpenVswitchDriver)! Still far from 1Gbit/s (my physical path limit) 
but, more acceptable now.


The command ethtool -K eth1 gro off still makes no difference.


So, there is only 1 remain problem, when traffic pass trough L3 / Namespace, 
it is still useless. Even the SSH connection into my Instances, via its 
Floating IPs, is slow as hell, sometimes it just stops responding for a few 
seconds, and becomes online again out-of-nothing...


I just detect a weird behavior, when I run apt-get update from instance-1, 
it is slow as I said plus, its ssh connection (where I'm running apt-get 
update), stops responding right after I run apt-get update AND, all my 
others ssh connections also stops working too! For a few seconds... This means 
that when I run apt-get update from within instance-1, the SSH session of 
instance-2 is affected too!! There is something pretty bad going on at L3 / 
Namespace.


BTW, do you think that a ~400MBit/sec intra-vm-communication (GRE tunnel) on 
top of a 1Gbit ethernet is acceptable?! It is still less than a half...


Thank you!
Thiago


On 25 October 2013 12:28, Darragh O'Reilly dara2002-openst...@yahoo.com 
wrote:

Hi Thiago,


for the VIF error: you will need to change qemu.conf as described here:
http://openvswitch.org/openstack/documentation/


Re, Darragh.




On Friday, 25 October 2013, 15:14, Martinx - ジェームズ 
thiagocmarti...@gmail.com wrote:
 
Hi Darragh,


Yes, Instances are getting MTU 1400.


I'm using LibvirtHybridOVSBridgeDriver at my Compute Nodes. I'll check BG 
1223267 right now! 




The LibvirtOpenVswitchDriver doesn't work, look:


http://paste.openstack.org/show/49709/



http://paste.openstack.org/show/49710/





My NICs are RTL8111/8168/8411 PCI Express Gigabit Ethernet, Hypervisors 
motherboard are MSI-890FXA-GD70.


The command ethtool -K eth1 gro off did not had any effect on the 
communication between instances on different hypervisors, still poor, around 
248Mbit/sec, when its physical path reach 1Gbit/s (where GRE is built).


My Linux version is Linux hypervisor-1 3.8.0-32-generic 
#47~precise1-Ubuntu, same kernel on Network Node and others nodes too 
(Ubuntu 12.04.3 installed from scratch for this Havana deployment).


The only difference I can see right now, between my two hypervisors, is that 
my second is just a spare machine, with a slow CPU but, I don't think it 
will have a negative impact at the network throughput, since I have only 1 
Instance running into it (plus a qemu-nbd process eating 90% of its CPU). 
I'll replace this CPU tomorrow, to redo this tests again but, I don't think 
that this is the source of my problem. The MOBOs of two hypervisors are 
identical, 1 3Com (manageable) switch connecting the two.


Thanks!
Thiago



On 25 October 2013 07:15, Darragh O'Reilly dara2002-openst...@yahoo.com 
wrote:

Hi Thiago,

you have configured DHCP to push out a MTU of 1400. Can you confirm that 
the 1400 MTU is actually getting out to the instances by running 'ip link' 
on them?

There is an open problem where the veth used to connect the OVS and Linux 
bridges causes a performance drop on some kernels - 
https://bugs.launchpad.net/nova-project/+bug/1223267 .  If you are using 
the LibvirtHybridOVSBridgeDriver VIF driver, can you try changing to 
LibvirtOpenVswitchDriver and repeat the iperf test between instances on 
different compute-nodes.

What NICs (maker+model) are you using? You could try disabling any off-load 
functionality - 'ethtool -k iface-used-for-gre'.

What kernal are you using: 'uname -a'?

Re, Darragh.


 Hi Daniel,


 I followed that page, my Instances MTU is lowered by DHCP Agent but, same
 result: poor network performance (internal between Instances and when
 trying to reach the Internet).

 No matter if I use dnsmasq_config_file=/etc/neutron/dnsmasq-neutron.conf 
 +
 dhcp-option-force=26,1400 for my Neutron DHCP agent, or not (i.e. MTU =
 1500), the result is almost the same.

 I'll try VXLAN (or just VLANs) this weekend to see if I can get better
 results...

 Thanks!
 Thiago


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-25 Thread Martinx - ジェームズ

Here we go:

---
root@net-node-1:~# grep local_ip
/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
local_ip = 10.20.2.52

root@net-node-1:~# ip r | grep 10.\20
10.20.2.0/24 dev eth1  proto kernel  scope link  src 10.20.2.52
---

---
root@hypervisor-1:~# grep local_ip
/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
local_ip = 10.20.2.53

root@hypervisor-1:~# ip r | grep 10.\20
10.20.2.0/24 dev eth1  proto kernel  scope link  src 10.20.2.53
---

---
root@hypervisor-2:~# grep local_ip
/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
local_ip = 10.20.2.57

root@hypervisor-2:~# ip r | grep 10.\20
10.20.2.0/24 dev eth1  proto kernel  scope link  src 10.20.2.57
---

Each ovs-vsctl show:

net-node-1: http://paste.openstack.org/show/49727/

hypervisor-1: http://paste.openstack.org/show/49728/

hypervisor-2: http://paste.openstack.org/show/49729/


Best,
Thiago


On 25 October 2013 14:11, Darragh O'Reilly dara2002-openst...@yahoo.comwrote:


 the uneven ssh performance is strange - maybe learning on the tunnel mesh
 is not stablizing. It is easy to mess it up by giving a wrong local_ip in
 the ovs-plugin config file. Check the tunnels ports on br-tun with
 'ovs-vsctl show'. Is each one using the correct IPs? Br-tun should have N-1
 gre-x ports - no more! Maybe you can put 'ovs-vsctl show' from the nodes on
 paste.openstack if there are not to many?

 Re, Darragh.


   On Friday, 25 October 2013, 16:20, Martinx - ジェームズ 
 thiagocmarti...@gmail.com wrote:

 I think can say... YAY!!:-D

 With LibvirtOpenVswitchDriver my internal communication is the double
 now! It goes from ~200 (with LibvirtHybridOVSBridgeDriver) to 
 *400Mbit/s*(with LibvirtOpenVswitchDriver)! Still far from 1Gbit/s (my 
 physical path
 limit) but, more acceptable now.

 The command ethtool -K eth1 gro off still makes no difference.

 So, there is only 1 remain problem, when traffic pass trough L3 /
 Namespace, it is still useless. Even the SSH connection into my Instances,
 via its Floating IPs, is slow as hell, sometimes it just stops responding
 for a few seconds, and becomes online again out-of-nothing...

 I just detect a weird behavior, when I run apt-get update from
 instance-1, it is slow as I said plus, its ssh connection (where I'm
 running apt-get update), stops responding right after I run apt-get
 update AND, *all my others ssh connections also stops working too!* For
 a few seconds... This means that when I run apt-get update from within
 instance-1, the SSH session of instance-2 is affected too!! There is
 something pretty bad going on at L3 / Namespace.

 BTW, do you think that a ~400MBit/sec intra-vm-communication (GRE tunnel)
 on top of a 1Gbit ethernet is acceptable?! It is still less than a half...

 Thank you!
 Thiago

 On 25 October 2013 12:28, Darragh O'Reilly 
 dara2002-openst...@yahoo.comwrote:

 Hi Thiago,

 for the VIF error: you will need to change qemu.conf as described here:
 http://openvswitch.org/openstack/documentation/

 Re, Darragh.


   On Friday, 25 October 2013, 15:14, Martinx - ジェームズ 
 thiagocmarti...@gmail.com wrote:

 Hi Darragh,

 Yes, Instances are getting MTU 1400.

 I'm using LibvirtHybridOVSBridgeDriver at my Compute Nodes. I'll check BG
 1223267 right now!


 The LibvirtOpenVswitchDriver doesn't work, look:

 http://paste.openstack.org/show/49709/

 http://paste.openstack.org/show/49710/


 My NICs are RTL8111/8168/8411 PCI Express Gigabit Ethernet, Hypervisors
 motherboard are MSI-890FXA-GD70.

 The command ethtool -K eth1 gro off did not had any effect on the
 communication between instances on different hypervisors, still poor,
 around 248Mbit/sec, when its physical path reach 1Gbit/s (where GRE is
 built).

 My Linux version is Linux hypervisor-1 3.8.0-32-generic
 #47~precise1-Ubuntu, same kernel on Network Node and others nodes too
 (Ubuntu 12.04.3 installed from scratch for this Havana deployment).

 The only difference I can see right now, between my two hypervisors, is
 that my second is just a spare machine, with a slow CPU but, I don't think
 it will have a negative impact at the network throughput, since I have only
 1 Instance running into it (plus a qemu-nbd process eating 90% of its CPU).
 I'll replace this CPU tomorrow, to redo this tests again but, I don't think
 that this is the source of my problem. The MOBOs of two hypervisors
 are identical, 1 3Com (manageable) switch connecting the two.

 Thanks!
 Thiago


  On 25 October 2013 07:15, Darragh O'Reilly 
 dara2002-openst...@yahoo.comwrote:

 Hi Thiago,

 you have configured DHCP to push out a MTU of 1400. Can you confirm that
 the 1400 MTU is actually getting out to the instances by running 'ip link'
 on them?

 There is an open problem where the veth used to connect the OVS and Linux
 bridges causes a performance drop on some kernels -
 https://bugs.launchpad.net/nova-project/+bug/1223267 .  If you are using
 the LibvirtHybridOVSBridgeDriver VIF driver, can you try changing to
 LibvirtOpenVswitchDriver and

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-25 Thread Darragh O'Reilly



ok, the tunnels look fine. One thing that looks funny on the network node are 
these untagged tap* devices. I guess you switched to using veths and then  
switched back to not using them. I don't know if they matter, but you should 
clean them up by stopping everthing, running neutron-ovs-cleanup (check bridges 
empty) and reboot.

Bridge br-int Port tapa1376f61-05 Interface tapa1376f61-05 ...
Port qr-a1376f61-05
tag: 1
Interface qr-a1376f61-05
type: internal

Re, Darragh.




On Friday, 25 October 2013, 17:28, Martinx - ジェームズ thiagocmarti...@gmail.com 
wrote:
 
Here we go:


---
root@net-node-1:~# grep local_ip 
/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini 
local_ip = 10.20.2.52


root@net-node-1:~# ip r | grep 10.\20
10.20.2.0/24 dev eth1  proto kernel  scope link  src 10.20.2.52 
---


---
root@hypervisor-1:~# grep local_ip 
/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
local_ip = 10.20.2.53


root@hypervisor-1:~# ip r | grep 10.\20
10.20.2.0/24 dev eth1  proto kernel  scope link  src 10.20.2.53 
---


---
root@hypervisor-2:~# grep local_ip 
/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
local_ip = 10.20.2.57


root@hypervisor-2:~# ip r | grep 10.\20
10.20.2.0/24 dev eth1  proto kernel  scope link  src 10.20.2.57
---


Each ovs-vsctl show:


net-node-1: http://paste.openstack.org/show/49727/


hypervisor-1: http://paste.openstack.org/show/49728/


hypervisor-2: http://paste.openstack.org/show/49729/





Best,
Thiago



On 25 October 2013 14:11, Darragh O'Reilly dara2002-openst...@yahoo.com 
wrote:



the uneven ssh performance is strange - maybe learning on the tunnel mesh is 
not stablizing. It is easy to mess it up by giving a wrong local_ip in the 
ovs-plugin config file. Check the tunnels ports on br-tun with 'ovs-vsctl 
show'. Is each one using the correct IPs? Br-tun should have N-1 gre-x ports 
- no more! Maybe you can put 'ovs-vsctl show' from the nodes on 
paste.openstack if there are not to many?


Re, Darragh.




On Friday, 25 October 2013, 16:20, Martinx - ジェームズ 
thiagocmarti...@gmail.com wrote:
 
I think can say... YAY!!    :-D


With LibvirtOpenVswitchDriver my internal communication is the double now! 
It goes from ~200 (with LibvirtHybridOVSBridgeDriver) to 400Mbit/s (with 
LibvirtOpenVswitchDriver)! Still far from 1Gbit/s (my physical path limit) 
but, more acceptable now.


The command ethtool -K eth1 gro off still makes no difference.


So, there is only 1 remain problem, when traffic pass trough L3 / Namespace, 
it is still useless. Even the SSH connection into my Instances, via its 
Floating IPs, is slow as hell, sometimes it just stops responding for a few 
seconds, and becomes online again out-of-nothing...


I just detect a weird behavior, when I run apt-get update from 
instance-1, it is slow as I said plus, its ssh connection (where I'm running 
apt-get update), stops responding right after I run apt-get update AND, 
all my others ssh connections also stops working too! For a few seconds... 
This means that when I run apt-get update from within instance-1, the SSH 
session of instance-2 is affected too!! There is something pretty bad going 
on at L3 / Namespace.


BTW, do you think that a ~400MBit/sec intra-vm-communication (GRE tunnel) on 
top of a 1Gbit ethernet is acceptable?! It is still less than a half...


Thank you!
Thiago


On 25 October 2013 12:28, Darragh O'Reilly dara2002-openst...@yahoo.com 
wrote:

Hi Thiago,


for the VIF error: you will need to change qemu.conf as described here:
http://openvswitch.org/openstack/documentation/


Re, Darragh.




On Friday, 25 October 2013, 15:14, Martinx - ジェームズ 
thiagocmarti...@gmail.com wrote:
 
Hi Darragh,


Yes, Instances are getting MTU 1400.


I'm using LibvirtHybridOVSBridgeDriver at my Compute Nodes. I'll check BG 
1223267 right now! 




The LibvirtOpenVswitchDriver doesn't work, look:


http://paste.openstack.org/show/49709/



http://paste.openstack.org/show/49710/





My NICs are RTL8111/8168/8411 PCI Express Gigabit Ethernet, Hypervisors 
motherboard are MSI-890FXA-GD70.


The command ethtool -K eth1 gro off did not had any effect on the 
communication between instances on different hypervisors, still poor, 
around 248Mbit/sec, when its physical path reach 1Gbit/s (where GRE is 
built).


My Linux version is Linux hypervisor-1 3.8.0-32-generic 
#47~precise1-Ubuntu, same kernel on Network Node and others nodes too 
(Ubuntu 12.04.3 installed from scratch for this Havana deployment).


The only difference I can see right now, between my two hypervisors, is 
that my second is just a spare machine, with a slow CPU but, I don't think 
it will have a negative impact at the network throughput, since I have 
only 1 Instance running into it (plus a qemu-nbd process eating 90% of its 
CPU). I'll replace this CPU tomorrow, to redo this tests again but, I 
don't think that this is the source of my problem. The MOBOs of two 
hypervisors are identical, 1

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-25 Thread Martinx - ジェームズ

Okay, cool!

tap** removed, neutron-ovs-cleanup ok, bridges empty, all nodes rebooted.

BUT, still poor performance when reaching External network from within a
Instance (plus SSH lags)... [?]

I'll install a new Network Node, in another hardware, to test it more...
Weird thing is, my Grizzly Network Node works perfectly on this very same
hardware (same OpenStack Network topology, of course)...

Hardware of my current net-node-1:

* Grizzly - Okay
* Havana - Fails...   ;-(

Best,
Thiago


On 25 October 2013 15:28, Darragh O'Reilly dara2002-openst...@yahoo.comwrote:


 ok, the tunnels look fine. One thing that looks funny on the network node
 are these untagged tap* devices. I guess you switched to using veths and
 then  switched back to not using them. I don't know if they matter, but you
 should clean them up by stopping everthing, running neutron-ovs-cleanup
 (check bridges empty) and reboot.

 Bridge br-int
 Port tapa1376f61-05
 Interface tapa1376f61-05
 ...
 Port qr-a1376f61-05
 tag: 1
 Interface qr-a1376f61-05
 type: internal

 Re, Darragh.



   On Friday, 25 October 2013, 17:28, Martinx - ジェームズ 
 thiagocmarti...@gmail.com wrote:

 Here we go:

 ---
 root@net-node-1:~# grep local_ip
 /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
 local_ip = 10.20.2.52

 root@net-node-1:~# ip r | grep 10.\20
 10.20.2.0/24 dev eth1  proto kernel  scope link  src 10.20.2.52
 ---

 ---
 root@hypervisor-1:~# grep local_ip
 /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
 local_ip = 10.20.2.53

 root@hypervisor-1:~# ip r | grep 10.\20
 10.20.2.0/24 dev eth1  proto kernel  scope link  src 10.20.2.53
 ---

 ---
 root@hypervisor-2:~# grep local_ip
 /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
 local_ip = 10.20.2.57

 root@hypervisor-2:~# ip r | grep 10.\20
 10.20.2.0/24 dev eth1  proto kernel  scope link  src 10.20.2.57
 ---

 Each ovs-vsctl show:

 net-node-1: http://paste.openstack.org/show/49727/

 hypervisor-1: http://paste.openstack.org/show/49728/

 hypervisor-2: http://paste.openstack.org/show/49729/


 Best,
 Thiago


 On 25 October 2013 14:11, Darragh O'Reilly 
 dara2002-openst...@yahoo.comwrote:


 the uneven ssh performance is strange - maybe learning on the tunnel mesh
 is not stablizing. It is easy to mess it up by giving a wrong local_ip in
 the ovs-plugin config file. Check the tunnels ports on br-tun with
 'ovs-vsctl show'. Is each one using the correct IPs? Br-tun should have N-1
 gre-x ports - no more! Maybe you can put 'ovs-vsctl show' from the nodes on
 paste.openstack if there are not to many?

 Re, Darragh.


   On Friday, 25 October 2013, 16:20, Martinx - ジェームズ 
 thiagocmarti...@gmail.com wrote:

 I think can say... YAY!!:-D

 With LibvirtOpenVswitchDriver my internal communication is the double
 now! It goes from ~200 (with LibvirtHybridOVSBridgeDriver) to 
 *400Mbit/s*(with LibvirtOpenVswitchDriver)! Still far from 1Gbit/s (my 
 physical path
 limit) but, more acceptable now.

 The command ethtool -K eth1 gro off still makes no difference.

 So, there is only 1 remain problem, when traffic pass trough L3 /
 Namespace, it is still useless. Even the SSH connection into my Instances,
 via its Floating IPs, is slow as hell, sometimes it just stops responding
 for a few seconds, and becomes online again out-of-nothing...

 I just detect a weird behavior, when I run apt-get update from
 instance-1, it is slow as I said plus, its ssh connection (where I'm
 running apt-get update), stops responding right after I run apt-get
 update AND, *all my others ssh connections also stops working too!* For
 a few seconds... This means that when I run apt-get update from within
 instance-1, the SSH session of instance-2 is affected too!! There is
 something pretty bad going on at L3 / Namespace.

 BTW, do you think that a ~400MBit/sec intra-vm-communication (GRE tunnel)
 on top of a 1Gbit ethernet is acceptable?! It is still less than a half...

 Thank you!
 Thiago

 On 25 October 2013 12:28, Darragh O'Reilly 
 dara2002-openst...@yahoo.comwrote:

 Hi Thiago,

 for the VIF error: you will need to change qemu.conf as described here:
 http://openvswitch.org/openstack/documentation/

 Re, Darragh.


   On Friday, 25 October 2013, 15:14, Martinx - ジェームズ 
 thiagocmarti...@gmail.com wrote:

 Hi Darragh,

 Yes, Instances are getting MTU 1400.

 I'm using LibvirtHybridOVSBridgeDriver at my Compute Nodes. I'll check BG
 1223267 right now!


 The LibvirtOpenVswitchDriver doesn't work, look:

 http://paste.openstack.org/show/49709/

 http://paste.openstack.org/show/49710/


 My NICs are RTL8111/8168/8411 PCI Express Gigabit Ethernet, Hypervisors
 motherboard are MSI-890FXA-GD70.

 The command ethtool -K eth1 gro off did not had any effect on the
 communication between instances on different hypervisors, still poor,
 around 248Mbit/sec, when its physical path reach 1Gbit/s (where GRE is
 built).

 My Linux version is

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-25 Thread Martinx - ジェームズ

Hi Rick,

On 25 October 2013 13:44, Rick Jones rick.jon...@hp.com wrote:

 On 10/25/2013 08:19 AM, Martinx - ジェームズ wrote:

 I think can say... YAY!!:-D

 With LibvirtOpenVswitchDriver my internal communication is the double
 now! It goes from ~200 (with LibvirtHybridOVSBridgeDriver) to
 *_400Mbit/s_* (with LibvirtOpenVswitchDriver)! Still far from 1Gbit/s

 (my physical path limit) but, more acceptable now.

 The command ethtool -K eth1 gro off still makes no difference.


 Does GRO happen if there isn't RX CKO on the NIC?



Ouch! I missed that lesson...  hehe

No idea, how can I check / test this?

If I disable RX CKO (using ethtool?) on the NIC, how can I verify if the
GRO is actually happening or not?

Anyway, I'm goggling about all this stuff right now. Thanks for pointing it
out!

Refs:

* JLS2009: Generic receive offload - http://lwn.net/Articles/358910/


Can your NIC peer-into a GRE tunnel (?) to do CKO on the encapsulated
 traffic?



Again, no idea... No idea...   :-/

Listen, maybe this sounds too dumb from my part but, it is the first time
I'm talking about this stuff (like NIC peer-into GRE ?, or GRO / CKO...

GRE tunnels sounds too damn complex and problematic... I guess it is time
to try VXLAN (or NVP ?)...

If you guys say: VXLAN is a completely different beast (i.e. it does not
touch with ANY GRE tunnel), and it works smoothly (without GRO / CKO / MTU
/ lags / low speed troubles and issues), I'll move to it right now (is
VXLAN docs ready?).

NOTE: I don't want to hijack this thread because of other (internal
communication VS Directional network performance issues with Neutron +
OpenvSwitch thread subject) problems with my OpenStack environment,
please, let me know if this becomes a problem for you guys.



  So, there is only 1 remain problem, when traffic pass trough L3 /
 Namespace, it is still useless. Even the SSH connection into my
 Instances, via its Floating IPs, is slow as hell, sometimes it just
 stops responding for a few seconds, and becomes online again
 out-of-nothing...

 I just detect a weird behavior, when I run apt-get update from
 instance-1, it is slow as I said plus, its ssh connection (where I'm
 running apt-get update), stops responding right after I run apt-get
 update AND, _all my others ssh connections also stops working too!_ For

 a few seconds... This means that when I run apt-get update from within
 instance-1, the SSH session of instance-2 is affected too!! There is
 something pretty bad going on at L3 / Namespace.

 BTW, do you think that a ~400MBit/sec intra-vm-communication (GRE
 tunnel) on top of a 1Gbit ethernet is acceptable?! It is still less than
 a half...


 I would suggest checking for individual CPUs maxing-out during the 400
 Mbit/s transfers.


Okay, I'll.




 rick jones


Thiago
___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-25 Thread Xu (Simon) Chen

You can use ethtool -k eth0 to view the setting and use ethtool -K eth0
gro off to turn off GRO.


On Fri, Oct 25, 2013 at 3:03 PM, Martinx - ジェームズ
thiagocmarti...@gmail.comwrote:

 Hi Rick,

 On 25 October 2013 13:44, Rick Jones rick.jon...@hp.com wrote:

 On 10/25/2013 08:19 AM, Martinx - ジェームズ wrote:

 I think can say... YAY!!:-D

 With LibvirtOpenVswitchDriver my internal communication is the double
 now! It goes from ~200 (with LibvirtHybridOVSBridgeDriver) to
 *_400Mbit/s_* (with LibvirtOpenVswitchDriver)! Still far from 1Gbit/s

 (my physical path limit) but, more acceptable now.

 The command ethtool -K eth1 gro off still makes no difference.


 Does GRO happen if there isn't RX CKO on the NIC?



 Ouch! I missed that lesson...  hehe

 No idea, how can I check / test this?

 If I disable RX CKO (using ethtool?) on the NIC, how can I verify if the
 GRO is actually happening or not?

 Anyway, I'm goggling about all this stuff right now. Thanks for pointing
 it out!

 Refs:

 * JLS2009: Generic receive offload - http://lwn.net/Articles/358910/


 Can your NIC peer-into a GRE tunnel (?) to do CKO on the encapsulated
 traffic?



 Again, no idea... No idea...   :-/

 Listen, maybe this sounds too dumb from my part but, it is the first time
 I'm talking about this stuff (like NIC peer-into GRE ?, or GRO / CKO...

 GRE tunnels sounds too damn complex and problematic... I guess it is time
 to try VXLAN (or NVP ?)...

 If you guys say: VXLAN is a completely different beast (i.e. it does not
 touch with ANY GRE tunnel), and it works smoothly (without GRO / CKO / MTU
 / lags / low speed troubles and issues), I'll move to it right now (is
 VXLAN docs ready?).

 NOTE: I don't want to hijack this thread because of other (internal
 communication VS Directional network performance issues with Neutron +
 OpenvSwitch thread subject) problems with my OpenStack environment,
 please, let me know if this becomes a problem for you guys.



  So, there is only 1 remain problem, when traffic pass trough L3 /
 Namespace, it is still useless. Even the SSH connection into my
 Instances, via its Floating IPs, is slow as hell, sometimes it just
 stops responding for a few seconds, and becomes online again
 out-of-nothing...

 I just detect a weird behavior, when I run apt-get update from
 instance-1, it is slow as I said plus, its ssh connection (where I'm
 running apt-get update), stops responding right after I run apt-get
 update AND, _all my others ssh connections also stops working too!_ For

 a few seconds... This means that when I run apt-get update from within
 instance-1, the SSH session of instance-2 is affected too!! There is
 something pretty bad going on at L3 / Namespace.

 BTW, do you think that a ~400MBit/sec intra-vm-communication (GRE
 tunnel) on top of a 1Gbit ethernet is acceptable?! It is still less than
 a half...


 I would suggest checking for individual CPUs maxing-out during the 400
 Mbit/s transfers.


 Okay, I'll.




 rick jones


 Thiago

 ___
 Mailing list:
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
 Post to : openstack@lists.openstack.org
 Unsubscribe :
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-25 Thread Rick Jones

 Listen, maybe this sounds too dumb from my part but, it is the first 
 time I'm talking about this stuff (like NIC peer-into GRE ?, or GRO
 / CKO...

No worries.

So, a slightly brief history of stateless offloads in NICs.  It may be
too basic, and I may get some details wrong, but it should give the gist.

Go back to the old days - 10 Mbit/s Ethernet was it (all you Token
Ring fans can keep quiet :).   Systems got faster than 10 Mbit/s.  By a
fair margin.  100 BT came out and it wasn't all that long before systems
were faster than that, but things like interrupt rates were starting to
get to be an issue for performance, so 100 BT NICs started implementing
interrupt avoidance heuristics.   The next bump in network speed to 1000
Mbit/s managed to get well out ahead of the systems.  All this time,
while the link speeds were increasing, the IEEE was doing little to
nothing to make sending and receiving Ethernet traffic any easier on the
end stations (eg increasing the MTU).  It was taking just as many CPU
cycles to send/receive a frame over 1000BT as it did over 100BT as it
did over 10BT.

insert segque about how FDDI was doing things to make life easier, as
well as what the FDDI NIC vendors were doing to enable copy-free
networking, here

So the Ethernet NIC vendors started getting creative and started
borrowing some techniques from FDDI.  The base of it all is CKO -
ChecKsum Offload. Offloading the checksum calculation for the TCP and
UDP checksums. In broad handwaving terms, for inbound packets, the NIC
is made either smart enough to recognize an incoming frame as TCP
segment (UDP datagram) or it performs the Internet Checksum across the
entire frame and leaves it to the driver to fixup.  For outbound
traffic, the stack, via the driver, tells the NIC a starting value
(perhaps), where to start computing the checksum, how far to go, and
where to stick it...

So, we can save the CPU cycles used calculating/verifying the checksums.
 In rough terms, in the presence of copies, that is perhaps 10% or 15%
savings.  Systems still needed more.  It was just as many trips up and
down the protocol stack in the host to send a MB of data as it was
before - the IEEE hanging-on to the 1500 byte MTU.  So, some NIC vendors
came-up with Jumbo Frames - I think the first may have been Alteon and
their AceNICs and switches.   A 9000 byte MTU allows one to send bulk
data across the network in ~1/6 the number of trips up and down the
protocol stack.   But that has problems - in particular you have to have
support for Jumbo Frames from end to end.

So someone, I don't recall who, had the flash of inspiration - What
If...  the NIC could perform the TCP segmentation on behalf of the
stack?  When sending a big chunk of data over TCP in one direction, the
only things which change from TCP segment to TCP segment are the
sequence number, and the checksum insert some handwaving about the IP
datagram ID here.  The NIC already knows how to compute the checksum,
so let's teach it how to very simply increment the TCP sequence number.
 Now we can give it A Lot of Data (tm) in one trip down the protocol
stack and save even more CPU cycles than Jumbo Frames.  Now the NIC has
to know a little bit more about the traffic - it has to know that it is
TCP so it can know where the TCP sequence number goes.  We also tell it
the MSS to use when it is doing the segmentation on our behalf.  Thus
was born TCP Segmentation Offload, aka TSO or Poor Man's Jumbo Frames

That works pretty well for servers at the time - they tend to send more
data than they receive.  The clients receiving the data don't need to be
able to keep up at 1000 Mbit/s and the server can be sending to multiple
clients.  However, we get another order of magnitude bump in link
speeds, to 1 Mbit/s.  Now  people need/want to receive at the higher
speeds too.  So some 10 Gbit/s NIC vendors come up with the mirror image
of TSO and call it LRO - Large Receive Offload.   The LRO NIC will
coalesce several, consequtive TCP segments into one uber segment and
hand that to the host. There are some issues with LRO though - for
example when a system is acting as a router, so in Linux, and perhaps
other stacks, LRO is taken out of the hands of the NIC and given to the
stack in the form of 'GRO - Generic Receive Offload.  GRO operates
above the NIC/driver, but below IP.   It detects the consecutive
segments and coalesces them before passing them further up the stack. It
becomes possible to receive data at link-rate over 10 GbE.  All is
happiness and joy.

OK, so now we have all these stateless offloads that know about the
basic traffic flow.  They are all built on the foundation of CKO.  They
are all dealing with *un* encapsulated traffic.  (They also don't to
anything for small packets.)

Now, toss-in some encapsulation.  Take your pick, in the abstract it
doesn't really matter which I suspect, at least for a little longer.
What is arriving at the NIC on inbound is no longer a TCP segment in an
IP

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-25 Thread Martinx - ジェームズ

WOW!! Thank you for your time Rick! Awesome answer!!=D

I'll do this tests (with ethtool GRO / CKO) tonight but, do you think that
this is the main root of the problem?!


I mean, I'm seeing two distinct problems here:

1- Slow connectivity to the External network plus SSH lags all over the
cloud (everything that pass trough L3 / Namespace is problematic), and;

2- Communication between two Instances on different hypervisors (i.e. maybe
it is related to this GRO / CKO thing).


So, two different problems, right?!

Thanks!
Thiago


On 25 October 2013 18:56, Rick Jones rick.jon...@hp.com wrote:

  Listen, maybe this sounds too dumb from my part but, it is the first
  time I'm talking about this stuff (like NIC peer-into GRE ?, or GRO
  / CKO...

 No worries.

 So, a slightly brief history of stateless offloads in NICs.  It may be
 too basic, and I may get some details wrong, but it should give the gist.

 Go back to the old days - 10 Mbit/s Ethernet was it (all you Token
 Ring fans can keep quiet :).   Systems got faster than 10 Mbit/s.  By a
 fair margin.  100 BT came out and it wasn't all that long before systems
 were faster than that, but things like interrupt rates were starting to
 get to be an issue for performance, so 100 BT NICs started implementing
 interrupt avoidance heuristics.   The next bump in network speed to 1000
 Mbit/s managed to get well out ahead of the systems.  All this time,
 while the link speeds were increasing, the IEEE was doing little to
 nothing to make sending and receiving Ethernet traffic any easier on the
 end stations (eg increasing the MTU).  It was taking just as many CPU
 cycles to send/receive a frame over 1000BT as it did over 100BT as it
 did over 10BT.

 insert segque about how FDDI was doing things to make life easier, as
 well as what the FDDI NIC vendors were doing to enable copy-free
 networking, here

 So the Ethernet NIC vendors started getting creative and started
 borrowing some techniques from FDDI.  The base of it all is CKO -
 ChecKsum Offload. Offloading the checksum calculation for the TCP and
 UDP checksums. In broad handwaving terms, for inbound packets, the NIC
 is made either smart enough to recognize an incoming frame as TCP
 segment (UDP datagram) or it performs the Internet Checksum across the
 entire frame and leaves it to the driver to fixup.  For outbound
 traffic, the stack, via the driver, tells the NIC a starting value
 (perhaps), where to start computing the checksum, how far to go, and
 where to stick it...

 So, we can save the CPU cycles used calculating/verifying the checksums.
  In rough terms, in the presence of copies, that is perhaps 10% or 15%
 savings.  Systems still needed more.  It was just as many trips up and
 down the protocol stack in the host to send a MB of data as it was
 before - the IEEE hanging-on to the 1500 byte MTU.  So, some NIC vendors
 came-up with Jumbo Frames - I think the first may have been Alteon and
 their AceNICs and switches.   A 9000 byte MTU allows one to send bulk
 data across the network in ~1/6 the number of trips up and down the
 protocol stack.   But that has problems - in particular you have to have
 support for Jumbo Frames from end to end.

 So someone, I don't recall who, had the flash of inspiration - What
 If...  the NIC could perform the TCP segmentation on behalf of the
 stack?  When sending a big chunk of data over TCP in one direction, the
 only things which change from TCP segment to TCP segment are the
 sequence number, and the checksum insert some handwaving about the IP
 datagram ID here.  The NIC already knows how to compute the checksum,
 so let's teach it how to very simply increment the TCP sequence number.
  Now we can give it A Lot of Data (tm) in one trip down the protocol
 stack and save even more CPU cycles than Jumbo Frames.  Now the NIC has
 to know a little bit more about the traffic - it has to know that it is
 TCP so it can know where the TCP sequence number goes.  We also tell it
 the MSS to use when it is doing the segmentation on our behalf.  Thus
 was born TCP Segmentation Offload, aka TSO or Poor Man's Jumbo Frames

 That works pretty well for servers at the time - they tend to send more
 data than they receive.  The clients receiving the data don't need to be
 able to keep up at 1000 Mbit/s and the server can be sending to multiple
 clients.  However, we get another order of magnitude bump in link
 speeds, to 1 Mbit/s.  Now  people need/want to receive at the higher
 speeds too.  So some 10 Gbit/s NIC vendors come up with the mirror image
 of TSO and call it LRO - Large Receive Offload.   The LRO NIC will
 coalesce several, consequtive TCP segments into one uber segment and
 hand that to the host. There are some issues with LRO though - for
 example when a system is acting as a router, so in Linux, and perhaps
 other stacks, LRO is taken out of the hands of the NIC and given to the
 stack in the form of 'GRO - Generic Receive Offload.  GRO operates
 above

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-25 Thread Martinx - ジェームズ

LOL... One day, Internet via Quantum Entanglement! Oops, Neutron! =P

I'll ignore the problems related to the performance between two instances
on different hypervisors for now. My priority is the connectivity issue
with the External networks... At least, internal is slow but it works.

I'm about to remove the L3 Agent / Namespaces entirely from my topology...
It is a shame because it is pretty cool! With Grizzly I had no problems at
all. Plus, I need to put Havana into production ASAP!:-/

Why I'm giving it up (of L3 / NS) for now? Because I tried:

The option tenant_network_type with gre, vxlan and vlan (range
physnet1:206:256 configured at the 3Com switch as tagged).

From the instances, the connection with External network *is always slow*,
no matter if I choose for Tenants, GRE, VXLAN or VLAN.

For example, right now, I'm using VLAN, same problem.

Don't you guys think that this can be a problem with the bridge br-ex and
its internals ? Since I swapped the Tenant Network Type 3 times, same
result... But I still did not removed the br-ex from the scene.

If someone wants to debug it, I can give the root password, no problem, it
is just a lab...   =)

Thanks!
Thiago

On 25 October 2013 19:45, Rick Jones rick.jon...@hp.com wrote:

 On 10/25/2013 02:37 PM, Martinx - ジェームズ wrote:

 WOW!! Thank you for your time Rick! Awesome answer!!=D

 I'll do this tests (with ethtool GRO / CKO) tonight but, do you think
 that this is the main root of the problem?!


 I mean, I'm seeing two distinct problems here:

 1- Slow connectivity to the External network plus SSH lags all over the
 cloud (everything that pass trough L3 / Namespace is problematic), and;

 2- Communication between two Instances on different hypervisors (i.e.
 maybe it is related to this GRO / CKO thing).


 So, two different problems, right?!


 One or two problems I cannot say.Certainly if one got the benefit of
 stateless offloads in one direction and not the other, one could see
 different performance limits in each direction.

 All I can really say is I liked it better when we were called Quantum,
 because then I could refer to it as Spooky networking at a distance.
  Sadly, describing Neutron as Networking with no inherent charge doesn't
 work as well :)

 rick jones


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-25 Thread Martinx - ジェームズ

I was able to enable ovs_use_veth and start Instances (VXLAN / DHCP /
Metadata Okay)... But, same problem when accessing External network.

BTW, I have valid Floating IPs and easy access to the Internet from the
Network Node, if someone wants to debug, just ping a message.


On 26 October 2013 02:25, Martinx - ジェームズ thiagocmarti...@gmail.com wrote:

 LOL... One day, Internet via Quantum Entanglement! Oops, Neutron! =P

 I'll ignore the problems related to the performance between two instances
 on different hypervisors for now. My priority is the connectivity issue
 with the External networks... At least, internal is slow but it works.

 I'm about to remove the L3 Agent / Namespaces entirely from my topology...
 It is a shame because it is pretty cool! With Grizzly I had no problems at
 all. Plus, I need to put Havana into production ASAP!:-/

 Why I'm giving it up (of L3 / NS) for now? Because I tried:

 The option tenant_network_type with gre, vxlan and vlan (range
 physnet1:206:256 configured at the 3Com switch as tagged).

 From the instances, the connection with External network *is always slow*,
 no matter if I choose for Tenants, GRE, VXLAN or VLAN.

 For example, right now, I'm using VLAN, same problem.

 Don't you guys think that this can be a problem with the bridge br-ex
 and its internals ? Since I swapped the Tenant Network Type 3 times, same
 result... But I still did not removed the br-ex from the scene.

 If someone wants to debug it, I can give the root password, no problem, it
 is just a lab...   =)

 Thanks!
 Thiago

 On 25 October 2013 19:45, Rick Jones rick.jon...@hp.com wrote:

 On 10/25/2013 02:37 PM, Martinx - ジェームズ wrote:

 WOW!! Thank you for your time Rick! Awesome answer!!=D

 I'll do this tests (with ethtool GRO / CKO) tonight but, do you think
 that this is the main root of the problem?!


 I mean, I'm seeing two distinct problems here:

 1- Slow connectivity to the External network plus SSH lags all over the
 cloud (everything that pass trough L3 / Namespace is problematic), and;

 2- Communication between two Instances on different hypervisors (i.e.
 maybe it is related to this GRO / CKO thing).


 So, two different problems, right?!


 One or two problems I cannot say.Certainly if one got the benefit of
 stateless offloads in one direction and not the other, one could see
 different performance limits in each direction.

 All I can really say is I liked it better when we were called Quantum,
 because then I could refer to it as Spooky networking at a distance.
  Sadly, describing Neutron as Networking with no inherent charge doesn't
 work as well :)

 rick jones



___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-24 Thread Robert Collins

Ok so that says that PMTUd is failing, probably due to a
bug/limitation in openvswitch. Can we please make sure a bug is filed
- both on Neutron and on the upstream component as soon as someone
tracks it down : Manual MTU lowering is only needed when a network
component is failing to report failed delivery of DF packets
correctly.

-Rob

On 25 October 2013 08:38, Speichert,Daniel djs...@drexel.edu wrote:
 We managed to bring the upload speed back to maximum on the instances
 through the use of this guide:

 http://docs.openstack.org/trunk/openstack-network/admin/content/openvswitch_plugin.html



 Basically, the MTU needs to be lowered for GRE tunnels. It can be done with
 DHCP as explained in the new trunk manual.



 Regards,

 Daniel



 From: annegen...@justwriteclick.com [mailto:annegen...@justwriteclick.com]
 On Behalf Of Anne Gentle
 Sent: Thursday, October 24, 2013 12:08 PM
 To: Martinx - ジェームズ
 Cc: Speichert,Daniel; openstack@lists.openstack.org


 Subject: Re: [Openstack] Directional network performance issues with Neutron
 + OpenvSwitch







 On Thu, Oct 24, 2013 at 10:37 AM, Martinx - ジェームズ
 thiagocmarti...@gmail.com wrote:

 Precisely!



 The doc currently says to disable Namespace when using GRE, never did this
 before, look:



 http://docs.openstack.org/trunk/install-guide/install/apt/content/install-neutron.install-plugin.ovs.gre.html



 But on this very same doc, they say to enable it... Who knows?!   =P



 http://docs.openstack.org/trunk/install-guide/install/apt/content/section_networking-routers-with-private-networks.html



 I stick with Namespace enabled...





 Just a reminder, /trunk/ links are works in progress, thanks for bringing
 the mismatch to our attention, and we already have a doc bug filed:



 https://bugs.launchpad.net/openstack-manuals/+bug/1241056



 Review this patch: https://review.openstack.org/#/c/53380/



 Anne







 Let me ask you something, when you enable ovs_use_veth, que Metadata and
 DHCP still works?!



 Cheers!

 Thiago



 On 24 October 2013 12:22, Speichert,Daniel djs...@drexel.edu wrote:

 Hello everyone,



 It seems we also ran into the same issue.



 We are running Ubuntu Saucy with OpenStack Havana from Ubuntu Cloud archives
 (precise-updates).



 The download speed to the VMs increased from 5 Mbps to maximum after
 enabling ovs_use_veth. Upload speed from the VMs is still terrible (max 1
 Mbps, usually 0.04 Mbps).



 Here is the iperf between the instance and L3 agent (network node) inside
 namespace.



 root@cloud:~# ip netns exec qrouter-a29e0200-d390-40d1-8cf7-7ac1cef5863a
 iperf -c 10.1.0.24 -r

 

 Server listening on TCP port 5001

 TCP window size: 85.3 KByte (default)

 

 

 Client connecting to 10.1.0.24, TCP port 5001

 TCP window size:  585 KByte (default)

 

 [  7] local 10.1.0.1 port 37520 connected with 10.1.0.24 port 5001

 [ ID] Interval   Transfer Bandwidth

 [  7]  0.0-10.0 sec   845 MBytes   708 Mbits/sec

 [  6] local 10.1.0.1 port 5001 connected with 10.1.0.24 port 53006

 [  6]  0.0-31.4 sec   256 KBytes  66.7 Kbits/sec



 We are using Neutron OpenVSwitch with GRE and namespaces.


 A side question: the documentation says to disable namespaces with GRE and
 enable them with VLANs. It was always working well for us on Grizzly with
 GRE and namespaces and we could never get it to work without namespaces. Is
 there any specific reason why the documentation is advising to disable it?



 Regards,

 Daniel



 From: Martinx - ジェームズ [mailto:thiagocmarti...@gmail.com]
 Sent: Thursday, October 24, 2013 3:58 AM
 To: Aaron Rosen
 Cc: openstack@lists.openstack.org


 Subject: Re: [Openstack] Directional network performance issues with Neutron
 + OpenvSwitch



 Hi Aaron,



 Thanks for answering! =)



 Lets work...



 ---



 TEST #1 - iperf between Network Node and its Uplink router (Data Center's
 gateway Internet) - OVS br-ex / eth2



 # Tenant Namespace route table



 root@net-node-1:~# ip netns exec
 qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 ip route

 default via 172.16.0.1 dev qg-50b615b7-c2

 172.16.0.0/20 dev qg-50b615b7-c2  proto kernel  scope link  src 172.16.0.2

 192.168.210.0/24 dev qr-a1376f61-05  proto kernel  scope link  src
 192.168.210.1



 # there is a iperf -s running at 172.16.0.1 Internet, testing it



 root@net-node-1:~# ip netns exec
 qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -c 172.16.0.1

 

 Client connecting to 172.16.0.1, TCP port 5001

 TCP window size: 22.9 KByte (default)

 

 [  5] local 172.16.0.2 port 58342 connected with 172.16.0.1 port 5001

 [ ID] Interval   Transfer Bandwidth

 [  5]  0.0

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-22 Thread Martinx - ジェームズ

James,

I think I'm hitting this problem.

I'm using Per-Tenant Routers with Private Networks, GRE tunnels and
L3+DHCP Network Node.

The connectivity from behind my Instances is very slow. It takes an
eternity to finish apt-get update.

If I run apt-get update from within tenant's Namespace, it goes fine.

If I enable ovs_use_veth, Metadata (and/or DHCP) stops working and I and
unable to start new Ubuntu Instances and login into them... Look:

--
cloud-init start running: Tue, 22 Oct 2013 05:57:39 +. up 4.01 seconds
2013-10-22 06:01:42,989 - util.py[WARNING]: '
http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [3/120s]:
url error [[Errno 113] No route to host]
2013-10-22 06:01:45,988 - util.py[WARNING]: '
http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [6/120s]:
url error [[Errno 113] No route to host]
--

Is this problem still around?!

Should I stay away from GRE tunnels when with Havana + Ubuntu 12.04.3?

Is it possible to re-enable Metadata when ovs_use_veth = true ?

Thanks!
Thiago


On 3 October 2013 06:27, James Page james.p...@ubuntu.com wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 On 02/10/13 22:49, James Page wrote:
  sudo ip netns exec qrouter-d3baf1b1-55ee-42cb-a3f6-9629288e3221
  traceroute -n 10.5.0.2 -p 4 --mtu traceroute to 10.5.0.2
  (10.5.0.2), 30 hops max, 65000 byte packets 1  10.5.0.2  0.950
  ms F=1500  0.598 ms  0.566 ms
 
  The PMTU from the l3 gateway to the instance looks OK to me.
  I spent a bit more time debugging this; performance from within
  the router netns on the L3 gateway node looks good in both
  directions when accessing via the tenant network (10.5.0.2) over
  the qr-X interface, but when accessing through the external
  network from within the netns I see the same performance choke
  upstream into the tenant network.
 
  Which would indicate that my problem lies somewhere around the
  qg-X interface in the router netns - just trying to figure out
  exactly what - maybe iptables is doing something wonky?

 OK - I found a fix but I'm not sure why this makes a difference;
 neither my l3-agent or dhcp-agent configuration had 'ovs_use_veth =
 True'; I switched this on, clearing everything down, rebooted and now
 I seem symmetric good performance across all neutron routers.

 This would point to some sort of underlying bug when ovs_use_veth = False.


 - --
 James Page
 Ubuntu and Debian Developer
 james.p...@ubuntu.com
 jamesp...@debian.org
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.14 (GNU/Linux)
 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

 iQIcBAEBCAAGBQJSTTh6AAoJEL/srsug59jDmpEP/jaB5/yn9+Xm12XrVu0Q3IV5
 fLGOuBboUgykVVsfkWccI/oygNlBaXIcDuak/E4jxPcoRhLAdY1zpX8MQ8wSsGKd
 CjSeuW8xxnXubdfzmsCKSs3FCIBhDkSYzyiJd/raLvCfflyy8Cl7KN2x22mGHJ6z
 qZ9APcYfm9qCVbEssA3BHcUL+st1iqMJ0YhVZBk03+QEXaWu3FFbjpjwx3X1ZvV5
 Vbac7enqy7Lr4DSAIJVldeVuRURfv3YE3iJZTIXjaoUCCVTQLm5OmP9TrwBNHLsA
 7W+LceQri+Vh0s4dHPKx5MiHsV3RCydcXkSQFYhx7390CXypMQ6WwXEY/a8Egssg
 SuxXByHwEcQFa+9sCwPQ+RXCmC0O6kUi8EPmwadjI5Gc1LoKw5Wov/SEen86fDUW
 P9pRXonseYyWN9I4MT4aG1ez8Dqq/SiZyWBHtcITxKI2smD92G9CwWGo4L9oGqJJ
 UcHRwQaTHgzy3yETPO25hjax8ZWZGNccHBixMCZKegr9p2dhR+7qF8G7mRtRQLxL
 0fgOAExn/SX59ZT4RaYi9fI6Gng13RtSyI87CJC/50vfTmqoraUUK1aoSjIY4Dt+
 DYEMMLp205uLEj2IyaNTzykR0yh3t6dvfpCCcRA/xPT9slfa0a7P8LafyiWa4/5c
 jkJM4Y1BUV+2L5Rrf3sc
 =4lO4
 -END PGP SIGNATURE-

 ___
 Mailing list:
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
 Post to : openstack@lists.openstack.org
 Unsubscribe :
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-03 Thread James Page

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 03/10/13 04:43, Martinx - ジェームズ wrote:
 Mmm... I am unable to compile openvswitch-datapath-dkms from
 Havana Ubuntu Cloud Archive (on top of a fresh install of Ubuntu
 12.04.3), look:

There is a bug in that version; I'm deploying from
ppa:ubuntu-cloud-archive/havana-staging which has a version that does
work - we are testing everything prior to push through to proposed and
updates for rc1 (i.e. this week).


- -- 
James Page
Ubuntu and Debian Developer
james.p...@ubuntu.com
jamesp...@debian.org
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJSTQh2AAoJEL/srsug59jDYvgQAIFpc/NTKGHBUSCRX3JiRVru
iBK2EuPZeNhh9Y4oXO14/zhDNp4/vnDQcJMNAZskUxuA5HcAnLp9oZbleKqG/r7W
w0s9fpkPzzYabaKR431QzJhm+3NIuMqtSgNy0ZX7zO9om3vkSAtLLTUlyYIHxTj3
owPpndN527XUuYalwFF7ffdZK0oIOX65XEUehmX1SPEeOGNhrWjnLH8rcr5XcCbL
VaGPMcqkJLjW+aKTjr4Xi0R6geQ+BjM7g+FNtu7BR4V+laxLyKz9f+WPdrdfcFQP
PLt6gBG6/OVzmZD8Fxs2iD0ox/KaC7gfhxF7ffF1aFwZIhzMZhUYtmCxNSPx80lG
FXOG9R54kDzvPzPNdZLS+dYUcuSBjFLw3Wjrplxzlok+cLjlqjfoABHXlhFjfcuM
Qr5QeUnJc9at+2p8JBjBRK1uxLgV2G+R7umIcjS9SIiD0kK9mKHGDbdKHJ4pvto8
sMAtIDAYMT+hEPWZ7i7x3lqbd/G2ipwKi2exgKy2VVfxB11qTY07boqNztd905NG
iOpusyvFqouHZZJ4SC5OziTTa3rcy2nhta2uYT946aS22z3BxESePlzi/PCJ5faU
h6HA7qIZyr4aUH75I/FBBmDasFrSKA7xJUYXPHa5wV1pnBvSs6QA14P0q43OsmwX
OQyC1OFfgRfE49kX14QZ
=TjDN
-END PGP SIGNATURE-

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-03 Thread Martinx - ジェームズ

Cool! The `ppa:ubuntu-cloud-archive/havana-staging' is the repository I was
looking for. It works now... Thanks!

On 3 October 2013 03:02, James Page james.p...@ubuntu.com wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 On 03/10/13 04:43, Martinx - ジェームズ wrote:
  Mmm... I am unable to compile openvswitch-datapath-dkms from
  Havana Ubuntu Cloud Archive (on top of a fresh install of Ubuntu
  12.04.3), look:

 There is a bug in that version; I'm deploying from
 ppa:ubuntu-cloud-archive/havana-staging which has a version that does
 work - we are testing everything prior to push through to proposed and
 updates for rc1 (i.e. this week).


 - --
 James Page
 Ubuntu and Debian Developer
 james.p...@ubuntu.com
 jamesp...@debian.org
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.14 (GNU/Linux)
 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

 iQIcBAEBCAAGBQJSTQh2AAoJEL/srsug59jDYvgQAIFpc/NTKGHBUSCRX3JiRVru
 iBK2EuPZeNhh9Y4oXO14/zhDNp4/vnDQcJMNAZskUxuA5HcAnLp9oZbleKqG/r7W
 w0s9fpkPzzYabaKR431QzJhm+3NIuMqtSgNy0ZX7zO9om3vkSAtLLTUlyYIHxTj3
 owPpndN527XUuYalwFF7ffdZK0oIOX65XEUehmX1SPEeOGNhrWjnLH8rcr5XcCbL
 VaGPMcqkJLjW+aKTjr4Xi0R6geQ+BjM7g+FNtu7BR4V+laxLyKz9f+WPdrdfcFQP
 PLt6gBG6/OVzmZD8Fxs2iD0ox/KaC7gfhxF7ffF1aFwZIhzMZhUYtmCxNSPx80lG
 FXOG9R54kDzvPzPNdZLS+dYUcuSBjFLw3Wjrplxzlok+cLjlqjfoABHXlhFjfcuM
 Qr5QeUnJc9at+2p8JBjBRK1uxLgV2G+R7umIcjS9SIiD0kK9mKHGDbdKHJ4pvto8
 sMAtIDAYMT+hEPWZ7i7x3lqbd/G2ipwKi2exgKy2VVfxB11qTY07boqNztd905NG
 iOpusyvFqouHZZJ4SC5OziTTa3rcy2nhta2uYT946aS22z3BxESePlzi/PCJ5faU
 h6HA7qIZyr4aUH75I/FBBmDasFrSKA7xJUYXPHa5wV1pnBvSs6QA14P0q43OsmwX
 OQyC1OFfgRfE49kX14QZ
 =TjDN
 -END PGP SIGNATURE-

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-03 Thread James Page

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 02/10/13 22:49, James Page wrote:
 sudo ip netns exec qrouter-d3baf1b1-55ee-42cb-a3f6-9629288e3221
 traceroute -n 10.5.0.2 -p 4 --mtu traceroute to 10.5.0.2 
 (10.5.0.2), 30 hops max, 65000 byte packets 1  10.5.0.2  0.950
 ms F=1500  0.598 ms  0.566 ms
 
 The PMTU from the l3 gateway to the instance looks OK to me.
 I spent a bit more time debugging this; performance from within
 the router netns on the L3 gateway node looks good in both
 directions when accessing via the tenant network (10.5.0.2) over
 the qr-X interface, but when accessing through the external
 network from within the netns I see the same performance choke
 upstream into the tenant network.
 
 Which would indicate that my problem lies somewhere around the 
 qg-X interface in the router netns - just trying to figure out 
 exactly what - maybe iptables is doing something wonky?

OK - I found a fix but I'm not sure why this makes a difference;
neither my l3-agent or dhcp-agent configuration had 'ovs_use_veth =
True'; I switched this on, clearing everything down, rebooted and now
I seem symmetric good performance across all neutron routers.

This would point to some sort of underlying bug when ovs_use_veth = False.


- -- 
James Page
Ubuntu and Debian Developer
james.p...@ubuntu.com
jamesp...@debian.org
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJSTTh6AAoJEL/srsug59jDmpEP/jaB5/yn9+Xm12XrVu0Q3IV5
fLGOuBboUgykVVsfkWccI/oygNlBaXIcDuak/E4jxPcoRhLAdY1zpX8MQ8wSsGKd
CjSeuW8xxnXubdfzmsCKSs3FCIBhDkSYzyiJd/raLvCfflyy8Cl7KN2x22mGHJ6z
qZ9APcYfm9qCVbEssA3BHcUL+st1iqMJ0YhVZBk03+QEXaWu3FFbjpjwx3X1ZvV5
Vbac7enqy7Lr4DSAIJVldeVuRURfv3YE3iJZTIXjaoUCCVTQLm5OmP9TrwBNHLsA
7W+LceQri+Vh0s4dHPKx5MiHsV3RCydcXkSQFYhx7390CXypMQ6WwXEY/a8Egssg
SuxXByHwEcQFa+9sCwPQ+RXCmC0O6kUi8EPmwadjI5Gc1LoKw5Wov/SEen86fDUW
P9pRXonseYyWN9I4MT4aG1ez8Dqq/SiZyWBHtcITxKI2smD92G9CwWGo4L9oGqJJ
UcHRwQaTHgzy3yETPO25hjax8ZWZGNccHBixMCZKegr9p2dhR+7qF8G7mRtRQLxL
0fgOAExn/SX59ZT4RaYi9fI6Gng13RtSyI87CJC/50vfTmqoraUUK1aoSjIY4Dt+
DYEMMLp205uLEj2IyaNTzykR0yh3t6dvfpCCcRA/xPT9slfa0a7P8LafyiWa4/5c
jkJM4Y1BUV+2L5Rrf3sc
=4lO4
-END PGP SIGNATURE-

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-02 Thread Rick Jones


On 10/02/2013 02:14 AM, James Page wrote:



I tcpdump'ed the traffic and I see alot of duplicate acks which makes
me suspect some sort of packet fragmentation but its got me puzzled.

Anyone have any ideas about how to debug this further? or has anyone
seen anything like this before?


Duplicate ACKs can be triggered by missing or out-of-order TCP segments. 
 Presumably that would show-up in the tcpdump trace though it might be 
easier to see if you run the .pcap file through tcptrace -G.


Iperf may have a similar option, but if there are actual TCP 
retransmissions during the run, netperf can be told to tell you about 
them (when running under Linux):


netperf -H remote -t TCP_STREAM -- -o 
throughput,local_transport_retrans,remote_transport_retrans


will give to remote

and

netperf -H remote -t TCP_MAERTS -- -o 
throughput,local_transport_retrans,remote_transport_retrans


will give from remote.  Or you can take snapshots of netstat -s output 
from before and after your iperf run(s) and do the math by hand.


rick jones
if the netperf in multiverse isn't new enough to grok the -o option, you 
can grab the top-of-trunk from http://www.netperf.org/svn/netperf2/trunk 
via svn.


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-02 Thread Jay Pipes

Hi James, have you tried setting the MTU to a lower number of bytes, 
instead of a higher-than-1500 setting? Say... 1454 instead of 1546?


Curious to see if that resolves the issue. If it does, then perhaps 
there is a path somewhere that had a 1546 PMTU?


-jay

On 10/02/2013 05:14 AM, James Page wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi Folks

I'm seeing an odd direction performance issue with my Havana test rig
which I'm struggling to debug; details:

Ubuntu 12.04 with Linux 3.8 backports kernel, Havana Cloud Archive
(currently Havana b3, OpenvSwitch 1.10.2), OpenvSwitch plugin with GRE
overlay networks.

I've configured the MTU's on all of the physical host network
interfaces to 1546 to add capacity for the GRE network headers.

Performance between instances within a single tenant network on
different physical hosts is as I would expect (near 1GBps), but I see
issues when data transits the Neutron L3 gateway - in the example
below churel is a physical host on the same network as the layer 3
gateway:

ubuntu@churel:~$ scp hardware.dump 10.98.191.103:
hardware.dump
   100%   67MB   4.8MB/s
00:14

ubuntu@churel:~$ scp 10.98.191.103:hardware.dump .
hardware.dump
 100%   67MB
66.8MB/s   00:01

As you can see, pushing data to the instance (via a floating ip
10.98.191.103) is painfully slow, whereas pulling the same data is
x10+ faster (and closer to what I would expect).

iperf confirms the same:

ubuntu@churel:~$ iperf -c 10.98.191.103 -m
- 
Client connecting to 10.98.191.103, TCP port 5001
TCP window size: 22.9 KByte (default)
- 
[  3] local 10.98.191.11 port 55330 connected with 10.98.191.103 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec  60.8 MBytes  50.8 Mbits/sec
[  3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)

ubuntu@james-page-bastion:~$ iperf -c 10.98.191.11 -m


- 
Client connecting to 10.98.191.11, TCP port 5001
TCP window size: 23.3 KByte (default)
- 
[  3] local 10.5.0.2 port 52190 connected with 10.98.191.11 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec  1.07 GBytes   918 Mbits/sec
[  3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)


918Mbit vs 50Mbits.

I tcpdump'ed the traffic and I see alot of duplicate acks which makes
me suspect some sort of packet fragmentation but its got me puzzled.

Anyone have any ideas about how to debug this further? or has anyone
seen anything like this before?

Cheers

James


- --
James Page
Ubuntu and Debian Developer
james.p...@ubuntu.com
jamesp...@debian.org
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJSS+QSAAoJEL/srsug59jD8ZcQAKbZDVU8KKa7hsic7+ulqWQQ
EFbq8Im5x4mQY7htIvIOM26BR0ktAO5luE7zMBXsA4AwPud1BQSGhw89/NvNhADT
TLcGdQADsomeiBpJebzwUmvL/tYUoMDRA3O96mUn2pi0fySWbEuEgMDjDJ/ow23D
Y7nEv0mItaZ4MBSI9RZcqsDUl7UbbdlGejSWhJcwp/127HMU9nYwWNz5UHJjsGZ1
eITyv1WZH/dYPQ1SES41qD1WvkTBugopGJvptEyrcO62A+akGOvnqpsHgPECbLb+
b/8rk8nB1HB74Wh+tQP4WRQCZYso15nB6ukIyIU24Qti2tXtXDdKwszEoblCwCT3
YZJTERNOENURlUEFwgi6FNL+nZomSG0UJU6qqDGiUJkbSF7SwJm4y8/XRlJM2Ihn
wyxFB0qe3YdMqgDLZn11GwCDqn3g11hYaocHNUyRaj/tgxhGKbOFvix5kz3I4V7T
gd+sqUySMVd9wCRXBzDDhCuG9xf/QY2ZQxXzyfPJWd9svPh/O6osTSQzaI1eZl9/
jVRejMAFr6Rl11GPKd3DYi32GXa896QELjBmJ9Kof0NDlCcDuUKpVeifIhcbQZZV
sWyQmbb6Z/ypFV9xXiLRfH2fW2bAQQHgiQGvy9apoE78BWYdnsD8Q3Ekwag6lFqp
yUwt/RcRXS1PbLG4EGFW
=HTvW
-END PGP SIGNATURE-

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack




___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-02 Thread James Page

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi Gangur

On 02/10/13 17:24, Gangur, Hrushikesh (R  D HP Cloud) wrote:
 http://techbackground.blogspot.co.uk/2013/06/path-mtu-discovery-and-gre.html

Yeah
 
- - I read that already:

sudo ip netns exec qrouter-d3baf1b1-55ee-42cb-a3f6-9629288e3221
traceroute -n 10.5.0.2 -p 4 --mtu
traceroute to 10.5.0.2 (10.5.0.2), 30 hops max, 65000 byte packets
 1  10.5.0.2  0.950 ms F=1500  0.598 ms  0.566 ms

The PMTU from the l3 gateway to the instance looks OK to me.

 On 02/10/13 16:37, Jay Pipes wrote:
 Hi James, have you tried setting the MTU to a lower number of
 bytes, instead of a higher-than-1500 setting? Say... 1454 instead
 of 1546?
 
 Curious to see if that resolves the issue. If it does, then
 perhaps there is a path somewhere that had a 1546 PMTU?
 
 Do you mean in instances, or on the physical servers?
 
 For context I hit this problem prior to tweaking MTU's (defaults
 of 1500 everywhere).

- -- 
James Page
Ubuntu and Debian Developer
james.p...@ubuntu.com
jamesp...@debian.org
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJSTErmAAoJEL/srsug59jDLzEQALqIhfbVeWwUCe/s+P/CLN3k
EIH5koGJ69RiQDFhcIBSRzQw7FbwWznBAHHeemVn5OW/LcCKJQo9wLNX1K742pjz
G2pDwVeJnwX/QVK95chyJ/4zZENpSiT/2fzlNje7H95eiKdRd6mvDSPsIjoEQ5Ci
Cz4R1nvOoJj9cWOt5xCHtsmb5PX7O2D9zpCj/Al6ELH95zNfe7eyFSUcwZ/MEo9t
e8VxAaKlg+AQ6bdYokssIrHU6osdHDGXY1/9z6ffbcrVXJnlDkzHx0DmN81qIPXV
ros8OPZA51cVqVpEw2TvFbl5DZHukjOLGePsTKN6IcQ/2TtMdqqgbGdWAxO9iVFR
SAQdVp9yM6J7XM4kZ//gj4Oc3g/jN9EHr8rP0tEFWlypomiBjG8sQeEuHlp6DFxQ
IYacqOfWCozTDuQroj77Q9QUf4VV+ykVvTPFBHG7FiLAZyXRV5ueOlwHgAdysiyO
rIYcxXYrU6RAAmuqXXnyu5awFd/s2qisuAXTjhQpN9mUuVB9ge/BRGLa1di4S/Wz
sHAhT18h/JAxvyzARq9Qa0X8go87mM3Xoe5fivnvQrTNPQsoOxgaK6JVbTNG0pP2
bJbnRTBEjudSNlRo1WEfopsiz1HxYsN5tlpG0BabnkAsUqVjKP36tUQphe3e7S9R
dFBngsPowBFLcBuBY7tp
=FDK3
-END PGP SIGNATURE-

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-02 Thread Jay Pipes


On 10/02/2013 12:17 PM, James Page wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi Jay

On 02/10/13 16:37, Jay Pipes wrote:

Hi James, have you tried setting the MTU to a lower number of
bytes, instead of a higher-than-1500 setting? Say... 1454 instead
of 1546?

Curious to see if that resolves the issue. If it does, then
perhaps there is a path somewhere that had a 1546 PMTU?


Do you mean in instances, or on the physical servers?


I mean on the instance vNICs.


For context I hit this problem prior to tweaking MTU's (defaults of
1500 everywhere).


Right, I'm just curious :)

-jay



___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-02 Thread James Page

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 02/10/13 17:28, Jay Pipes wrote:
 On 02/10/13 16:37, Jay Pipes wrote:
 Hi James, have you tried setting the MTU to a lower number of 
 bytes, instead of a higher-than-1500 setting? Say... 1454
 instead of 1546?
 
 Curious to see if that resolves the issue. If it does, then 
 perhaps there is a path somewhere that had a 1546 PMTU?
 
 Do you mean in instances, or on the physical servers?
 
 I mean on the instance vNICs.

Yeah - thats what I thought - that makes no difference either.

- -- 
James Page
Technical Lead
Ubuntu Server Team
james.p...@canonical.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJSTE7zAAoJEL/srsug59jDwswP/AwQarblKhDnAe+aGYVn1hKs
g/BPiyqovNtBNNXKj5FLIaDDQnpLueIDxoX0lHPZkKLpDJybrsQBtqwnol2qcBa3
rBfb/yt92vL8wDlRBEsbh1qr/2EmErksFjcIMIltqBNXP5gGR3ADS9DIJ65GUIFY
Aipsk03bu3pn2FiCJo/cbbKBT96bbQg9vNgbUi8Eu8vWW7wpEq90njlDrVh02u/o
ioME0Ja8DnFrPNmIx8kaaOdXSY9e3YmWfjImQbi/O7lVwUHV7ZA+4szSrQiCmPn3
eHUGTblLP2yEmETu3rF7hxB1bn2H3bxZ+C1vg7k3ABNlTMrDPHTQv+iRSCA9WDcf
yMNjCD5dTI10gx+OTDjEIg+z2yEA4fqmYqHgHsuPyCBdRs6CX1qIJPywFZlFDglC
AC1R6PMtpVTlcUXlLX/3QJc63/n+3nX6R56iOmAxgDIaVLy5+Hh52g+5vY1T5Nl8
B0aqM60Duxvpf6/9wkgSHcjp7MHBp1IEoT8b+aD5xwSZjG+gqW2wClCGx6ktOfnN
vwxmaTT+rY2vqLNXd51PF2Tfl5+cfK2Sws3lnmJwh5PxZtcwfY42wiBAJWbuJMDT
EIurmHqSPhBkylZlONWto7oNyDSaiqYczbTXGM3eYw/ZqTpgN/X9JuCpMAxt51oI
ALR0na+J0AIQcRUS0P4M
=CQbq
-END PGP SIGNATURE-

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-02 Thread James Page

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 02/10/13 17:33, James Page wrote:
 On 02/10/13 17:24, Gangur, Hrushikesh (R  D HP Cloud) wrote:
 http://techbackground.blogspot.co.uk/2013/06/path-mtu-discovery-and-gre.html

 
Yeah
 
 - I read that already:
 
 sudo ip netns exec qrouter-d3baf1b1-55ee-42cb-a3f6-9629288e3221 
 traceroute -n 10.5.0.2 -p 4 --mtu traceroute to 10.5.0.2
 (10.5.0.2), 30 hops max, 65000 byte packets 1  10.5.0.2  0.950 ms
 F=1500  0.598 ms  0.566 ms
 
 The PMTU from the l3 gateway to the instance looks OK to me.

I spent a bit more time debugging this; performance from within the
router netns on the L3 gateway node looks good in both directions when
accessing via the tenant network (10.5.0.2) over the qr-X
interface, but when accessing through the external network from within
the netns I see the same performance choke upstream into the tenant
network.

Which would indicate that my problem lies somewhere around the
qg-X interface in the router netns - just trying to figure out
exactly what - maybe iptables is doing something wonky?

- -- 
James Page
Ubuntu and Debian Developer
james.p...@ubuntu.com
jamesp...@debian.org
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJSTJTzAAoJEL/srsug59jDXIoQAIqd5Msoyubvs0Y270PeYHwJ
vsmjw0Fzyf+428KTo2RcfWKGarkBmn/3kbygzPJH2aVHZx/+s2dHY1YJu1gH7B4i
0yCIQZWhur+CdXN7QplqhJLgq+ZVyC4/GV4RA/C2NpHzGZg/avx5BPMhzfnSnRtB
Xy49umZkG90622WhW2hlXW5J06YIEsO1EuwonXxIXzXu2CYsvLKk2GguU7tejC7Q
DfW36gkCVv2z/71vVXgpjNt76MNsA8IVmaB4vv08Ai4yyUMNpvUc/SWu5DwzuoZx
vGxkCFv419rzO64L6EbYcmnUBXa+wFnSTp8hCNfl8fsDMJb6kynwLAWqCiIKKS8/
ozZfZ7eQ4CmyctckXjxBchmybh0aMRrzYANvE/9vkub3aAF7fpeCus+Nw59TLe62
tlfAZKPhmLikGbbIia6SX6j9PS9x2mSagfinjQs0BHDV0Pyww5qotWbWLbCFD7Cz
yhLjAGAhOnB5CQlEqX9XdM2/YGvhTIzLMMkPeQVicNlUXx/TXqJ2cvcIjdoBASFC
i6lfhhwXU9n9zi0THOxHQozksaMKc/diWULkcewqdbqYgLbZ5x8+SADf2Zd7WFzZ
MKe54y7fmhKWnL+zTN9tLwG8qnLWpIWJ5M4V99a8HL6zgTyeRJ/9bgMsl/2ghTra
EGO8vL6+zj8cAYTFB3oF
=Fp5N
-END PGP SIGNATURE-

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-02 Thread Martinx - ジェームズ

Hi James,

Let me ask you something...

Are you using the package `openvswitch-datapath-dkms' from Havana Ubuntu
Cloud Archive with Linux 3.8?

I am unable to compile that module on top of Ubuntu 12.04.3 (with Linux
3.8) and I'm wondering if it is still required or not...

Thanks!
Thiago


On 2 October 2013 06:14, James Page james.p...@ubuntu.com wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Hi Folks

 I'm seeing an odd direction performance issue with my Havana test rig
 which I'm struggling to debug; details:

 Ubuntu 12.04 with Linux 3.8 backports kernel, Havana Cloud Archive
 (currently Havana b3, OpenvSwitch 1.10.2), OpenvSwitch plugin with GRE
 overlay networks.

 I've configured the MTU's on all of the physical host network
 interfaces to 1546 to add capacity for the GRE network headers.

 Performance between instances within a single tenant network on
 different physical hosts is as I would expect (near 1GBps), but I see
 issues when data transits the Neutron L3 gateway - in the example
 below churel is a physical host on the same network as the layer 3
 gateway:

 ubuntu@churel:~$ scp hardware.dump 10.98.191.103:
 hardware.dump
   100%   67MB   4.8MB/s
 00:14

 ubuntu@churel:~$ scp 10.98.191.103:hardware.dump .
 hardware.dump
 100%   67MB
 66.8MB/s   00:01

 As you can see, pushing data to the instance (via a floating ip
 10.98.191.103) is painfully slow, whereas pulling the same data is
 x10+ faster (and closer to what I would expect).

 iperf confirms the same:

 ubuntu@churel:~$ iperf -c 10.98.191.103 -m
 - 
 Client connecting to 10.98.191.103, TCP port 5001
 TCP window size: 22.9 KByte (default)
 - 
 [  3] local 10.98.191.11 port 55330 connected with 10.98.191.103 port 5001
 [ ID] Interval   Transfer Bandwidth
 [  3]  0.0-10.0 sec  60.8 MBytes  50.8 Mbits/sec
 [  3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)

 ubuntu@james-page-bastion:~$ iperf -c 10.98.191.11 -m


 - 
 Client connecting to 10.98.191.11, TCP port 5001
 TCP window size: 23.3 KByte (default)
 - 
 [  3] local 10.5.0.2 port 52190 connected with 10.98.191.11 port 5001
 [ ID] Interval   Transfer Bandwidth
 [  3]  0.0-10.0 sec  1.07 GBytes   918 Mbits/sec
 [  3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)


 918Mbit vs 50Mbits.

 I tcpdump'ed the traffic and I see alot of duplicate acks which makes
 me suspect some sort of packet fragmentation but its got me puzzled.

 Anyone have any ideas about how to debug this further? or has anyone
 seen anything like this before?

 Cheers

 James


 - --
 James Page
 Ubuntu and Debian Developer
 james.p...@ubuntu.com
 jamesp...@debian.org
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.14 (GNU/Linux)
 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

 iQIcBAEBCAAGBQJSS+QSAAoJEL/srsug59jD8ZcQAKbZDVU8KKa7hsic7+ulqWQQ
 EFbq8Im5x4mQY7htIvIOM26BR0ktAO5luE7zMBXsA4AwPud1BQSGhw89/NvNhADT
 TLcGdQADsomeiBpJebzwUmvL/tYUoMDRA3O96mUn2pi0fySWbEuEgMDjDJ/ow23D
 Y7nEv0mItaZ4MBSI9RZcqsDUl7UbbdlGejSWhJcwp/127HMU9nYwWNz5UHJjsGZ1
 eITyv1WZH/dYPQ1SES41qD1WvkTBugopGJvptEyrcO62A+akGOvnqpsHgPECbLb+
 b/8rk8nB1HB74Wh+tQP4WRQCZYso15nB6ukIyIU24Qti2tXtXDdKwszEoblCwCT3
 YZJTERNOENURlUEFwgi6FNL+nZomSG0UJU6qqDGiUJkbSF7SwJm4y8/XRlJM2Ihn
 wyxFB0qe3YdMqgDLZn11GwCDqn3g11hYaocHNUyRaj/tgxhGKbOFvix5kz3I4V7T
 gd+sqUySMVd9wCRXBzDDhCuG9xf/QY2ZQxXzyfPJWd9svPh/O6osTSQzaI1eZl9/
 jVRejMAFr6Rl11GPKd3DYi32GXa896QELjBmJ9Kof0NDlCcDuUKpVeifIhcbQZZV
 sWyQmbb6Z/ypFV9xXiLRfH2fW2bAQQHgiQGvy9apoE78BWYdnsD8Q3Ekwag6lFqp
 yUwt/RcRXS1PbLG4EGFW
 =HTvW
 -END PGP SIGNATURE-

 ___
 Mailing list:
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
 Post to : openstack@lists.openstack.org
 Unsubscribe :
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-02 Thread Robert Collins

I believe it's still needed: upstream kernel have pushed back against
the modules it provides, but neutron needs them to deliver the gre
tunnels.

-Rob

On 3 October 2013 13:15, Martinx - ジェームズ thiagocmarti...@gmail.com wrote:
 Hi James,

 Let me ask you something...

 Are you using the package `openvswitch-datapath-dkms' from Havana Ubuntu
 Cloud Archive with Linux 3.8?

 I am unable to compile that module on top of Ubuntu 12.04.3 (with Linux 3.8)
 and I'm wondering if it is still required or not...

 Thanks!
 Thiago


 On 2 October 2013 06:14, James Page james.p...@ubuntu.com wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Hi Folks

 I'm seeing an odd direction performance issue with my Havana test rig
 which I'm struggling to debug; details:

 Ubuntu 12.04 with Linux 3.8 backports kernel, Havana Cloud Archive
 (currently Havana b3, OpenvSwitch 1.10.2), OpenvSwitch plugin with GRE
 overlay networks.

 I've configured the MTU's on all of the physical host network
 interfaces to 1546 to add capacity for the GRE network headers.

 Performance between instances within a single tenant network on
 different physical hosts is as I would expect (near 1GBps), but I see
 issues when data transits the Neutron L3 gateway - in the example
 below churel is a physical host on the same network as the layer 3
 gateway:

 ubuntu@churel:~$ scp hardware.dump 10.98.191.103:
 hardware.dump
   100%   67MB   4.8MB/s
 00:14

 ubuntu@churel:~$ scp 10.98.191.103:hardware.dump .
 hardware.dump
 100%   67MB
 66.8MB/s   00:01

 As you can see, pushing data to the instance (via a floating ip
 10.98.191.103) is painfully slow, whereas pulling the same data is
 x10+ faster (and closer to what I would expect).

 iperf confirms the same:

 ubuntu@churel:~$ iperf -c 10.98.191.103 -m
 - 
 Client connecting to 10.98.191.103, TCP port 5001
 TCP window size: 22.9 KByte (default)
 - 
 [  3] local 10.98.191.11 port 55330 connected with 10.98.191.103 port 5001
 [ ID] Interval   Transfer Bandwidth
 [  3]  0.0-10.0 sec  60.8 MBytes  50.8 Mbits/sec
 [  3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)

 ubuntu@james-page-bastion:~$ iperf -c 10.98.191.11 -m


 - 
 Client connecting to 10.98.191.11, TCP port 5001
 TCP window size: 23.3 KByte (default)
 - 
 [  3] local 10.5.0.2 port 52190 connected with 10.98.191.11 port 5001
 [ ID] Interval   Transfer Bandwidth
 [  3]  0.0-10.0 sec  1.07 GBytes   918 Mbits/sec
 [  3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)


 918Mbit vs 50Mbits.

 I tcpdump'ed the traffic and I see alot of duplicate acks which makes
 me suspect some sort of packet fragmentation but its got me puzzled.

 Anyone have any ideas about how to debug this further? or has anyone
 seen anything like this before?

 Cheers

 James


 - --
 James Page
 Ubuntu and Debian Developer
 james.p...@ubuntu.com
 jamesp...@debian.org
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.14 (GNU/Linux)
 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

 iQIcBAEBCAAGBQJSS+QSAAoJEL/srsug59jD8ZcQAKbZDVU8KKa7hsic7+ulqWQQ
 EFbq8Im5x4mQY7htIvIOM26BR0ktAO5luE7zMBXsA4AwPud1BQSGhw89/NvNhADT
 TLcGdQADsomeiBpJebzwUmvL/tYUoMDRA3O96mUn2pi0fySWbEuEgMDjDJ/ow23D
 Y7nEv0mItaZ4MBSI9RZcqsDUl7UbbdlGejSWhJcwp/127HMU9nYwWNz5UHJjsGZ1
 eITyv1WZH/dYPQ1SES41qD1WvkTBugopGJvptEyrcO62A+akGOvnqpsHgPECbLb+
 b/8rk8nB1HB74Wh+tQP4WRQCZYso15nB6ukIyIU24Qti2tXtXDdKwszEoblCwCT3
 YZJTERNOENURlUEFwgi6FNL+nZomSG0UJU6qqDGiUJkbSF7SwJm4y8/XRlJM2Ihn
 wyxFB0qe3YdMqgDLZn11GwCDqn3g11hYaocHNUyRaj/tgxhGKbOFvix5kz3I4V7T
 gd+sqUySMVd9wCRXBzDDhCuG9xf/QY2ZQxXzyfPJWd9svPh/O6osTSQzaI1eZl9/
 jVRejMAFr6Rl11GPKd3DYi32GXa896QELjBmJ9Kof0NDlCcDuUKpVeifIhcbQZZV
 sWyQmbb6Z/ypFV9xXiLRfH2fW2bAQQHgiQGvy9apoE78BWYdnsD8Q3Ekwag6lFqp
 yUwt/RcRXS1PbLG4EGFW
 =HTvW
 -END PGP SIGNATURE-

 ___
 Mailing list:
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
 Post to : openstack@lists.openstack.org
 Unsubscribe :
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack



 ___
 Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
 Post to : openstack@lists.openstack.org
 Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack




-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-02 Thread Martinx - ジェームズ

Mmm... I am unable to compile openvswitch-datapath-dkms from Havana Ubuntu
Cloud Archive (on top of a fresh install of Ubuntu 12.04.3), look:

--
root@havabuntu-1:~# uname -a
Linux havabuntu-1 3.8.0-31-generic #46~precise1-Ubuntu SMP Wed Sep 11
18:21:16 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

root@havabuntu-1:~# dpkg -l | grep openvswitch-datapath-dkms
ii  openvswitch-datapath-dkms1.10.2-0ubuntu1~cloud0Open
vSwitch datapath module source - DKMS version

root@havabuntu-1:~# dpkg-reconfigure openvswitch-datapath-dkms

--
Deleting module version: 1.10.2
completely from the DKMS tree.
--
Done.

Creating symlink /var/lib/dkms/openvswitch/1.10.2/source -
 /usr/src/openvswitch-1.10.2

DKMS: add completed.

Kernel preparation unnecessary for this kernel.  Skipping...

Building module:
cleaning build area(bad exit status: 2)
./configure --with-linux='/lib/modules/3.8.0-31-generic/build'  make -C
datapath/linux(bad exit status: 2)
Error! Bad return status for module build on kernel: 3.8.0-31-generic
(x86_64)
Consult /var/lib/dkms/openvswitch/1.10.2/build/make.log for more
information.
--

Contents of /var/lib/dkms/openvswitch/1.10.2/build/make.log:

http://paste.openstack.org/show/47888/

I also have the packages: build-essential, linux-headers, etc, installed...

So, James, have you this module compiled on your test environment? I mean,
does this command: dpkg-reconfigure openvswitch-datapath-dkms works for
you?!

NOTE: It also doesn't compiles when with Linux 3.2 (Ubuntu 12.04.1).

Thanks,
Thiago


On 2 October 2013 22:28, Robert Collins robe...@robertcollins.net wrote:

 I believe it's still needed: upstream kernel have pushed back against
 the modules it provides, but neutron needs them to deliver the gre
 tunnels.

 -Rob

 On 3 October 2013 13:15, Martinx - ジェームズ thiagocmarti...@gmail.com
 wrote:
  Hi James,
 
  Let me ask you something...
 
  Are you using the package `openvswitch-datapath-dkms' from Havana Ubuntu
  Cloud Archive with Linux 3.8?
 
  I am unable to compile that module on top of Ubuntu 12.04.3 (with Linux
 3.8)
  and I'm wondering if it is still required or not...
 
  Thanks!
  Thiago
 
 
  On 2 October 2013 06:14, James Page james.p...@ubuntu.com wrote:
 
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA256
 
  Hi Folks
 
  I'm seeing an odd direction performance issue with my Havana test rig
  which I'm struggling to debug; details:
 
  Ubuntu 12.04 with Linux 3.8 backports kernel, Havana Cloud Archive
  (currently Havana b3, OpenvSwitch 1.10.2), OpenvSwitch plugin with GRE
  overlay networks.
 
  I've configured the MTU's on all of the physical host network
  interfaces to 1546 to add capacity for the GRE network headers.
 
  Performance between instances within a single tenant network on
  different physical hosts is as I would expect (near 1GBps), but I see
  issues when data transits the Neutron L3 gateway - in the example
  below churel is a physical host on the same network as the layer 3
  gateway:
 
  ubuntu@churel:~$ scp hardware.dump 10.98.191.103:
  hardware.dump
100%   67MB   4.8MB/s
  00:14
 
  ubuntu@churel:~$ scp 10.98.191.103:hardware.dump .
  hardware.dump
  100%   67MB
  66.8MB/s   00:01
 
  As you can see, pushing data to the instance (via a floating ip
  10.98.191.103) is painfully slow, whereas pulling the same data is
  x10+ faster (and closer to what I would expect).
 
  iperf confirms the same:
 
  ubuntu@churel:~$ iperf -c 10.98.191.103 -m
  - 
  Client connecting to 10.98.191.103, TCP port 5001
  TCP window size: 22.9 KByte (default)
  - 
  [  3] local 10.98.191.11 port 55330 connected with 10.98.191.103 port
 5001
  [ ID] Interval   Transfer Bandwidth
  [  3]  0.0-10.0 sec  60.8 MBytes  50.8 Mbits/sec
  [  3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
 
  ubuntu@james-page-bastion:~$ iperf -c 10.98.191.11 -m
 
 
  - 
  Client connecting to 10.98.191.11, TCP port 5001
  TCP window size: 23.3 KByte (default)
  - 
  [  3] local 10.5.0.2 port 52190 connected with 10.98.191.11 port 5001
  [ ID] Interval   Transfer Bandwidth
  [  3]  0.0-10.0 sec  1.07 GBytes   918 Mbits/sec
  [  3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
 
 
  918Mbit vs 50Mbits.
 
  I tcpdump'ed the traffic and I see alot of duplicate acks which makes
  me suspect some sort of packet fragmentation but its got me puzzled.
 
  Anyone have any ideas about how to debug this further? or has anyone
  seen anything like this before?
 
  Cheers
 
  James
 
 
  - --
  James Page
  Ubuntu and Debian Developer
  james.p...@ubuntu.com

39 matches

Mail list logo