Ok so that says that PMTUd is failing, probably due to a bug/limitation in openvswitch. Can we please make sure a bug is filed - both on Neutron and on the upstream component as soon as someone tracks it down : Manual MTU lowering is only needed when a network component is failing to report failed delivery of DF packets correctly.
-Rob On 25 October 2013 08:38, Speichert,Daniel <djs...@drexel.edu> wrote: > We managed to bring the upload speed back to maximum on the instances > through the use of this guide: > > http://docs.openstack.org/trunk/openstack-network/admin/content/openvswitch_plugin.html > > > > Basically, the MTU needs to be lowered for GRE tunnels. It can be done with > DHCP as explained in the new trunk manual. > > > > Regards, > > Daniel > > > > From: annegen...@justwriteclick.com [mailto:annegen...@justwriteclick.com] > On Behalf Of Anne Gentle > Sent: Thursday, October 24, 2013 12:08 PM > To: Martinx - ジェームズ > Cc: Speichert,Daniel; openstack@lists.openstack.org > > > Subject: Re: [Openstack] Directional network performance issues with Neutron > + OpenvSwitch > > > > > > > > On Thu, Oct 24, 2013 at 10:37 AM, Martinx - ジェームズ > <thiagocmarti...@gmail.com> wrote: > > Precisely! > > > > The doc currently says to disable Namespace when using GRE, never did this > before, look: > > > > http://docs.openstack.org/trunk/install-guide/install/apt/content/install-neutron.install-plugin.ovs.gre.html > > > > But on this very same doc, they say to enable it... Who knows?! =P > > > > http://docs.openstack.org/trunk/install-guide/install/apt/content/section_networking-routers-with-private-networks.html > > > > I stick with Namespace enabled... > > > > > > Just a reminder, /trunk/ links are works in progress, thanks for bringing > the mismatch to our attention, and we already have a doc bug filed: > > > > https://bugs.launchpad.net/openstack-manuals/+bug/1241056 > > > > Review this patch: https://review.openstack.org/#/c/53380/ > > > > Anne > > > > > > > > Let me ask you something, when you enable ovs_use_veth, que Metadata and > DHCP still works?! > > > > Cheers! > > Thiago > > > > On 24 October 2013 12:22, Speichert,Daniel <djs...@drexel.edu> wrote: > > Hello everyone, > > > > It seems we also ran into the same issue. > > > > We are running Ubuntu Saucy with OpenStack Havana from Ubuntu Cloud archives > (precise-updates). > > > > The download speed to the VMs increased from 5 Mbps to maximum after > enabling ovs_use_veth. Upload speed from the VMs is still terrible (max 1 > Mbps, usually 0.04 Mbps). > > > > Here is the iperf between the instance and L3 agent (network node) inside > namespace. > > > > root@cloud:~# ip netns exec qrouter-a29e0200-d390-40d1-8cf7-7ac1cef5863a > iperf -c 10.1.0.24 -r > > ------------------------------------------------------------ > > Server listening on TCP port 5001 > > TCP window size: 85.3 KByte (default) > > ------------------------------------------------------------ > > ------------------------------------------------------------ > > Client connecting to 10.1.0.24, TCP port 5001 > > TCP window size: 585 KByte (default) > > ------------------------------------------------------------ > > [ 7] local 10.1.0.1 port 37520 connected with 10.1.0.24 port 5001 > > [ ID] Interval Transfer Bandwidth > > [ 7] 0.0-10.0 sec 845 MBytes 708 Mbits/sec > > [ 6] local 10.1.0.1 port 5001 connected with 10.1.0.24 port 53006 > > [ 6] 0.0-31.4 sec 256 KBytes 66.7 Kbits/sec > > > > We are using Neutron OpenVSwitch with GRE and namespaces. > > > A side question: the documentation says to disable namespaces with GRE and > enable them with VLANs. It was always working well for us on Grizzly with > GRE and namespaces and we could never get it to work without namespaces. Is > there any specific reason why the documentation is advising to disable it? > > > > Regards, > > Daniel > > > > From: Martinx - ジェームズ [mailto:thiagocmarti...@gmail.com] > Sent: Thursday, October 24, 2013 3:58 AM > To: Aaron Rosen > Cc: openstack@lists.openstack.org > > > Subject: Re: [Openstack] Directional network performance issues with Neutron > + OpenvSwitch > > > > Hi Aaron, > > > > Thanks for answering! =) > > > > Lets work... > > > > --- > > > > TEST #1 - iperf between Network Node and its Uplink router (Data Center's > gateway "Internet") - OVS br-ex / eth2 > > > > # Tenant Namespace route table > > > > root@net-node-1:~# ip netns exec > qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 ip route > > default via 172.16.0.1 dev qg-50b615b7-c2 > > 172.16.0.0/20 dev qg-50b615b7-c2 proto kernel scope link src 172.16.0.2 > > 192.168.210.0/24 dev qr-a1376f61-05 proto kernel scope link src > 192.168.210.1 > > > > # there is a "iperf -s" running at 172.16.0.1 "Internet", testing it > > > > root@net-node-1:~# ip netns exec > qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -c 172.16.0.1 > > ------------------------------------------------------------ > > Client connecting to 172.16.0.1, TCP port 5001 > > TCP window size: 22.9 KByte (default) > > ------------------------------------------------------------ > > [ 5] local 172.16.0.2 port 58342 connected with 172.16.0.1 port 5001 > > [ ID] Interval Transfer Bandwidth > > [ 5] 0.0-10.0 sec 668 MBytes 559 Mbits/sec > > --- > > > > --- > > > > TEST #2 - iperf on one instance to the Namespace of the L3 agent + uplink > router > > > > # iperf server running within Tenant's Namespace router > > > > root@net-node-1:~# ip netns exec > qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -s > > > > - > > > > # from instance-1 > > > > ubuntu@instance-1:~$ ip route > > default via 192.168.210.1 dev eth0 metric 100 > > 192.168.210.0/24 dev eth0 proto kernel scope link src 192.168.210.2 > > > > # instance-1 performing tests against net-node-1 Namespace above > > > > ubuntu@instance-1:~$ iperf -c 192.168.210.1 > > ------------------------------------------------------------ > > Client connecting to 192.168.210.1, TCP port 5001 > > TCP window size: 21.0 KByte (default) > > ------------------------------------------------------------ > > [ 3] local 192.168.210.2 port 43739 connected with 192.168.210.1 port 5001 > > [ ID] Interval Transfer Bandwidth > > [ 3] 0.0-10.0 sec 484 MBytes 406 Mbits/sec > > > > # still on instance-1, now against "External IP" of its own Namespace / > Router > > > > ubuntu@instance-1:~$ iperf -c 172.16.0.2 > > ------------------------------------------------------------ > > Client connecting to 172.16.0.2, TCP port 5001 > > TCP window size: 21.0 KByte (default) > > ------------------------------------------------------------ > > [ 3] local 192.168.210.2 port 34703 connected with 172.16.0.2 port 5001 > > [ ID] Interval Transfer Bandwidth > > [ 3] 0.0-10.0 sec 520 MBytes 436 Mbits/sec > > > > # still on instance-1, now against the Data Center UpLink Router > > > > ubuntu@instance-1:~$ iperf -c 172.16.0.1 > > ------------------------------------------------------------ > > Client connecting to 172.16.0.1, TCP port 5001 > > TCP window size: 21.0 KByte (default) > > ------------------------------------------------------------ > > [ 3] local 192.168.210.4 port 38401 connected with 172.16.0.1 port 5001 > > [ ID] Interval Transfer Bandwidth > > [ 3] 0.0-10.0 sec 324 MBytes 271 Mbits/sec > > --- > > > > This latest test shows only 271 Mbits/s! I think it should be at least, > 400~430 MBits/s... Right?! > > > > --- > > > > TEST #3 - Two instances on the same hypervisor > > > > # iperf server > > > > ubuntu@instance-2:~$ ip route > > default via 192.168.210.1 dev eth0 metric 100 > > 192.168.210.0/24 dev eth0 proto kernel scope link src 192.168.210.4 > > > > ubuntu@instance-2:~$ iperf -s > > ------------------------------------------------------------ > > Server listening on TCP port 5001 > > TCP window size: 85.3 KByte (default) > > ------------------------------------------------------------ > > [ 4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port 45800 > > [ ID] Interval Transfer Bandwidth > > [ 4] 0.0-10.0 sec 4.61 GBytes 3.96 Gbits/sec > > > > # iperf client > > > > ubuntu@instance-1:~$ iperf -c 192.168.210.4 > > ------------------------------------------------------------ > > Client connecting to 192.168.210.4, TCP port 5001 > > TCP window size: 21.0 KByte (default) > > ------------------------------------------------------------ > > [ 3] local 192.168.210.2 port 45800 connected with 192.168.210.4 port 5001 > > [ ID] Interval Transfer Bandwidth > > [ 3] 0.0-10.0 sec 4.61 GBytes 3.96 Gbits/sec > > --- > > > > --- > > > > TEST #4 - Two instances on different hypervisors - over GRE > > > > root@instance-2:~# iperf -s > > ------------------------------------------------------------ > > Server listening on TCP port 5001 > > TCP window size: 85.3 KByte (default) > > ------------------------------------------------------------ > > [ 4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port 34640 > > [ ID] Interval Transfer Bandwidth > > [ 4] 0.0-10.0 sec 237 MBytes 198 Mbits/sec > > > > > > root@instance-1:~# iperf -c 192.168.210.4 > > ------------------------------------------------------------ > > Client connecting to 192.168.210.4, TCP port 5001 > > TCP window size: 21.0 KByte (default) > > ------------------------------------------------------------ > > [ 3] local 192.168.210.2 port 34640 connected with 192.168.210.4 port 5001 > > [ ID] Interval Transfer Bandwidth > > [ 3] 0.0-10.0 sec 237 MBytes 198 Mbits/sec > > --- > > > > I just realized how slow is my intra-cloud (intra-VM) communication... :-/ > > > > --- > > > > TEST #5 - Two hypervisors - "GRE TUNNEL LAN" - OVS local_ip / remote_ip > > > > # Same path of "TEST #4" but, testing the physical GRE path (where GRE > traffic flows) > > > > root@hypervisor-2:~$ iperf -s > > ------------------------------------------------------------ > > Server listening on TCP port 5001 > > TCP window size: 85.3 KByte (default) > > ------------------------------------------------------------ > > n[ 4] local 10.20.2.57 port 5001 connected with 10.20.2.53 port 51694 > > [ ID] Interval Transfer Bandwidth > > [ 4] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec > > > > root@hypervisor-1:~# iperf -c 10.20.2.57 > > ------------------------------------------------------------ > > Client connecting to 10.20.2.57, TCP port 5001 > > TCP window size: 22.9 KByte (default) > > ------------------------------------------------------------ > > [ 3] local 10.20.2.53 port 51694 connected with 10.20.2.57 port 5001 > > [ ID] Interval Transfer Bandwidth > > [ 3] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec > > --- > > > > About Test #5, I don't know why the GRE traffic (Test #4) doesn't reach > 1Gbit/sec (only ~200Mbit/s ?), since its physical path is much faster > (GIGALan). Plus, Test #3 shows a pretty fast speed when traffic flows only > within a hypervisor (3.96Gbit/sec). > > > > Tomorrow, I'll do this tests with netperf. > > > > NOTE: I'm using Open vSwitch 1.11.0, compiled for Ubuntu 12.04.3, via > "dpkg-buildpackage" and installed via "Debian / Ubuntu way". If I downgrade > to 1.10.2 from Havana Cloud Archive, same results... I can downgrade it, if > you guys tell me to do so. > > > > BTW, I'll install another "Region", based on Havana on Ubuntu 13.10, with > exactly the same configurations from my current Havana + Ubuntu 12.04.3, on > top of the same hardware, to see if the problem still persist. > > > > Regards, > > Thiago > > > > On 23 October 2013 22:40, Aaron Rosen <aro...@nicira.com> wrote: > > > > > > On Mon, Oct 21, 2013 at 11:52 PM, Martinx - ジェームズ > <thiagocmarti...@gmail.com> wrote: > > James, > > > > I think I'm hitting this problem. > > > > I'm using "Per-Tenant Routers with Private Networks", GRE tunnels and > L3+DHCP Network Node. > > > > The connectivity from behind my Instances is very slow. It takes an eternity > to finish "apt-get update". > > > > > > I'm curious if you can do the following tests to help pinpoint the bottle > neck: > > > > Run iperf or netperf between: > > two instances on the same hypervisor - this will determine if it's a > virtualization driver issue if the performance is bad. > > two instances on different hypervisors. > > one instance to the namespace of the l3 agent. > > > > > > > > > > > > > > If I run "apt-get update" from within tenant's Namespace, it goes fine. > > > > If I enable "ovs_use_veth", Metadata (and/or DHCP) stops working and I and > unable to start new Ubuntu Instances and login into them... Look: > > > > -- > > cloud-init start running: Tue, 22 Oct 2013 05:57:39 +0000. up 4.01 seconds > > 2013-10-22 06:01:42,989 - util.py[WARNING]: > 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [3/120s]: > url error [[Errno 113] No route to host] > > 2013-10-22 06:01:45,988 - util.py[WARNING]: > 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [6/120s]: > url error [[Errno 113] No route to host] > > -- > > > > > > Do you see anything interesting in the neutron-metadata-agent log? Or it > looks like your instance doesn't have a route to the default gw? > > > > > > Is this problem still around?! > > > > Should I stay away from GRE tunnels when with Havana + Ubuntu 12.04.3? > > > > Is it possible to re-enable Metadata when ovs_use_veth = true ? > > > > Thanks! > > Thiago > > > > On 3 October 2013 06:27, James Page <james.p...@ubuntu.com> wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > On 02/10/13 22:49, James Page wrote: >>> sudo ip netns exec qrouter-d3baf1b1-55ee-42cb-a3f6-9629288e3221 >>>> traceroute -n 10.5.0.2 -p 44444 --mtu traceroute to 10.5.0.2 >>>> (10.5.0.2), 30 hops max, 65000 byte packets 1 10.5.0.2 0.950 >>>> ms F=1500 0.598 ms 0.566 ms >>>> >>>> The PMTU from the l3 gateway to the instance looks OK to me. >> I spent a bit more time debugging this; performance from within >> the router netns on the L3 gateway node looks good in both >> directions when accessing via the tenant network (10.5.0.2) over >> the qr-XXXXX interface, but when accessing through the external >> network from within the netns I see the same performance choke >> upstream into the tenant network. >> >> Which would indicate that my problem lies somewhere around the >> qg-XXXXX interface in the router netns - just trying to figure out >> exactly what - maybe iptables is doing something wonky? > > OK - I found a fix but I'm not sure why this makes a difference; > neither my l3-agent or dhcp-agent configuration had 'ovs_use_veth = > True'; I switched this on, clearing everything down, rebooted and now > I seem symmetric good performance across all neutron routers. > > This would point to some sort of underlying bug when ovs_use_veth = False. > > > > - -- > James Page > Ubuntu and Debian Developer > james.p...@ubuntu.com > jamesp...@debian.org > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.14 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iQIcBAEBCAAGBQJSTTh6AAoJEL/srsug59jDmpEP/jaB5/yn9+Xm12XrVu0Q3IV5 > fLGOuBboUgykVVsfkWccI/oygNlBaXIcDuak/E4jxPcoRhLAdY1zpX8MQ8wSsGKd > CjSeuW8xxnXubdfzmsCKSs3FCIBhDkSYzyiJd/raLvCfflyy8Cl7KN2x22mGHJ6z > qZ9APcYfm9qCVbEssA3BHcUL+st1iqMJ0YhVZBk03+QEXaWu3FFbjpjwx3X1ZvV5 > Vbac7enqy7Lr4DSAIJVldeVuRURfv3YE3iJZTIXjaoUCCVTQLm5OmP9TrwBNHLsA > 7W+LceQri+Vh0s4dHPKx5MiHsV3RCydcXkSQFYhx7390CXypMQ6WwXEY/a8Egssg > SuxXByHwEcQFa+9sCwPQ+RXCmC0O6kUi8EPmwadjI5Gc1LoKw5Wov/SEen86fDUW > P9pRXonseYyWN9I4MT4aG1ez8Dqq/SiZyWBHtcITxKI2smD92G9CwWGo4L9oGqJJ > UcHRwQaTHgzy3yETPO25hjax8ZWZGNccHBixMCZKegr9p2dhR+7qF8G7mRtRQLxL > 0fgOAExn/SX59ZT4RaYi9fI6Gng13RtSyI87CJC/50vfTmqoraUUK1aoSjIY4Dt+ > DYEMMLp205uLEj2IyaNTzykR0yh3t6dvfpCCcRA/xPT9slfa0a7P8LafyiWa4/5c > jkJM4Y1BUV+2L5Rrf3sc > =4lO4 > > -----END PGP SIGNATURE----- > > _______________________________________________ > Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > Post to : openstack@lists.openstack.org > Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > > > > > _______________________________________________ > Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > Post to : openstack@lists.openstack.org > Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > > > > > > > _______________________________________________ > Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > Post to : openstack@lists.openstack.org > Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > > > > > _______________________________________________ > Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > Post to : openstack@lists.openstack.org > Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > > > > > _______________________________________________ > Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > Post to : openstack@lists.openstack.org > Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > -- Robert Collins <rbtcoll...@hp.com> Distinguished Technologist HP Converged Cloud _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack