For the sake of over-simplification, is there ever a reason to NOT enable jumbo frames in a cloud/SDN context where most of the traffic is between virtual elements that all support it? I understand that some switches do not support it and traffic form the web doesn't support it either but besides that, seems like a default "jumboframes = 1" concept would work just fine to me.
Then again I'm all about making OpenStack easier to consume so my ideas tend to gloss over special use cases with special requirements. *Adam Lawson* AQORN, Inc. 427 North Tatnall Street Ste. 58461 Wilmington, Delaware 19801-2230 Toll-free: (844) 4-AQORN-NOW ext. 101 International: +1 302-387-4660 Direct: +1 916-246-2072 On Fri, Jan 22, 2016 at 7:13 PM, Matt Kassawara <[email protected]> wrote: > The fun continues, now using an OpenStack deployment on physical hardware > that supports jumbo frames with 9000 MTU and IPv4/IPv6. This experiment > still uses Linux bridge for consistency. I'm planning to run similar > experiments with Open vSwitch and Open Virtual Network (OVN) in the next > week. > > I highly recommend reading further, but here's the TL;DR: Using physical > network interfaces with MTUs larger than 1500 reveals an additional problem > with veth pair for the neutron router interface on the public network. > Additionally, IP protocol version does not impact MTU calculation for > Linux bridge. > > First, review the OpenStack bits and resulting network components in the > environment [1]. In the first experiment, public cloud network limitations > prevented truly seeing how Linux bridge (actually the kernel) handles > physical network interfaces with MTUs larger than 1500. In this experiment, > we see that it automatically calculates the proper MTU for bridges and > VXLAN interfaces using the MTU of parent devices. Also, see that a regular > 'ping' works between the host outside of the deployment and the VM [2]. > > [1] https://gist.github.com/ionosphere80/a3725066386d8ca4c6d7 > [2] https://gist.github.com/ionosphere80/a8d601a356ac6c6274cb > > Note: The tcpdump output in each case references up to six points: neutron > router gateway on the public network (qg), namespace end of the veth pair > for the neutron router interface on the private network (qr), bridge end of > the veth pair for router interface on the private network (tap), controller > node end of the VXLAN network (underlying interface), compute node end of > the VXLAN network (underlying interface), and the bridge end of the tap for > the VM (tap). > > In the first experiment, SSH "stuck" because of a MTU mismatch on the veth > pair between the router namespace and private network bridge. In this > experiment, SSH works because the VM network interface uses a 1500 MTU and > all devices along the path between the host and VM use a 1500 or larger > MTU. So, let's configure the VM network interface to use the proper MTU of > 9000 minus the VXLAN protocol overhead of 50 bytes... 8950... and try SSH > again. > > 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8950 qdisc pfifo_fast qlen > 1000 > link/ether fa:16:3e:46:ac:d3 brd ff:ff:ff:ff:ff:ff > inet 172.16.1.3/24 brd 172.16.1.255 scope global eth0 > inet6 fd00:100:52:1:f816:3eff:fe46:acd3/64 scope global dynamic > valid_lft 86395sec preferred_lft 14395sec > inet6 fe80::f816:3eff:fe46:acd3/64 scope link > valid_lft forever preferred_lft forever > > SSH doesn't work with IPv4 or IPv6. Adding a slight twist to the first > experiment, I don't even see the large packet traversing the neutron > router gateway on the public network. So, I began a tcpdump closer to the > source on the bridge end of the veth pair for the neutron router > interface on the public network. > > Looking at [3], the veth pair between the router namespace and private > network bridge drops the packet. The MTU changes over a layer-2 connection > without a router, similar to connecting two switches with different MTUs. > Even if it could participate in PMTUD, the veth pair lacks an IP address > and therefore cannot originate ICMP messages. > > [3] https://gist.github.com/ionosphere80/ec83d0955c79b05ea381 > > Using observations from the first experiment, let's configure the MTU of > the interfaces in the qrouter namespace to match the other end of their > respective veth pairs. The public network (gateway) interface MTU becomes > 9000 and the private network router interfaces (IPv4 and IPv6) become 8950. > > 2: qr-49b27408-04: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8950 qdisc > pfifo_fast state UP mode DEFAULT group default qlen 1000 > link/ether fa:16:3e:e5:43:1c brd ff:ff:ff:ff:ff:ff > 3: qr-b7e0ef22-32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8950 qdisc > pfifo_fast state UP mode DEFAULT group default qlen 1000 > link/ether fa:16:3e:16:01:92 brd ff:ff:ff:ff:ff:ff > 4: qg-7bbe8e38-cc: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc > pfifo_fast state UP mode DEFAULT group default qlen 1000 > link/ether fa:16:3e:2b:c1:fd brd ff:ff:ff:ff:ff:ff > > Let's ping with a payload size of 8922 for IPv4 and 8902 for IPv6, the > maximum for a VXLAN segment with 8950 MTU, and look at the tcpdump output > [4]. For brevity, I'm only showing tcpdump output from the VM tap > interface. Ping operates normally. > > # ping -c 1 -s 8922 -M do 10.100.52.104 > > # ping -c 1 -s 8902 -M do fd00:100:52:1:f816:3eff:fe46:acd3 > > [4] https://gist.github.com/ionosphere80/85339b587bb9b2693b07 > > Let's ping with a payload size of 8923 for IPv4 and 8903 for IPv6, one > byte larger than the maximum for a VXLAN segment with 8950 MTU. The > router namespace, operating at layer-3, sees the MTU discrepancy between > the two interfaces in the namespace and returns an ICMP "fragmentation > needed" or "packet too big" message to the sender. The sender uses the MTU > value in the ICMP packet to recalculate the length of the first packet and > caches it for future packets. > > # ping -c 1 -s 8923 -M do 10.100.52.104 > PING 10.100.52.104 (10.100.52.104) 8923(8951) bytes of data. > From 10.100.52.104 icmp_seq=1 Frag needed and DF set (mtu = 8950) > > --- 10.100.52.104 ping statistics --- > 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms > > # ping6 -c 1 -s 8903 -M do fd00:100:52:1:f816:3eff:fe46:acd3 > PING fd00:100:52:1:f816:3eff:fe46:acd3(fd00:100:52:1:f816:3eff:fe46:acd3) > 8903 data bytes > From fd00:100:52::101 icmp_seq=1 Packet too big: mtu=8950 > > --- fd00:100:52:1:f816:3eff:fe46:acd3 ping statistics --- > 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms > > # ip route get to 10.100.52.104 > 10.100.52.104 dev eth1 src 10.100.52.45 > cache expires 596sec mtu 8950 > > # ip route get to fd00:100:52:1:f816:3eff:fe46:acd3 > fd00:100:52:1:f816:3eff:fe46:acd3 from :: via fd00:100:52::101 dev eth1 > src fd00:100:52::45 metric 0 > cache expires 556sec mtu 8950 > > Finally, let's try SSH. > > # ssh [email protected] > [email protected]'s password: > $ > > # ssh cirros@fd00:100:52:1:f816:3eff:fe46:acd3 > cirros@fd00:100:52:1:f816:3eff:fe46:acd3's password: > $ > > SSH works for both IPv4 and IPv6. > > This experiment reaches the same conclusion as the first experiment. > However, using physical hardware that supports jumbo frames reveals an > additional problem with the veth pair for the neutron router interface on > the public network. For any MTU, we can address the egress MTU disparity > (from the VM) by advertising the MTU of the overlay network to the VM via > DHCP/RA or using manual interface configuration. Additionally, IP > protocol version does not impact MTU calculation for Linux bridge. > > Hopefully moving to physical hardware makes this experiment easier to > understand and the conclusion more useful for realistic networks. > > Matt > > On Wed, Jan 20, 2016 at 11:18 AM, Rick Jones <[email protected]> wrote: > >> On 01/20/2016 08:56 AM, Sean M. Collins wrote: >> >>> On Tue, Jan 19, 2016 at 08:15:18AM EST, Matt Kassawara wrote: >>> >>>> No. However, we ought to determine what happens when both DHCP and RA >>>> advertise it. >>>> >>> >>> We'd have to look at the RFCs for how hosts are supposed to behave since >>> IPv6 has a minimum MTU of 1280 bytes while IPv4's minimum mtu is 576 >>> (what is this, an MTU for ants?). >>> >> >> Quibble - 576 is the IPv4 minimum, maximum MTU. That is to say a >> compliant IPv4 implementation must be able to reassemble datagrams of at >> least 576 bytes. >> >> If memory serves, the actual minimum MTU for IPv4 is 68 bytes. >> >> rick jones >> >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >> [email protected]?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
