Hi Ihar, This reminds me of a mailing list thread from a while back about moving OVS ports between namespaces being considered harmful [1]. Do you know if that was ever resolved by the OVS folks? Or, is this MTU bug just further indication of this action being harmful?
Another comment inline. Rawlin Peters [1] http://lists.openstack.org/pipermail/openstack-dev/2015-February/056834.html On Monday, June 13, 2016 10:50 AM, Ihar Hrachyshka wrote: > > Hi all, > > in Mitaka, we introduced a bunch of changes to the way we handle MTU in > Neutron/Nova, making sure that the whole instance data path, starting from > instance internal interface, thru hybrid bridge, into the br-int; as well as > router data path (qr) have proper MTU value set on all participating devices. > On hypervisor side, both Nova and Neutron take part in it, setting it with ip- > link tool based on what Neutron plugin calculates for us. So far so good. > > Turns out that for OVS, it does not work as expected in regards to br-int. > There was a bug reported lately: https://launchpad.net/bugs/1590397 > > Briefly, when we try to set MTU on a device that is plugged into a bridge, and > if the bridge already has another port with lower MTU, the bridge itself > inherits MTU from that latter port, and Linux kernel (?) does not allow to set > MTU on the first device at all, making ip link calls ineffective. > > AFAIU this behaviour is consistent with Linux bridging rules: you can’t have > ports of different MTU plugged into the same bridge. > > Now, that’s a huge problem for Neutron, because we plug ports that belong > to different networks (and that hence may have different MTUs) into the > same br-int bridge. > > So I played with the code locally a bit and spotted that currently, we set MTU > for router ports before we move their devices into router namespaces. And > once the device is in a namespace, ip-link actually works. So I wrote a fix > with > a functional test that proves the point: > https://review.openstack.org/#/c/327651/ The fix was validated by the > reporter of the original bug and seems to fix the issue for him. > > It’s suspicious that it works from inside a namespace but not when the > device is still in the root namespace. So I reached out to Jiri Benc from our > local Open vSwitch team, and here is a quote: > > === > > "It's a bug in ovs-vswitchd. It doesn't see the interface that's in other > netns > and thus cannot enforce the correct MTU. > > We'll hopefully fix it and disallow incorrect MTU setting even across > namespaces. However, it requires significant effort and rework of ovs name > space handling. > > You should not depend on the current buggy behavior. Don't set MTU of the > internal interfaces higher than the rest of the bridge, it's not supported. > Hacking this around by moving the interface to a netns is exploiting of a bug. > > We can certainly discuss whether this limitation could be relaxed. > Honestly, I don't know, it's for a discussion upstream. But as of now, it's > not > supported and you should not do it.” > > So basically, as long as we try to plug ports with different MTUs into the > same > bridge, we are utilizing a bug in Open vSwitch, that may break us any time. > > I guess our alternatives are: > - either redesign bridge setup for openvswitch to e.g. maintain a bridge per > network; > - or talk to ovs folks on whether they may support that for us. > It seems like another alternative would be to always use veth devices by default rather than internal OVS ports (i.e. ovs_use_veth = True), but that would likely mean taking a large performance hit that no one will be happy about. > I understand the former option is too scary. It opens lots of questions, > including upgrade impact since it will obviously introduce a dataplane > downtime. That would be a huge shift in paradigm, probably too huge to > swallow. The latter option may not fly with vswitch folks. Any better ideas? > > It’s also not clear whether we want to proceed with my immediate fix. > Advices are welcome. > > Thanks, > Ihar > __________________________________________________________ > ________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: OpenStack-dev- > requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev