> So basically, as long as we try to plug ports with different MTUs into the 
> same bridge, we are utilizing a bug in Open vSwitch, that may break us any 
> time.
>
> I guess our alternatives are:
> - either redesign bridge setup for openvswitch to e.g. maintain a bridge per 
> network;
> - or talk to ovs folks on whether they may support that for us.
>
> I understand the former option is too scary. It opens lots of questions, 
> including upgrade impact since it will obviously introduce a dataplane 
> downtime. That would be a huge shift in paradigm, probably too huge to 
> swallow. The latter option may not fly with vswitch folks. Any better ideas?

I know I've heard from people who'd like to be able to support both
DPDK and non-DPDK workloads on the same node. The current
implementation with a single br-int (and thus datapath) makes that
impossible to pull of with good performance. So there may be other
reasons to consider introducing multiple isolated bridges: MTUs,
datapath_types, etc.

Terry

On Mon, Jun 13, 2016 at 11:49 AM, Ihar Hrachyshka <[email protected]> wrote:
> Hi all,
>
> in Mitaka, we introduced a bunch of changes to the way we handle MTU in 
> Neutron/Nova, making sure that the whole instance data path, starting from 
> instance internal interface, thru hybrid bridge, into the br-int; as well as 
> router data path (qr) have proper MTU value set on all participating devices. 
> On hypervisor side, both Nova and Neutron take part in it, setting it with 
> ip-link tool based on what Neutron plugin calculates for us. So far so good.
>
> Turns out that for OVS, it does not work as expected in regards to br-int. 
> There was a bug reported lately: https://launchpad.net/bugs/1590397
>
> Briefly, when we try to set MTU on a device that is plugged into a bridge, 
> and if the bridge already has another port with lower MTU, the bridge itself 
> inherits MTU from that latter port, and Linux kernel (?) does not allow to 
> set MTU on the first device at all, making ip link calls ineffective.
>
> AFAIU this behaviour is consistent with Linux bridging rules: you can’t have 
> ports of different MTU plugged into the same bridge.
>
> Now, that’s a huge problem for Neutron, because we plug ports that belong to 
> different networks (and that hence may have different MTUs) into the same 
> br-int bridge.
>
> So I played with the code locally a bit and spotted that currently, we set 
> MTU for router ports before we move their devices into router namespaces. And 
> once the device is in a namespace, ip-link actually works. So I wrote a fix 
> with a functional test that proves the point: 
> https://review.openstack.org/#/c/327651/ The fix was validated by the 
> reporter of the original bug and seems to fix the issue for him.
>
> It’s suspicious that it works from inside a namespace but not when the device 
> is still in the root namespace. So I reached out to Jiri Benc from our local 
> Open vSwitch team, and here is a quote:
>
> ===
>
> "It's a bug in ovs-vswitchd. It doesn't see the interface that's in
> other netns and thus cannot enforce the correct MTU.
>
> We'll hopefully fix it and disallow incorrect MTU setting even across
> namespaces. However, it requires significant effort and rework of ovs
> name space handling.
>
> You should not depend on the current buggy behavior. Don't set MTU of
> the internal interfaces higher than the rest of the bridge, it's not
> supported. Hacking this around by moving the interface to a netns is
> exploiting of a bug.
>
> We can certainly discuss whether this limitation could be relaxed.
> Honestly, I don't know, it's for a discussion upstream. But as of now,
> it's not supported and you should not do it.”
>
> So basically, as long as we try to plug ports with different MTUs into the 
> same bridge, we are utilizing a bug in Open vSwitch, that may break us any 
> time.
>
> I guess our alternatives are:
> - either redesign bridge setup for openvswitch to e.g. maintain a bridge per 
> network;
> - or talk to ovs folks on whether they may support that for us.
>
> I understand the former option is too scary. It opens lots of questions, 
> including upgrade impact since it will obviously introduce a dataplane 
> downtime. That would be a huge shift in paradigm, probably too huge to 
> swallow. The latter option may not fly with vswitch folks. Any better ideas?
>
> It’s also not clear whether we want to proceed with my immediate fix. Advices 
> are welcome.
>
> Thanks,
> Ihar
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: [email protected]?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to