On 12/18/14, 2:06 PM, "Mike Kolesnik" <mkole...@redhat.com> wrote:
>Hi Neutron community members. > >I wanted to query the community about a proposal of how to fix HA routers >not >working with L2Population (bug 1365476[1]). >This bug is important to fix especially if we want to have HA routers and >DVR >routers working together. > >[1] https://bugs.launchpad.net/neutron/+bug/1365476 > >What's happening now? >* HA routers use distributed ports, i.e. the port with the same IP & MAC > details is applied on all nodes where an L3 agent is hosting this >router. >* Currently, the port details have a binding pointing to an arbitrary node > and this is not updated. >* L2pop takes this "potentially stale" information and uses it to create: > 1. A tunnel to the node. > 2. An FDB entry that directs traffic for that port to that node. > 3. If ARP responder is on, ARP requests will not traverse the network. >* Problem is, the master router wouldn't necessarily be running on the > reported agent. > This means that traffic would not reach the master node but some >arbitrary > node where the router master might be running, but might be in another > state (standby, fail). > >What is proposed? >Basically the idea is not to do L2Pop for HA router ports that reside on >the >tenant network. >Instead, we would create a tunnel to each node hosting the HA router so >that >the normal learning switch functionality would take care of switching the >traffic to the master router. In Neutron we just ensure that the MAC address is unique per network. Could a duplicate MAC address cause problems here? >This way no matter where the master router is currently running, the data >plane would know how to forward traffic to it. >This solution requires changes on the controller only. > >What's to gain? >* Data plane only solution, independent of the control plane. >* Lowest failover time (same as HA routers today). >* High backport potential: > * No APIs changed/added. > * No configuration changes. > * No DB changes. > * Changes localized to a single file and limited in scope. > >What's the alternative? >An alternative solution would be to have the controller update the port >binding >on the single port so that the plain old L2Pop happens and notifies about >the >location of the master router. >This basically negates all the benefits of the proposed solution, but is >wider. >This solution depends on the report-ha-router-master spec which is >currently in >the implementation phase. > >It's important to note that these two solutions don't collide and could >be done >independently. The one I'm proposing just makes more sense from an HA >viewpoint >because of it's benefits which fit the HA methodology of being fast & >having as >little outside dependency as possible. >It could be done as an initial solution which solves the bug for mechanism >drivers that support normal learning switch (OVS), and later kept as an >optimization to the more general, controller based, solution which will >solve >the issue for any mechanism driver working with L2Pop (Linux Bridge, >possibly >others). > >Would love to hear your thoughts on the subject. > >Regards, >Mike > >_______________________________________________ >OpenStack-dev mailing list >OpenStack-dev@lists.openstack.org >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev