[ovs-dev] [PATCH] lacp: Select a may-enable IF as the lead IF

2016-12-23 Thread Torgny Lindberg
On Wed Dec 21 17:27:05 UTC 2016, Ben Pfaff wrote:
> On Thu, Dec 01, 2016 at 02:01:59PM +0100, Torgny Lindberg wrote:
> > A reboot of one switch in an MC-LAG bond makes all bond links
> > to go down, causing a total connectivity loss for 3 seconds.
> >
> > Packet capture shows that spurious LACP PDUs are sent to OVS with
> > a different MAC address (partner system id) during the final
> > stages of the MC-LAG switch reboot.
> >
> > The current code selects a lead interface based on information
> > in the LACP PDU, regardless of its synchronization state. If a
> > non-synchronized interface is selected as the OVS lead interface
> > then all other interfaces are forced down as their stored partner
> > system id differs and the bond ends up with no working interface.
> > The bond recovers within three seconds after the last spurious
> > message.
> >
> > To avoid the problem, this commit requires a lead interface
> > to be synchronized. In case no synchronized interface exists,
> > the selection of lead interface is done as in the current code.
> >
> > Signed-off-by: Torgny Lindberg 
> 
> I think I understand what's going on here now.  I made some changes that
> better reflect my understanding:
> https://mail.openvswitch.org/pipermail/ovs-dev/2016-
> December/326567.html
> Does this work for you?  If so, I'll apply it.
> 
> Thanks,
> 
> Ben.

Hi Ben,


yes, the changed code gives the same result so it works for me.


Thanks,
Torgny



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] lacp: Select a may-enable IF as the lead IF

2016-12-21 Thread Ben Pfaff
On Thu, Dec 01, 2016 at 02:01:59PM +0100, Torgny Lindberg wrote:
> A reboot of one switch in an MC-LAG bond makes all bond links
> to go down, causing a total connectivity loss for 3 seconds.
> 
> Packet capture shows that spurious LACP PDUs are sent to OVS with
> a different MAC address (partner system id) during the final
> stages of the MC-LAG switch reboot.
> 
> The current code selects a lead interface based on information
> in the LACP PDU, regardless of its synchronization state. If a
> non-synchronized interface is selected as the OVS lead interface
> then all other interfaces are forced down as their stored partner
> system id differs and the bond ends up with no working interface.
> The bond recovers within three seconds after the last spurious
> message.
> 
> To avoid the problem, this commit requires a lead interface
> to be synchronized. In case no synchronized interface exists,
> the selection of lead interface is done as in the current code.
> 
> Signed-off-by: Torgny Lindberg 

I think I understand what's going on here now.  I made some changes that
better reflect my understanding:
https://mail.openvswitch.org/pipermail/ovs-dev/2016-December/326567.html
Does this work for you?  If so, I'll apply it.

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] lacp: Select a may-enable IF as the lead IF

2016-12-02 Thread Ben Pfaff
On Fri, Dec 02, 2016 at 03:31:24PM +0100, Simon Horman wrote:
> On Thu, Dec 01, 2016 at 02:01:59PM +0100, Torgny Lindberg wrote:
> > A reboot of one switch in an MC-LAG bond makes all bond links
> > to go down, causing a total connectivity loss for 3 seconds.
> > 
> > Packet capture shows that spurious LACP PDUs are sent to OVS with
> > a different MAC address (partner system id) during the final
> > stages of the MC-LAG switch reboot.
> > 
> > The current code selects a lead interface based on information
> > in the LACP PDU, regardless of its synchronization state. If a
> > non-synchronized interface is selected as the OVS lead interface
> > then all other interfaces are forced down as their stored partner
> > system id differs and the bond ends up with no working interface.
> > The bond recovers within three seconds after the last spurious
> > message.
> > 
> > To avoid the problem, this commit requires a lead interface
> > to be synchronized. In case no synchronized interface exists,
> > the selection of lead interface is done as in the current code.
> > 
> > Signed-off-by: Torgny Lindberg 
> 
> This patch looks good to me and it appears to me that it is applicable all
> the way back to branch-2.0. I am however wondering what the views of others
> are to applying it to branches all the way back to there as it is arguably
> a behavioural change. In particular I wonder if there any chance people
> could be relying on this behaviour for some reason?

Usually, we don't apply patches beyond the last long-term stable
release, unless it's a severe bug.  Sometimes I go a bit farther and for
me, lately, that means back to branch-2.3.  It's hard to ever get any
testing for any releases older than that, anyway.

I don't understand LACP well enough to judge this change, but if you
understand it and believe that it is correct then I'd suggest applying
it back to branch-2.3.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev