Linux host was not sending anything to the old port after changing the active slave to eno4.
I tried to abuse the existing grat_arp_lock timer to avoid neighbour updates for a while after receiving gratuitous ARPs. It actually helped to the revalidator learning of old port problem. Unfortunately there is still the random delay of 100-500 ms after port change, during which traffic is not going through. While investigating this I started polling the neighbour info with ovs-appctl fdb/show command. When I polled that with 100ms intervals, also the arp delay was reduced to 100ms. How can the ovs-appctl program affect the neighbour update timing is beyond my understanding. BR, Mika 6.4.2017 22.15 "Joe Stringer" <[email protected]> kirjoitti: > On 6 April 2017 at 11:57, Mika Väisänen <[email protected]> wrote: > > Hello, > > > > Is it normal OVS behaviour that neighbour update (MAC moving from OVS > switch port to another) can cause 100-500 ms break to traffic? Is there any > way to configure it to be faster? > > > > In my case a Linux host is connected with two bonded Ethernets to server > running OVS 2.5.2 switch. When I change the active bonding slave from the > Linux host, it causes 100-500 ms break to traffic between the Linux host > and other hosts on the network. In case I run the same test with HW switch, > there is no noticeable traffic break at all. > > > > While investigating this, I found some strangeness in the way how MAC is > learned by OVS. In the following example I have moved the active slave from > interface eno3 to eno4. It seems correct (hander51), but then > revalidator56 updates the MAC to be found from the old port again: > > > > 2017-04-06T09:51:25.179Z|00339|ofproto_dpif_xlate(handler51)|DBG|bridge > swu0: learned that 02:11:61:61:70:25 is on port eno4 in VLAN 4 > > 2017-04-06T09:51:25.179Z|00344|ofproto_dpif_xlate(handler51)|DBG|bridge > swu0: learned that 02:11:61:61:70:25 is on port eno4 in VLAN 5 > > 2017-04-06T09:51:25.179Z|00349|ofproto_dpif_xlate(handler51)|DBG|bridge > swu0: learned that 02:11:61:61:70:25 is on port eno4 in VLAN 64 > > 2017-04-06T09:51:25.247Z|00065|ofproto_dpif_xlate(revalidator56)|DBG|bridge > swu0: learned that 02:11:61:61:70:25 is on port eno3 in VLAN 4 > > 2017-04-06T09:51:25.247Z|00066|ofproto_dpif_xlate(revalidator56)|DBG|bridge > swu0: learned that 02:11:61:61:70:25 is on port eno3 in VLAN 5 > > 2017-04-06T09:51:25.249Z|00067|ofproto_dpif_xlate(revalidator56)| DBG > |bridge swu0: learned that 02:11:61:61:70:25 is on port eno4 in VLAN 4 > > > > Why is revalidator refreshing old neighbour information? Could it be > causing the slowness or is it totally irrelevant? > > I wonder if there's a race that happens here. > > Let's say that at T0, revalidator runs, forwarding is all correct, and > the datapath flows are all fine. > At T1, a packet arrives for eno3 VLAN 4, and is forwarded correctly. > There's now one packet attributed to the datapath flow which > revalidator will need to translate and attribute stats for. > At T2, the active slave is shifted from eno3 to eno4. Traffic starts > to flow over eno4, so you see the handler threads setting up new flows > to handle this traffic (correctly). > At T3, the revalidator thread wakes up, and starts dumping all of the > datapath flows. When it finds the flow that was hit in T1, it will > translate this flow, attribute stats, and execute side effects such as > learning the MAC. If it learnt the MAC at the exact moment that the > packet arrived, then it would have correctly learned that the mac > existed on eno3. However, it's not aware that the traffic has since > shifted to eno4, so it attributes and learns on eno3. The revalidator > thread continues to dump the datapath flows and finds the one that > handles the traffic now on eno4, and translates that one which also > has traffic. This makes the learning happen again on eno4. > > Thereafter, I'm guessing that you don't send the traffic on eno3 so > there will be no packets to attribute, no MAC should be learnt, and > eventually the revalidator will time out the flow. > > It may be possible to mitigate this is if there were to be some sort > of 'learning ratelimit' where a MAC that shifts to a new interface > cannot be relearnt for X seconds. It could try to be smart and track > the previous interface, then if the MAC shifts to a new interface we > don't perform learning for the previous interface, or it could be > something a bit more general as just 'don't learn a particular MAC > more than once a second'. >
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
