I am just wondering if MAC_Binding table entries can expire after a certain timeout will help here? Just like we do for openflow flows (idle_timeout and hard_timeout). That can help address the scale problem as well as stale entry problems. Even if we move the MAC_Binding table to LS, i think it doesn't guarantee that this table won't bloat over the time, because we don't flush any of these MAC entries? I believe kernel networking arp cache uses a similar approach to maintain this cache.
On Sun, Nov 29, 2020 at 10:08 PM Numan Siddique <[email protected]> wrote: > > On Mon, Nov 30, 2020 at 7:37 AM Han Zhou <[email protected]> wrote: > > > > On Sat, Nov 28, 2020 at 12:31 PM Tony Liu <[email protected]> wrote: > > > > > > Hi Renat, > > > > > > What's this "logical datapath patches that Ilya Maximets submitted"? > > > Could you share some links? > > > > > > There were couple discussions for the similar issue. > > > [1] raised the issue and results a new option > > > always_learn_from_arp_request to be added [2]. > > > [3] results a patch to OVN ML2 driver [4] to set the option added by [1]. > > > > > > It seems that it helps to optimize logical_flow table. > > > I am not sure if it helps on mac_binding as well. > > > > > > Is it the same issue we are trying to address here, by either > > > Numan's local cache or the solution proposed by Dumitru? > > > > > > [1] > > https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/049994.html > > > [2] > > https://github.com/ovn-org/ovn/commit/61ccc6b5fc7c49b512e26347cfa12b86f0ec2fd9#diff-05b24a3133733fb7b0f979698083b8128e8f1f18c3c2bd09002ae788d34a32f5 > > > [3] http://osdir.com/openstack-discuss/msg16002.html > > > [4] https://review.opendev.org/c/openstack/neutron/+/752678 > > > > > > > > > Thanks! > > > Tony > > > > Thanks Tony for pointing to the old discussion [0]. I thought setting the > > option always_learn_from_arp_request to "false" on the logical routers > > should have solved this scale problem in MAC_Binding table in this scenario. > > > > However, it seems the commit a2b88dc513 ("pinctrl: Directly update > > MAC_Bindings created by self originated GARPs.") have overridden the > > option. (I haven't tested, but maybe @Dumitru Ceara <[email protected]> can > > confirm.) > > > > Similarly, for the Logical_Flow explosion, it should have been solved by > > setting the option dynamic_neigh_routers to "true". > > > > I think these two options are exactly for the scenario Renat is > > reporting. @Renat, could you try setting these options as suggested above > > using the OVN version before the commit a2b88dc513 to see if it solves your > > problem? > > > > When you test it out with the suggested commit, please delete the > mac_binding entries manually > as ovn-northd or ovn-controllers don't delete any entries from > mac_binding table. > > > Regarding the proposals in this thread: > > - Move MAC_Binding to LS (by Dumitru) > > This sounds good to me, while I am not sure about all the implications > > yet, wondering why it was associated with LRP instead in the beginning. > > > > - Remove MAC_Binding from SB (by Numan) > > I am a little concerned about this. The MAC_Binding in SB is required > > for distributed LR to work for dynamic ARP resolving. Consider a general > > use case: A - LS1 - LR1 - LS2 - B. A is on HV1 and B is on HV2. Now A sends > > a packet to B's IP. Assume B's IP is unknown by OVN. The packet is routed > > by LR1 and on the LRP facing LS2 an ARP is sent out over the LS1 logical > > network. The above steps happen on HV1. Now the ARP request reaches HV2 and > > is received by B, so B sends an ARP response. With the current > > implementation, HV2's OVS flow would learn the MAC-IP binding from the ARP > > response and update SB DB, and HV1 will get the SB update and install the > > MAC Binding flow as a result of ARP resolving. The next time A sends a > > packet to B, the HV1 will directly resolve the ARP from the MAC Binding > > flows locally and send the IP packet to HV2. The SB DB MAC_Binding table > > works as a distributed ARP/Neighbor cache. It is a mechanism to sync the > > ARP cache from the place where it is learned to the place where it is > > initiated, and all HVs benefit from this without the need to send ARP > > themselves for the same LRP. In other words, the LRP is distributed, so the > > ARP resolving is in a distributed fashion. Without this, each HV would > > initiate ARP request on behalf of the same LRP, which would largely > > increase the ARP traffic unnecessarily - even more than the traditional > > network (where one physical router only needs to do one ARP resolving for > > each neighbor and maintain one copy of ARP cache). And I am not sure if > > there are other side effects when an endpoint sees unexpectedly frequent > > ARP requests from the same LRP - would there be any rate limit that even > > discards repeated ARP requests from the same source? Numan, maybe you have > > already considered these. Would you share your thoughts? > > Thanks for the comments and highlighting this use case which I missed > completely. > > I was thinking more in lines on the N-S usecase with a distributed > gateway router port. > And I completely missed the E-W with an unknown address scenario. If > we don't consider > the unknown address scenario, I think moving away from MAC_Binding > south db tabe would > be beneficial in the long run. For few reasons > 1. For better scale. > 2. To address the mac_binding stale entries (which presently CMS > have to handle) > > For N-S traffic scenario, ovn-controller claiming the gw router port > will take care of generating the ARP. > For Floating IP dvr scenario, each compute node will have to generate > the ARP request to learn a remote. > I think this should be fine as it is just a one time thing. > > Regarding the unknown address scenario, right now ovn controller > floods the packet to all the unknown logical ports > of a switch if OVN doesn't know the MAC. All these are unknown logical > ports belonging to a multicast group. > > I think we should solve this case. In the case of Openstack, when port > security is disabled for a neutron port, the logical > port will have an unknown address configured. There are a few related > bugzillas/lauchpad bugs [1]. > > I think we should fix this behavior in OVN and ovn should do the mac > learning on the switch for the unknown ports. And If we do that, > I think the scenario you mentioned will be addressed. > > Maybe we can extend Dumitru's suggestion and have just one approach > which does the mac learning on the switch (keeping > the SB Mac_binding table). > - for unknown logical ports > - for unknown macs for the N-S routing. > > Any thoughts ? > > FYI - I have a PoC/RFC patch in progress which adds the mac binding > cache support - > https://github.com/numansiddique/ovn/commit/22082d04ca789155ea2edd3c1706bde509ae44da > > [1] - https://review.opendev.org/c/openstack/neutron/+/763567/ > https://bugzilla.redhat.com/show_bug.cgi?id=1888441 > https://bugs.launchpad.net/neutron/+bug/1904412 > https://bugzilla.redhat.com/show_bug.cgi?id=1672625 > > Thanks > Numan > > > > > Thanks, > > Han > > > > > > -----Original Message----- > > > > From: dev <[email protected]> On Behalf Of Numan Siddique > > > > Sent: Thursday, November 26, 2020 11:36 AM > > > > To: Daniel Alvarez Sanchez <[email protected]> > > > > Cc: ovs-dev <[email protected]> > > > > Subject: Re: [ovs-dev] Scaling of Logical_Flows and MAC_Binding tables > > > > > > > > On Thu, Nov 26, 2020 at 4:32 PM Numan Siddique <[email protected]> wrote: > > > > > > > > > > On Thu, Nov 26, 2020 at 4:11 PM Daniel Alvarez Sanchez > > > > > <[email protected]> wrote: > > > > > > > > > > > > On Wed, Nov 25, 2020 at 7:59 PM Dumitru Ceara <[email protected]> > > > > wrote: > > > > > > > > > > > > > On 11/25/20 7:06 PM, Numan Siddique wrote: > > > > > > > > On Wed, Nov 25, 2020 at 10:24 PM Renat Nurgaliyev > > > > > > > > <[email protected]> > > > > > > > wrote: > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> On 25.11.20 16:14, Dumitru Ceara wrote: > > > > > > > >>> On 11/25/20 3:30 PM, Renat Nurgaliyev wrote: > > > > > > > >>>> Hello folks, > > > > > > > >>>> > > > > > > > >>> Hi Renat, > > > > > > > >>> > > > > > > > >>>> we run a lab where we try to evaluate scalability potential > > > > > > > >>>> of OVN > > > > > > > with > > > > > > > >>>> OpenStack as CMS. > > > > > > > >>>> Current lab setup is following: > > > > > > > >>>> > > > > > > > >>>> 500 networks > > > > > > > >>>> 500 routers > > > > > > > >>>> 1500 VM ports (3 per network/router) > > > > > > > >>>> 1500 Floating IPs (one per VM port) > > > > > > > >>>> > > > > > > > >>>> There is an external network, which is bridged to br-provider > > > > > > > >>>> on > > > > > > > gateway > > > > > > > >>>> nodes. There are 2000 ports > > > > > > > >>>> connected to this external network (1500 Floating IPs + 500 > > > > > > > >>>> SNAT > > > > > > > router > > > > > > > >>>> ports). So the setup is not > > > > > > > >>>> very big we'd say, but after applying this configuration via > > > > > > > >>>> ML2/OVN plugin, northd kicks in and does its job, and after > > > > > > > >>>> its done, Logical_Flow table gets 645877 entries, which is > > > > > > > >>>> way too much. But ok, we move on and start one controller on > > > > > > > >>>> the gateway chassis, and here things get really messy. > > > > > > > >>>> MAC_Binding table grows from 0 to 999088 entries in one > > > > > > > >>>> moment, and after its done, the size of SB biggest tables > > > > > > > >>>> look like this: > > > > > > > >>>> > > > > > > > >>>> 999088 MAC_Binding > > > > > > > >>>> 645877 Logical_Flow > > > > > > > >>>> 4726 Port_Binding > > > > > > > >>>> 1117 Multicast_Group > > > > > > > >>>> 1068 Datapath_Binding > > > > > > > >>>> 1046 Port_Group > > > > > > > >>>> 551 IP_Multicast > > > > > > > >>>> 519 DNS > > > > > > > >>>> 517 HA_Chassis_Group > > > > > > > >>>> 517 HA_Chassis > > > > > > > >>>> ... > > > > > > > >>>> > > > > > > > >>>> MAC binding table gets huge, basically it now has an entry > > > > > > > >>>> for every port that is connected to external network * number > > > > > > > >>>> of datapaths, which roughly makes it one million entries. > > > > > > > >>>> This table by itself increases the size of the SB by 200 > > > > > > > >>>> megabytes. Logical_Flow table also gets very heavy, we have > > > > > > > >>>> already played a bit with logical datapath patches that Ilya > > > > > > > >>>> Maximets submitted, and it > > > > > > > looks > > > > > > > >>>> much better, but the size of > > > > > > > >>>> the MAC_Binding table still feels inadequate. > > > > > > > >>>> > > > > > > > >>>> We would like to start to work at least on MAC_Binding table > > > > > > > >>>> optimisation, but it is a bit difficult to start working from > > > > > > > >>>> scratch. Can someone help us with ideas how this could be > > > > > > > >>>> optimised? > > > > > > > >>>> > > > > > > > >>>> Maybe it would also make sense to group entries in > > > > > > > >>>> MAC_Binding table > > > > > > > in > > > > > > > >>>> the same way like it is proposed for logical flows in Ilya's > > > > > > > >>>> patch? > > > > > > > >>>> > > > > > > > >>> Maybe it would work but I'm not really sure how, right now. > > > > > > > >>> However, what if we change the way MAC_Bindings are created? > > > > > > > >>> > > > > > > > >>> Right now a MAC Binding is created for each logical router > > > > > > > >>> port but in your case there are a lot of logical router ports > > > > > > > >>> connected to the single provider logical switch and they all > > > > learn the same ARPs. > > > > > > > >>> > > > > > > > >>> What if we instead store MAC_Bindings per logical switch? > > > > > > > >>> Basically sharing all these MAC_Bindings between all router > > > > > > > >>> ports connected to > > > > > > > the > > > > > > > >>> same LS. > > > > > > > >>> > > > > > > > >>> Do you see any problem with this approach? > > > > > > > >>> > > > > > > > >>> Thanks, > > > > > > > >>> Dumitru > > > > > > > >>> > > > > > > > >>> > > > > > > > >> I believe that this approach is way to go, at least nothing > > > > > > > >> comes to my > > > > > > > mind > > > > > > > >> that could go wrong here. We will try to make a patch for that. > > > > > > > However, if > > > > > > > >> someone is familiar with the code and knows how to do it fast, > > > > > > > >> it would > > > > > > > also > > > > > > > >> be very nice. > > > > > > > > > > > > > > > > This approach should work. > > > > > > > > > > > > > > > > I've another idea (I won't call it a solution yet). What if we > > > > > > > > drop the usage of MAC_Binding altogether ? > > > > > > > > > > > > > > This would be great! > > > > > > > > > > > > > > > > > > > > > > > - When ovn-controller learns a mac_binding, it will not create a > > > > > > > > row into the SB MAC_binding table > > > > > > > > - Instead it will maintain the learnt mac binding in its memory. > > > > > > > > - ovn-controller will still program the table 66 with the flow > > > > > > > > to set the eth.dst (for the get_arp() action) > > > > > > > > > > > > > > > > This has a couple of advantages > > > > > > > > - Right now we never flush the old/stale mac_binding entries. > > > > > > > > - If suppose the mac of an external IP has changed, but OVN > > > > > > > > has an entry for that IP with old mac in the mac_binding table, > > > > > > > > we will use the old mac, causing the packet to be sent out > > > > > > > > to the wrong destination and the packet might get lost. > > > > > > > > - So we will get rid of this problem > > > > > > > > - We will also save SB DB space. > > > > > > > > > > > > > > > > There are few disadvantages > > > > > > > > - Other ovn-controllers will not add the flows in table 66. I > > > > > > > > guess this should be fine as each ovn-controller can generate > > > > > > > > the ARP request and learn the mac. > > > > > > > > - When ovn-controller restarts we lose the learnt macs and > > > > > > > > would need to learn again. > > > > > > > > > > > > > > > > Any thoughts on this ? > > > > > > > > > > > > > > > > > > > It'd be great to have some sort of local ARP cache but I'm concerned > > > > > > about the performance implications. > > > > > > > > > > > > - How are you going to determine when an entry is stale? > > > > > > If you slow path the packets to reset the timeout everytime a pkt > > > > > > with source mac is received, it doesn't look good. Maybe you have > > > > > > something else in mind. > > > > > > > > > > Right now we don't stale any mac_binding entry. If I understand you > > > > > correctly, your concern is for the scenario where a floating ip is > > > > > updated with a different mac, how the local cache is updated ? > > > > > > > > > > Right now networking-ovn (in the case of openstack) updates the > > > > > mac_binding entry in the South db for such cases right ? > > > > > > > > > > > > > FYI - I have started working on this approach as PoC. i.e to use local > > > > mac_binding cache > > > > instead of using the SB mac_binding table. > > > > > > > > I will update this thread about the progress. > > > > > > > > Thanks > > > > Numan > > > > > > > > > Thanks > > > > > Numan > > > > > > > > > > > > > > > > > - > > > > > > > > > > > > > > > > > > > > > There's another scenario that we need to take care of and doesn't > > > > seem > > > > > > > too obvious to address without MAC_Bindings. > > > > > > > > > > > > > > GARPs were being injected in the L2 broadcast domain of a LS for > > > > nat > > > > > > > addresses in case FIPs are reused by the CMS, introduced by: > > > > > > > > > > > > > > > > > > > > > https://github.com/ovn- > > > > org/ovn/commit/069a32cbf443c937feff44078e8828d7a2702da8 > > > > > > > > > > > > > > > > > > Dumitru and I have been discussing the possibility of reverting this > > > > patch > > > > > > and rely on CMSs to maintain the MAC_Binding entries associated with > > > > the > > > > > > FIPs [0]. > > > > > > I'm against reverting this patch in OVN [1] for multiple reasons > > > > being the > > > > > > most important one the fact that if we rely on workarounds in the > > > > CMS side, > > > > > > we'll be creating a control plane dependency for something that is > > > > pure > > > > > > dataplane only (ie. if Neutron server is down - outage, upgrades, > > > > etc. -, > > > > > > traffic is going to be disrupted). On the other hand one could argue > > > > that > > > > > > the same dependency now exists on ovn-controller being up & running > > > > but I > > > > > > believe that this is better than a) relying on workarounds on CMSs > > b) > > > > > > relying on CMSs availability. > > > > > > > > > > > > In the short term I think that moving the MAC_Binding entries to LS > > > > instead > > > > > > of LRP as it was suggested up thread would be a good idea and in the > > > > long > > > > > > haul, the ARP *local* cache seems to be the right solution. > > > > Brainstorming > > > > > > with Dumitru he suggested inspecting the flows regularly to see if > > > > the > > > > > > packet count on flows that check if src_mac == X has not increased > > > > in a > > > > > > while and then remove the ARP responder flows locally. > > > > > > > > > > > > [0] > > > > > > https://github.com/openstack/networking- > > > > ovn/commit/5181f1106ff839d08152623c25c9a5f6797aa2d7 > > > > > > > > > > > > [1] > > > > > > https://github.com/ovn- > > > > org/ovn/commit/069a32cbf443c937feff44078e8828d7a2702da8 > > > > > > > > > > > > > > > > > > > > > > > > > > > Recently, due to the dataplane scaling issue (4K resubmit limit > > > > being > > > > > > > hit), we don't flood these packets on non-router ports and instead > > > > > > > create the MAC Bindings directly from ovn-controller: > > > > > > > > > > > > > > > > > > > > > https://github.com/ovn- > > > > org/ovn/commit/a2b88dc5136507e727e4bcdc4bf6fde559f519a9 > > > > > > > > > > > > > > Without the MAC_Binding table we'd need to find a way to update or > > > > flush > > > > > > > stale bindings when an IP is used for a VIF or FIP. > > > > > > > > > > > > > > Thanks, > > > > > > > Dumitru > > > > > > > > > > > > > > _______________________________________________ > > > > > > > dev mailing list > > > > > > > [email protected] > > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > dev mailing list > > > > > > [email protected] > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > > > _______________________________________________ > > > > dev mailing list > > > > [email protected] > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > _______________________________________________ > > > dev mailing list > > > [email protected] > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ > > dev mailing list > > [email protected] > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev -- Thanks Anil _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
