On Mon, Nov 30, 2020 at 12:28:56PM -0800, Han Zhou wrote: > On Mon, Nov 30, 2020 at 12:13 PM Renat Nurgaliyev <[email protected]> > wrote: > > > > On 30.11.20 07:07, Numan Siddique wrote: > > > On Mon, Nov 30, 2020 at 7:37 AM Han Zhou <[email protected]> wrote: > > >> On Sat, Nov 28, 2020 at 12:31 PM Tony Liu <[email protected]> > wrote: > > >>> Hi Renat, > > > > Hi folks, > > >>> > > >>> What's this "logical datapath patches that Ilya Maximets submitted"? > > >>> Could you share some links? > > >>> > > >>> There were couple discussions for the similar issue. > > >>> [1] raised the issue and results a new option > > >>> always_learn_from_arp_request to be added [2]. > > >>> [3] results a patch to OVN ML2 driver [4] to set the option added by > [1]. > > >>> > > >>> It seems that it helps to optimize logical_flow table. > > >>> I am not sure if it helps on mac_binding as well. > > >>> > > >>> Is it the same issue we are trying to address here, by either > > >>> Numan's local cache or the solution proposed by Dumitru? > > >>> > > >>> [1] > > >> https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/049994.html > > >>> [2] > > >> > https://github.com/ovn-org/ovn/commit/61ccc6b5fc7c49b512e26347cfa12b86f0ec2fd9#diff-05b24a3133733fb7b0f979698083b8128e8f1f18c3c2bd09002ae788d34a32f5 > > >>> [3] http://osdir.com/openstack-discuss/msg16002.html > > >>> [4] https://review.opendev.org/c/openstack/neutron/+/752678 > > >>> > > >>> > > >>> Thanks! > > >>> Tony > > >> Thanks Tony for pointing to the old discussion [0]. I thought setting > the > > >> option always_learn_from_arp_request to "false" on the logical routers > > >> should have solved this scale problem in MAC_Binding table in this > scenario. > > >> > > >> However, it seems the commit a2b88dc513 ("pinctrl: Directly update > > >> MAC_Bindings created by self originated GARPs.") have overridden the > > >> option. (I haven't tested, but maybe @Dumitru Ceara <[email protected]> > can > > >> confirm.) > > >> > > >> Similarly, for the Logical_Flow explosion, it should have been solved > by > > >> setting the option dynamic_neigh_routers to "true". > > >> > > >> I think these two options are exactly for the scenario Renat is > > >> reporting. @Renat, could you try setting these options as suggested > above > > >> using the OVN version before the commit a2b88dc513 to see if it solves > your > > >> problem? > > >> > > > When you test it out with the suggested commit, please delete the > > > mac_binding entries manually > > > as ovn-northd or ovn-controllers don't delete any entries from > > > mac_binding table. > > > > We tested with dynamic_neigh_routers set to true, and we saw some very > > positive change, size of Logical_Flows table decresed from 600k > > entries to 100k. This is a huge difference, thanks for pointing this > > out! > > > > It did not affect MAC_Binding table with commit a2b88dc513 ("pinctrl: > > Directly update MAC_Bindings created by self originated GARPs."), but > > that was expected. Just for test purposes we commented out some code > > as follows: > > > > diff --git a/controller/pinctrl.c b/controller/pinctrl.c > > index 291202c24..76047939c 100644 > > --- a/controller/pinctrl.c > > +++ b/controller/pinctrl.c > > @@ -4115,10 +4115,10 @@ send_garp_rarp_update(struct ovsdb_idl_txn > *ovnsb_idl_txn, > > laddrs->ipv4_addrs[i].addr, > > binding_rec->datapath->tunnel_key, > > binding_rec->tunnel_key); > > - send_garp_locally(ovnsb_idl_txn, > > - sbrec_mac_binding_by_lport_ip, > > - local_datapaths, binding_rec, > laddrs->ea, > > - laddrs->ipv4_addrs[i].addr); > > + //send_garp_locally(ovnsb_idl_txn, > > + // sbrec_mac_binding_by_lport_ip, > > + // local_datapaths, binding_rec, > laddrs->ea, > > + // laddrs->ipv4_addrs[i].addr); > > > > } > > free(name); > > > > Together with dynamic_neigh_routers we achieved quite a stable setup, > > with 62 MiB SB database, which is a huge step forward after 1.9 GiB. > > MAC_Binding size stays around 2000 entries, in comparison to almost a > > million. > > > > Will it make sense to make behaviour introduced in a2b88dc513 > > toggleable via a command line option, before there is a better > > solution? > > > > Thanks, > > Renat. > > > > Thanks Renat for the testing. The result looks good. Just to confirm, in > the final test with the code change above, did you also set the > "always_learn_from_arp_request" to "false"?
Hi Han, yes, sorry for not making it clear initially, always_learn_from_arp_request is set to false > I think the logic introduced in a2b88dc513 can add the check for the option > "always_learn_from_arp_request" instead of overriding it. > > Also, regarding to Winson's question: > > We moved to ovn 20.09 branch recently and the mac binding issues happen > again in our > > ovn-k8s scale test cluster. > > Is there a quick workaround to make the options " > always_learn_from_arp_request “ works again? > > > Thanks Winson for confirming. As mentioned above, I think the logic of the > patch "pinctrl: Directly update MAC_Bindings created by self originated > GARPs." can be updated to add the check for this option, to restore the > behavior. Before the fix, I think a quick work around for you in 20.09 > could be reverting the following patches (I haven't tested though): > 1. "ovn-northd: Limit self originated ARP/ND broadcast domain." > 2. "pinctrl: Fix segfault seen when creating mac_binding for local GARPs." > 3. "pinctrl: Directly update MAC_Bindings created by self originated GARPs." > > Thanks, > Han > > > >> Regarding the proposals in this thread: > > >> - Move MAC_Binding to LS (by Dumitru) > > >> This sounds good to me, while I am not sure about all the > implications > > >> yet, wondering why it was associated with LRP instead in the beginning. > > >> > > >> - Remove MAC_Binding from SB (by Numan) > > >> I am a little concerned about this. The MAC_Binding in SB is > required > > >> for distributed LR to work for dynamic ARP resolving. Consider a > general > > >> use case: A - LS1 - LR1 - LS2 - B. A is on HV1 and B is on HV2. Now A > sends > > >> a packet to B's IP. Assume B's IP is unknown by OVN. The packet is > routed > > >> by LR1 and on the LRP facing LS2 an ARP is sent out over the LS1 > logical > > >> network. The above steps happen on HV1. Now the ARP request reaches > HV2 and > > >> is received by B, so B sends an ARP response. With the current > > >> implementation, HV2's OVS flow would learn the MAC-IP binding from the > ARP > > >> response and update SB DB, and HV1 will get the SB update and install > the > > >> MAC Binding flow as a result of ARP resolving. The next time A sends a > > >> packet to B, the HV1 will directly resolve the ARP from the MAC Binding > > >> flows locally and send the IP packet to HV2. The SB DB MAC_Binding > table > > >> works as a distributed ARP/Neighbor cache. It is a mechanism to sync > the > > >> ARP cache from the place where it is learned to the place where it is > > >> initiated, and all HVs benefit from this without the need to send ARP > > >> themselves for the same LRP. In other words, the LRP is distributed, > so the > > >> ARP resolving is in a distributed fashion. Without this, each HV would > > >> initiate ARP request on behalf of the same LRP, which would largely > > >> increase the ARP traffic unnecessarily - even more than the traditional > > >> network (where one physical router only needs to do one ARP resolving > for > > >> each neighbor and maintain one copy of ARP cache). And I am not sure if > > >> there are other side effects when an endpoint sees unexpectedly > frequent > > >> ARP requests from the same LRP - would there be any rate limit that > even > > >> discards repeated ARP requests from the same source? Numan, maybe you > have > > >> already considered these. Would you share your thoughts? > > > Thanks for the comments and highlighting this use case which I missed > > > completely. > > > > > > I was thinking more in lines on the N-S usecase with a distributed > > > gateway router port. > > > And I completely missed the E-W with an unknown address scenario. If > > > we don't consider > > > the unknown address scenario, I think moving away from MAC_Binding > > > south db tabe would > > > be beneficial in the long run. For few reasons > > > 1. For better scale. > > > 2. To address the mac_binding stale entries (which presently CMS > > > have to handle) > > > > > > For N-S traffic scenario, ovn-controller claiming the gw router port > > > will take care of generating the ARP. > > > For Floating IP dvr scenario, each compute node will have to generate > > > the ARP request to learn a remote. > > > I think this should be fine as it is just a one time thing. > > > > > > Regarding the unknown address scenario, right now ovn controller > > > floods the packet to all the unknown logical ports > > > of a switch if OVN doesn't know the MAC. All these are unknown logical > > > ports belonging to a multicast group. > > > > > > I think we should solve this case. In the case of Openstack, when port > > > security is disabled for a neutron port, the logical > > > port will have an unknown address configured. There are a few related > > > bugzillas/lauchpad bugs [1]. > > > > > > I think we should fix this behavior in OVN and ovn should do the mac > > > learning on the switch for the unknown ports. And If we do that, > > > I think the scenario you mentioned will be addressed. > > > > > > Maybe we can extend Dumitru's suggestion and have just one approach > > > which does the mac learning on the switch (keeping > > > the SB Mac_binding table). > > > - for unknown logical ports > > > - for unknown macs for the N-S routing. > > > > > > Any thoughts ? > > > > > > FYI - I have a PoC/RFC patch in progress which adds the mac binding > > > cache support - > > > > https://github.com/numansiddique/ovn/commit/22082d04ca789155ea2edd3c1706bde509ae44da > > > > > > [1] - https://review.opendev.org/c/openstack/neutron/+/763567/ > > > https://bugzilla.redhat.com/show_bug.cgi?id=1888441 > > > https://bugs.launchpad.net/neutron/+bug/1904412 > > > https://bugzilla.redhat.com/show_bug.cgi?id=1672625 > > > > > > Thanks > > > Numan > > > > > >> Thanks, > > >> Han > > >> > > >>>> -----Original Message----- > > >>>> From: dev <[email protected]> On Behalf Of Numan > Siddique > > >>>> Sent: Thursday, November 26, 2020 11:36 AM > > >>>> To: Daniel Alvarez Sanchez <[email protected]> > > >>>> Cc: ovs-dev <[email protected]> > > >>>> Subject: Re: [ovs-dev] Scaling of Logical_Flows and MAC_Binding > tables > > >>>> > > >>>> On Thu, Nov 26, 2020 at 4:32 PM Numan Siddique <[email protected]> > wrote: > > >>>>> On Thu, Nov 26, 2020 at 4:11 PM Daniel Alvarez Sanchez > > >>>>> <[email protected]> wrote: > > >>>>>> On Wed, Nov 25, 2020 at 7:59 PM Dumitru Ceara <[email protected]> > > >>>> wrote: > > >>>>>>> On 11/25/20 7:06 PM, Numan Siddique wrote: > > >>>>>>>> On Wed, Nov 25, 2020 at 10:24 PM Renat Nurgaliyev > > >>>>>>>> <[email protected]> > > >>>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> On 25.11.20 16:14, Dumitru Ceara wrote: > > >>>>>>>>>> On 11/25/20 3:30 PM, Renat Nurgaliyev wrote: > > >>>>>>>>>>> Hello folks, > > >>>>>>>>>>> > > >>>>>>>>>> Hi Renat, > > >>>>>>>>>> > > >>>>>>>>>>> we run a lab where we try to evaluate scalability potential > > >>>>>>>>>>> of OVN > > >>>>>>> with > > >>>>>>>>>>> OpenStack as CMS. > > >>>>>>>>>>> Current lab setup is following: > > >>>>>>>>>>> > > >>>>>>>>>>> 500 networks > > >>>>>>>>>>> 500 routers > > >>>>>>>>>>> 1500 VM ports (3 per network/router) > > >>>>>>>>>>> 1500 Floating IPs (one per VM port) > > >>>>>>>>>>> > > >>>>>>>>>>> There is an external network, which is bridged to br-provider > > >>>>>>>>>>> on > > >>>>>>> gateway > > >>>>>>>>>>> nodes. There are 2000 ports > > >>>>>>>>>>> connected to this external network (1500 Floating IPs + 500 > > >>>>>>>>>>> SNAT > > >>>>>>> router > > >>>>>>>>>>> ports). So the setup is not > > >>>>>>>>>>> very big we'd say, but after applying this configuration via > > >>>>>>>>>>> ML2/OVN plugin, northd kicks in and does its job, and after > > >>>>>>>>>>> its done, Logical_Flow table gets 645877 entries, which is > > >>>>>>>>>>> way too much. But ok, we move on and start one controller on > > >>>>>>>>>>> the gateway chassis, and here things get really messy. > > >>>>>>>>>>> MAC_Binding table grows from 0 to 999088 entries in one > > >>>>>>>>>>> moment, and after its done, the size of SB biggest tables > > >>>>>>>>>>> look like this: > > >>>>>>>>>>> > > >>>>>>>>>>> 999088 MAC_Binding > > >>>>>>>>>>> 645877 Logical_Flow > > >>>>>>>>>>> 4726 Port_Binding > > >>>>>>>>>>> 1117 Multicast_Group > > >>>>>>>>>>> 1068 Datapath_Binding > > >>>>>>>>>>> 1046 Port_Group > > >>>>>>>>>>> 551 IP_Multicast > > >>>>>>>>>>> 519 DNS > > >>>>>>>>>>> 517 HA_Chassis_Group > > >>>>>>>>>>> 517 HA_Chassis > > >>>>>>>>>>> ... > > >>>>>>>>>>> > > >>>>>>>>>>> MAC binding table gets huge, basically it now has an entry > > >>>>>>>>>>> for every port that is connected to external network * number > > >>>>>>>>>>> of datapaths, which roughly makes it one million entries. > > >>>>>>>>>>> This table by itself increases the size of the SB by 200 > > >>>>>>>>>>> megabytes. Logical_Flow table also gets very heavy, we have > > >>>>>>>>>>> already played a bit with logical datapath patches that Ilya > > >>>>>>>>>>> Maximets submitted, and it > > >>>>>>> looks > > >>>>>>>>>>> much better, but the size of > > >>>>>>>>>>> the MAC_Binding table still feels inadequate. > > >>>>>>>>>>> > > >>>>>>>>>>> We would like to start to work at least on MAC_Binding table > > >>>>>>>>>>> optimisation, but it is a bit difficult to start working from > > >>>>>>>>>>> scratch. Can someone help us with ideas how this could be > > >>>>>>>>>>> optimised? > > >>>>>>>>>>> > > >>>>>>>>>>> Maybe it would also make sense to group entries in > > >>>>>>>>>>> MAC_Binding table > > >>>>>>> in > > >>>>>>>>>>> the same way like it is proposed for logical flows in Ilya's > > >>>>>>>>>>> patch? > > >>>>>>>>>>> > > >>>>>>>>>> Maybe it would work but I'm not really sure how, right now. > > >>>>>>>>>> However, what if we change the way MAC_Bindings are created? > > >>>>>>>>>> > > >>>>>>>>>> Right now a MAC Binding is created for each logical router > > >>>>>>>>>> port but in your case there are a lot of logical router ports > > >>>>>>>>>> connected to the single provider logical switch and they all > > >>>> learn the same ARPs. > > >>>>>>>>>> What if we instead store MAC_Bindings per logical switch? > > >>>>>>>>>> Basically sharing all these MAC_Bindings between all router > > >>>>>>>>>> ports connected to > > >>>>>>> the > > >>>>>>>>>> same LS. > > >>>>>>>>>> > > >>>>>>>>>> Do you see any problem with this approach? > > >>>>>>>>>> > > >>>>>>>>>> Thanks, > > >>>>>>>>>> Dumitru > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> I believe that this approach is way to go, at least nothing > > >>>>>>>>> comes to my > > >>>>>>> mind > > >>>>>>>>> that could go wrong here. We will try to make a patch for that. > > >>>>>>> However, if > > >>>>>>>>> someone is familiar with the code and knows how to do it fast, > > >>>>>>>>> it would > > >>>>>>> also > > >>>>>>>>> be very nice. > > >>>>>>>> This approach should work. > > >>>>>>>> > > >>>>>>>> I've another idea (I won't call it a solution yet). What if we > > >>>>>>>> drop the usage of MAC_Binding altogether ? > > >>>>>>> This would be great! > > >>>>>>> > > >>>>>>>> - When ovn-controller learns a mac_binding, it will not create a > > >>>>>>>> row into the SB MAC_binding table > > >>>>>>>> - Instead it will maintain the learnt mac binding in its memory. > > >>>>>>>> - ovn-controller will still program the table 66 with the flow > > >>>>>>>> to set the eth.dst (for the get_arp() action) > > >>>>>>>> > > >>>>>>>> This has a couple of advantages > > >>>>>>>> - Right now we never flush the old/stale mac_binding entries. > > >>>>>>>> - If suppose the mac of an external IP has changed, but OVN > > >>>>>>>> has an entry for that IP with old mac in the mac_binding table, > > >>>>>>>> we will use the old mac, causing the packet to be sent out > > >>>>>>>> to the wrong destination and the packet might get lost. > > >>>>>>>> - So we will get rid of this problem > > >>>>>>>> - We will also save SB DB space. > > >>>>>>>> > > >>>>>>>> There are few disadvantages > > >>>>>>>> - Other ovn-controllers will not add the flows in table 66. I > > >>>>>>>> guess this should be fine as each ovn-controller can generate > > >>>>>>>> the ARP request and learn the mac. > > >>>>>>>> - When ovn-controller restarts we lose the learnt macs and > > >>>>>>>> would need to learn again. > > >>>>>>>> > > >>>>>>>> Any thoughts on this ? > > >>>>>> It'd be great to have some sort of local ARP cache but I'm > concerned > > >>>>>> about the performance implications. > > >>>>>> > > >>>>>> - How are you going to determine when an entry is stale? > > >>>>>> If you slow path the packets to reset the timeout everytime a pkt > > >>>>>> with source mac is received, it doesn't look good. Maybe you have > > >>>>>> something else in mind. > > >>>>> Right now we don't stale any mac_binding entry. If I understand you > > >>>>> correctly, your concern is for the scenario where a floating ip is > > >>>>> updated with a different mac, how the local cache is updated ? > > >>>>> > > >>>>> Right now networking-ovn (in the case of openstack) updates the > > >>>>> mac_binding entry in the South db for such cases right ? > > >>>>> > > >>>> FYI - I have started working on this approach as PoC. i.e to use > local > > >>>> mac_binding cache > > >>>> instead of using the SB mac_binding table. > > >>>> > > >>>> I will update this thread about the progress. > > >>>> > > >>>> Thanks > > >>>> Numan > > >>>> > > >>>>> Thanks > > >>>>> Numan > > >>>>> > > >>>>>> - > > >>>>>> > > >>>>>>> There's another scenario that we need to take care of and doesn't > > >>>> seem > > >>>>>>> too obvious to address without MAC_Bindings. > > >>>>>>> > > >>>>>>> GARPs were being injected in the L2 broadcast domain of a LS for > > >>>> nat > > >>>>>>> addresses in case FIPs are reused by the CMS, introduced by: > > >>>>>>> > > >>>>>>> > > >>>>>>> https://github.com/ovn- > > >>>> org/ovn/commit/069a32cbf443c937feff44078e8828d7a2702da8 > > >>>>>> > > >>>>>> Dumitru and I have been discussing the possibility of reverting > this > > >>>> patch > > >>>>>> and rely on CMSs to maintain the MAC_Binding entries associated > with > > >>>> the > > >>>>>> FIPs [0]. > > >>>>>> I'm against reverting this patch in OVN [1] for multiple reasons > > >>>> being the > > >>>>>> most important one the fact that if we rely on workarounds in the > > >>>> CMS side, > > >>>>>> we'll be creating a control plane dependency for something that is > > >>>> pure > > >>>>>> dataplane only (ie. if Neutron server is down - outage, upgrades, > > >>>> etc. -, > > >>>>>> traffic is going to be disrupted). On the other hand one could > argue > > >>>> that > > >>>>>> the same dependency now exists on ovn-controller being up & running > > >>>> but I > > >>>>>> believe that this is better than a) relying on workarounds on CMSs > > >> b) > > >>>>>> relying on CMSs availability. > > >>>>>> > > >>>>>> In the short term I think that moving the MAC_Binding entries to LS > > >>>> instead > > >>>>>> of LRP as it was suggested up thread would be a good idea and in > the > > >>>> long > > >>>>>> haul, the ARP *local* cache seems to be the right solution. > > >>>> Brainstorming > > >>>>>> with Dumitru he suggested inspecting the flows regularly to see if > > >>>> the > > >>>>>> packet count on flows that check if src_mac == X has not increased > > >>>> in a > > >>>>>> while and then remove the ARP responder flows locally. > > >>>>>> > > >>>>>> [0] > > >>>>>> https://github.com/openstack/networking- > > >>>> ovn/commit/5181f1106ff839d08152623c25c9a5f6797aa2d7 > > >>>>>> [1] > > >>>>>> https://github.com/ovn- > > >>>> org/ovn/commit/069a32cbf443c937feff44078e8828d7a2702da8 > > >>>>>>> > > >>>>>>> Recently, due to the dataplane scaling issue (4K resubmit limit > > >>>> being > > >>>>>>> hit), we don't flood these packets on non-router ports and instead > > >>>>>>> create the MAC Bindings directly from ovn-controller: > > >>>>>>> > > >>>>>>> > > >>>>>>> https://github.com/ovn- > > >>>> org/ovn/commit/a2b88dc5136507e727e4bcdc4bf6fde559f519a9 > > >>>>>>> Without the MAC_Binding table we'd need to find a way to update or > > >>>> flush > > >>>>>>> stale bindings when an IP is used for a VIF or FIP. > > >>>>>>> > > >>>>>>> Thanks, > > >>>>>>> Dumitru > > >>>>>>> > > >>>>>>> _______________________________________________ > > >>>>>>> dev mailing list > > >>>>>>> [email protected] > > >>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > >>>>>>> > > >>>>>>> > > >>>>>> _______________________________________________ > > >>>>>> dev mailing list > > >>>>>> [email protected] > > >>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > >>>>>> > > >>>> _______________________________________________ > > >>>> dev mailing list > > >>>> [email protected] > > >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > >>> _______________________________________________ > > >>> dev mailing list > > >>> [email protected] > > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > >> _______________________________________________ > > >> dev mailing list > > >> [email protected] > > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > >> > > > _______________________________________________ > > > dev mailing list > > > [email protected] > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
