On Mon, Nov 30, 2020 at 12:28:56PM -0800, Han Zhou wrote:
> On Mon, Nov 30, 2020 at 12:13 PM Renat Nurgaliyev <[email protected]>
> wrote:
> >
> > On 30.11.20 07:07, Numan Siddique wrote:
> > > On Mon, Nov 30, 2020 at 7:37 AM Han Zhou <[email protected]> wrote:
> > >> On Sat, Nov 28, 2020 at 12:31 PM Tony Liu <[email protected]>
> wrote:
> > >>> Hi Renat,
> >
> > Hi folks,
> > >>>
> > >>> What's this "logical datapath patches that Ilya Maximets submitted"?
> > >>> Could you share some links?
> > >>>
> > >>> There were couple discussions for the similar issue.
> > >>> [1] raised the issue and results a new option
> > >>> always_learn_from_arp_request to be added [2].
> > >>> [3] results a patch to OVN ML2 driver [4] to set the option added by
> [1].
> > >>>
> > >>> It seems that it helps to optimize logical_flow table.
> > >>> I am not sure if it helps on mac_binding as well.
> > >>>
> > >>> Is it the same issue we are trying to address here, by either
> > >>> Numan's local cache or the solution proposed by Dumitru?
> > >>>
> > >>> [1]
> > >> https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/049994.html
> > >>> [2]
> > >>
> https://github.com/ovn-org/ovn/commit/61ccc6b5fc7c49b512e26347cfa12b86f0ec2fd9#diff-05b24a3133733fb7b0f979698083b8128e8f1f18c3c2bd09002ae788d34a32f5
> > >>> [3] http://osdir.com/openstack-discuss/msg16002.html
> > >>> [4] https://review.opendev.org/c/openstack/neutron/+/752678
> > >>>
> > >>>
> > >>> Thanks!
> > >>> Tony
> > >> Thanks Tony for pointing to the old discussion [0]. I thought setting
> the
> > >> option always_learn_from_arp_request to "false" on the logical routers
> > >> should have solved this scale problem in MAC_Binding table in this
> scenario.
> > >>
> > >> However, it seems the commit a2b88dc513 ("pinctrl: Directly update
> > >> MAC_Bindings created by self originated GARPs.") have overridden the
> > >> option. (I haven't tested, but maybe @Dumitru Ceara <[email protected]>
> can
> > >> confirm.)
> > >>
> > >> Similarly, for the Logical_Flow explosion, it should have been solved
> by
> > >> setting the option dynamic_neigh_routers to "true".
> > >>
> > >> I think these two options are exactly for the scenario Renat is
> > >> reporting. @Renat, could you try setting these options as suggested
> above
> > >> using the OVN version before the commit a2b88dc513 to see if it solves
> your
> > >> problem?
> > >>
> > > When you test it out with the suggested commit, please delete the
> > > mac_binding entries manually
> > > as ovn-northd or ovn-controllers don't delete any entries from
> > > mac_binding table.
> >
> > We tested with dynamic_neigh_routers set to true, and we saw some very
> > positive change, size of Logical_Flows table decresed from 600k
> > entries to 100k. This is a huge difference, thanks for pointing this
> > out!
> >
> > It did not affect MAC_Binding table with commit a2b88dc513 ("pinctrl:
> > Directly update MAC_Bindings created by self originated GARPs."), but
> > that was expected. Just for test purposes we commented out some code
> > as follows:
> >
> > diff --git a/controller/pinctrl.c b/controller/pinctrl.c
> > index 291202c24..76047939c 100644
> > --- a/controller/pinctrl.c
> > +++ b/controller/pinctrl.c
> > @@ -4115,10 +4115,10 @@ send_garp_rarp_update(struct ovsdb_idl_txn
> *ovnsb_idl_txn,
> >                                     laddrs->ipv4_addrs[i].addr,
> >                                     binding_rec->datapath->tunnel_key,
> >                                     binding_rec->tunnel_key);
> > -                    send_garp_locally(ovnsb_idl_txn,
> > -                                      sbrec_mac_binding_by_lport_ip,
> > -                                      local_datapaths, binding_rec,
> laddrs->ea,
> > -                                      laddrs->ipv4_addrs[i].addr);
> > +                    //send_garp_locally(ovnsb_idl_txn,
> > +                    //                  sbrec_mac_binding_by_lport_ip,
> > +                    //                  local_datapaths, binding_rec,
> laddrs->ea,
> > +                    //                  laddrs->ipv4_addrs[i].addr);
> >
> >                   }
> >                   free(name);
> >
> > Together with dynamic_neigh_routers we achieved quite a stable setup,
> > with 62 MiB SB database, which is a huge step forward after 1.9 GiB.
> > MAC_Binding size stays around 2000 entries, in comparison to almost a
> > million.
> >
> > Will it make sense to make behaviour introduced in a2b88dc513
> > toggleable via a command line option, before there is a better
> > solution?
> >
> > Thanks,
> > Renat.
> >
> 
> Thanks Renat for the testing. The result looks good. Just to confirm, in
> the final test with the code change above, did you also set the
> "always_learn_from_arp_request" to "false"?

Hi Han,

yes, sorry for not making it clear initially, always_learn_from_arp_request
is set to false

> I think the logic introduced in a2b88dc513 can add the check for the option
> "always_learn_from_arp_request" instead of overriding it.
> 
> Also, regarding to Winson's question:
> > We moved to ovn 20.09 branch recently and the mac binding issues happen
> again in our
> > ovn-k8s scale test cluster.
> > Is there a quick workaround to make the options  "
> always_learn_from_arp_request “ works again?
> >
> Thanks Winson for confirming. As mentioned above, I think the logic of the
> patch "pinctrl: Directly update MAC_Bindings created by self originated
> GARPs." can be updated to add the check for this option, to restore the
> behavior. Before the fix, I think a quick work around for you in 20.09
> could be reverting the following patches (I haven't tested though):
> 1. "ovn-northd: Limit self originated ARP/ND broadcast domain."
> 2. "pinctrl: Fix segfault seen when creating mac_binding for local GARPs."
> 3. "pinctrl: Directly update MAC_Bindings created by self originated GARPs."
> 
> Thanks,
> Han
> 
> > >> Regarding the proposals in this thread:
> > >> - Move MAC_Binding to LS (by Dumitru)
> > >>      This sounds good to me, while I am not sure about all the
> implications
> > >> yet, wondering why it was associated with LRP instead in the beginning.
> > >>
> > >> - Remove MAC_Binding from SB (by Numan)
> > >>      I am a little concerned about this. The MAC_Binding in SB is
> required
> > >> for distributed LR to work for dynamic ARP resolving. Consider a
> general
> > >> use case: A - LS1 - LR1 - LS2 - B. A is on HV1 and B is on HV2. Now A
> sends
> > >> a packet to B's IP. Assume B's IP is unknown by OVN. The packet is
> routed
> > >> by LR1 and on the LRP facing LS2 an ARP is sent out over the LS1
> logical
> > >> network. The above steps happen on HV1. Now the ARP request reaches
> HV2 and
> > >> is received by B, so B sends an ARP response. With the current
> > >> implementation, HV2's OVS flow would learn the MAC-IP binding from the
> ARP
> > >> response and update SB DB, and HV1 will get the SB update and install
> the
> > >> MAC Binding flow as a result of ARP resolving. The next time A sends a
> > >> packet to B, the HV1 will directly resolve the ARP from the MAC Binding
> > >> flows locally and send the IP packet to HV2. The SB DB MAC_Binding
> table
> > >> works as a distributed ARP/Neighbor cache. It is a mechanism to sync
> the
> > >> ARP cache from the place where it is learned to the place where it is
> > >> initiated, and all HVs benefit from this without the need to send ARP
> > >> themselves for the same LRP. In other words, the LRP is distributed,
> so the
> > >> ARP resolving is in a distributed fashion. Without this, each HV would
> > >> initiate ARP request on behalf of the same LRP, which would largely
> > >> increase the ARP traffic unnecessarily - even more than the traditional
> > >> network (where one physical router only needs to do one ARP resolving
> for
> > >> each neighbor and maintain one copy of ARP cache). And I am not sure if
> > >> there are other side effects when an endpoint sees unexpectedly
> frequent
> > >> ARP requests from the same LRP - would there be any rate limit that
> even
> > >> discards repeated ARP requests from the same source? Numan, maybe you
> have
> > >> already considered these. Would you share your thoughts?
> > > Thanks for the comments and highlighting this use case which I missed
> > > completely.
> > >
> > > I was thinking more in lines on the N-S usecase with a distributed
> > > gateway router port.
> > > And I completely missed the E-W with an unknown address scenario. If
> > > we don't consider
> > > the unknown address scenario, I think moving away from MAC_Binding
> > > south db tabe would
> > > be beneficial in the long run. For  few reasons
> > >     1. For better scale.
> > >     2. To address the mac_binding stale entries (which presently CMS
> > > have to handle)
> > >
> > > For N-S traffic scenario, ovn-controller claiming the gw router port
> > > will take care of generating the ARP.
> > > For Floating IP dvr scenario, each compute node will have to generate
> > > the ARP request to learn a remote.
> > > I think this should be fine as it is just a one time thing.
> > >
> > > Regarding the unknown address scenario, right now ovn controller
> > > floods the packet to all the unknown logical ports
> > > of a switch if OVN doesn't know the MAC. All these are unknown logical
> > > ports belonging to a multicast group.
> > >
> > > I think we should solve this case. In the case of Openstack, when port
> > > security is disabled for a neutron port, the logical
> > > port will have an unknown address configured. There are a few related
> > > bugzillas/lauchpad bugs [1].
> > >
> > > I think we should fix this behavior in OVN and ovn should do the mac
> > > learning on the switch for the unknown ports. And If we do that,
> > > I think the scenario you mentioned will be addressed.
> > >
> > > Maybe we can extend Dumitru's suggestion and have just one approach
> > > which does the mac learning on the switch (keeping
> > > the SB Mac_binding table).
> > >      -  for unknown logical ports
> > >      -  for unknown macs for the N-S routing.
> > >
> > > Any thoughts ?
> > >
> > > FYI - I have a PoC/RFC patch in progress which adds the mac binding
> > > cache support -
> > >
> https://github.com/numansiddique/ovn/commit/22082d04ca789155ea2edd3c1706bde509ae44da
> > >
> > > [1] - https://review.opendev.org/c/openstack/neutron/+/763567/
> > >         https://bugzilla.redhat.com/show_bug.cgi?id=1888441
> > >        https://bugs.launchpad.net/neutron/+bug/1904412
> > >        https://bugzilla.redhat.com/show_bug.cgi?id=1672625
> > >
> > > Thanks
> > > Numan
> > >
> > >> Thanks,
> > >> Han
> > >>
> > >>>> -----Original Message-----
> > >>>> From: dev <[email protected]> On Behalf Of Numan
> Siddique
> > >>>> Sent: Thursday, November 26, 2020 11:36 AM
> > >>>> To: Daniel Alvarez Sanchez <[email protected]>
> > >>>> Cc: ovs-dev <[email protected]>
> > >>>> Subject: Re: [ovs-dev] Scaling of Logical_Flows and MAC_Binding
> tables
> > >>>>
> > >>>> On Thu, Nov 26, 2020 at 4:32 PM Numan Siddique <[email protected]>
> wrote:
> > >>>>> On Thu, Nov 26, 2020 at 4:11 PM Daniel Alvarez Sanchez
> > >>>>> <[email protected]> wrote:
> > >>>>>> On Wed, Nov 25, 2020 at 7:59 PM Dumitru Ceara <[email protected]>
> > >>>> wrote:
> > >>>>>>> On 11/25/20 7:06 PM, Numan Siddique wrote:
> > >>>>>>>> On Wed, Nov 25, 2020 at 10:24 PM Renat Nurgaliyev
> > >>>>>>>> <[email protected]>
> > >>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On 25.11.20 16:14, Dumitru Ceara wrote:
> > >>>>>>>>>> On 11/25/20 3:30 PM, Renat Nurgaliyev wrote:
> > >>>>>>>>>>> Hello folks,
> > >>>>>>>>>>>
> > >>>>>>>>>> Hi Renat,
> > >>>>>>>>>>
> > >>>>>>>>>>> we run a lab where we try to evaluate scalability potential
> > >>>>>>>>>>> of OVN
> > >>>>>>> with
> > >>>>>>>>>>> OpenStack as CMS.
> > >>>>>>>>>>> Current lab setup is following:
> > >>>>>>>>>>>
> > >>>>>>>>>>> 500 networks
> > >>>>>>>>>>> 500 routers
> > >>>>>>>>>>> 1500 VM ports (3 per network/router)
> > >>>>>>>>>>> 1500 Floating IPs (one per VM port)
> > >>>>>>>>>>>
> > >>>>>>>>>>> There is an external network, which is bridged to br-provider
> > >>>>>>>>>>> on
> > >>>>>>> gateway
> > >>>>>>>>>>> nodes. There are 2000 ports
> > >>>>>>>>>>> connected to this external network (1500 Floating IPs + 500
> > >>>>>>>>>>> SNAT
> > >>>>>>> router
> > >>>>>>>>>>> ports). So the setup is not
> > >>>>>>>>>>> very big we'd say, but after applying this configuration via
> > >>>>>>>>>>> ML2/OVN plugin, northd kicks in and does its job, and after
> > >>>>>>>>>>> its done, Logical_Flow table gets 645877 entries, which is
> > >>>>>>>>>>> way too much. But ok, we move on and start one controller on
> > >>>>>>>>>>> the gateway chassis, and here things get really messy.
> > >>>>>>>>>>> MAC_Binding table grows from 0 to 999088 entries in one
> > >>>>>>>>>>> moment, and after its done, the size of SB biggest tables
> > >>>>>>>>>>> look like this:
> > >>>>>>>>>>>
> > >>>>>>>>>>> 999088 MAC_Binding
> > >>>>>>>>>>> 645877 Logical_Flow
> > >>>>>>>>>>> 4726 Port_Binding
> > >>>>>>>>>>> 1117 Multicast_Group
> > >>>>>>>>>>> 1068 Datapath_Binding
> > >>>>>>>>>>> 1046 Port_Group
> > >>>>>>>>>>> 551 IP_Multicast
> > >>>>>>>>>>> 519 DNS
> > >>>>>>>>>>> 517 HA_Chassis_Group
> > >>>>>>>>>>> 517 HA_Chassis
> > >>>>>>>>>>> ...
> > >>>>>>>>>>>
> > >>>>>>>>>>> MAC binding table gets huge, basically it now has an entry
> > >>>>>>>>>>> for every port that is connected to external network * number
> > >>>>>>>>>>> of datapaths, which roughly makes it one million entries.
> > >>>>>>>>>>> This table by itself increases the size of the SB by 200
> > >>>>>>>>>>> megabytes. Logical_Flow table also gets very heavy, we have
> > >>>>>>>>>>> already played a bit with logical datapath patches that Ilya
> > >>>>>>>>>>> Maximets submitted, and it
> > >>>>>>> looks
> > >>>>>>>>>>> much better, but the size of
> > >>>>>>>>>>> the MAC_Binding table still feels inadequate.
> > >>>>>>>>>>>
> > >>>>>>>>>>> We would like to start to work at least on MAC_Binding table
> > >>>>>>>>>>> optimisation, but it is a bit difficult to start working from
> > >>>>>>>>>>> scratch. Can someone help us with ideas how this could be
> > >>>>>>>>>>> optimised?
> > >>>>>>>>>>>
> > >>>>>>>>>>> Maybe it would also make sense to group entries in
> > >>>>>>>>>>> MAC_Binding table
> > >>>>>>> in
> > >>>>>>>>>>> the same way like it is proposed for logical flows in Ilya's
> > >>>>>>>>>>> patch?
> > >>>>>>>>>>>
> > >>>>>>>>>> Maybe it would work but I'm not really sure how, right now.
> > >>>>>>>>>> However, what if we change the way MAC_Bindings are created?
> > >>>>>>>>>>
> > >>>>>>>>>> Right now a MAC Binding is created for each logical router
> > >>>>>>>>>> port but in your case there are a lot of logical router ports
> > >>>>>>>>>> connected to the single provider logical switch and they all
> > >>>> learn the same ARPs.
> > >>>>>>>>>> What if we instead store MAC_Bindings per logical switch?
> > >>>>>>>>>> Basically sharing all these MAC_Bindings between all router
> > >>>>>>>>>> ports connected to
> > >>>>>>> the
> > >>>>>>>>>> same LS.
> > >>>>>>>>>>
> > >>>>>>>>>> Do you see any problem with this approach?
> > >>>>>>>>>>
> > >>>>>>>>>> Thanks,
> > >>>>>>>>>> Dumitru
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>> I believe that this approach is way to go, at least nothing
> > >>>>>>>>> comes to my
> > >>>>>>> mind
> > >>>>>>>>> that could go wrong here. We will try to make a patch for that.
> > >>>>>>> However, if
> > >>>>>>>>> someone is familiar with the code and knows how to do it fast,
> > >>>>>>>>> it would
> > >>>>>>> also
> > >>>>>>>>> be very nice.
> > >>>>>>>> This approach should work.
> > >>>>>>>>
> > >>>>>>>> I've another idea (I won't call it a solution yet). What if we
> > >>>>>>>> drop the usage of MAC_Binding altogether ?
> > >>>>>>> This would be great!
> > >>>>>>>
> > >>>>>>>> - When ovn-controller learns a mac_binding, it will not create a
> > >>>>>>>> row into the SB MAC_binding table
> > >>>>>>>> - Instead it will maintain the learnt mac binding in its memory.
> > >>>>>>>> - ovn-controller will still program the table 66 with the flow
> > >>>>>>>> to set the eth.dst (for the get_arp() action)
> > >>>>>>>>
> > >>>>>>>> This has a couple of advantages
> > >>>>>>>>    - Right now we never flush the old/stale mac_binding entries.
> > >>>>>>>>    - If suppose the mac of an external IP has changed, but OVN
> > >>>>>>>> has an entry for that IP with old mac in the mac_binding table,
> > >>>>>>>>      we will use the old mac, causing the packet to be sent out
> > >>>>>>>> to the wrong destination and the packet might get lost.
> > >>>>>>>>    - So we will get rid of this problem
> > >>>>>>>>    - We will also save SB DB space.
> > >>>>>>>>
> > >>>>>>>> There are few disadvantages
> > >>>>>>>>    -  Other ovn-controllers will not add the flows in table 66. I
> > >>>>>>>> guess this should be fine as each ovn-controller can generate
> > >>>>>>>> the ARP request and learn the mac.
> > >>>>>>>>    - When ovn-controller restarts we lose the learnt macs and
> > >>>>>>>> would need to learn again.
> > >>>>>>>>
> > >>>>>>>> Any thoughts on this ?
> > >>>>>> It'd be great to have some sort of local ARP cache but I'm
> concerned
> > >>>>>> about the performance implications.
> > >>>>>>
> > >>>>>> - How are you going to determine when an entry is stale?
> > >>>>>> If you slow path the packets to reset the timeout everytime a pkt
> > >>>>>> with source mac is received, it doesn't look good. Maybe you have
> > >>>>>> something else in mind.
> > >>>>> Right now we don't stale any mac_binding entry. If I understand you
> > >>>>> correctly, your concern is for the scenario where a floating ip is
> > >>>>> updated with a different mac, how the local cache is updated ?
> > >>>>>
> > >>>>> Right now networking-ovn (in the case of openstack) updates the
> > >>>>> mac_binding entry in the South db for such cases right ?
> > >>>>>
> > >>>> FYI - I have started working on this approach as PoC. i.e to use
> local
> > >>>> mac_binding cache
> > >>>> instead of using the SB mac_binding table.
> > >>>>
> > >>>> I will update this thread about the progress.
> > >>>>
> > >>>> Thanks
> > >>>> Numan
> > >>>>
> > >>>>> Thanks
> > >>>>> Numan
> > >>>>>
> > >>>>>> -
> > >>>>>>
> > >>>>>>> There's another scenario that we need to take care of and doesn't
> > >>>> seem
> > >>>>>>> too obvious to address without MAC_Bindings.
> > >>>>>>>
> > >>>>>>> GARPs were being injected in the L2 broadcast domain of a LS for
> > >>>> nat
> > >>>>>>> addresses in case FIPs are reused by the CMS, introduced by:
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> https://github.com/ovn-
> > >>>> org/ovn/commit/069a32cbf443c937feff44078e8828d7a2702da8
> > >>>>>>
> > >>>>>> Dumitru and I have been discussing the possibility of reverting
> this
> > >>>> patch
> > >>>>>> and rely on CMSs to maintain the MAC_Binding entries associated
> with
> > >>>> the
> > >>>>>> FIPs [0].
> > >>>>>> I'm against reverting this patch in OVN [1] for multiple reasons
> > >>>> being the
> > >>>>>> most important one the fact that if we rely on workarounds in the
> > >>>> CMS side,
> > >>>>>> we'll be creating a control plane dependency for something that is
> > >>>> pure
> > >>>>>> dataplane only (ie. if Neutron server is down - outage, upgrades,
> > >>>> etc. -,
> > >>>>>> traffic is going to be disrupted). On the other hand one could
> argue
> > >>>> that
> > >>>>>> the same dependency now exists on ovn-controller being up & running
> > >>>> but I
> > >>>>>> believe that this is better than a) relying on workarounds on CMSs
> > >> b)
> > >>>>>> relying on CMSs availability.
> > >>>>>>
> > >>>>>> In the short term I think that moving the MAC_Binding entries to LS
> > >>>> instead
> > >>>>>> of LRP as it was suggested up thread would be a good idea and in
> the
> > >>>> long
> > >>>>>> haul, the ARP *local* cache seems to be the right solution.
> > >>>> Brainstorming
> > >>>>>> with Dumitru he suggested inspecting the flows regularly to see if
> > >>>> the
> > >>>>>> packet count on flows that check if src_mac == X has not increased
> > >>>> in a
> > >>>>>> while and then remove the ARP responder flows locally.
> > >>>>>>
> > >>>>>> [0]
> > >>>>>> https://github.com/openstack/networking-
> > >>>> ovn/commit/5181f1106ff839d08152623c25c9a5f6797aa2d7
> > >>>>>> [1]
> > >>>>>> https://github.com/ovn-
> > >>>> org/ovn/commit/069a32cbf443c937feff44078e8828d7a2702da8
> > >>>>>>>
> > >>>>>>> Recently, due to the dataplane scaling issue (4K resubmit limit
> > >>>> being
> > >>>>>>> hit), we don't flood these packets on non-router ports and instead
> > >>>>>>> create the MAC Bindings directly from ovn-controller:
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> https://github.com/ovn-
> > >>>> org/ovn/commit/a2b88dc5136507e727e4bcdc4bf6fde559f519a9
> > >>>>>>> Without the MAC_Binding table we'd need to find a way to update or
> > >>>> flush
> > >>>>>>> stale bindings when an IP is used for a VIF or FIP.
> > >>>>>>>
> > >>>>>>> Thanks,
> > >>>>>>> Dumitru
> > >>>>>>>
> > >>>>>>> _______________________________________________
> > >>>>>>> dev mailing list
> > >>>>>>> [email protected]
> > >>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >>>>>>>
> > >>>>>>>
> > >>>>>> _______________________________________________
> > >>>>>> dev mailing list
> > >>>>>> [email protected]
> > >>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >>>>>>
> > >>>> _______________________________________________
> > >>>> dev mailing list
> > >>>> [email protected]
> > >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >>> _______________________________________________
> > >>> dev mailing list
> > >>> [email protected]
> > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >> _______________________________________________
> > >> dev mailing list
> > >> [email protected]
> > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >>
> > > _______________________________________________
> > > dev mailing list
> > > [email protected]
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to