Re: [ovs-dev] Scaling of Logical_Flows and MAC_Binding tables

Anil Vishnoi Wed, 02 Dec 2020 23:27:18 -0800

On Wed, Dec 2, 2020 at 1:31 PM Han Zhou <[email protected]> wrote:
>
>
>
> On Wed, Dec 2, 2020 at 11:38 AM Anil Vishnoi <[email protected]> wrote:
> >
> > On Tue, Dec 1, 2020 at 12:20 AM Han Zhou <[email protected]> wrote:
> > >
> > >
> > >
> > > On Mon, Nov 30, 2020 at 11:11 PM Anil Vishnoi <[email protected]> 
> > > wrote:
> > > >
> > > > On Mon, Nov 30, 2020 at 9:26 PM Han Zhou <[email protected]> wrote:
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Nov 30, 2020 at 8:22 PM Anil Vishnoi <[email protected]> 
> > > > > wrote:
> > > > > >
> > > > > > I am just wondering if MAC_Binding table entries can expire after a
> > > > > > certain timeout will help here? Just like we do for openflow flows
> > > > > > (idle_timeout and hard_timeout). That can help address the scale
> > > > > > problem as well as stale entry problems. Even if we move the
> > > > > > MAC_Binding table to LS, i think it doesn't guarantee that this 
> > > > > > table
> > > > > > won't bloat over the time, because we don't flush any of these MAC
> > > > > > entries? I believe kernel networking arp cache uses a similar 
> > > > > > approach
> > > > > > to maintain this cache.
> > > > > >
> > > > >
> > > > > Hi Anil,
> > > > >
> > > > > This has been discussed before. It is just hard to implement a 
> > > > > timeout mechanism in OVSDB without significant performance penalty.
> > > > Even with time granularity in Minute?
> > >
> > > Well, let me rephrase it. The difficulty is really about how to detect 
> > > the aging. We want to timeout the idle entries instead of active entries. 
> > > It is expensive to keep track of the last usage time of the DB entry in a 
> > > distributed fashion (the ARP cache hit happens on each nodes but the 
> > > source of truth is in the central OVSDB), considering the possible 
> > > bandwidth of traffic.
> >
> > Agree, maintaining the state (timestamp) across the distributed system
> > will cause scale problems (each read will cause a write of timestamp
> > in db). I didn't get a chance to look at the implementation, but does
> > the read operation on follower is routed to the leader or does it read
> > locally from the follower? If the read operation is routed to the
> > leader, I am wondering if the leader can maintain an in-memory cache
> > of the timestamps and keep updating it on every read that comes to it
> > (from itself or follower). That way we don't have to write that data
> > in the db and can avoid the replication. In case of the leader change,
> > it needs to start the caching from scratch again, but that should be
> > fine because this state is maintained for pruning the idle entries,
> > and if that gets delayed because of leader election, that should not
> > impact the functioning.
> >
> I think here is a misunderstanding. For the "distributed fashion" I wasn't 
> referring to the RAFT cluster of SB DB itself. I was talking about 
> mac_binding (distributed ARP cache) hits happening on each hypervisor node 
> while the source of truth maintained in the central SB DB (regardless of the 
> details of leader and followers of the DB cluster). Cache entries hitting on 
> all hypervisors need to be informed back to the MAC_Binding table in the 
> central SB DB, so that we know which entries are alive and which are stale 
> and to be cleared. This is where the cost would be introduced. There are 
> definitely ways to do it, but just not well justify the gains with the cost.


Thanks for clarification @Han Zhou , got it now. Yeah, any update from
nodes to sb-db is definitely an issue for scale.

>
> > > If we don't keep track of the usage but simply timeout, then it could 
> > > break live connections badly.
> >
> > Yeah, timeouts are disruptions for sure.
> >
> > >Any suggestions are welcome!
> >
> >
> > >
> > > > >
> > > > > For the scale problem, I think for most use cases the two options I 
> > > > > mentioned were enough to solve the problem, although it now needs 
> > > > > some more fix since it is broken, as discussed in the other replies. 
> > > > > If they are not sufficient for some special use cases, e.g. large 
> > > > > number of routers need E-W communications in a full-mesh fashion (I 
> > > > > wonder if this scenario is realistic), then some more optimization 
> > > > > might be needed, such as sharing the MAC_Binding entries per LS, 
> > > > > which reduces the problem of O(n^2) to O(n).
> > > > >
> > > > > For the stale entries, it is not a problem in most cases, because if 
> > > > > an endpoint is gone but the entry remains in MAC_Binding, in the end 
> > > > > there is not much difference when someone tries to send packet to the 
> > > > > endpoint. It is just unreachable. What matters is when an entry is 
> > > > > updated but the update itself is lost for some reason (e.g. control 
> > > > > plane outage), then packets would always go to the staled MAC instead 
> > > > > of the correct one. This can usually be mitigated by periodical GARP 
> > > > > from endpoints.
> > > >
> > > > I am not very sure if endpoints send periodic GARP by default( I know
> > > > OS/VIMs send at bootup), until and unless you run something like
> > > > keepalived or something. Do container runtimes support periodic GARP?
> > > > Wondering if processing periodic GARPs from multiple endpoints would
> > > > be any cheaper compared to periodic flushing (but don't have any
> > > > datapoint as such if it isn't :) ).
> > >
> > > It is not a default behavior for endpoints. I think this is why using 
> > > MAC-bindings for east-west is discouraged (in my opinion). If it is 
> > > required, then it is better to have such kind of mitigation in place, 
> > > which is more about operations (take into account how reliable the 
> > > control plane is and some estimation of the number of stale entries left 
> > > after such outage, v.s. the cost of periodical GARP).
> > >
> > > MAC-binding is more critical for North-South GW to work with physical 
> > > routers. In such scenarios, the external router IP-MAC binding change 
> > > happens much less frequently (mainly in router failover), thus the chance 
> > > of a lost MAC_Binding update is quite low. But I agree it could still 
> > > happen in extreme cases, so it may be good to have some mitigation here 
> > > as well for the worst case scenario (external router failover happened at 
> > > the same time when the whole SB RAFT cluster is down).
> >
> > Yeah, I have similar concerns. These systems are supposed to run for a
> > longer time and overtime based on the operation run in the CMS, this
> > table can bloat and that's a ticking time bomb. Even safely purging a
> > MAC Binding table on daily basis won't be a bad thing to have here.
> > >
> > > > >
> > > > > Thanks,
> > > > > Han
> > > > >
> > > > > > On Sun, Nov 29, 2020 at 10:08 PM Numan Siddique <[email protected]> 
> > > > > > wrote:
> > > > > > >
> > > > > > > On Mon, Nov 30, 2020 at 7:37 AM Han Zhou <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Sat, Nov 28, 2020 at 12:31 PM Tony Liu 
> > > > > > > > <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > Hi Renat,
> > > > > > > > >
> > > > > > > > > What's this "logical datapath patches that Ilya Maximets 
> > > > > > > > > submitted"?
> > > > > > > > > Could you share some links?
> > > > > > > > >
> > > > > > > > > There were couple discussions for the similar issue.
> > > > > > > > > [1] raised the issue and results a new option
> > > > > > > > > always_learn_from_arp_request to be added [2].
> > > > > > > > > [3] results a patch to OVN ML2 driver [4] to set the option 
> > > > > > > > > added by [1].
> > > > > > > > >
> > > > > > > > > It seems that it helps to optimize logical_flow table.
> > > > > > > > > I am not sure if it helps on mac_binding as well.
> > > > > > > > >
> > > > > > > > > Is it the same issue we are trying to address here, by either
> > > > > > > > > Numan's local cache or the solution proposed by Dumitru?
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/049994.html
> > > > > > > > > [2]
> > > > > > > > https://github.com/ovn-org/ovn/commit/61ccc6b5fc7c49b512e26347cfa12b86f0ec2fd9#diff-05b24a3133733fb7b0f979698083b8128e8f1f18c3c2bd09002ae788d34a32f5
> > > > > > > > > [3] http://osdir.com/openstack-discuss/msg16002.html
> > > > > > > > > [4] https://review.opendev.org/c/openstack/neutron/+/752678
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > > Tony
> > > > > > > >
> > > > > > > > Thanks Tony for pointing to the old discussion [0]. I thought 
> > > > > > > > setting the
> > > > > > > > option always_learn_from_arp_request to "false" on the logical 
> > > > > > > > routers
> > > > > > > > should have solved this scale problem in MAC_Binding table in 
> > > > > > > > this scenario.
> > > > > > > >
> > > > > > > > However, it seems the commit a2b88dc513 ("pinctrl: Directly 
> > > > > > > > update
> > > > > > > > MAC_Bindings created by self originated GARPs.") have 
> > > > > > > > overridden the
> > > > > > > > option. (I haven't tested, but maybe @Dumitru Ceara 
> > > > > > > > <[email protected]> can
> > > > > > > > confirm.)
> > > > > > > >
> > > > > > > > Similarly, for the Logical_Flow explosion, it should have been 
> > > > > > > > solved by
> > > > > > > > setting the option dynamic_neigh_routers to "true".
> > > > > > > >
> > > > > > > > I think these two options are exactly for the scenario Renat is
> > > > > > > > reporting. @Renat, could you try setting these options as 
> > > > > > > > suggested above
> > > > > > > > using the OVN version before the commit a2b88dc513 to see if it 
> > > > > > > > solves your
> > > > > > > > problem?
> > > > > > > >
> > > > > > >
> > > > > > > When you test it out with the suggested commit, please delete the
> > > > > > > mac_binding entries manually
> > > > > > > as ovn-northd or ovn-controllers don't delete any entries from
> > > > > > > mac_binding table.
> > > > > > >
> > > > > > > > Regarding the proposals in this thread:
> > > > > > > > - Move MAC_Binding to LS (by Dumitru)
> > > > > > > >     This sounds good to me, while I am not sure about all the 
> > > > > > > > implications
> > > > > > > > yet, wondering why it was associated with LRP instead in the 
> > > > > > > > beginning.
> > > > > > > >
> > > > > > > > - Remove MAC_Binding from SB (by Numan)
> > > > > > > >     I am a little concerned about this. The MAC_Binding in SB 
> > > > > > > > is required
> > > > > > > > for distributed LR to work for dynamic ARP resolving. Consider 
> > > > > > > > a general
> > > > > > > > use case: A - LS1 - LR1 - LS2 - B. A is on HV1 and B is on HV2. 
> > > > > > > > Now A sends
> > > > > > > > a packet to B's IP. Assume B's IP is unknown by OVN. The packet 
> > > > > > > > is routed
> > > > > > > > by LR1 and on the LRP facing LS2 an ARP is sent out over the 
> > > > > > > > LS1 logical
> > > > > > > > network. The above steps happen on HV1. Now the ARP request 
> > > > > > > > reaches HV2 and
> > > > > > > > is received by B, so B sends an ARP response. With the current
> > > > > > > > implementation, HV2's OVS flow would learn the MAC-IP binding 
> > > > > > > > from the ARP
> > > > > > > > response and update SB DB, and HV1 will get the SB update and 
> > > > > > > > install the
> > > > > > > > MAC Binding flow as a result of ARP resolving. The next time A 
> > > > > > > > sends a
> > > > > > > > packet to B, the HV1 will directly resolve the ARP from the MAC 
> > > > > > > > Binding
> > > > > > > > flows locally and send the IP packet to HV2. The SB DB 
> > > > > > > > MAC_Binding table
> > > > > > > > works as a distributed ARP/Neighbor cache. It is a mechanism to 
> > > > > > > > sync the
> > > > > > > > ARP cache from the place where it is learned to the place where 
> > > > > > > > it is
> > > > > > > > initiated, and all HVs benefit from this without the need to 
> > > > > > > > send ARP
> > > > > > > > themselves for the same LRP. In other words, the LRP is 
> > > > > > > > distributed, so the
> > > > > > > > ARP resolving is in a distributed fashion. Without this, each 
> > > > > > > > HV would
> > > > > > > > initiate ARP request on behalf of the same LRP, which would 
> > > > > > > > largely
> > > > > > > > increase the ARP traffic unnecessarily - even more than the 
> > > > > > > > traditional
> > > > > > > > network (where one physical router only needs to do one ARP 
> > > > > > > > resolving for
> > > > > > > > each neighbor and maintain one copy of ARP cache). And I am not 
> > > > > > > > sure if
> > > > > > > > there are other side effects when an endpoint sees unexpectedly 
> > > > > > > > frequent
> > > > > > > > ARP requests from the same LRP - would there be any rate limit 
> > > > > > > > that even
> > > > > > > > discards repeated ARP requests from the same source? Numan, 
> > > > > > > > maybe you have
> > > > > > > > already considered these. Would you share your thoughts?
> > > > > > >
> > > > > > > Thanks for the comments and highlighting this use case which I 
> > > > > > > missed
> > > > > > > completely.
> > > > > > >
> > > > > > > I was thinking more in lines on the N-S usecase with a distributed
> > > > > > > gateway router port.
> > > > > > > And I completely missed the E-W with an unknown address scenario. 
> > > > > > > If
> > > > > > > we don't consider
> > > > > > > the unknown address scenario, I think moving away from MAC_Binding
> > > > > > > south db tabe would
> > > > > > > be beneficial in the long run. For  few reasons
> > > > > > >    1. For better scale.
> > > > > > >    2. To address the mac_binding stale entries (which presently 
> > > > > > > CMS
> > > > > > > have to handle)
> > > > > > >
> > > > > > > For N-S traffic scenario, ovn-controller claiming the gw router 
> > > > > > > port
> > > > > > > will take care of generating the ARP.
> > > > > > > For Floating IP dvr scenario, each compute node will have to 
> > > > > > > generate
> > > > > > > the ARP request to learn a remote.
> > > > > > > I think this should be fine as it is just a one time thing.
> > > > > > >
> > > > > > > Regarding the unknown address scenario, right now ovn controller
> > > > > > > floods the packet to all the unknown logical ports
> > > > > > > of a switch if OVN doesn't know the MAC. All these are unknown 
> > > > > > > logical
> > > > > > > ports belonging to a multicast group.
> > > > > > >
> > > > > > > I think we should solve this case. In the case of Openstack, when 
> > > > > > > port
> > > > > > > security is disabled for a neutron port, the logical
> > > > > > > port will have an unknown address configured. There are a few 
> > > > > > > related
> > > > > > > bugzillas/lauchpad bugs [1].
> > > > > > >
> > > > > > > I think we should fix this behavior in OVN and ovn should do the 
> > > > > > > mac
> > > > > > > learning on the switch for the unknown ports. And If we do that,
> > > > > > > I think the scenario you mentioned will be addressed.
> > > > > > >
> > > > > > > Maybe we can extend Dumitru's suggestion and have just one 
> > > > > > > approach
> > > > > > > which does the mac learning on the switch (keeping
> > > > > > > the SB Mac_binding table).
> > > > > > >     -  for unknown logical ports
> > > > > > >     -  for unknown macs for the N-S routing.
> > > > > > >
> > > > > > > Any thoughts ?
> > > > > > >
> > > > > > > FYI - I have a PoC/RFC patch in progress which adds the mac 
> > > > > > > binding
> > > > > > > cache support -
> > > > > > > https://github.com/numansiddique/ovn/commit/22082d04ca789155ea2edd3c1706bde509ae44da
> > > > > > >
> > > > > > > [1] - https://review.opendev.org/c/openstack/neutron/+/763567/
> > > > > > >        https://bugzilla.redhat.com/show_bug.cgi?id=1888441
> > > > > > >       https://bugs.launchpad.net/neutron/+bug/1904412
> > > > > > >       https://bugzilla.redhat.com/show_bug.cgi?id=1672625
> > > > > > >
> > > > > > > Thanks
> > > > > > > Numan
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Han
> > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: dev <[email protected]> On Behalf Of 
> > > > > > > > > > Numan Siddique
> > > > > > > > > > Sent: Thursday, November 26, 2020 11:36 AM
> > > > > > > > > > To: Daniel Alvarez Sanchez <[email protected]>
> > > > > > > > > > Cc: ovs-dev <[email protected]>
> > > > > > > > > > Subject: Re: [ovs-dev] Scaling of Logical_Flows and 
> > > > > > > > > > MAC_Binding tables
> > > > > > > > > >
> > > > > > > > > > On Thu, Nov 26, 2020 at 4:32 PM Numan Siddique 
> > > > > > > > > > <[email protected]> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Nov 26, 2020 at 4:11 PM Daniel Alvarez Sanchez
> > > > > > > > > > > <[email protected]> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Nov 25, 2020 at 7:59 PM Dumitru Ceara 
> > > > > > > > > > > > <[email protected]>
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On 11/25/20 7:06 PM, Numan Siddique wrote:
> > > > > > > > > > > > > > On Wed, Nov 25, 2020 at 10:24 PM Renat Nurgaliyev
> > > > > > > > > > > > > > <[email protected]>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> On 25.11.20 16:14, Dumitru Ceara wrote:
> > > > > > > > > > > > > >>> On 11/25/20 3:30 PM, Renat Nurgaliyev wrote:
> > > > > > > > > > > > > >>>> Hello folks,
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>> Hi Renat,
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>>> we run a lab where we try to evaluate 
> > > > > > > > > > > > > >>>> scalability potential
> > > > > > > > > > > > > >>>> of OVN
> > > > > > > > > > > > > with
> > > > > > > > > > > > > >>>> OpenStack as CMS.
> > > > > > > > > > > > > >>>> Current lab setup is following:
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>>> 500 networks
> > > > > > > > > > > > > >>>> 500 routers
> > > > > > > > > > > > > >>>> 1500 VM ports (3 per network/router)
> > > > > > > > > > > > > >>>> 1500 Floating IPs (one per VM port)
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>>> There is an external network, which is bridged 
> > > > > > > > > > > > > >>>> to br-provider
> > > > > > > > > > > > > >>>> on
> > > > > > > > > > > > > gateway
> > > > > > > > > > > > > >>>> nodes. There are 2000 ports
> > > > > > > > > > > > > >>>> connected to this external network (1500 
> > > > > > > > > > > > > >>>> Floating IPs + 500
> > > > > > > > > > > > > >>>> SNAT
> > > > > > > > > > > > > router
> > > > > > > > > > > > > >>>> ports). So the setup is not
> > > > > > > > > > > > > >>>> very big we'd say, but after applying this 
> > > > > > > > > > > > > >>>> configuration via
> > > > > > > > > > > > > >>>> ML2/OVN plugin, northd kicks in and does its 
> > > > > > > > > > > > > >>>> job, and after
> > > > > > > > > > > > > >>>> its done, Logical_Flow table gets 645877 
> > > > > > > > > > > > > >>>> entries, which is
> > > > > > > > > > > > > >>>> way too much. But ok, we move on and start one 
> > > > > > > > > > > > > >>>> controller on
> > > > > > > > > > > > > >>>> the gateway chassis, and here things get really 
> > > > > > > > > > > > > >>>> messy.
> > > > > > > > > > > > > >>>> MAC_Binding table grows from 0 to 999088 entries 
> > > > > > > > > > > > > >>>> in one
> > > > > > > > > > > > > >>>> moment, and after its done, the size of SB 
> > > > > > > > > > > > > >>>> biggest tables
> > > > > > > > > > > > > >>>> look like this:
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>>> 999088 MAC_Binding
> > > > > > > > > > > > > >>>> 645877 Logical_Flow
> > > > > > > > > > > > > >>>> 4726 Port_Binding
> > > > > > > > > > > > > >>>> 1117 Multicast_Group
> > > > > > > > > > > > > >>>> 1068 Datapath_Binding
> > > > > > > > > > > > > >>>> 1046 Port_Group
> > > > > > > > > > > > > >>>> 551 IP_Multicast
> > > > > > > > > > > > > >>>> 519 DNS
> > > > > > > > > > > > > >>>> 517 HA_Chassis_Group
> > > > > > > > > > > > > >>>> 517 HA_Chassis
> > > > > > > > > > > > > >>>> ...
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>>> MAC binding table gets huge, basically it now 
> > > > > > > > > > > > > >>>> has an entry
> > > > > > > > > > > > > >>>> for every port that is connected to external 
> > > > > > > > > > > > > >>>> network * number
> > > > > > > > > > > > > >>>> of datapaths, which roughly makes it one million 
> > > > > > > > > > > > > >>>> entries.
> > > > > > > > > > > > > >>>> This table by itself increases the size of the 
> > > > > > > > > > > > > >>>> SB by 200
> > > > > > > > > > > > > >>>> megabytes. Logical_Flow table also gets very 
> > > > > > > > > > > > > >>>> heavy, we have
> > > > > > > > > > > > > >>>> already played a bit with logical datapath 
> > > > > > > > > > > > > >>>> patches that Ilya
> > > > > > > > > > > > > >>>> Maximets submitted, and it
> > > > > > > > > > > > > looks
> > > > > > > > > > > > > >>>> much better, but the size of
> > > > > > > > > > > > > >>>> the MAC_Binding table still feels inadequate.
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>>> We would like to start to work at least on 
> > > > > > > > > > > > > >>>> MAC_Binding table
> > > > > > > > > > > > > >>>> optimisation, but it is a bit difficult to start 
> > > > > > > > > > > > > >>>> working from
> > > > > > > > > > > > > >>>> scratch. Can someone help us with ideas how this 
> > > > > > > > > > > > > >>>> could be
> > > > > > > > > > > > > >>>> optimised?
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>>> Maybe it would also make sense to group entries 
> > > > > > > > > > > > > >>>> in
> > > > > > > > > > > > > >>>> MAC_Binding table
> > > > > > > > > > > > > in
> > > > > > > > > > > > > >>>> the same way like it is proposed for logical 
> > > > > > > > > > > > > >>>> flows in Ilya's
> > > > > > > > > > > > > >>>> patch?
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>> Maybe it would work but I'm not really sure how, 
> > > > > > > > > > > > > >>> right now.
> > > > > > > > > > > > > >>> However, what if we change the way MAC_Bindings 
> > > > > > > > > > > > > >>> are created?
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>> Right now a MAC Binding is created for each 
> > > > > > > > > > > > > >>> logical router
> > > > > > > > > > > > > >>> port but in your case there are a lot of logical 
> > > > > > > > > > > > > >>> router ports
> > > > > > > > > > > > > >>> connected to the single provider logical switch 
> > > > > > > > > > > > > >>> and they all
> > > > > > > > > > learn the same ARPs.
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>> What if we instead store MAC_Bindings per logical 
> > > > > > > > > > > > > >>> switch?
> > > > > > > > > > > > > >>> Basically sharing all these MAC_Bindings between 
> > > > > > > > > > > > > >>> all router
> > > > > > > > > > > > > >>> ports connected to
> > > > > > > > > > > > > the
> > > > > > > > > > > > > >>> same LS.
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>> Do you see any problem with this approach?
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>> Thanks,
> > > > > > > > > > > > > >>> Dumitru
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >> I believe that this approach is way to go, at 
> > > > > > > > > > > > > >> least nothing
> > > > > > > > > > > > > >> comes to my
> > > > > > > > > > > > > mind
> > > > > > > > > > > > > >> that could go wrong here. We will try to make a 
> > > > > > > > > > > > > >> patch for that.
> > > > > > > > > > > > > However, if
> > > > > > > > > > > > > >> someone is familiar with the code and knows how to 
> > > > > > > > > > > > > >> do it fast,
> > > > > > > > > > > > > >> it would
> > > > > > > > > > > > > also
> > > > > > > > > > > > > >> be very nice.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This approach should work.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I've another idea (I won't call it a solution yet). 
> > > > > > > > > > > > > > What if we
> > > > > > > > > > > > > > drop the usage of MAC_Binding altogether ?
> > > > > > > > > > > > >
> > > > > > > > > > > > > This would be great!
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - When ovn-controller learns a mac_binding, it will 
> > > > > > > > > > > > > > not create a
> > > > > > > > > > > > > > row into the SB MAC_binding table
> > > > > > > > > > > > > > - Instead it will maintain the learnt mac binding 
> > > > > > > > > > > > > > in its memory.
> > > > > > > > > > > > > > - ovn-controller will still program the table 66 
> > > > > > > > > > > > > > with the flow
> > > > > > > > > > > > > > to set the eth.dst (for the get_arp() action)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This has a couple of advantages
> > > > > > > > > > > > > >   - Right now we never flush the old/stale 
> > > > > > > > > > > > > > mac_binding entries.
> > > > > > > > > > > > > >   - If suppose the mac of an external IP has 
> > > > > > > > > > > > > > changed, but OVN
> > > > > > > > > > > > > > has an entry for that IP with old mac in the 
> > > > > > > > > > > > > > mac_binding table,
> > > > > > > > > > > > > >     we will use the old mac, causing the packet to 
> > > > > > > > > > > > > > be sent out
> > > > > > > > > > > > > > to the wrong destination and the packet might get 
> > > > > > > > > > > > > > lost.
> > > > > > > > > > > > > >   - So we will get rid of this problem
> > > > > > > > > > > > > >   - We will also save SB DB space.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > There are few disadvantages
> > > > > > > > > > > > > >   -  Other ovn-controllers will not add the flows 
> > > > > > > > > > > > > > in table 66. I
> > > > > > > > > > > > > > guess this should be fine as each ovn-controller 
> > > > > > > > > > > > > > can generate
> > > > > > > > > > > > > > the ARP request and learn the mac.
> > > > > > > > > > > > > >   - When ovn-controller restarts we lose the learnt 
> > > > > > > > > > > > > > macs and
> > > > > > > > > > > > > > would need to learn again.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Any thoughts on this ?
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > It'd be great to have some sort of local ARP cache but 
> > > > > > > > > > > > I'm concerned
> > > > > > > > > > > > about the performance implications.
> > > > > > > > > > > >
> > > > > > > > > > > > - How are you going to determine when an entry is stale?
> > > > > > > > > > > > If you slow path the packets to reset the timeout 
> > > > > > > > > > > > everytime a pkt
> > > > > > > > > > > > with source mac is received, it doesn't look good. 
> > > > > > > > > > > > Maybe you have
> > > > > > > > > > > > something else in mind.
> > > > > > > > > > >
> > > > > > > > > > > Right now we don't stale any mac_binding entry. If I 
> > > > > > > > > > > understand you
> > > > > > > > > > > correctly, your concern is for the scenario where a 
> > > > > > > > > > > floating ip is
> > > > > > > > > > > updated with a different mac, how the local cache is 
> > > > > > > > > > > updated ?
> > > > > > > > > > >
> > > > > > > > > > > Right now networking-ovn (in the case of openstack) 
> > > > > > > > > > > updates the
> > > > > > > > > > > mac_binding entry in the South db for such cases right ?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > FYI - I have started working on this approach as PoC. i.e 
> > > > > > > > > > to use local
> > > > > > > > > > mac_binding cache
> > > > > > > > > > instead of using the SB mac_binding table.
> > > > > > > > > >
> > > > > > > > > > I will update this thread about the progress.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > > Numan
> > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > > Numan
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > -
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > There's another scenario that we need to take care of 
> > > > > > > > > > > > > and doesn't
> > > > > > > > > > seem
> > > > > > > > > > > > > too obvious to address without MAC_Bindings.
> > > > > > > > > > > > >
> > > > > > > > > > > > > GARPs were being injected in the L2 broadcast domain 
> > > > > > > > > > > > > of a LS for
> > > > > > > > > > nat
> > > > > > > > > > > > > addresses in case FIPs are reused by the CMS, 
> > > > > > > > > > > > > introduced by:
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > https://github.com/ovn-
> > > > > > > > > > org/ovn/commit/069a32cbf443c937feff44078e8828d7a2702da8
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Dumitru and I have been discussing the possibility of 
> > > > > > > > > > > > reverting this
> > > > > > > > > > patch
> > > > > > > > > > > > and rely on CMSs to maintain the MAC_Binding entries 
> > > > > > > > > > > > associated with
> > > > > > > > > > the
> > > > > > > > > > > > FIPs [0].
> > > > > > > > > > > > I'm against reverting this patch in OVN [1] for 
> > > > > > > > > > > > multiple reasons
> > > > > > > > > > being the
> > > > > > > > > > > > most important one the fact that if we rely on 
> > > > > > > > > > > > workarounds in the
> > > > > > > > > > CMS side,
> > > > > > > > > > > > we'll be creating a control plane dependency for 
> > > > > > > > > > > > something that is
> > > > > > > > > > pure
> > > > > > > > > > > > dataplane only (ie. if Neutron server is down - outage, 
> > > > > > > > > > > > upgrades,
> > > > > > > > > > etc. -,
> > > > > > > > > > > > traffic is going to be disrupted). On the other hand 
> > > > > > > > > > > > one could argue
> > > > > > > > > > that
> > > > > > > > > > > > the same dependency now exists on ovn-controller being 
> > > > > > > > > > > > up & running
> > > > > > > > > > but I
> > > > > > > > > > > > believe that this is better than a) relying on 
> > > > > > > > > > > > workarounds on CMSs
> > > > > > > > b)
> > > > > > > > > > > > relying on CMSs availability.
> > > > > > > > > > > >
> > > > > > > > > > > > In the short term I think that moving the MAC_Binding 
> > > > > > > > > > > > entries to LS
> > > > > > > > > > instead
> > > > > > > > > > > > of LRP as it was suggested up thread would be a good 
> > > > > > > > > > > > idea and in the
> > > > > > > > > > long
> > > > > > > > > > > > haul, the ARP *local* cache seems to be the right 
> > > > > > > > > > > > solution.
> > > > > > > > > > Brainstorming
> > > > > > > > > > > > with Dumitru he suggested inspecting the flows 
> > > > > > > > > > > > regularly to see if
> > > > > > > > > > the
> > > > > > > > > > > > packet count on flows that check if src_mac == X has 
> > > > > > > > > > > > not increased
> > > > > > > > > > in a
> > > > > > > > > > > > while and then remove the ARP responder flows locally.
> > > > > > > > > > > >
> > > > > > > > > > > > [0]
> > > > > > > > > > > > https://github.com/openstack/networking-
> > > > > > > > > > ovn/commit/5181f1106ff839d08152623c25c9a5f6797aa2d7
> > > > > > > > > > > >
> > > > > > > > > > > > [1]
> > > > > > > > > > > > https://github.com/ovn-
> > > > > > > > > > org/ovn/commit/069a32cbf443c937feff44078e8828d7a2702da8
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Recently, due to the dataplane scaling issue (4K 
> > > > > > > > > > > > > resubmit limit
> > > > > > > > > > being
> > > > > > > > > > > > > hit), we don't flood these packets on non-router 
> > > > > > > > > > > > > ports and instead
> > > > > > > > > > > > > create the MAC Bindings directly from ovn-controller:
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > https://github.com/ovn-
> > > > > > > > > > org/ovn/commit/a2b88dc5136507e727e4bcdc4bf6fde559f519a9
> > > > > > > > > > > > >
> > > > > > > > > > > > > Without the MAC_Binding table we'd need to find a way 
> > > > > > > > > > > > > to update or
> > > > > > > > > > flush
> > > > > > > > > > > > > stale bindings when an IP is used for a VIF or FIP.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Dumitru
> > > > > > > > > > > > >
> > > > > > > > > > > > > _______________________________________________
> > > > > > > > > > > > > dev mailing list
> > > > > > > > > > > > > [email protected]
> > > > > > > > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > _______________________________________________
> > > > > > > > > > > > dev mailing list
> > > > > > > > > > > > [email protected]
> > > > > > > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > > > > > > > > >
> > > > > > > > > > _______________________________________________
> > > > > > > > > > dev mailing list
> > > > > > > > > > [email protected]
> > > > > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > > > > > > _______________________________________________
> > > > > > > > > dev mailing list
> > > > > > > > > [email protected]
> > > > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > > > > > _______________________________________________
> > > > > > > > dev mailing list
> > > > > > > > [email protected]
> > > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > > > > >
> > > > > > > _______________________________________________
> > > > > > > dev mailing list
> > > > > > > [email protected]
> > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Thanks
> > > > > > Anil
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks
> > > > Anil
> >
> >
> >
> > --
> > Thanks
> > Anil



-- 
Thanks
Anil
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] Scaling of Logical_Flows and MAC_Binding tables

Reply via email to