On Sat, Nov 28, 2020 at 12:31 PM Tony Liu <[email protected]> wrote:
>
> Hi Renat,
>
> What's this "logical datapath patches that Ilya Maximets submitted"?
> Could you share some links?
>
> There were couple discussions for the similar issue.
> [1] raised the issue and results a new option
> always_learn_from_arp_request to be added [2].
> [3] results a patch to OVN ML2 driver [4] to set the option added by [1].
>
> It seems that it helps to optimize logical_flow table.
> I am not sure if it helps on mac_binding as well.
>
> Is it the same issue we are trying to address here, by either
> Numan's local cache or the solution proposed by Dumitru?
>
> [1]
https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/049994.html
> [2]
https://github.com/ovn-org/ovn/commit/61ccc6b5fc7c49b512e26347cfa12b86f0ec2fd9#diff-05b24a3133733fb7b0f979698083b8128e8f1f18c3c2bd09002ae788d34a32f5
> [3] http://osdir.com/openstack-discuss/msg16002.html
> [4] https://review.opendev.org/c/openstack/neutron/+/752678
>
>
> Thanks!
> Tony

Thanks Tony for pointing to the old discussion [0]. I thought setting the
option always_learn_from_arp_request to "false" on the logical routers
should have solved this scale problem in MAC_Binding table in this scenario.

However, it seems the commit a2b88dc513 ("pinctrl: Directly update
MAC_Bindings created by self originated GARPs.") have overridden the
option. (I haven't tested, but maybe @Dumitru Ceara <[email protected]> can
confirm.)

Similarly, for the Logical_Flow explosion, it should have been solved by
setting the option dynamic_neigh_routers to "true".

I think these two options are exactly for the scenario Renat is
reporting. @Renat, could you try setting these options as suggested above
using the OVN version before the commit a2b88dc513 to see if it solves your
problem?

Regarding the proposals in this thread:
- Move MAC_Binding to LS (by Dumitru)
    This sounds good to me, while I am not sure about all the implications
yet, wondering why it was associated with LRP instead in the beginning.

- Remove MAC_Binding from SB (by Numan)
    I am a little concerned about this. The MAC_Binding in SB is required
for distributed LR to work for dynamic ARP resolving. Consider a general
use case: A - LS1 - LR1 - LS2 - B. A is on HV1 and B is on HV2. Now A sends
a packet to B's IP. Assume B's IP is unknown by OVN. The packet is routed
by LR1 and on the LRP facing LS2 an ARP is sent out over the LS1 logical
network. The above steps happen on HV1. Now the ARP request reaches HV2 and
is received by B, so B sends an ARP response. With the current
implementation, HV2's OVS flow would learn the MAC-IP binding from the ARP
response and update SB DB, and HV1 will get the SB update and install the
MAC Binding flow as a result of ARP resolving. The next time A sends a
packet to B, the HV1 will directly resolve the ARP from the MAC Binding
flows locally and send the IP packet to HV2. The SB DB MAC_Binding table
works as a distributed ARP/Neighbor cache. It is a mechanism to sync the
ARP cache from the place where it is learned to the place where it is
initiated, and all HVs benefit from this without the need to send ARP
themselves for the same LRP. In other words, the LRP is distributed, so the
ARP resolving is in a distributed fashion. Without this, each HV would
initiate ARP request on behalf of the same LRP, which would largely
increase the ARP traffic unnecessarily - even more than the traditional
network (where one physical router only needs to do one ARP resolving for
each neighbor and maintain one copy of ARP cache). And I am not sure if
there are other side effects when an endpoint sees unexpectedly frequent
ARP requests from the same LRP - would there be any rate limit that even
discards repeated ARP requests from the same source? Numan, maybe you have
already considered these. Would you share your thoughts?

Thanks,
Han

> > -----Original Message-----
> > From: dev <[email protected]> On Behalf Of Numan Siddique
> > Sent: Thursday, November 26, 2020 11:36 AM
> > To: Daniel Alvarez Sanchez <[email protected]>
> > Cc: ovs-dev <[email protected]>
> > Subject: Re: [ovs-dev] Scaling of Logical_Flows and MAC_Binding tables
> >
> > On Thu, Nov 26, 2020 at 4:32 PM Numan Siddique <[email protected]> wrote:
> > >
> > > On Thu, Nov 26, 2020 at 4:11 PM Daniel Alvarez Sanchez
> > > <[email protected]> wrote:
> > > >
> > > > On Wed, Nov 25, 2020 at 7:59 PM Dumitru Ceara <[email protected]>
> > wrote:
> > > >
> > > > > On 11/25/20 7:06 PM, Numan Siddique wrote:
> > > > > > On Wed, Nov 25, 2020 at 10:24 PM Renat Nurgaliyev
> > > > > > <[email protected]>
> > > > > wrote:
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> On 25.11.20 16:14, Dumitru Ceara wrote:
> > > > > >>> On 11/25/20 3:30 PM, Renat Nurgaliyev wrote:
> > > > > >>>> Hello folks,
> > > > > >>>>
> > > > > >>> Hi Renat,
> > > > > >>>
> > > > > >>>> we run a lab where we try to evaluate scalability potential
> > > > > >>>> of OVN
> > > > > with
> > > > > >>>> OpenStack as CMS.
> > > > > >>>> Current lab setup is following:
> > > > > >>>>
> > > > > >>>> 500 networks
> > > > > >>>> 500 routers
> > > > > >>>> 1500 VM ports (3 per network/router)
> > > > > >>>> 1500 Floating IPs (one per VM port)
> > > > > >>>>
> > > > > >>>> There is an external network, which is bridged to br-provider
> > > > > >>>> on
> > > > > gateway
> > > > > >>>> nodes. There are 2000 ports
> > > > > >>>> connected to this external network (1500 Floating IPs + 500
> > > > > >>>> SNAT
> > > > > router
> > > > > >>>> ports). So the setup is not
> > > > > >>>> very big we'd say, but after applying this configuration via
> > > > > >>>> ML2/OVN plugin, northd kicks in and does its job, and after
> > > > > >>>> its done, Logical_Flow table gets 645877 entries, which is
> > > > > >>>> way too much. But ok, we move on and start one controller on
> > > > > >>>> the gateway chassis, and here things get really messy.
> > > > > >>>> MAC_Binding table grows from 0 to 999088 entries in one
> > > > > >>>> moment, and after its done, the size of SB biggest tables
> > > > > >>>> look like this:
> > > > > >>>>
> > > > > >>>> 999088 MAC_Binding
> > > > > >>>> 645877 Logical_Flow
> > > > > >>>> 4726 Port_Binding
> > > > > >>>> 1117 Multicast_Group
> > > > > >>>> 1068 Datapath_Binding
> > > > > >>>> 1046 Port_Group
> > > > > >>>> 551 IP_Multicast
> > > > > >>>> 519 DNS
> > > > > >>>> 517 HA_Chassis_Group
> > > > > >>>> 517 HA_Chassis
> > > > > >>>> ...
> > > > > >>>>
> > > > > >>>> MAC binding table gets huge, basically it now has an entry
> > > > > >>>> for every port that is connected to external network * number
> > > > > >>>> of datapaths, which roughly makes it one million entries.
> > > > > >>>> This table by itself increases the size of the SB by 200
> > > > > >>>> megabytes. Logical_Flow table also gets very heavy, we have
> > > > > >>>> already played a bit with logical datapath patches that Ilya
> > > > > >>>> Maximets submitted, and it
> > > > > looks
> > > > > >>>> much better, but the size of
> > > > > >>>> the MAC_Binding table still feels inadequate.
> > > > > >>>>
> > > > > >>>> We would like to start to work at least on MAC_Binding table
> > > > > >>>> optimisation, but it is a bit difficult to start working from
> > > > > >>>> scratch. Can someone help us with ideas how this could be
> > > > > >>>> optimised?
> > > > > >>>>
> > > > > >>>> Maybe it would also make sense to group entries in
> > > > > >>>> MAC_Binding table
> > > > > in
> > > > > >>>> the same way like it is proposed for logical flows in Ilya's
> > > > > >>>> patch?
> > > > > >>>>
> > > > > >>> Maybe it would work but I'm not really sure how, right now.
> > > > > >>> However, what if we change the way MAC_Bindings are created?
> > > > > >>>
> > > > > >>> Right now a MAC Binding is created for each logical router
> > > > > >>> port but in your case there are a lot of logical router ports
> > > > > >>> connected to the single provider logical switch and they all
> > learn the same ARPs.
> > > > > >>>
> > > > > >>> What if we instead store MAC_Bindings per logical switch?
> > > > > >>> Basically sharing all these MAC_Bindings between all router
> > > > > >>> ports connected to
> > > > > the
> > > > > >>> same LS.
> > > > > >>>
> > > > > >>> Do you see any problem with this approach?
> > > > > >>>
> > > > > >>> Thanks,
> > > > > >>> Dumitru
> > > > > >>>
> > > > > >>>
> > > > > >> I believe that this approach is way to go, at least nothing
> > > > > >> comes to my
> > > > > mind
> > > > > >> that could go wrong here. We will try to make a patch for that.
> > > > > However, if
> > > > > >> someone is familiar with the code and knows how to do it fast,
> > > > > >> it would
> > > > > also
> > > > > >> be very nice.
> > > > > >
> > > > > > This approach should work.
> > > > > >
> > > > > > I've another idea (I won't call it a solution yet). What if we
> > > > > > drop the usage of MAC_Binding altogether ?
> > > > >
> > > > > This would be great!
> > > > >
> > > > > >
> > > > > > - When ovn-controller learns a mac_binding, it will not create a
> > > > > > row into the SB MAC_binding table
> > > > > > - Instead it will maintain the learnt mac binding in its memory.
> > > > > > - ovn-controller will still program the table 66 with the flow
> > > > > > to set the eth.dst (for the get_arp() action)
> > > > > >
> > > > > > This has a couple of advantages
> > > > > >   - Right now we never flush the old/stale mac_binding entries.
> > > > > >   - If suppose the mac of an external IP has changed, but OVN
> > > > > > has an entry for that IP with old mac in the mac_binding table,
> > > > > >     we will use the old mac, causing the packet to be sent out
> > > > > > to the wrong destination and the packet might get lost.
> > > > > >   - So we will get rid of this problem
> > > > > >   - We will also save SB DB space.
> > > > > >
> > > > > > There are few disadvantages
> > > > > >   -  Other ovn-controllers will not add the flows in table 66. I
> > > > > > guess this should be fine as each ovn-controller can generate
> > > > > > the ARP request and learn the mac.
> > > > > >   - When ovn-controller restarts we lose the learnt macs and
> > > > > > would need to learn again.
> > > > > >
> > > > > > Any thoughts on this ?
> > > > >
> > > >
> > > > It'd be great to have some sort of local ARP cache but I'm concerned
> > > > about the performance implications.
> > > >
> > > > - How are you going to determine when an entry is stale?
> > > > If you slow path the packets to reset the timeout everytime a pkt
> > > > with source mac is received, it doesn't look good. Maybe you have
> > > > something else in mind.
> > >
> > > Right now we don't stale any mac_binding entry. If I understand you
> > > correctly, your concern is for the scenario where a floating ip is
> > > updated with a different mac, how the local cache is updated ?
> > >
> > > Right now networking-ovn (in the case of openstack) updates the
> > > mac_binding entry in the South db for such cases right ?
> > >
> >
> > FYI - I have started working on this approach as PoC. i.e to use local
> > mac_binding cache
> > instead of using the SB mac_binding table.
> >
> > I will update this thread about the progress.
> >
> > Thanks
> > Numan
> >
> > > Thanks
> > > Numan
> > >
> > > >
> > > > -
> > > >
> > > > > >
> > > > > There's another scenario that we need to take care of and doesn't
> > seem
> > > > > too obvious to address without MAC_Bindings.
> > > > >
> > > > > GARPs were being injected in the L2 broadcast domain of a LS for
> > nat
> > > > > addresses in case FIPs are reused by the CMS, introduced by:
> > > > >
> > > > >
> > > > > https://github.com/ovn-
> > org/ovn/commit/069a32cbf443c937feff44078e8828d7a2702da8
> > > >
> > > >
> > > > Dumitru and I have been discussing the possibility of reverting this
> > patch
> > > > and rely on CMSs to maintain the MAC_Binding entries associated with
> > the
> > > > FIPs [0].
> > > > I'm against reverting this patch in OVN [1] for multiple reasons
> > being the
> > > > most important one the fact that if we rely on workarounds in the
> > CMS side,
> > > > we'll be creating a control plane dependency for something that is
> > pure
> > > > dataplane only (ie. if Neutron server is down - outage, upgrades,
> > etc. -,
> > > > traffic is going to be disrupted). On the other hand one could argue
> > that
> > > > the same dependency now exists on ovn-controller being up & running
> > but I
> > > > believe that this is better than a) relying on workarounds on CMSs
b)
> > > > relying on CMSs availability.
> > > >
> > > > In the short term I think that moving the MAC_Binding entries to LS
> > instead
> > > > of LRP as it was suggested up thread would be a good idea and in the
> > long
> > > > haul, the ARP *local* cache seems to be the right solution.
> > Brainstorming
> > > > with Dumitru he suggested inspecting the flows regularly to see if
> > the
> > > > packet count on flows that check if src_mac == X has not increased
> > in a
> > > > while and then remove the ARP responder flows locally.
> > > >
> > > > [0]
> > > > https://github.com/openstack/networking-
> > ovn/commit/5181f1106ff839d08152623c25c9a5f6797aa2d7
> > > >
> > > > [1]
> > > > https://github.com/ovn-
> > org/ovn/commit/069a32cbf443c937feff44078e8828d7a2702da8
> > > >
> > > > >
> > > > >
> > > > > Recently, due to the dataplane scaling issue (4K resubmit limit
> > being
> > > > > hit), we don't flood these packets on non-router ports and instead
> > > > > create the MAC Bindings directly from ovn-controller:
> > > > >
> > > > >
> > > > > https://github.com/ovn-
> > org/ovn/commit/a2b88dc5136507e727e4bcdc4bf6fde559f519a9
> > > > >
> > > > > Without the MAC_Binding table we'd need to find a way to update or
> > flush
> > > > > stale bindings when an IP is used for a VIF or FIP.
> > > > >
> > > > > Thanks,
> > > > > Dumitru
> > > > >
> > > > > _______________________________________________
> > > > > dev mailing list
> > > > > [email protected]
> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > >
> > > > >
> > > > _______________________________________________
> > > > dev mailing list
> > > > [email protected]
> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > >
> > _______________________________________________
> > dev mailing list
> > [email protected]
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to