On Thu, Mar 27, 2025 at 10:16 AM Numan Siddique <num...@ovn.org> wrote:
>
> On Tue, Mar 25, 2025 at 9:06 AM Han Zhou <hz...@ovn.org> wrote:
> >
> >
> >
> > On Fri, Mar 14, 2025 at 12:35 AM Numan Siddique <num...@ovn.org> wrote:
> > >
> > > On Thu, Mar 13, 2025 at 12:21 PM Han Zhou <hz...@ovn.org> wrote:
> > > >
> > > >
> > > >
> > > > On Sat, Mar 1, 2025 at 6:38 AM <num...@ovn.org> wrote:
> > > >>
> > > >> From: Numan Siddique <num...@ovn.org>
> > > >>
> > > >> This patch series optimizes ovn-controller to add only relevant
> > > >> datapaths to its "local datapaths" map if the logical topology
> > > >> has a single flat provider logical switch connected to multiple
> > > >> routers with a distributed gateway port.
> > > >>
> > > >> (Patch 3 has a detailed commit message.)
> > > >>
> > > >> v1 -> v2
> > > >> -----
> > > >>   * Rebased to resolve the conflicts.
> > > >>
> > > >> Numan Siddique (3):
> > > >>   controller: Store local binding lports in local_datapath.
> > > >>   northd: Add a flag 'only_dgp_peer_ports' to the SB Datapath.
> > > >>   controller: Optimize adding 'dps' to the local datapaths.
> > > >>
> > > >>  controller/binding.c        | 324 +++++++++++---
> > > >>  controller/binding.h        |   2 +
> > > >>  controller/local_data.c     |  98 ++++-
> > > >>  controller/local_data.h     |  10 +-
> > > >>  controller/lport.c          |  12 +
> > > >>  controller/lport.h          |   4 +
> > > >>  controller/ovn-controller.c |  38 ++
> > > >>  northd/northd.c             |  31 +-
> > > >>  northd/northd.h             |   4 +
> > > >>  tests/multinode.at          | 185 +++++++-
> > > >>  tests/ovn-northd.at         |  59 +++
> > > >>  tests/ovn-performance.at    |   6 +-
> > > >>  tests/ovn.at                | 853
++++++++++++++++++++++++++++++++++++
> > > >>  13 files changed, 1545 insertions(+), 81 deletions(-)
> > > >>
> > > >> --
> > > >> 2.48.1
> > > >>
> > > >> _______________________________________________
> > > >> dev mailing list
> > > >> d...@openvswitch.org
> > > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > >
> > > >
> > > > Hi Numan,
> > > >
> > > > Sorry for the slow review. I have a general comment for this
series. I believe the exact same performance optimization is supposed to be
achieved by the below commit:
> > > >
> > > > 22298fd37 ovn-controller: Don't flood fill local datapaths beyond
DGP boundary.
> > > >
> > > > The scenarios are mentioned in detail in the commit message of that
patch, which is very similar to what you described here.
> > > >
> > > > The only exception was when there are distributed NAT or when the
redirection type is set to "bridged". In these cases it will fall back to
add peer datapath as local. See SB:Port_Binding:options:always-redirect. Is
this something that your patch is targeting? Otherwise I wonder why it
didn't work without your patch.
> > >
> > > Hi Han,
> > >
> > > I wanted to mention in the commit message and in the cover letter that
> > > this patch series is an extension of  the commit 22298fd37. I somehow
> > > missed it.
> > >
> > > Yes,  I'm targeting the scenario of distributed NAT,
> > >
> >
> > I'd request to update the commit message and focus on how the patch
addresses the distributed NAT scenario. Otherwise it would be misleading,
since the problem doesn't exist for the scenario described in the current
commit message.
>
> Hi Han,
>
> Thanks for the comments.  Sure I'll update the commit message.
>
> >
> > In addition, could you explain a little more about the solution. How
does the has_only_dgp_peer_ports address the distributed NAT scenario? For
my understanding, when there is distributed NAT associated to the LR, the
chassis needs to know the
> provider LS so that the NAT pipeline can work on the chassis. If the
> peer LS (provider LS) is not added just because of
> has_only_dgp_peer_ports, how would the distributed NAT work?
>
> [NUMAN[ The provider LS is added to the local data paths but it
> doesn't go beyond it.
>
> Let's take the topology in the commit message as an example. Let's say
> all the routers lr0, lr1, lr2.... have distributed nats.
> And let's say ovn-controller running on chassis 'A' claims the logical
> port of sw0.  It starts by adding 'sw0' to the local_datapaths.
> Since add_local_datapath() traverses the topology through the
> connected peers,  it will add lr0 to its local_datapaths.
> Since lr0 is connected to the provider LS public, it will also add
> "public" to its local_datapaths.
> The main thing added by this patch is that  it stops at "public" LS if
> the option "pure_provider_switch" set on this datapath.
>
> Note:  I'll be renaming the option to "pure_provider_switch" in the
> next version.
>
> So after claiming sw0's port,  the local_datapaths map will have [sw0,
> lr0, public]
> Having the logical flows of these datapaths processed into openflows
> is enough for the distributed nats associated with the "sw0"'s logical
> ports to work.
>
> If this option was not set, then by the end of the
> add_local_datapath() finally returns to its original caller,  the
> local_datapaths map would have [sw0, lr0, public, lr1, sw1, lr2, sw2,
> ....]
>
> Now let's say ovn-controller of chassis 'A' claims sw3's logical port,
>  then it will first call add_local_datapath(sw3).  And this will add
> [sw0, lr3] to the local_datapaths map.
> Note that public is already present in the local_datapaths.  After
> this the local_datapaths map will look like - [sw0, lr0, public, sw3,
> lr3].
> If suppose there are no more port bindings of sw3, then the patch also
> makes sure to remove sw3 and lr3 from the local_datapaths map.
>
> If you see the topology,  I'm visualising that "public" LS is like a
> root node,  and ovn-controller will start traversing from the leaf and
> stop at root. It will not traverse the other branches of "root" when
> it reaches it.
>
> Suppose if there is a logical port of "public" which ovn-controller
> claims,  it adds "public" to the local datapaths and stops there.
> If the ovn-controller has not claimed any other datapath logical
> ports,  then there is no need to add them to the local datapaths.
> If the logical port of "public" wants to reach other logical ports,
> the packet can traverse through the localnet port to the fabric and
> enter the other chassis where the DNAT IP resides.
>
> "pure_provider_switch" option indicates to ovn-controller that it's a
> "root" node.
>
>
>
> > I didn't see any handling in your change related to distributed NAT. I
believe I missed something,
> [Numan]  It doesn't require any special handling for distributed NATs
> in ovn-controller.  Your commit "22298fd37"  will add the option
> "always-redirect"
> for "non distributed NATs"  for the logical router datapath.  And
> add_local_datapath() will NOT add "public" to its local datapaths map.
> However it will add "public" to its local datapath maps,  if other
> logical routers connected to public have distributed NATs.
>
> The main difference between the two scenarios  (i.e distributed NATs
> vs non distributed NATs) is that with distributed NATs, this patch
> adds the "public" LS to the local datapaths map and doesn't traverse
> further.
> Whereas with the other scenario, "public" LS is not added at all.
>

Hi Numan, Thanks for the explanation. Now I see the difference between the
two scenarios.

>
>  because I saw your test case purposely used two LRs, one with
> distributed NAT and one without, and the one with distributed NAT did
> have provider LS added to the chassis, although I didn't find how that
> worked. It would be helpful for code review if you could highlight the
> related logic, probably also helpful for maintaining the code by
> adding related comments in the code for this specific consideration.
>
> Sure.  I'll add more comments.
>
>
> >
> > On the other hand, I also have doubts about the criteria of deciding
not to add peers according to has_only_dgp_peer_ports. When the peer is a
provider LS and connected by DGP, we shouldn't add the peer, regardless of
whether there is another LR connected to the provider LS with a distributed
LRP, right? The provider LS (and the other LR, say, LR-X) should be added
to the chassis if there is a port-binding on the chassis that is connected
(directly or indirectly) to that LR-X, right? But before this, I think I
need to understand firstly how the distributed NAT scenario is addressed by
your patch.
>
> Sorry.  I didn't get this question.  Can you please see if my above
> comments addressed your questions ?  If not can you please elaborate a
> bit further ?
>

My question was wrong because I didn't distinguish the two scenarios until
I saw your explanation above. But still I have a related question to the
criteria of adding peers of the root (provider LS). In your implementation
the peer will be excluded (not flooded fill) only if the LS satisfies the
pure_provider_switch requirement:

    criteria:
      -  has one or more localnet ports.
      -  has one or more router ports and all its peers are distributed
gateway ports.

I think this criteria is not very generic and hard to reason about. Let's
say in your example topology one of the LRP connected to the public LS is
fully distributed instead of DGP, but all the other LRPs are still DGP,
then according to your criteria, every chassis will add all the DPs.
However, in theory only the DPs connected by the distributed LR should be
added, and the flood-fill should stop at all the DGPs when considering
peers of the "public" LS, right? Can we change it to be more generic so
that when considering a peer DP, if it is traversing from a LS to a LR, we
handle it just like "always_redirect == true", while when traversing from a
LR to a LS we need to honestly examine the "always_redirect" for
distributed NAT. Should this work?

Thanks,
Han

>
> Thanks
> Numan
>
> >
> > Thanks,
> > Han
> >
> > > Numan
> > >
> > >
> > > > Could you help clarify?
> > > >
> > > > Thanks,
> > > > Han
> > > >
> > > >>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to