On Tue, Mar 25, 2025 at 9:06 AM Han Zhou <hz...@ovn.org> wrote:
>
>
>
> On Fri, Mar 14, 2025 at 12:35 AM Numan Siddique <num...@ovn.org> wrote:
> >
> > On Thu, Mar 13, 2025 at 12:21 PM Han Zhou <hz...@ovn.org> wrote:
> > >
> > >
> > >
> > > On Sat, Mar 1, 2025 at 6:38 AM <num...@ovn.org> wrote:
> > >>
> > >> From: Numan Siddique <num...@ovn.org>
> > >>
> > >> This patch series optimizes ovn-controller to add only relevant
> > >> datapaths to its "local datapaths" map if the logical topology
> > >> has a single flat provider logical switch connected to multiple
> > >> routers with a distributed gateway port.
> > >>
> > >> (Patch 3 has a detailed commit message.)
> > >>
> > >> v1 -> v2
> > >> -----
> > >>   * Rebased to resolve the conflicts.
> > >>
> > >> Numan Siddique (3):
> > >>   controller: Store local binding lports in local_datapath.
> > >>   northd: Add a flag 'only_dgp_peer_ports' to the SB Datapath.
> > >>   controller: Optimize adding 'dps' to the local datapaths.
> > >>
> > >>  controller/binding.c        | 324 +++++++++++---
> > >>  controller/binding.h        |   2 +
> > >>  controller/local_data.c     |  98 ++++-
> > >>  controller/local_data.h     |  10 +-
> > >>  controller/lport.c          |  12 +
> > >>  controller/lport.h          |   4 +
> > >>  controller/ovn-controller.c |  38 ++
> > >>  northd/northd.c             |  31 +-
> > >>  northd/northd.h             |   4 +
> > >>  tests/multinode.at          | 185 +++++++-
> > >>  tests/ovn-northd.at         |  59 +++
> > >>  tests/ovn-performance.at    |   6 +-
> > >>  tests/ovn.at                | 853 ++++++++++++++++++++++++++++++++++++
> > >>  13 files changed, 1545 insertions(+), 81 deletions(-)
> > >>
> > >> --
> > >> 2.48.1
> > >>
> > >> _______________________________________________
> > >> dev mailing list
> > >> d...@openvswitch.org
> > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >
> > >
> > > Hi Numan,
> > >
> > > Sorry for the slow review. I have a general comment for this series. I 
> > > believe the exact same performance optimization is supposed to be 
> > > achieved by the below commit:
> > >
> > > 22298fd37 ovn-controller: Don't flood fill local datapaths beyond DGP 
> > > boundary.
> > >
> > > The scenarios are mentioned in detail in the commit message of that 
> > > patch, which is very similar to what you described here.
> > >
> > > The only exception was when there are distributed NAT or when the 
> > > redirection type is set to "bridged". In these cases it will fall back to 
> > > add peer datapath as local. See SB:Port_Binding:options:always-redirect. 
> > > Is this something that your patch is targeting? Otherwise I wonder why it 
> > > didn't work without your patch.
> >
> > Hi Han,
> >
> > I wanted to mention in the commit message and in the cover letter that
> > this patch series is an extension of  the commit 22298fd37. I somehow
> > missed it.
> >
> > Yes,  I'm targeting the scenario of distributed NAT,
> >
>
> I'd request to update the commit message and focus on how the patch addresses 
> the distributed NAT scenario. Otherwise it would be misleading, since the 
> problem doesn't exist for the scenario described in the current commit 
> message.

Hi Han,

Thanks for the comments.  Sure I'll update the commit message.

>
> In addition, could you explain a little more about the solution. How does the 
> has_only_dgp_peer_ports address the distributed NAT scenario? For my 
> understanding, when there is distributed NAT associated to the LR, the 
> chassis needs to know the
provider LS so that the NAT pipeline can work on the chassis. If the
peer LS (provider LS) is not added just because of
has_only_dgp_peer_ports, how would the distributed NAT work?

[NUMAN[ The provider LS is added to the local data paths but it
doesn't go beyond it.

Let's take the topology in the commit message as an example. Let's say
all the routers lr0, lr1, lr2.... have distributed nats.
And let's say ovn-controller running on chassis 'A' claims the logical
port of sw0.  It starts by adding 'sw0' to the local_datapaths.
Since add_local_datapath() traverses the topology through the
connected peers,  it will add lr0 to its local_datapaths.
Since lr0 is connected to the provider LS public, it will also add
"public" to its local_datapaths.
The main thing added by this patch is that  it stops at "public" LS if
the option "pure_provider_switch" set on this datapath.

Note:  I'll be renaming the option to "pure_provider_switch" in the
next version.

So after claiming sw0's port,  the local_datapaths map will have [sw0,
lr0, public]
Having the logical flows of these datapaths processed into openflows
is enough for the distributed nats associated with the "sw0"'s logical
ports to work.

If this option was not set, then by the end of the
add_local_datapath() finally returns to its original caller,  the
local_datapaths map would have [sw0, lr0, public, lr1, sw1, lr2, sw2,
....]

Now let's say ovn-controller of chassis 'A' claims sw3's logical port,
 then it will first call add_local_datapath(sw3).  And this will add
[sw0, lr3] to the local_datapaths map.
Note that public is already present in the local_datapaths.  After
this the local_datapaths map will look like - [sw0, lr0, public, sw3,
lr3].
If suppose there are no more port bindings of sw3, then the patch also
makes sure to remove sw3 and lr3 from the local_datapaths map.

If you see the topology,  I'm visualising that "public" LS is like a
root node,  and ovn-controller will start traversing from the leaf and
stop at root. It will not traverse the other branches of "root" when
it reaches it.

Suppose if there is a logical port of "public" which ovn-controller
claims,  it adds "public" to the local datapaths and stops there.
If the ovn-controller has not claimed any other datapath logical
ports,  then there is no need to add them to the local datapaths.
If the logical port of "public" wants to reach other logical ports,
the packet can traverse through the localnet port to the fabric and
enter the other chassis where the DNAT IP resides.

"pure_provider_switch" option indicates to ovn-controller that it's a
"root" node.



> I didn't see any handling in your change related to distributed NAT. I 
> believe I missed something,
[Numan]  It doesn't require any special handling for distributed NATs
in ovn-controller.  Your commit "22298fd37"  will add the option
"always-redirect"
for "non distributed NATs"  for the logical router datapath.  And
add_local_datapath() will NOT add "public" to its local datapaths map.
However it will add "public" to its local datapath maps,  if other
logical routers connected to public have distributed NATs.

The main difference between the two scenarios  (i.e distributed NATs
vs non distributed NATs) is that with distributed NATs, this patch
adds the "public" LS to the local datapaths map and doesn't traverse
further.
Whereas with the other scenario, "public" LS is not added at all.


 because I saw your test case purposely used two LRs, one with
distributed NAT and one without, and the one with distributed NAT did
have provider LS added to the chassis, although I didn't find how that
worked. It would be helpful for code review if you could highlight the
related logic, probably also helpful for maintaining the code by
adding related comments in the code for this specific consideration.

Sure.  I'll add more comments.


>
> On the other hand, I also have doubts about the criteria of deciding not to 
> add peers according to has_only_dgp_peer_ports. When the peer is a provider 
> LS and connected by DGP, we shouldn't add the peer, regardless of whether 
> there is another LR connected to the provider LS with a distributed LRP, 
> right? The provider LS (and the other LR, say, LR-X) should be added to the 
> chassis if there is a port-binding on the chassis that is connected (directly 
> or indirectly) to that LR-X, right? But before this, I think I need to 
> understand firstly how the distributed NAT scenario is addressed by your 
> patch.

Sorry.  I didn't get this question.  Can you please see if my above
comments addressed your questions ?  If not can you please elaborate a
bit further ?


Thanks
Numan

>
> Thanks,
> Han
>
> > Numan
> >
> >
> > > Could you help clarify?
> > >
> > > Thanks,
> > > Han
> > >
> > >>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to