On Tue, Jun 24, 2025 at 5:14 PM Mark Michelson <mmich...@redhat.com> wrote:

> On 6/24/25 1:31 PM, Tim Rozet wrote:
> > Thanks Mark for the detailed response and taking the time to review the
> > proposal. See inline.
> >
> > Tim Rozet
> > Red Hat OpenShift Networking Team
> >
> >
> > On Tue, Jun 24, 2025 at 12:04 PM Mark Michelson <mmich...@redhat.com
> > <mailto:mmich...@redhat.com>> wrote:
> >
> >     On 6/18/25 8:26 PM, Numan Siddique wrote:
> >      > On Fri, Jun 13, 2025 at 8:44 AM Tim Rozet via dev
> >      > <ovs-dev@openvswitch.org <mailto:ovs-dev@openvswitch.org>> wrote:
> >      >>
> >      >> Hello,
> >      >> In the OVN-Kubernetes we have been discussing and designing a
> way to
> >      >> implement Service Function Chaining (SFC) for various use cases.
> >     Some of
> >      >> these use cases are fairly complicated, involving a DPU and
> multiple
> >      >> clusters. However, we have tried to abstract the OVN design and
> >     use case
> >      >> into a generic implementation that is not specific to our
> >     particular use
> >      >> cases. It follows SFC designs previously done within other
> >     projects like
> >      >> OpenStack Neutron and OpenDaylight. Please see:
> >      >>
> >      >> https://docs.google.com/document/
> >     d/1dLdpx_9ZCnjHHldbNZABIpJF_GXd69qb/edit#bookmark=id.a7vfofkk8rj5
> >     <https://docs.google.com/document/
> >     d/1dLdpx_9ZCnjHHldbNZABIpJF_GXd69qb/edit#bookmark=id.a7vfofkk8rj5>
> >      >>
> >      >> tl;dr the design includes new tables to declare chains and
> >     classifiers to
> >      >> get traffic into that chain. There needs to be a new stage in
> >     the datapath
> >      >> pipeline to evaluate this behavior upon port ingress. We also
> >     need these
> >      >> flows to be hardware offloadable.
> >      >>
> >      >> For more details on the specific use cases we are targeting in
> the
> >      >> OVN-Kubernetes project, please see:
> >      >>
> >      >> https://docs.google.com/document/d/1MDZlu4oHL3RCWndbSgC-
> >     IGLgs1QfnB1l47nPqtM5iNo/edit?tab=t.0#heading=h.g8u53k9ds9s5
> >     <https://docs.google.com/document/d/1MDZlu4oHL3RCWndbSgC-
> >     IGLgs1QfnB1l47nPqtM5iNo/edit?tab=t.0#heading=h.g8u53k9ds9s5>
> >      >>
> >      >> Would appreciate feedback (either on the mailing list or in the
> >     design doc)
> >      >> and thoughts from the OVN experts on how we can accomodate this
> >     feature.
> >      >>
> >      >
> >      > Hi Tim,
> >      >
> >      > There is a very similar proposal from @Sragdhara Datta Chaudhuri
> to
> >      > add Network Functions support in OVN.
> >      > Can you please take a look at it ?  Looks like there are many
> >      > similarities in the requirements.
> >      >
> >      > https://mail.openvswitch.org/pipermail/ovs-dev/2025-
> >     May/423586.html <https://mail.openvswitch.org/pipermail/ovs-
> >     dev/2025-May/423586.html>
> >      > https://mail.openvswitch.org/pipermail/ovs-dev/2025-
> >     June/424102.html <https://mail.openvswitch.org/pipermail/ovs-
> >     dev/2025-June/424102.html>
> >      >
> >      >
> >      > Thanks
> >      > Numan
> >
> >     Hi Tim and Numan,
> >
> >     I've looked at both the ovn-k proposal and the Nutanix patch series.
> I
> >     think the biggest differences between the proposals (aside from small
> >     things, like naming) are the following:
> >
> >     1) Nutanix amends the ACL table to include a network function group
> to
> >     send the packet to if the packet matches. The ovn-k proposal
> suggests a
> >     new SFC_Classifier table that includes an ACL-like match.
> >
> >     2) ovn-k wants load balancing of the service functions. The Nutanix
> >     patch series has no load balancing.
> >
> >     3) ovn-k wants a Service_Function_Chain table, that allows for
> multiple
> >     services to be chained. The Nutanix patch series provides a
> >     Network_Function_Group table that allows a single network function
> >     to be
> >     the active one. There is no concept of chaining in the patch series.
> >
> >     4) ovn-k wants NSH-awareness. I don't 100% know what this entails,
> but
> >     there is no NSH in the Nutanix patch series.
> >
> >
> > We don't necessarily require NSH. Some limited Cisco products support
> > NSH, but I'm not aware of other vendors. So for now the majority of the
> > CNF use case would be proxied. However, we do need some mechanism to
> > store metadata to know what chain the packet is currently on, especially
> > as packets go between nodes. This could be Geneve TLV metadata. I'm
> > looking for feedback on this kind of stuff in the doc, as I'm not sure
> > what is best suited for this and if it is offloadable.
>  > >
> >     IMO, items 2, 3, and 4 can be made as add-ons to the Nutanix patch
> >     series.
> >
> >
> > How do you envision it being added on? Would it be a separate feature,
> > or an extension of the Nutanix effort?
>
> These are great questions. My thought had been that it would be an
> extension of the Nutanix feature.
>
> > I'm a bit concerned if it is the
> > latter, because I worry we will have boxed ourselves into a certain
> > paradigm and be less flexible to accomodate the full SFC RFC. For
> > example, in the Nutanix proposal it looks like the functionality relies
> > on standard networking principles. The client ports are connected to the
> > same subnet as the network function. In my proposal, there is no concept
> > of this network connectivity. The new stage simply takes the packet and
> > delivers it to the port, without any requirement of layer 2 or layer 3
> > connectivity.
>
> I'm not 100% sure I understand what you mean about the Nutanix proposal
> relying on standard network principles. For instance, my reading of the
> Nutanix patches is that if the ACL matches, then the packet is sent to
> the configured switch outport.


What I mean is when the NF does not exist on the same switch as the client
traffic. Looking at the proposal again I think the relevant section is "NF
insertion across logical switches".  In my proposal there is no definition
of needing a link between the switches. My definition might be wrong in the
OVN context, that's where I need feedback and we need to discuss how it
would work. To try to explain it in simple terms. If I have a client on
switch LS1 that sends traffic that is classified to a chain with 1 NF
(analogous to the Nutanix NF/NFG) on switch SVC LS...in Nutanix a child
port is created by CMS to connect to the 2 switches together, while in my
proposal there is no concept of that link:

Nutanix proposal:

                                    --------
                                    | NF VM  |
                                     --------
                                     |      |
          -----                      |      |              -----
         | VM1 |                    nfp1   nfp2           | VM2 |
          ---- -   |     |         --------------          -----    |      |
            |      |     |        |    SVC LS    |          |       |      |
          p1|  nfp1_ch1  nfp2_ch1  --------------         p3|
nfp1_ch2  nfp2_ch2
          --------------------                             --------------------
         |         LS1        |                           |         LS2        |
          --------------------                             --------------------


nfp1_ch1 is created by CMS to get the packet from the LS1 to SVC LS. I'm
guessing it doesn't matter in this case whether or not the SVC LS is on the
same OVN node? How would this work in IC? Would we need a transit switch?
How does nfp1_ch1 map to SVC LS? It isn't clear in the proposal how the CMS
configures these parts with examples. Should it really be on the user to
create these nfp1_ch1 ports? Or can they automatically be inferred?

In the OVNK proposal, I do not define that there needs to be links between
switches. From the SFC perspective, it transcends standard networking.
Therefore there is no reason to need a link between switches. Once it
classifies the packet as needing to go to nfp1, and if nfp1 is on the same
host, it just sends the packet to its port. If it's on a remote node, it
adds header data and tunnels the packet to the next node. This is how it
can be implemented in raw openflow or in other SDN controllers. That
perspective may not be grounded in reality when it comes to how OVN
ingress/egress pipelines and traffic forwarding work. That's where I need
your feedback and we need to figure out how those pieces should work. IMHO
it would be imperative to figure that out before the Nutanix stuff is
merged.

Then when the patch re-arrives on the
> configured switch inport, it uses conntrack information to put the
> packet back on track to go to its intended destination. The service
> function does not appear to require any sort of L2 switching based
> solely on that.
>
> Even the final patch that introduces the health monitoring doesn't rely
> on the switch subnet but instead uses NB_Global options to determine the
> destination MAC to check. It doesn't seem to be necessary to be on the
> same subnet as the switch on which the service is configured.
>
> I may be misinterpreting, though.
>
> > Furthermore in the Nutanix proposal there are requirements
> > around the packet not being modified, while in SFC it is totally OK for
> > the packet to be modified. Once classified, the packet is identified by
> > its chain id and position in the chain (aforementioned NSH/Geneve
> metadata).
>
> Can you refresh me on how the chain ID is determined in the SFC
> proposal? In the patch series, the function group ID is stored in
> conntrack, so when the packet rearrives into OVN, we use conntrack to
> identify that the packet has come from a network function and needs to
> be "resumed" as it were. Because the patches use conntrack, the packet's
> identifying info (src IP, dst IP, src port, dst port, l4 protocol) can't
> be changed, since it means that we won't be able to find the packet in
> conntrack any longer.
>

Sure. So in the SFC world when the packet is going to be sent to the NF,
the SFF (OVS switch) determines if the NF needs to be proxied or not. If it
does not need proxying, then the SFF sends the packet with the NSH header.
This header describes to the NF the chain ID and the current position in
the chain (index). Note, this requires the NF to be NSH aware so that it
can read the NSH header. At this point the NF will process the packet, and
decrement the chain index, and send the packet back to the SFF. Now when
the packet arrives back in OVS, it can read the NSH header and know where
to send the packet to next. This way a single NF can actually be part of
multiple chains. It can even reclassify packets to a different chain by
itself. However, this all relies on NSH, which only a few NFs actually
support.

Now, when we look at NFs that do not support NSH and need proxying. In this
case the SFF "proxies" by "stripping the chain information" and sending the
packet without any additional information to the NF. In this model the NF
can only be part of a single chain, because when the packet is sent and
comes back there would be no way to distinguish packets being one one chain
or another. So what I have seen in the past implementations is you set OF
registers to track the chain internally in OVS. Let's take an example with
a 2 NF chain, where the NFs are split across nodes, and let's assume that
we use Geneve with TLV to hold our chain/index information. Let's define
the chain as an ordered list of NF1,NF2:


                             NF1
NF2
                              |
|
                              |
|
                              |
|
                       +-----------+
+-----------+
                       |           |                            |
     |
     client -----------|OVS node1  |--------------------------- |OVS
node2  |
                       |           |--------------------------- |
     |
                       |           |                            |
     |
                       +-----------+
+-----------+                         +
                              |
                              |
                           server



1. client sends packet (let's assume to google, 8.8.8.8), it gets
classified and OF registers are stored with chain id, and index 255,
punted to chain processing OF table

2. OVS node 1 - SFC stage/table matches chain id, index 255, send to NF1

3. NF1 receives raw packet, modifies dest IP address to be *server*,
sends packet back to OVS node1

4. OVS node1 - Recieves packet from in_port NF1, restores OF register
for chain id, stores register for index, now decremented to 254

5. OVS node 1 - SFC stage/table matches chain id, index 254, send to
remote SFF OVS node 2. Enacapsulate in geneve, set Geneve metadata
chain id and index at 254.

6. OVS node 2 receives packet - SFC stage/table matches chain id,
index 254, send to NF2

7. NF2 receives raw packet, modifies something else in the packet,
sends back to OVS node2

8. OVS node 2 receives the packet from in_port NF2, restores OF
register for chain id, stores register for index, now decremented to
253

9. OVS node 2 - SFC stage/table matches on chain ID, index 253, has
reached the end of chain. Send packet back to original SFF to resume
datapath pipeline processing. Encapsulate in geneve, set chain id and
index at 253.

10. OVS node 1 receives packet. Processes chain id and determines 253
is end of chain. Continue to next stage of ingress datapath pipeline.

11. Regular OVN datapath pipeline finishes, routes packet towards
server due to dest IP in packet.


The chain has effectively rerouted the destination of the packet to
another server, without needing conntrack to store anything.



> In the SFC proposal, if the packet is modified, then that means we would
> need to use something other than conntrack to track the chain ID. Would
> we require NSH in order to track the chain ID properly? Or is there some
> other way?
>
> >
> >
> >     Item 1 is the biggest sticking point. From my point of view, I prefer
> >     the Nutanix approach of modifying the ACL table since,
> >     * ACLs can be applied to switches or port groups. The proposed
> >     SFC_Classifier only applies to port groups.
> >     * ACLs have things like logging and sampling that can be useful in
> this
> >     scenario.
> >     * ACLs can be tiered.
> >     However, if there's a good reason why this will not work for ovn-k's
> >     scenario, then that would be good to know.
> >
> >
> > Using the ACLs I think would be fine for the OVNK use case as well. The
> > reason I didn't propose using ACLs were 2 fold:
> > 1. Trying to create a clear boundary for SFC. Since SFC does not behave
> > like normal networking, I thought it would make sense to make it its own
> > entity.
>
> This is where I really wish we had something like composable services in
> place, because it sounds like SFC is only being added to logical
> switches because that's the current best fit for them. They would really
> be better suited to their own datapath type.
>
> But for now, putting them on a logical switch is the best choice.
>
> The nice thing about ACL stages is that they are very early in the
> logical switch pipelines. We perform FDB and mirror actions before the
> ACL, but that's it.
>
> > 2. I didn't think OVN would be amenable to modifying ACL to have a new
> > column to send to a chain.
>
> > In the Nutanix proposal it looks like the column is added to send to a
> > NFG. Would we also add the ability to send to a SFC?
>
> The way I had thought about it, we could expand NFGs to contain SFCs.
> Currently, an NFG has a list of network functions. But we could create a
> new column in the NFG table that could be one or more SFCs. The idea
> would be that if you configure the network_functions column, we use
> those. If you configure the service_function_chains column, we use those
> instead. It would be a misconfiguration to use both at the same time.
>
> >
> >
> >     Currently, I would prefer to review and accept the Nutanix patch
> series
> >     (for ovn25.09) and then add on the ovn-k features that are not
> present
> >     in the series (for ovn26.03).
> >
> >     Tim, what do you think?
> >
> >
> > I think first we should have a solid plan for how we will add on the SFC
> > part. For example will we expand NFG so that we can load balance across
> > it or only have 1 active at a time? If so, then it would maybe make
> > sense now to add a new field to the NFG to indicate this mode. Those
> > types of detail I would like to iron out and have a plan for so we don't
> > find ourselves cornered when we try to add SFC later. wdyt?
>
> Yes, this is how my thought process was as well. The current NFG
> configuration allows for multiple network functions to be configured,
> choosing a single one as the active one based on health checks.
>
> We have to consider that we want to:
> 1) Allow for multiple functions to be chained.
> 2) Allow for multiple functions/chains to be load balanced.
>
> There are many possibilities for how to implement these based on the
> current patch series.
>
> For chaining, I think the best plan is to create a new
> Service_Function_Chain (or Network_Function_Chain if we want to keep the
> same nomenclature) table. Then the NFG's network_function column could
> allow for either singular functions or chains in the list of
> network_functions.
>
> Alternatively, we could get rid of the current Network_Function table in
> favor of replacing it with the Service_Function_Chain table. A
> Network_Function is nothing more than a Service_Function_Chain with a
> single function, after all.
>

+1

>
> For load balancing, we could either:
> a) Add a boolean to the NFG table, called load_balance. If set to false,
> then a single active network function or service function chain is
> chosen from the list. If set to true, then all network functions or
> service function chains are viable, and we use load balancing to
> determine which to use. We can still use health checks to ensure we only
> try to load balance between live functions.
>

+1 I think this is probably true but just want to also highlight health
checks should be optional as well.


> b) Create a new Load_Balanced_Service_Function_Chain table that
> specifies lists of load balanced service function chains. Then the NFG
> could place these in the network_functions as well.
> c) The same as B, but instead of adding a new table, add a new column to
> the existing Load_Balancer table that allows a list of network_functions
> (or chains) to be listed. Then these load balancers could be applied to
> the NFG the same way as a network function.
>
> >
> >
> >     Thanks,
> >     Mark Michelson
> >      >
> >      >> Thanks
> >      >>
> >      >> Tim Rozet
> >      >> Red Hat OpenShift Networking Team
> >      >>
> >      >>>
> >      >> _______________________________________________
> >      >> dev mailing list
> >      >> d...@openvswitch.org <mailto:d...@openvswitch.org>
> >      >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev <https://
> >     mail.openvswitch.org/mailman/listinfo/ovs-dev>
> >      > _______________________________________________
> >      > dev mailing list
> >      > d...@openvswitch.org <mailto:d...@openvswitch.org>
> >      > https://mail.openvswitch.org/mailman/listinfo/ovs-dev <https://
> >     mail.openvswitch.org/mailman/listinfo/ovs-dev>
> >
>
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to