On Thu, Jun 26, 2025 at 2:28 PM Tim Rozet <tro...@redhat.com> wrote: > > > > On Thu, Jun 26, 2025 at 11:49 AM Mark Michelson <mmich...@redhat.com> wrote: >> >> On 6/25/25 6:25 PM, Tim Rozet wrote: >> > >> > >> > On Wed, Jun 25, 2025 at 5:42 PM Mark Michelson <mmich...@redhat.com >> > <mailto:mmich...@redhat.com>> wrote: >> > >> > On 6/25/25 11:38 AM, Tim Rozet wrote: >> > > >> > > On Tue, Jun 24, 2025 at 5:14 PM Mark Michelson >> > <mmich...@redhat.com <mailto:mmich...@redhat.com> >> > > <mailto:mmich...@redhat.com <mailto:mmich...@redhat.com>>> wrote: >> > > >> > > On 6/24/25 1:31 PM, Tim Rozet wrote: >> > > > Thanks Mark for the detailed response and taking the time to >> > > review the >> > > > proposal. See inline. >> > > > >> > > > Tim Rozet >> > > > Red Hat OpenShift Networking Team >> > > > >> > > > >> > > > On Tue, Jun 24, 2025 at 12:04 PM Mark Michelson >> > > <mmich...@redhat.com <mailto:mmich...@redhat.com> >> > <mailto:mmich...@redhat.com <mailto:mmich...@redhat.com>> >> > > > <mailto:mmich...@redhat.com <mailto:mmich...@redhat.com> >> > <mailto:mmich...@redhat.com <mailto:mmich...@redhat.com>>>> wrote: >> > > > >> > > > On 6/18/25 8:26 PM, Numan Siddique wrote: >> > > > > On Fri, Jun 13, 2025 at 8:44 AM Tim Rozet via dev >> > > > > <ovs-dev@openvswitch.org <mailto:ovs- >> > d...@openvswitch.org> <mailto:ovs-dev@openvswitch.org <mailto:ovs- >> > d...@openvswitch.org>> >> > > <mailto:ovs-dev@openvswitch.org <mailto:ovs- >> > d...@openvswitch.org> <mailto:ovs-dev@openvswitch.org <mailto:ovs- >> > d...@openvswitch.org>>>> >> > > wrote: >> > > > >> >> > > > >> Hello, >> > > > >> In the OVN-Kubernetes we have been discussing and >> > > designing a way to >> > > > >> implement Service Function Chaining (SFC) for >> > various use >> > > cases. >> > > > Some of >> > > > >> these use cases are fairly complicated, involving >> > a DPU >> > > and multiple >> > > > >> clusters. However, we have tried to abstract the OVN >> > > design and >> > > > use case >> > > > >> into a generic implementation that is not specific >> > to our >> > > > particular use >> > > > >> cases. It follows SFC designs previously done >> > within other >> > > > projects like >> > > > >> OpenStack Neutron and OpenDaylight. Please see: >> > > > >> >> > > > >> https://docs.google.com/document/ <https:// >> > docs.google.com/document/> <https:// >> > > docs.google.com/document/ <http://docs.google.com/document/>> >> > > > d/1dLdpx_9ZCnjHHldbNZABIpJF_GXd69qb/ >> > edit#bookmark=id.a7vfofkk8rj5 >> > > > <https://docs.google.com/document/ <https:// >> > docs.google.com/document/> <https://docs.google.com/ <https:// >> > docs.google.com/> >> > > document/> >> > > > d/1dLdpx_9ZCnjHHldbNZABIpJF_GXd69qb/ >> > > edit#bookmark=id.a7vfofkk8rj5> >> > > > >> >> > > > >> tl;dr the design includes new tables to declare >> > chains and >> > > > classifiers to >> > > > >> get traffic into that chain. There needs to be a >> > new stage in >> > > > the datapath >> > > > >> pipeline to evaluate this behavior upon port >> > ingress. We also >> > > > need these >> > > > >> flows to be hardware offloadable. >> > > > >> >> > > > >> For more details on the specific use cases we are >> > > targeting in the >> > > > >> OVN-Kubernetes project, please see: >> > > > >> >> > > > >> https://docs.google.com/document/ >> > d/1MDZlu4oHL3RCWndbSgC- <https://docs.google.com/document/ >> > d/1MDZlu4oHL3RCWndbSgC-> >> > > <https://docs.google.com/document/d/1MDZlu4oHL3RCWndbSgC- >> > <https://docs.google.com/document/d/1MDZlu4oHL3RCWndbSgC->> >> > > > IGLgs1QfnB1l47nPqtM5iNo/edit? >> > tab=t.0#heading=h.g8u53k9ds9s5 >> > > > <https://docs.google.com/document/ >> > d/1MDZlu4oHL3RCWndbSgC- <https://docs.google.com/document/ >> > d/1MDZlu4oHL3RCWndbSgC-> >> > > <https://docs.google.com/document/d/1MDZlu4oHL3RCWndbSgC- >> > <https://docs.google.com/document/d/1MDZlu4oHL3RCWndbSgC->> >> > > > IGLgs1QfnB1l47nPqtM5iNo/edit? >> > tab=t.0#heading=h.g8u53k9ds9s5> >> > > > >> >> > > > >> Would appreciate feedback (either on the mailing >> > list or >> > > in the >> > > > design doc) >> > > > >> and thoughts from the OVN experts on how we can >> > > accomodate this >> > > > feature. >> > > > >> >> > > > > >> > > > > Hi Tim, >> > > > > >> > > > > There is a very similar proposal from @Sragdhara Datta >> > > Chaudhuri to >> > > > > add Network Functions support in OVN. >> > > > > Can you please take a look at it ? Looks like >> > there are many >> > > > > similarities in the requirements. >> > > > > >> > > > > https://mail.openvswitch.org/pipermail/ovs- >> > dev/2025- <https://mail.openvswitch.org/pipermail/ovs-dev/2025-> >> > > <https://mail.openvswitch.org/pipermail/ovs-dev/2025- >> > <https://mail.openvswitch.org/pipermail/ovs-dev/2025->> >> > > > May/423586.html <https://mail.openvswitch.org/ >> > pipermail/ovs- <https://mail.openvswitch.org/pipermail/ovs-> >> > > <https://mail.openvswitch.org/pipermail/ovs- <https:// >> > mail.openvswitch.org/pipermail/ovs->> >> > > > dev/2025-May/423586.html> >> > > > > https://mail.openvswitch.org/pipermail/ovs- >> > dev/2025- <https://mail.openvswitch.org/pipermail/ovs-dev/2025-> >> > > <https://mail.openvswitch.org/pipermail/ovs-dev/2025- >> > <https://mail.openvswitch.org/pipermail/ovs-dev/2025->> >> > > > June/424102.html <https://mail.openvswitch.org/ >> > pipermail/ovs- <https://mail.openvswitch.org/pipermail/ovs-> >> > > <https://mail.openvswitch.org/pipermail/ovs- <https:// >> > mail.openvswitch.org/pipermail/ovs->> >> > > > dev/2025-June/424102.html> >> > > > > >> > > > > >> > > > > Thanks >> > > > > Numan >> > > > >> > > > Hi Tim and Numan, >> > > > >> > > > I've looked at both the ovn-k proposal and the Nutanix >> > patch >> > > series. I >> > > > think the biggest differences between the proposals >> > (aside >> > > from small >> > > > things, like naming) are the following: >> > > > >> > > > 1) Nutanix amends the ACL table to include a network >> > function >> > > group to >> > > > send the packet to if the packet matches. The ovn-k >> > proposal >> > > suggests a >> > > > new SFC_Classifier table that includes an ACL-like match. >> > > > >> > > > 2) ovn-k wants load balancing of the service >> > functions. The >> > > Nutanix >> > > > patch series has no load balancing. >> > > > >> > > > 3) ovn-k wants a Service_Function_Chain table, that >> > allows >> > > for multiple >> > > > services to be chained. The Nutanix patch series >> > provides a >> > > > Network_Function_Group table that allows a single network >> > > function >> > > > to be >> > > > the active one. There is no concept of chaining in the >> > patch >> > > series. >> > > > >> > > > 4) ovn-k wants NSH-awareness. I don't 100% know what this >> > > entails, but >> > > > there is no NSH in the Nutanix patch series. >> > > > >> > > > >> > > > We don't necessarily require NSH. Some limited Cisco products >> > > support >> > > > NSH, but I'm not aware of other vendors. So for now the >> > majority >> > > of the >> > > > CNF use case would be proxied. However, we do need some >> > mechanism to >> > > > store metadata to know what chain the packet is currently >> > > on, especially >> > > > as packets go between nodes. This could be Geneve TLV >> > metadata. I'm >> > > > looking for feedback on this kind of stuff in the doc, as >> > I'm not >> > > sure >> > > > what is best suited for this and if it is offloadable. >> > > > > >> > > > IMO, items 2, 3, and 4 can be made as add-ons to the >> > Nutanix >> > > patch >> > > > series. >> > > > >> > > > >> > > > How do you envision it being added on? Would it be a separate >> > > feature, >> > > > or an extension of the Nutanix effort? >> > > >> > > These are great questions. My thought had been that it would >> > be an >> > > extension of the Nutanix feature. >> > > >> > > > I'm a bit concerned if it is the >> > > > latter, because I worry we will have boxed ourselves into >> > a certain >> > > > paradigm and be less flexible to accomodate the full SFC >> > RFC. For >> > > > example, in the Nutanix proposal it looks like the >> > functionality >> > > relies >> > > > on standard networking principles. The client ports are >> > connected >> > > to the >> > > > same subnet as the network function. In my proposal, there >> > is no >> > > concept >> > > > of this network connectivity. The new stage simply takes the >> > > packet and >> > > > delivers it to the port, without any requirement of layer 2 >> > or >> > > layer 3 >> > > > connectivity. >> > > >> > > I'm not 100% sure I understand what you mean about the >> > Nutanix proposal >> > > relying on standard network principles. For instance, my >> > reading of the >> > > Nutanix patches is that if the ACL matches, then the packet >> > is sent to >> > > the configured switch outport. >> > > >> > > >> > > What I mean is when the NF does not exist on the same switch as the >> > > client traffic. Looking at the proposal again I think the relevant >> > > section is "NF insertion across logical switches". In my >> > proposal there >> > > is no definition of needing a link between the switches. My >> > definition >> > > might be wrong in the OVN context, that's where I need feedback >> > and we >> > > need to discuss how it would work. To try to explain it in simple >> > terms. >> > > If I have a client on switch LS1 that sends traffic that is >> > classified >> > > to a chain with 1 NF (analogous to the Nutanix NF/NFG) on switch SVC >> > > LS...in Nutanix a child port is created by CMS to connect to the 2 >> > > switches together, while in my proposal there is no concept of >> > that link: >> > > >> > > Nutanix proposal: >> > > >> > > -------- >> > > | NF VM | >> > > -------- >> > > | | >> > > ----- | | ----- >> > > | VM1 | nfp1 nfp2 | VM2 | >> > > ---- - | | -------------- ----- >> > | | >> > > | | | | SVC LS | | >> > | | >> > > p1| nfp1_ch1 nfp2_ch1 -------------- p3| >> > nfp1_ch2 nfp2_ch2 >> > > -------------------- >> > -------------------- >> > > | LS1 | | >> > LS2 | >> > > -------------------- >> > -------------------- >> > > >> > > nfp1_ch1 is created by CMS to get the packet from the LS1 to SVC >> > LS. I'm >> > > guessing it doesn't matter in this case whether or not the SVC LS >> > is on >> > > the same OVN node? How would this work in IC? Would we need a >> > transit >> > > switch? How does nfp1_ch1 map to SVC LS? It isn't clear in the >> > proposal >> > > how the CMS configures these parts with examples. Should it >> > really be on >> > > the user to create these nfp1_ch1 ports? Or can they >> > automatically be >> > > inferred? >> > > >> > > In the OVNK proposal, I do not define that there needs to be links >> > > between switches. From the SFC perspective, it transcends standard >> > > networking. Therefore there is no reason to need a link between >> > > switches. Once it classifies the packet as needing to go to nfp1, >> > and if >> > > nfp1 is on the same host, it just sends the packet to its port. >> > If it's >> > > on a remote node, it adds header data and tunnels the packet to >> > the next >> > > node. This is how it can be implemented in raw openflow or in >> > other SDN >> > > controllers. That perspective may not be grounded in reality when it >> > > comes to how OVN ingress/egress pipelines and traffic forwarding >> > work. >> > > That's where I need your feedback and we need to figure out how >> > those >> > > pieces should work. IMHO it would be imperative to figure that out >> > > before the Nutanix stuff is merged. >> > >> > Thanks for elucidating, Tim. >> > >> > I think this has really made it more clear about the fundamental >> > differences between the SFC proposal and the Nutanix series. >> > >> > The Nutanix series introduces a hook mechanism for packets that arrive >> > on a particular switch that is configured to send those packets out >> > to a >> > network function. And the series is >> > >> > The ovn-k proposal essentially wants a new overlay on top of the >> > existing logical network for service function chaining. The SFC >> > proposal >> > is only using logical switches because that's the only thing that OVN >> > provides that is close enough to where an SFF should live in an OVN >> > topology. >> > >> > > >> > > Then when the patch re-arrives on the >> > > configured switch inport, it uses conntrack information to >> > put the >> > > packet back on track to go to its intended destination. The >> > service >> > > function does not appear to require any sort of L2 switching >> > based >> > > solely on that. >> > > >> > > Even the final patch that introduces the health monitoring >> > doesn't rely >> > > on the switch subnet but instead uses NB_Global options to >> > determine >> > > the >> > > destination MAC to check. It doesn't seem to be necessary to >> > be on the >> > > same subnet as the switch on which the service is configured. >> > > >> > > I may be misinterpreting, though. >> > > >> > > > Furthermore in the Nutanix proposal there are requirements >> > > > around the packet not being modified, while in SFC it is >> > totally >> > > OK for >> > > > the packet to be modified. Once classified, the packet is >> > > identified by >> > > > its chain id and position in the chain (aforementioned >> > NSH/Geneve >> > > metadata). >> > > >> > > Can you refresh me on how the chain ID is determined in the SFC >> > > proposal? In the patch series, the function group ID is stored >> > in >> > > conntrack, so when the packet rearrives into OVN, we use >> > conntrack to >> > > identify that the packet has come from a network function and >> > needs to >> > > be "resumed" as it were. Because the patches use conntrack, the >> > > packet's >> > > identifying info (src IP, dst IP, src port, dst port, l4 >> > protocol) >> > > can't >> > > be changed, since it means that we won't be able to find the >> > packet in >> > > conntrack any longer. >> > > >> > > >> > > Sure. So in the SFC world when the packet is going to be sent to >> > the NF, >> > > the SFF (OVS switch) determines if the NF needs to be proxied or >> > not. If >> > > it does not need proxying, then the SFF sends the packet with the >> > > NSH header. This header describes to the NF the chain ID and the >> > current >> > > position in the chain (index). Note, this requires the NF to be NSH >> > > aware so that it can read the NSH header. At this point the NF will >> > > process the packet, and decrement the chain index, and send the >> > packet >> > > back to the SFF. Now when the packet arrives back in OVS, it can >> > read >> > > the NSH header and know where to send the packet to next. This way a >> > > single NF can actually be part of multiple chains. It can even >> > > reclassify packets to a different chain by itself. However, this all >> > > relies on NSH, which only a few NFs actually support. >> > > >> > > Now, when we look at NFs that do not support NSH and need >> > proxying. In >> > > this case the SFF "proxies" by "stripping the chain information" and >> > > sending the packet without any additional information to the NF. >> > In this >> > > model the NF can only be part of a single chain, because when the >> > packet >> > > is sent and comes back there would be no way to distinguish packets >> > > being one one chain or another. So what I have seen in the past >> > > implementations is you set OF registers to track the chain >> > internally in >> > > OVS. Let's take an example with a 2 NF chain, where the NFs are >> > split >> > > across nodes, and let's assume that we use Geneve with TLV to >> > hold our >> > > chain/index information. Let's define the chain as an ordered >> > list of >> > > NF1,NF2: >> > > >> > > >> > > NF1 >> > NF2 >> > > | >> > | >> > > | >> > | >> > > | >> > | >> > > +-----------+ >> > +-----------+ >> > > | | >> > | | >> > > client -----------|OVS node1 |--------------------------- >> > |OVS node2 | >> > > | |--------------------------- >> > | | >> > > | | >> > | | >> > > +-----------+ >> > +-----------+ + >> > > | >> > > | >> > > server >> > > >> > > >> > > >> > > 1. client sends packet (let's assume to google, 8.8.8.8), it gets >> > classified and OF registers are stored with chain id, and index 255, >> > punted to chain processing OF table >> > > >> > > 2. OVS node 1 - SFC stage/table matches chain id, index 255, send >> > to NF1 >> > > >> > > 3. NF1 receives raw packet, modifies dest IP address to be >> > *server*, sends packet back to OVS node1 >> > > >> > > 4. OVS node1 - Recieves packet from in_port NF1, restores OF >> > register for chain id, stores register for index, now decremented to >> > 254 >> > >> > At this point, I have questions about some implementation details, >> > and I >> > have bad news with regards to how OVN currently works. >> > >> > You've stored the chain id and index in OF registers. for simplicity, >> > we'll assume two 8-bit registers, reg0 and reg1. >> > >> > So packet0 arrives at OVS node 1 from the client. The chain id and >> > index >> > are stored in reg0 and reg1 respectively. packet0 is then sent to NF1 >> > and we await the return of packet0 so that we can use the registers to >> > know what to do with it next. >> > >> > Now at this point, let's say packet1 arrives at OVS node 1 from the >> > client. packet0 is still out at NF1. packet1 also needs to traverse the >> > NFs, but reg0 and reg1 are being used to track packet0. If we overwrite >> > the values, then when packet0 arrives back, reg0 and reg1 will have >> > packet1's chain state, not packet0's. How do you handle the collision? >> > >> > >> > AFAIK registers in openflow act in a per packet context. The registers >> > used to provide metadata to packet0 are per packet metadata that are >> > totally isolated from the registers used to handle packet1 context. >> >> I checked with some knowledgeable folks, and you are correct that the >> registers are scoped to a particular packet. But the flipside to this is >> that the registers can't be preserved if that packet leaves OVS. In the >> scenario that you outlined, did the packets leave OVS, or were the NFs >> implemented on separate bridges within the same OVS instance? > > > The NFs on the local node are all connected to the same OVS bridge. When the > packet leaves the bridge to go to an NF (assuming we are talking no NSH > here), the context is lost. But as you mentioned in your points below, we > know that the outport of the NF belongs to only this one chain (a limitation > without something like NSH), and we know its the output port of the first NF > in the chain. Therefore once the packet comes back to us from the NF output > port, we can match its in_port, then load the registers back with the chain > id, and decrement the index to 254. Once the flows determine the next NF is > now across node, we can load the registers into the geneve TLV as metadata > there to send it across. Although Dumitru simplified it for me earlier where > we do not need to store that information in Geneve and could just use the > remote port instead, more on that below. >> >> >> When it comes to identifying the chain state, there are essentially >> three options: >> >> 1) Use packet data. This is the NSH case, and based on what you stated >> earlier, it sounds like we should not plan for this since most NFs don't >> support NSH anyway. >> >> 2) Base our decisions based on the port on which the packet is received. >> Assuming the chains are not dynamic, then we can set up each SF to use a >> dedicated set of ports. This way, we can know that if, say, a packet >> arrives on port 1, it means that it's a packet that hasn't gone through >> any SFs yet, so we send to NF1 via port 2. Then when NF1 completes its >> processing, it sends the packet to port 3. Receiving on this port means >> we know to send to NF2 on port 4, e.g. We would still need to encode >> information via Geneve in the case where the packet needs to reach an NF >> on a separate node. This is essentially how the Nutanix series works, >> but it does not support chaining multiple NFs together. >> >> 3) Daisy-chain the NFs. Instead of having OVN send the packet to each >> NF, have OVN send the packet to NF1, then have NF1 send the packet to >> NF2, etc. This likely is not a viable option since the NFs will have no >> knowledge of how to reach each other, but I figured I'd throw it out >> there anyway. >> >> It sounds like (2) could be a reasonable way to go, assuming the chains >> are static. > > > I agree with you on 2. If later down the road we have NSH support than an NF > can be part of multiple chains (because of passing context along with the > packet), but until then we are proxying and that means an NF can only belong > to a single chain, and that chain has a static ordering of ports (besides the > notion of load balancing, but it's the same concept). So you can make the > assumption that when you know a packet comes out of an SF it is on 1 chain, > and you know the next port or openflow group to send to. Dumitru and I did > some diagraming today and this is what we came up with: > https://docs.google.com/drawings/d/1xD7n4IqMAWktbpBRY3p5F0FACmZS9odV415Q_YZeytQ/edit?usp=sharing > > Now the big problem here, which was also a problem in OpenDaylight, was at > the end of the chain...how can you know where to send the packet back to for > resuming normal datapath pipeline? With NSH it is easy, because we can pass > metadata in NSH to store the original switch/port. However without it, the > only way to do it would be to store context within something like conntrack. > Then when the packet comes back from the SF2 in the diagram, match conntrack > and now you have the context to know where to send the packet back to. > However, that is not practical because the SF may modify the packet, and the > conntrack will not match. I've been trying to dig through old OpenDaylight > code and ask around to see what we did for this problem and I'm not finding > an answer. Assuming there is no good solution, then we need to compromise the > design a bit. Here are a couple options: > > 1. Make a rule that SFs should not change the source mac of the packets. This > way we have something static that we can map back to our original node. This > would be problematic for SFs acting as a router or some other intermediary > hop in the chain that sends from its own mac. > 2. Restrict service function chains to be confined within a node. > > My initial thought is that option 2 is worse than 1. Forcing users to be per > node would mean they need to run their CNFs potentially on every node. That > won't scale very well. Perhaps 1 is an OK tradeoff for opting not to use NSH > with your CNF. In the future if/once we support NSH as well we could tell NF > vendors if you want to change the source mac, then support NSH. > >> >> >> > >> > >> > The same questions go for restoration of chain information at stage 8. >> > >> > This becomes moot when dealing with OVN datapaths because a transition >> > out of a pipeline involves clearing all registers. So we can't rely on >> > registers to hold values once a packet has either moved from one >> > pipeline to another or has exited the OVS bridge. This is why conntrack >> > is relied on heavily for stateful data. But of course, if the packet >> > data is being changed by an NF, we can't use conntrack either. >> > >> > >> > The ingress/egress pipeline semantics in OVN is something we need to >> > figure out. Another way to look at this is when a packet is classified >> > to a chain, its ingress pipeline is essentially paused and the packet is >> > diverted to a special SFC pipeline. The packet is later returned to the >> > ingress pipeline to resume ingress and egress processing. Not sure if we >> > could introduce a new type of SFC pipeline into OVN? >> >> This is essentially what I'm describing with the composable service >> idea. The packet would, prior to hitting the switch, hit an alternate >> SFC datapath. Once all NFs have executed, then the new datapath would >> send the packet to the switch and it would be processed like normal. >> >> Introducing a new SFC pipeline would be interesting, to say the least >> :). Let's just say for now that if we could avoid this, I'd prefer that. >> If it becomes a requirement, then we can try to work out exactly how >> that would be implemented, and what the implications would be on all >> levels of OVN. > > > I think from what Dumitru told me today what is in the diagram is very close > to what Nutanix is trying to do. Maybe we can view this as a superset of that > effort and consider their use case as a chain with a single NF. One thing I > don't get is why they need to use conntrack. Since they do not allow the > packet to change and there is only one NF, I think it would be enough to just > create a reverse chain (or just classifier in their case) to match on traffic > in the opposite direction. In my proposal that is a field in the service > function chain object called "symmetric". I think avoiding conntrack would be > more performant, and potentially easier to make offloadable.
Nutanix series uses conntrack to store the OVS tunnel interface's ofport in the ct_label in order to remember the source chassis of the original sender if the network function ports are running in different chassis. Once the packet is received from the outport of the NF function, the packet is tunneled back to the source chassis. Unfortunately I don't see any other way than using the conntrack. Numan >> >> >> > >> > >> > If we don't use conntrack or OF registers, then we usually resort to >> > storing state in the ovn-controller runtime. But having to slowpath >> > every packet to send to ovn-controller multiple times to store and >> > retrieve chain state would likely be *terrible* for performance. >> > >> > So ideally, the packet would always contain the chain id and index >> > information. But as you mentioned, not many NFs actually support >> > NSH, so >> > it's unlikely that we can rely on the packet to retain the information. >> > What do? >> > >> > >> > IMO we should be using OF registers. That's what we did in OpenDaylight. >> >> Just to reiterate what I was saying above, were the packets ever leaving >> the switch in this scenario? >> >> > > >> > > >> > > 5. OVS node 1 - SFC stage/table matches chain id, index 254, send >> > to remote SFF OVS node 2. Enacapsulate in geneve, set Geneve >> > metadata chain id and index at 254. >> > > >> > > 6. OVS node 2 receives packet - SFC stage/table matches chain id, >> > index 254, send to NF2 >> > > >> > > 7. NF2 receives raw packet, modifies something else in the >> > packet, sends back to OVS node2 >> > > >> > > 8. OVS node 2 receives the packet from in_port NF2, restores OF >> > register for chain id, stores register for index, now decremented to >> > 253 >> > > >> > > 9. OVS node 2 - SFC stage/table matches on chain ID, index 253, >> > has reached the end of chain. Send packet back to original SFF to >> > resume datapath pipeline processing. Encapsulate in geneve, set >> > chain id and index at 253. >> > > >> > > 10. OVS node 1 receives packet. Processes chain id and determines >> > 253 is end of chain. Continue to next stage of ingress datapath >> > pipeline. >> > > >> > > 11. Regular OVN datapath pipeline finishes, routes packet towards >> > server due to dest IP in packet. >> > > >> > > >> > > The chain has effectively rerouted the destination of the packet >> > to another server, without needing conntrack to store anything. >> > > >> > > >> > > >> > > In the SFC proposal, if the packet is modified, then that >> > means we >> > > would >> > > need to use something other than conntrack to track the chain >> > ID. Would >> > > we require NSH in order to track the chain ID properly? Or is >> > there >> > > some >> > > other way? >> > > >> > > > >> > > > >> > > > Item 1 is the biggest sticking point. From my point of >> > view, >> > > I prefer >> > > > the Nutanix approach of modifying the ACL table since, >> > > > * ACLs can be applied to switches or port groups. The >> > proposed >> > > > SFC_Classifier only applies to port groups. >> > > > * ACLs have things like logging and sampling that can be >> > > useful in this >> > > > scenario. >> > > > * ACLs can be tiered. >> > > > However, if there's a good reason why this will not >> > work for >> > > ovn-k's >> > > > scenario, then that would be good to know. >> > > > >> > > > >> > > > Using the ACLs I think would be fine for the OVNK use case as >> > > well. The >> > > > reason I didn't propose using ACLs were 2 fold: >> > > > 1. Trying to create a clear boundary for SFC. Since SFC >> > does not >> > > behave >> > > > like normal networking, I thought it would make sense to >> > make it >> > > its own >> > > > entity. >> > > >> > > This is where I really wish we had something like composable >> > > services in >> > > place, because it sounds like SFC is only being added to logical >> > > switches because that's the current best fit for them. They >> > would >> > > really >> > > be better suited to their own datapath type. >> > > >> > > But for now, putting them on a logical switch is the best >> > choice. >> > > >> > > The nice thing about ACL stages is that they are very early >> > in the >> > > logical switch pipelines. We perform FDB and mirror actions >> > before the >> > > ACL, but that's it. >> > > >> > > > 2. I didn't think OVN would be amenable to modifying ACL >> > to have >> > > a new >> > > > column to send to a chain. >> > > >> > > > In the Nutanix proposal it looks like the column is added >> > to send >> > > to a >> > > > NFG. Would we also add the ability to send to a SFC? >> > > >> > > The way I had thought about it, we could expand NFGs to >> > contain SFCs. >> > > Currently, an NFG has a list of network functions. But we could >> > > create a >> > > new column in the NFG table that could be one or more SFCs. >> > The idea >> > > would be that if you configure the network_functions column, >> > we use >> > > those. If you configure the service_function_chains column, >> > we use >> > > those >> > > instead. It would be a misconfiguration to use both at the >> > same time. >> > > >> > > > >> > > > >> > > > Currently, I would prefer to review and accept the >> > Nutanix >> > > patch series >> > > > (for ovn25.09) and then add on the ovn-k features that >> > are >> > > not present >> > > > in the series (for ovn26.03). >> > > > >> > > > Tim, what do you think? >> > > > >> > > > >> > > > I think first we should have a solid plan for how we will >> > add on >> > > the SFC >> > > > part. For example will we expand NFG so that we can load >> > balance >> > > across >> > > > it or only have 1 active at a time? If so, then it would >> > maybe make >> > > > sense now to add a new field to the NFG to indicate this >> > mode. Those >> > > > types of detail I would like to iron out and have a plan >> > for so >> > > we don't >> > > > find ourselves cornered when we try to add SFC later. wdyt? >> > > >> > > Yes, this is how my thought process was as well. The current NFG >> > > configuration allows for multiple network functions to be >> > configured, >> > > choosing a single one as the active one based on health checks. >> > > >> > > We have to consider that we want to: >> > > 1) Allow for multiple functions to be chained. >> > > 2) Allow for multiple functions/chains to be load balanced. >> > > >> > > There are many possibilities for how to implement these based >> > on the >> > > current patch series. >> > > >> > > For chaining, I think the best plan is to create a new >> > > Service_Function_Chain (or Network_Function_Chain if we want >> > to keep >> > > the >> > > same nomenclature) table. Then the NFG's network_function >> > column could >> > > allow for either singular functions or chains in the list of >> > > network_functions. >> > > >> > > Alternatively, we could get rid of the current Network_Function >> > > table in >> > > favor of replacing it with the Service_Function_Chain table. A >> > > Network_Function is nothing more than a >> > Service_Function_Chain with a >> > > single function, after all. >> > > >> > > >> > > +1 >> > > >> > > >> > > For load balancing, we could either: >> > > a) Add a boolean to the NFG table, called load_balance. If set >> > to >> > > false, >> > > then a single active network function or service function >> > chain is >> > > chosen from the list. If set to true, then all network >> > functions or >> > > service function chains are viable, and we use load balancing to >> > > determine which to use. We can still use health checks to >> > ensure we >> > > only >> > > try to load balance between live functions. >> > > >> > > >> > > +1 I think this is probably true but just want to also highlight >> > health >> > > checks should be optional as well. >> > >> > I believe in the current patch series health checks are optional. If >> > you >> > do not set the destination MAC for health checks then they do not >> > happen. I can double-check to be sure though. >> > >> > > >> > > b) Create a new Load_Balanced_Service_Function_Chain table that >> > > specifies lists of load balanced service function chains. >> > Then the NFG >> > > could place these in the network_functions as well. >> > > c) The same as B, but instead of adding a new table, add a new >> > > column to >> > > the existing Load_Balancer table that allows a list of >> > > network_functions >> > > (or chains) to be listed. Then these load balancers could be >> > applied to >> > > the NFG the same way as a network function. >> > > >> > >> > At this point, this is my summary of the situation: >> > >> > The patch series implements a hook system that allows packets to be >> > sent >> > out to non-mutating NFs. Packets may have traversed other switches and >> > routers in the network before arriving at the point that the hook is >> > invoked. Since the feature extends ACLs, packets can be sent to the NFs >> > during the ingress or egress pipeline. The NFs must be non-mutating >> > because we use conntrack to track the state of the packet. If the NFs >> > are chained, the chaining must be handled outside of OVN. NSH is never >> > involved. >> > >> > The ovn-k proposal seeks to encapsulate packets with NSH info before >> > the >> > packet arrives in OVN. >> > >> > >> > We don't seek to use NSH. It's just the only real standard out there as >> > an SFC transport. I'm fine with not supporting NSH (at least initially) >> > and just using Geneve to carry the chain/index information. >> >> OK great. I'll keep NSH out of discussions from this point since we can >> save it for a later add-on. >> >> > >> > Upon arrival in OVN, the packet should, as soon >> > as possible, be sent out to NFs. OVN may or may not need to proxy the >> > NSH information (though it likely will need to since most NFs are not >> > NSH-aware). Chaining may happen within the NFs or it can be handled by >> > OVN. The NFs may mutate the packet, meaning OVN cannot use conntrack to >> > track the chain id or index. Once all NFs have handled the packet, then >> > it is entered into the typical switch ingress pipeline and handled >> > as it >> > normally would be. AFAICT, sending the packets to NFs will *always* >> > happen ASAP, and cannot happen during the egress pipeline. >> > >> > >> > Yeah this is because we do not want the packet to be altered before it >> > is sent to the NF. For example, if the dest was a load balancer VIP, we >> > do not want the packet to get DNAT'ed by the LB and then sent to the >> > chain later. The classification and diversion of the packet should >> > happen as early as possible. >> >> This is another reason why the composable service would be a good idea, >> because the packet would have all SFC processing completed before ever >> ingressing a switch. >> >> If we're not using a composable service but instead using a logical >> switch, though, then I have come around to your POV that we should do >> SFC as the very first thing on ingress. Waiting until the ACL stage >> means performing (potentially pointless and incorrect) FDB lookups on >> packets that may be altered by NFs. Those FDB lookups should happen on >> the altered packet instead. >> >> > >> > >> > Generally speaking, these are very different with regards to >> > implementation details. However, I think Numan and Tim are correct that >> > we could tailor the new tables to be able to work with both use >> > cases. I >> > can think through this and try to propose something that will work for >> > both. I had previously thought that the ovn-k case could bend to the >> > Nutanix's use case, but I think that's incorrect. I think they are >> > distinct enough to exist as separate features in OVN. I don't think >> > either use case is invalid, and aside from ensuring the tables can >> > accommodate both use cases, I don't think anything should block merging >> > of the Nutanix patch series. >> > >> > Now more than ever, I think ovn-k SFC proposal would work best as a >> > composable service rather than in OVN logical switches. As a refresher, >> > the composable service idea is to essentially be able to insert new >> > hyper-specialized logical datapath types between VMs and switches, or >> > between switches and router distributed gateway ports. You could place >> > an "SFC classifier" datapath between the VM and the switch, allowing >> > for >> > the SFC processing to happen before the packet ever even enters the >> > logical switch, thereby not messing with the logical switch's >> > functionality at all. One thing I had always considered with the >> > composable services feature was that all composable services would >> > still >> > operate in OVN's br-int bridge. But if we want to be able to play fast >> > and loose with register behavior in composable services, it may be a >> > requirement to implement them within their own OVS bridges instead. >> > This >> > way they would have their own independent register space to use as they >> > see fit, including persisting register values after packets depart. >> > Depending on your answer to my questions above with regards to register >> > restoration, we may be able to implement identical logic to what you >> > illustrated within OVN. No composable services are actually implemented >> > in OVN yet, but ovn-northd refactoring efforts to allow for them to >> > exist are posted on patchwork currently. I had planned to try to >> > implement a simple NAT composable service as the first one, but SFC may >> > be a better way of proving their worth, especially if we need to be >> > able >> > to utilize secondary bridges. >> > >> > >> > Sounds interesting. Do you have a pointer to some links about composable >> > services and how to use them? >> >> Sure, I have this: >> https://docs.google.com/document/d/1hRdx9LTiquXoeKQNsTfq0LWGbGOHIqbhGTaxYVB4yUU/edit?tab=t.0 >> . From the document, everything up until the "Hook Services" section is >> relevant. The "External" service is something I had come up with before >> I had ever heard the term "SFC", but it kind of sought to have the same >> goal. It proposes OVN dumbly sending packets to an OVS bridge and having >> that OVS bridge do whatever it wants to the packet before sending the >> packet back into OVN. >> >> And there's also my presentation I gave last year at OVS+OVN conf: >> https://www.youtube.com/watch?v=Gf4M-ZSmTz4 . This definitely is not >> centered around SFC (I don't even recall if I mention the "external" >> service in the talk), but it goes into detail about the intent of how >> composable services will work and some use-cases they help with. >> >> > >> > >> > >> > > > >> > > > >> > > > Thanks, >> > > > Mark Michelson >> > > > > >> > > > >> Thanks >> > > > >> >> > > > >> Tim Rozet >> > > > >> Red Hat OpenShift Networking Team >> > > > >> >> > > > >>> >> > > > >> _______________________________________________ >> > > > >> dev mailing list >> > > > >> d...@openvswitch.org <mailto:d...@openvswitch.org> >> > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org>> >> > > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org> >> > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org>>> >> > > > >> https://mail.openvswitch.org/mailman/listinfo/ovs- >> > dev <https://mail.openvswitch.org/mailman/listinfo/ovs-dev> >> > > <https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>> <https:// >> > > > mail.openvswitch.org/mailman/listinfo/ovs-dev <http:// >> > mail.openvswitch.org/mailman/listinfo/ovs-dev> <http:// >> > > mail.openvswitch.org/mailman/listinfo/ovs-dev <http:// >> > mail.openvswitch.org/mailman/listinfo/ovs-dev>>> >> > > > > _______________________________________________ >> > > > > dev mailing list >> > > > > d...@openvswitch.org <mailto:d...@openvswitch.org> >> > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org>> >> > > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org> >> > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org>>> >> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs- >> > dev <https://mail.openvswitch.org/mailman/listinfo/ovs-dev> >> > > <https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>> <https:// >> > > > mail.openvswitch.org/mailman/listinfo/ovs-dev <http:// >> > mail.openvswitch.org/mailman/listinfo/ovs-dev> <http:// >> > > mail.openvswitch.org/mailman/listinfo/ovs-dev <http:// >> > mail.openvswitch.org/mailman/listinfo/ovs-dev>>> >> > > > >> > > >> > >> _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev