On Mon, Jul 7, 2025 at 11:10 AM Tim Rozet <tro...@redhat.com> wrote: > > Hey Numan, > But the issue here is with conntrack then you cannot modify the packet with > the SF. That seems very restrictive. Perhaps my proposal of not allowing > source MAC modification would be a better/alternative option? Perhaps we > could have multiple fields of an SF like: > > Network_Function > requires_proxy: true > allow_packet_modification: true > > If allow_packet_modification is false, we store the state in conntrack. If > true, we either: > 1. Restrict that the chain must be rendered on each node, and the chain is > never inter-node. > 2. Restrict that the source mac of the packet cannot be modified. > > wdyt?
Sounds good to me. Thanks Numan > > -Tim > > > On Fri, Jul 4, 2025 at 11:32 AM Numan Siddique <num...@ovn.org> wrote: >> >> On Thu, Jun 26, 2025 at 2:28 PM Tim Rozet <tro...@redhat.com> wrote: >> > >> > >> > >> > On Thu, Jun 26, 2025 at 11:49 AM Mark Michelson <mmich...@redhat.com> >> > wrote: >> >> >> >> On 6/25/25 6:25 PM, Tim Rozet wrote: >> >> > >> >> > >> >> > On Wed, Jun 25, 2025 at 5:42 PM Mark Michelson <mmich...@redhat.com >> >> > <mailto:mmich...@redhat.com>> wrote: >> >> > >> >> > On 6/25/25 11:38 AM, Tim Rozet wrote: >> >> > > >> >> > > On Tue, Jun 24, 2025 at 5:14 PM Mark Michelson >> >> > <mmich...@redhat.com <mailto:mmich...@redhat.com> >> >> > > <mailto:mmich...@redhat.com <mailto:mmich...@redhat.com>>> wrote: >> >> > > >> >> > > On 6/24/25 1:31 PM, Tim Rozet wrote: >> >> > > > Thanks Mark for the detailed response and taking the time >> >> > to >> >> > > review the >> >> > > > proposal. See inline. >> >> > > > >> >> > > > Tim Rozet >> >> > > > Red Hat OpenShift Networking Team >> >> > > > >> >> > > > >> >> > > > On Tue, Jun 24, 2025 at 12:04 PM Mark Michelson >> >> > > <mmich...@redhat.com <mailto:mmich...@redhat.com> >> >> > <mailto:mmich...@redhat.com <mailto:mmich...@redhat.com>> >> >> > > > <mailto:mmich...@redhat.com <mailto:mmich...@redhat.com> >> >> > <mailto:mmich...@redhat.com <mailto:mmich...@redhat.com>>>> wrote: >> >> > > > >> >> > > > On 6/18/25 8:26 PM, Numan Siddique wrote: >> >> > > > > On Fri, Jun 13, 2025 at 8:44 AM Tim Rozet via dev >> >> > > > > <ovs-dev@openvswitch.org <mailto:ovs- >> >> > d...@openvswitch.org> <mailto:ovs-dev@openvswitch.org <mailto:ovs- >> >> > d...@openvswitch.org>> >> >> > > <mailto:ovs-dev@openvswitch.org <mailto:ovs- >> >> > d...@openvswitch.org> <mailto:ovs-dev@openvswitch.org <mailto:ovs- >> >> > d...@openvswitch.org>>>> >> >> > > wrote: >> >> > > > >> >> >> > > > >> Hello, >> >> > > > >> In the OVN-Kubernetes we have been discussing and >> >> > > designing a way to >> >> > > > >> implement Service Function Chaining (SFC) for >> >> > various use >> >> > > cases. >> >> > > > Some of >> >> > > > >> these use cases are fairly complicated, involving >> >> > a DPU >> >> > > and multiple >> >> > > > >> clusters. However, we have tried to abstract the >> >> > OVN >> >> > > design and >> >> > > > use case >> >> > > > >> into a generic implementation that is not specific >> >> > to our >> >> > > > particular use >> >> > > > >> cases. It follows SFC designs previously done >> >> > within other >> >> > > > projects like >> >> > > > >> OpenStack Neutron and OpenDaylight. Please see: >> >> > > > >> >> >> > > > >> https://docs.google.com/document/ <https:// >> >> > docs.google.com/document/> <https:// >> >> > > docs.google.com/document/ <http://docs.google.com/document/>> >> >> > > > d/1dLdpx_9ZCnjHHldbNZABIpJF_GXd69qb/ >> >> > edit#bookmark=id.a7vfofkk8rj5 >> >> > > > <https://docs.google.com/document/ <https:// >> >> > docs.google.com/document/> <https://docs.google.com/ <https:// >> >> > docs.google.com/> >> >> > > document/> >> >> > > > d/1dLdpx_9ZCnjHHldbNZABIpJF_GXd69qb/ >> >> > > edit#bookmark=id.a7vfofkk8rj5> >> >> > > > >> >> >> > > > >> tl;dr the design includes new tables to declare >> >> > chains and >> >> > > > classifiers to >> >> > > > >> get traffic into that chain. There needs to be a >> >> > new stage in >> >> > > > the datapath >> >> > > > >> pipeline to evaluate this behavior upon port >> >> > ingress. We also >> >> > > > need these >> >> > > > >> flows to be hardware offloadable. >> >> > > > >> >> >> > > > >> For more details on the specific use cases we are >> >> > > targeting in the >> >> > > > >> OVN-Kubernetes project, please see: >> >> > > > >> >> >> > > > >> https://docs.google.com/document/ >> >> > d/1MDZlu4oHL3RCWndbSgC- <https://docs.google.com/document/ >> >> > d/1MDZlu4oHL3RCWndbSgC-> >> >> > > <https://docs.google.com/document/d/1MDZlu4oHL3RCWndbSgC- >> >> > <https://docs.google.com/document/d/1MDZlu4oHL3RCWndbSgC->> >> >> > > > IGLgs1QfnB1l47nPqtM5iNo/edit? >> >> > tab=t.0#heading=h.g8u53k9ds9s5 >> >> > > > <https://docs.google.com/document/ >> >> > d/1MDZlu4oHL3RCWndbSgC- <https://docs.google.com/document/ >> >> > d/1MDZlu4oHL3RCWndbSgC-> >> >> > > <https://docs.google.com/document/d/1MDZlu4oHL3RCWndbSgC- >> >> > <https://docs.google.com/document/d/1MDZlu4oHL3RCWndbSgC->> >> >> > > > IGLgs1QfnB1l47nPqtM5iNo/edit? >> >> > tab=t.0#heading=h.g8u53k9ds9s5> >> >> > > > >> >> >> > > > >> Would appreciate feedback (either on the mailing >> >> > list or >> >> > > in the >> >> > > > design doc) >> >> > > > >> and thoughts from the OVN experts on how we can >> >> > > accomodate this >> >> > > > feature. >> >> > > > >> >> >> > > > > >> >> > > > > Hi Tim, >> >> > > > > >> >> > > > > There is a very similar proposal from @Sragdhara >> >> > Datta >> >> > > Chaudhuri to >> >> > > > > add Network Functions support in OVN. >> >> > > > > Can you please take a look at it ? Looks like >> >> > there are many >> >> > > > > similarities in the requirements. >> >> > > > > >> >> > > > > https://mail.openvswitch.org/pipermail/ovs- >> >> > dev/2025- <https://mail.openvswitch.org/pipermail/ovs-dev/2025-> >> >> > > <https://mail.openvswitch.org/pipermail/ovs-dev/2025- >> >> > <https://mail.openvswitch.org/pipermail/ovs-dev/2025->> >> >> > > > May/423586.html <https://mail.openvswitch.org/ >> >> > pipermail/ovs- <https://mail.openvswitch.org/pipermail/ovs-> >> >> > > <https://mail.openvswitch.org/pipermail/ovs- <https:// >> >> > mail.openvswitch.org/pipermail/ovs->> >> >> > > > dev/2025-May/423586.html> >> >> > > > > https://mail.openvswitch.org/pipermail/ovs- >> >> > dev/2025- <https://mail.openvswitch.org/pipermail/ovs-dev/2025-> >> >> > > <https://mail.openvswitch.org/pipermail/ovs-dev/2025- >> >> > <https://mail.openvswitch.org/pipermail/ovs-dev/2025->> >> >> > > > June/424102.html <https://mail.openvswitch.org/ >> >> > pipermail/ovs- <https://mail.openvswitch.org/pipermail/ovs-> >> >> > > <https://mail.openvswitch.org/pipermail/ovs- <https:// >> >> > mail.openvswitch.org/pipermail/ovs->> >> >> > > > dev/2025-June/424102.html> >> >> > > > > >> >> > > > > >> >> > > > > Thanks >> >> > > > > Numan >> >> > > > >> >> > > > Hi Tim and Numan, >> >> > > > >> >> > > > I've looked at both the ovn-k proposal and the Nutanix >> >> > patch >> >> > > series. I >> >> > > > think the biggest differences between the proposals >> >> > (aside >> >> > > from small >> >> > > > things, like naming) are the following: >> >> > > > >> >> > > > 1) Nutanix amends the ACL table to include a network >> >> > function >> >> > > group to >> >> > > > send the packet to if the packet matches. The ovn-k >> >> > proposal >> >> > > suggests a >> >> > > > new SFC_Classifier table that includes an ACL-like >> >> > match. >> >> > > > >> >> > > > 2) ovn-k wants load balancing of the service >> >> > functions. The >> >> > > Nutanix >> >> > > > patch series has no load balancing. >> >> > > > >> >> > > > 3) ovn-k wants a Service_Function_Chain table, that >> >> > allows >> >> > > for multiple >> >> > > > services to be chained. The Nutanix patch series >> >> > provides a >> >> > > > Network_Function_Group table that allows a single >> >> > network >> >> > > function >> >> > > > to be >> >> > > > the active one. There is no concept of chaining in the >> >> > patch >> >> > > series. >> >> > > > >> >> > > > 4) ovn-k wants NSH-awareness. I don't 100% know what >> >> > this >> >> > > entails, but >> >> > > > there is no NSH in the Nutanix patch series. >> >> > > > >> >> > > > >> >> > > > We don't necessarily require NSH. Some limited Cisco >> >> > products >> >> > > support >> >> > > > NSH, but I'm not aware of other vendors. So for now the >> >> > majority >> >> > > of the >> >> > > > CNF use case would be proxied. However, we do need some >> >> > mechanism to >> >> > > > store metadata to know what chain the packet is currently >> >> > > on, especially >> >> > > > as packets go between nodes. This could be Geneve TLV >> >> > metadata. I'm >> >> > > > looking for feedback on this kind of stuff in the doc, as >> >> > I'm not >> >> > > sure >> >> > > > what is best suited for this and if it is offloadable. >> >> > > > > >> >> > > > IMO, items 2, 3, and 4 can be made as add-ons to the >> >> > Nutanix >> >> > > patch >> >> > > > series. >> >> > > > >> >> > > > >> >> > > > How do you envision it being added on? Would it be a >> >> > separate >> >> > > feature, >> >> > > > or an extension of the Nutanix effort? >> >> > > >> >> > > These are great questions. My thought had been that it would >> >> > be an >> >> > > extension of the Nutanix feature. >> >> > > >> >> > > > I'm a bit concerned if it is the >> >> > > > latter, because I worry we will have boxed ourselves into >> >> > a certain >> >> > > > paradigm and be less flexible to accomodate the full SFC >> >> > RFC. For >> >> > > > example, in the Nutanix proposal it looks like the >> >> > functionality >> >> > > relies >> >> > > > on standard networking principles. The client ports are >> >> > connected >> >> > > to the >> >> > > > same subnet as the network function. In my proposal, there >> >> > is no >> >> > > concept >> >> > > > of this network connectivity. The new stage simply takes >> >> > the >> >> > > packet and >> >> > > > delivers it to the port, without any requirement of layer >> >> > 2 or >> >> > > layer 3 >> >> > > > connectivity. >> >> > > >> >> > > I'm not 100% sure I understand what you mean about the >> >> > Nutanix proposal >> >> > > relying on standard network principles. For instance, my >> >> > reading of the >> >> > > Nutanix patches is that if the ACL matches, then the packet >> >> > is sent to >> >> > > the configured switch outport. >> >> > > >> >> > > >> >> > > What I mean is when the NF does not exist on the same switch as >> >> > the >> >> > > client traffic. Looking at the proposal again I think the >> >> > relevant >> >> > > section is "NF insertion across logical switches". In my >> >> > proposal there >> >> > > is no definition of needing a link between the switches. My >> >> > definition >> >> > > might be wrong in the OVN context, that's where I need feedback >> >> > and we >> >> > > need to discuss how it would work. To try to explain it in simple >> >> > terms. >> >> > > If I have a client on switch LS1 that sends traffic that is >> >> > classified >> >> > > to a chain with 1 NF (analogous to the Nutanix NF/NFG) on switch >> >> > SVC >> >> > > LS...in Nutanix a child port is created by CMS to connect to the >> >> > 2 >> >> > > switches together, while in my proposal there is no concept of >> >> > that link: >> >> > > >> >> > > Nutanix proposal: >> >> > > >> >> > > -------- >> >> > > | NF VM | >> >> > > -------- >> >> > > | | >> >> > > ----- | | ----- >> >> > > | VM1 | nfp1 nfp2 | VM2 >> >> > | >> >> > > ---- - | | -------------- ----- >> >> > | | >> >> > > | | | | SVC LS | | >> >> > | | >> >> > > p1| nfp1_ch1 nfp2_ch1 -------------- p3| >> >> > nfp1_ch2 nfp2_ch2 >> >> > > -------------------- >> >> > -------------------- >> >> > > | LS1 | | >> >> > LS2 | >> >> > > -------------------- >> >> > -------------------- >> >> > > >> >> > > nfp1_ch1 is created by CMS to get the packet from the LS1 to SVC >> >> > LS. I'm >> >> > > guessing it doesn't matter in this case whether or not the SVC LS >> >> > is on >> >> > > the same OVN node? How would this work in IC? Would we need a >> >> > transit >> >> > > switch? How does nfp1_ch1 map to SVC LS? It isn't clear in the >> >> > proposal >> >> > > how the CMS configures these parts with examples. Should it >> >> > really be on >> >> > > the user to create these nfp1_ch1 ports? Or can they >> >> > automatically be >> >> > > inferred? >> >> > > >> >> > > In the OVNK proposal, I do not define that there needs to be >> >> > links >> >> > > between switches. From the SFC perspective, it transcends >> >> > standard >> >> > > networking. Therefore there is no reason to need a link between >> >> > > switches. Once it classifies the packet as needing to go to nfp1, >> >> > and if >> >> > > nfp1 is on the same host, it just sends the packet to its port. >> >> > If it's >> >> > > on a remote node, it adds header data and tunnels the packet to >> >> > the next >> >> > > node. This is how it can be implemented in raw openflow or in >> >> > other SDN >> >> > > controllers. That perspective may not be grounded in reality >> >> > when it >> >> > > comes to how OVN ingress/egress pipelines and traffic forwarding >> >> > work. >> >> > > That's where I need your feedback and we need to figure out how >> >> > those >> >> > > pieces should work. IMHO it would be imperative to figure that >> >> > out >> >> > > before the Nutanix stuff is merged. >> >> > >> >> > Thanks for elucidating, Tim. >> >> > >> >> > I think this has really made it more clear about the fundamental >> >> > differences between the SFC proposal and the Nutanix series. >> >> > >> >> > The Nutanix series introduces a hook mechanism for packets that >> >> > arrive >> >> > on a particular switch that is configured to send those packets out >> >> > to a >> >> > network function. And the series is >> >> > >> >> > The ovn-k proposal essentially wants a new overlay on top of the >> >> > existing logical network for service function chaining. The SFC >> >> > proposal >> >> > is only using logical switches because that's the only thing that >> >> > OVN >> >> > provides that is close enough to where an SFF should live in an OVN >> >> > topology. >> >> > >> >> > > >> >> > > Then when the patch re-arrives on the >> >> > > configured switch inport, it uses conntrack information to >> >> > put the >> >> > > packet back on track to go to its intended destination. The >> >> > service >> >> > > function does not appear to require any sort of L2 switching >> >> > based >> >> > > solely on that. >> >> > > >> >> > > Even the final patch that introduces the health monitoring >> >> > doesn't rely >> >> > > on the switch subnet but instead uses NB_Global options to >> >> > determine >> >> > > the >> >> > > destination MAC to check. It doesn't seem to be necessary to >> >> > be on the >> >> > > same subnet as the switch on which the service is configured. >> >> > > >> >> > > I may be misinterpreting, though. >> >> > > >> >> > > > Furthermore in the Nutanix proposal there are requirements >> >> > > > around the packet not being modified, while in SFC it is >> >> > totally >> >> > > OK for >> >> > > > the packet to be modified. Once classified, the packet is >> >> > > identified by >> >> > > > its chain id and position in the chain (aforementioned >> >> > NSH/Geneve >> >> > > metadata). >> >> > > >> >> > > Can you refresh me on how the chain ID is determined in the >> >> > SFC >> >> > > proposal? In the patch series, the function group ID is >> >> > stored in >> >> > > conntrack, so when the packet rearrives into OVN, we use >> >> > conntrack to >> >> > > identify that the packet has come from a network function and >> >> > needs to >> >> > > be "resumed" as it were. Because the patches use conntrack, >> >> > the >> >> > > packet's >> >> > > identifying info (src IP, dst IP, src port, dst port, l4 >> >> > protocol) >> >> > > can't >> >> > > be changed, since it means that we won't be able to find the >> >> > packet in >> >> > > conntrack any longer. >> >> > > >> >> > > >> >> > > Sure. So in the SFC world when the packet is going to be sent to >> >> > the NF, >> >> > > the SFF (OVS switch) determines if the NF needs to be proxied or >> >> > not. If >> >> > > it does not need proxying, then the SFF sends the packet with the >> >> > > NSH header. This header describes to the NF the chain ID and the >> >> > current >> >> > > position in the chain (index). Note, this requires the NF to be >> >> > NSH >> >> > > aware so that it can read the NSH header. At this point the NF >> >> > will >> >> > > process the packet, and decrement the chain index, and send the >> >> > packet >> >> > > back to the SFF. Now when the packet arrives back in OVS, it can >> >> > read >> >> > > the NSH header and know where to send the packet to next. This >> >> > way a >> >> > > single NF can actually be part of multiple chains. It can even >> >> > > reclassify packets to a different chain by itself. However, this >> >> > all >> >> > > relies on NSH, which only a few NFs actually support. >> >> > > >> >> > > Now, when we look at NFs that do not support NSH and need >> >> > proxying. In >> >> > > this case the SFF "proxies" by "stripping the chain information" >> >> > and >> >> > > sending the packet without any additional information to the NF. >> >> > In this >> >> > > model the NF can only be part of a single chain, because when the >> >> > packet >> >> > > is sent and comes back there would be no way to distinguish >> >> > packets >> >> > > being one one chain or another. So what I have seen in the past >> >> > > implementations is you set OF registers to track the chain >> >> > internally in >> >> > > OVS. Let's take an example with a 2 NF chain, where the NFs are >> >> > split >> >> > > across nodes, and let's assume that we use Geneve with TLV to >> >> > hold our >> >> > > chain/index information. Let's define the chain as an ordered >> >> > list of >> >> > > NF1,NF2: >> >> > > >> >> > > >> >> > > NF1 >> >> > NF2 >> >> > > | >> >> > | >> >> > > | >> >> > | >> >> > > | >> >> > | >> >> > > +-----------+ >> >> > +-----------+ >> >> > > | | >> >> > | | >> >> > > client -----------|OVS node1 |--------------------------- >> >> > |OVS node2 | >> >> > > | |--------------------------- >> >> > | | >> >> > > | | >> >> > | | >> >> > > +-----------+ >> >> > +-----------+ + >> >> > > | >> >> > > | >> >> > > server >> >> > > >> >> > > >> >> > > >> >> > > 1. client sends packet (let's assume to google, 8.8.8.8), it gets >> >> > classified and OF registers are stored with chain id, and index 255, >> >> > punted to chain processing OF table >> >> > > >> >> > > 2. OVS node 1 - SFC stage/table matches chain id, index 255, send >> >> > to NF1 >> >> > > >> >> > > 3. NF1 receives raw packet, modifies dest IP address to be >> >> > *server*, sends packet back to OVS node1 >> >> > > >> >> > > 4. OVS node1 - Recieves packet from in_port NF1, restores OF >> >> > register for chain id, stores register for index, now decremented >> >> > to 254 >> >> > >> >> > At this point, I have questions about some implementation details, >> >> > and I >> >> > have bad news with regards to how OVN currently works. >> >> > >> >> > You've stored the chain id and index in OF registers. for >> >> > simplicity, >> >> > we'll assume two 8-bit registers, reg0 and reg1. >> >> > >> >> > So packet0 arrives at OVS node 1 from the client. The chain id and >> >> > index >> >> > are stored in reg0 and reg1 respectively. packet0 is then sent to >> >> > NF1 >> >> > and we await the return of packet0 so that we can use the registers >> >> > to >> >> > know what to do with it next. >> >> > >> >> > Now at this point, let's say packet1 arrives at OVS node 1 from the >> >> > client. packet0 is still out at NF1. packet1 also needs to traverse >> >> > the >> >> > NFs, but reg0 and reg1 are being used to track packet0. If we >> >> > overwrite >> >> > the values, then when packet0 arrives back, reg0 and reg1 will have >> >> > packet1's chain state, not packet0's. How do you handle the >> >> > collision? >> >> > >> >> > >> >> > AFAIK registers in openflow act in a per packet context. The registers >> >> > used to provide metadata to packet0 are per packet metadata that are >> >> > totally isolated from the registers used to handle packet1 context. >> >> >> >> I checked with some knowledgeable folks, and you are correct that the >> >> registers are scoped to a particular packet. But the flipside to this is >> >> that the registers can't be preserved if that packet leaves OVS. In the >> >> scenario that you outlined, did the packets leave OVS, or were the NFs >> >> implemented on separate bridges within the same OVS instance? >> > >> > >> > The NFs on the local node are all connected to the same OVS bridge. When >> > the packet leaves the bridge to go to an NF (assuming we are talking no >> > NSH here), the context is lost. But as you mentioned in your points below, >> > we know that the outport of the NF belongs to only this one chain (a >> > limitation without something like NSH), and we know its the output port of >> > the first NF in the chain. Therefore once the packet comes back to us from >> > the NF output port, we can match its in_port, then load the registers back >> > with the chain id, and decrement the index to 254. Once the flows >> > determine the next NF is now across node, we can load the registers into >> > the geneve TLV as metadata there to send it across. Although Dumitru >> > simplified it for me earlier where we do not need to store that >> > information in Geneve and could just use the remote port instead, more on >> > that below. >> >> >> >> >> >> When it comes to identifying the chain state, there are essentially >> >> three options: >> >> >> >> 1) Use packet data. This is the NSH case, and based on what you stated >> >> earlier, it sounds like we should not plan for this since most NFs don't >> >> support NSH anyway. >> >> >> >> 2) Base our decisions based on the port on which the packet is received. >> >> Assuming the chains are not dynamic, then we can set up each SF to use a >> >> dedicated set of ports. This way, we can know that if, say, a packet >> >> arrives on port 1, it means that it's a packet that hasn't gone through >> >> any SFs yet, so we send to NF1 via port 2. Then when NF1 completes its >> >> processing, it sends the packet to port 3. Receiving on this port means >> >> we know to send to NF2 on port 4, e.g. We would still need to encode >> >> information via Geneve in the case where the packet needs to reach an NF >> >> on a separate node. This is essentially how the Nutanix series works, >> >> but it does not support chaining multiple NFs together. >> >> >> >> 3) Daisy-chain the NFs. Instead of having OVN send the packet to each >> >> NF, have OVN send the packet to NF1, then have NF1 send the packet to >> >> NF2, etc. This likely is not a viable option since the NFs will have no >> >> knowledge of how to reach each other, but I figured I'd throw it out >> >> there anyway. >> >> >> >> It sounds like (2) could be a reasonable way to go, assuming the chains >> >> are static. >> > >> > >> > I agree with you on 2. If later down the road we have NSH support than an >> > NF can be part of multiple chains (because of passing context along with >> > the packet), but until then we are proxying and that means an NF can only >> > belong to a single chain, and that chain has a static ordering of ports >> > (besides the notion of load balancing, but it's the same concept). So you >> > can make the assumption that when you know a packet comes out of an SF it >> > is on 1 chain, and you know the next port or openflow group to send to. >> > Dumitru and I did some diagraming today and this is what we came up with: >> > https://docs.google.com/drawings/d/1xD7n4IqMAWktbpBRY3p5F0FACmZS9odV415Q_YZeytQ/edit?usp=sharing >> > >> > Now the big problem here, which was also a problem in OpenDaylight, was at >> > the end of the chain...how can you know where to send the packet back to >> > for resuming normal datapath pipeline? With NSH it is easy, because we can >> > pass metadata in NSH to store the original switch/port. However without >> > it, the only way to do it would be to store context within something like >> > conntrack. Then when the packet comes back from the SF2 in the diagram, >> > match conntrack and now you have the context to know where to send the >> > packet back to. However, that is not practical because the SF may modify >> > the packet, and the conntrack will not match. I've been trying to dig >> > through old OpenDaylight code and ask around to see what we did for this >> > problem and I'm not finding an answer. Assuming there is no good solution, >> > then we need to compromise the design a bit. Here are a couple options: >> > >> > 1. Make a rule that SFs should not change the source mac of the packets. >> > This way we have something static that we can map back to our original >> > node. This would be problematic for SFs acting as a router or some other >> > intermediary hop in the chain that sends from its own mac. >> > 2. Restrict service function chains to be confined within a node. >> > >> > My initial thought is that option 2 is worse than 1. Forcing users to be >> > per node would mean they need to run their CNFs potentially on every node. >> > That won't scale very well. Perhaps 1 is an OK tradeoff for opting not to >> > use NSH with your CNF. In the future if/once we support NSH as well we >> > could tell NF vendors if you want to change the source mac, then support >> > NSH. >> > >> >> >> >> >> >> > >> >> > >> >> > The same questions go for restoration of chain information at stage >> >> > 8. >> >> > >> >> > This becomes moot when dealing with OVN datapaths because a >> >> > transition >> >> > out of a pipeline involves clearing all registers. So we can't rely >> >> > on >> >> > registers to hold values once a packet has either moved from one >> >> > pipeline to another or has exited the OVS bridge. This is why >> >> > conntrack >> >> > is relied on heavily for stateful data. But of course, if the packet >> >> > data is being changed by an NF, we can't use conntrack either. >> >> > >> >> > >> >> > The ingress/egress pipeline semantics in OVN is something we need to >> >> > figure out. Another way to look at this is when a packet is classified >> >> > to a chain, its ingress pipeline is essentially paused and the packet is >> >> > diverted to a special SFC pipeline. The packet is later returned to the >> >> > ingress pipeline to resume ingress and egress processing. Not sure if we >> >> > could introduce a new type of SFC pipeline into OVN? >> >> >> >> This is essentially what I'm describing with the composable service >> >> idea. The packet would, prior to hitting the switch, hit an alternate >> >> SFC datapath. Once all NFs have executed, then the new datapath would >> >> send the packet to the switch and it would be processed like normal. >> >> >> >> Introducing a new SFC pipeline would be interesting, to say the least >> >> :). Let's just say for now that if we could avoid this, I'd prefer that. >> >> If it becomes a requirement, then we can try to work out exactly how >> >> that would be implemented, and what the implications would be on all >> >> levels of OVN. >> > >> > >> > I think from what Dumitru told me today what is in the diagram is very >> > close to what Nutanix is trying to do. Maybe we can view this as a >> > superset of that effort and consider their use case as a chain with a >> > single NF. One thing I don't get is why they need to use conntrack. Since >> > they do not allow the packet to change and there is only one NF, I think >> > it would be enough to just create a reverse chain (or just classifier in >> > their case) to match on traffic in the opposite direction. In my proposal >> > that is a field in the service function chain object called "symmetric". I >> > think avoiding conntrack would be more performant, and potentially easier >> > to make offloadable. >> >> Nutanix series uses conntrack to store the OVS tunnel interface's >> ofport in the ct_label >> in order to remember the source chassis of the original sender if the >> network function ports >> are running in different chassis. Once the packet is received from >> the outport of the NF function, >> the packet is tunneled back to the source chassis. >> >> Unfortunately I don't see any other way than using the conntrack. >> >> Numan >> >> >> >> >> >> >> > >> >> > >> >> > If we don't use conntrack or OF registers, then we usually resort to >> >> > storing state in the ovn-controller runtime. But having to slowpath >> >> > every packet to send to ovn-controller multiple times to store and >> >> > retrieve chain state would likely be *terrible* for performance. >> >> > >> >> > So ideally, the packet would always contain the chain id and index >> >> > information. But as you mentioned, not many NFs actually support >> >> > NSH, so >> >> > it's unlikely that we can rely on the packet to retain the >> >> > information. >> >> > What do? >> >> > >> >> > >> >> > IMO we should be using OF registers. That's what we did in OpenDaylight. >> >> >> >> Just to reiterate what I was saying above, were the packets ever leaving >> >> the switch in this scenario? >> >> >> >> > > >> >> > > >> >> > > 5. OVS node 1 - SFC stage/table matches chain id, index 254, send >> >> > to remote SFF OVS node 2. Enacapsulate in geneve, set Geneve >> >> > metadata chain id and index at 254. >> >> > > >> >> > > 6. OVS node 2 receives packet - SFC stage/table matches chain id, >> >> > index 254, send to NF2 >> >> > > >> >> > > 7. NF2 receives raw packet, modifies something else in the >> >> > packet, sends back to OVS node2 >> >> > > >> >> > > 8. OVS node 2 receives the packet from in_port NF2, restores OF >> >> > register for chain id, stores register for index, now decremented >> >> > to 253 >> >> > > >> >> > > 9. OVS node 2 - SFC stage/table matches on chain ID, index 253, >> >> > has reached the end of chain. Send packet back to original SFF to >> >> > resume datapath pipeline processing. Encapsulate in geneve, set >> >> > chain id and index at 253. >> >> > > >> >> > > 10. OVS node 1 receives packet. Processes chain id and determines >> >> > 253 is end of chain. Continue to next stage of ingress datapath >> >> > pipeline. >> >> > > >> >> > > 11. Regular OVN datapath pipeline finishes, routes packet towards >> >> > server due to dest IP in packet. >> >> > > >> >> > > >> >> > > The chain has effectively rerouted the destination of the packet >> >> > to another server, without needing conntrack to store anything. >> >> > > >> >> > > >> >> > > >> >> > > In the SFC proposal, if the packet is modified, then that >> >> > means we >> >> > > would >> >> > > need to use something other than conntrack to track the chain >> >> > ID. Would >> >> > > we require NSH in order to track the chain ID properly? Or is >> >> > there >> >> > > some >> >> > > other way? >> >> > > >> >> > > > >> >> > > > >> >> > > > Item 1 is the biggest sticking point. From my point of >> >> > view, >> >> > > I prefer >> >> > > > the Nutanix approach of modifying the ACL table since, >> >> > > > * ACLs can be applied to switches or port groups. The >> >> > proposed >> >> > > > SFC_Classifier only applies to port groups. >> >> > > > * ACLs have things like logging and sampling that can >> >> > be >> >> > > useful in this >> >> > > > scenario. >> >> > > > * ACLs can be tiered. >> >> > > > However, if there's a good reason why this will not >> >> > work for >> >> > > ovn-k's >> >> > > > scenario, then that would be good to know. >> >> > > > >> >> > > > >> >> > > > Using the ACLs I think would be fine for the OVNK use >> >> > case as >> >> > > well. The >> >> > > > reason I didn't propose using ACLs were 2 fold: >> >> > > > 1. Trying to create a clear boundary for SFC. Since SFC >> >> > does not >> >> > > behave >> >> > > > like normal networking, I thought it would make sense to >> >> > make it >> >> > > its own >> >> > > > entity. >> >> > > >> >> > > This is where I really wish we had something like composable >> >> > > services in >> >> > > place, because it sounds like SFC is only being added to >> >> > logical >> >> > > switches because that's the current best fit for them. They >> >> > would >> >> > > really >> >> > > be better suited to their own datapath type. >> >> > > >> >> > > But for now, putting them on a logical switch is the best >> >> > choice. >> >> > > >> >> > > The nice thing about ACL stages is that they are very early >> >> > in the >> >> > > logical switch pipelines. We perform FDB and mirror actions >> >> > before the >> >> > > ACL, but that's it. >> >> > > >> >> > > > 2. I didn't think OVN would be amenable to modifying ACL >> >> > to have >> >> > > a new >> >> > > > column to send to a chain. >> >> > > >> >> > > > In the Nutanix proposal it looks like the column is added >> >> > to send >> >> > > to a >> >> > > > NFG. Would we also add the ability to send to a SFC? >> >> > > >> >> > > The way I had thought about it, we could expand NFGs to >> >> > contain SFCs. >> >> > > Currently, an NFG has a list of network functions. But we >> >> > could >> >> > > create a >> >> > > new column in the NFG table that could be one or more SFCs. >> >> > The idea >> >> > > would be that if you configure the network_functions column, >> >> > we use >> >> > > those. If you configure the service_function_chains column, >> >> > we use >> >> > > those >> >> > > instead. It would be a misconfiguration to use both at the >> >> > same time. >> >> > > >> >> > > > >> >> > > > >> >> > > > Currently, I would prefer to review and accept the >> >> > Nutanix >> >> > > patch series >> >> > > > (for ovn25.09) and then add on the ovn-k features >> >> > that are >> >> > > not present >> >> > > > in the series (for ovn26.03). >> >> > > > >> >> > > > Tim, what do you think? >> >> > > > >> >> > > > >> >> > > > I think first we should have a solid plan for how we will >> >> > add on >> >> > > the SFC >> >> > > > part. For example will we expand NFG so that we can load >> >> > balance >> >> > > across >> >> > > > it or only have 1 active at a time? If so, then it would >> >> > maybe make >> >> > > > sense now to add a new field to the NFG to indicate this >> >> > mode. Those >> >> > > > types of detail I would like to iron out and have a plan >> >> > for so >> >> > > we don't >> >> > > > find ourselves cornered when we try to add SFC later. >> >> > wdyt? >> >> > > >> >> > > Yes, this is how my thought process was as well. The current >> >> > NFG >> >> > > configuration allows for multiple network functions to be >> >> > configured, >> >> > > choosing a single one as the active one based on health >> >> > checks. >> >> > > >> >> > > We have to consider that we want to: >> >> > > 1) Allow for multiple functions to be chained. >> >> > > 2) Allow for multiple functions/chains to be load balanced. >> >> > > >> >> > > There are many possibilities for how to implement these based >> >> > on the >> >> > > current patch series. >> >> > > >> >> > > For chaining, I think the best plan is to create a new >> >> > > Service_Function_Chain (or Network_Function_Chain if we want >> >> > to keep >> >> > > the >> >> > > same nomenclature) table. Then the NFG's network_function >> >> > column could >> >> > > allow for either singular functions or chains in the list of >> >> > > network_functions. >> >> > > >> >> > > Alternatively, we could get rid of the current >> >> > Network_Function >> >> > > table in >> >> > > favor of replacing it with the Service_Function_Chain table. >> >> > A >> >> > > Network_Function is nothing more than a >> >> > Service_Function_Chain with a >> >> > > single function, after all. >> >> > > >> >> > > >> >> > > +1 >> >> > > >> >> > > >> >> > > For load balancing, we could either: >> >> > > a) Add a boolean to the NFG table, called load_balance. If >> >> > set to >> >> > > false, >> >> > > then a single active network function or service function >> >> > chain is >> >> > > chosen from the list. If set to true, then all network >> >> > functions or >> >> > > service function chains are viable, and we use load >> >> > balancing to >> >> > > determine which to use. We can still use health checks to >> >> > ensure we >> >> > > only >> >> > > try to load balance between live functions. >> >> > > >> >> > > >> >> > > +1 I think this is probably true but just want to also highlight >> >> > health >> >> > > checks should be optional as well. >> >> > >> >> > I believe in the current patch series health checks are optional. If >> >> > you >> >> > do not set the destination MAC for health checks then they do not >> >> > happen. I can double-check to be sure though. >> >> > >> >> > > >> >> > > b) Create a new Load_Balanced_Service_Function_Chain table >> >> > that >> >> > > specifies lists of load balanced service function chains. >> >> > Then the NFG >> >> > > could place these in the network_functions as well. >> >> > > c) The same as B, but instead of adding a new table, add a >> >> > new >> >> > > column to >> >> > > the existing Load_Balancer table that allows a list of >> >> > > network_functions >> >> > > (or chains) to be listed. Then these load balancers could be >> >> > applied to >> >> > > the NFG the same way as a network function. >> >> > > >> >> > >> >> > At this point, this is my summary of the situation: >> >> > >> >> > The patch series implements a hook system that allows packets to be >> >> > sent >> >> > out to non-mutating NFs. Packets may have traversed other switches >> >> > and >> >> > routers in the network before arriving at the point that the hook is >> >> > invoked. Since the feature extends ACLs, packets can be sent to the >> >> > NFs >> >> > during the ingress or egress pipeline. The NFs must be non-mutating >> >> > because we use conntrack to track the state of the packet. If the >> >> > NFs >> >> > are chained, the chaining must be handled outside of OVN. NSH is >> >> > never >> >> > involved. >> >> > >> >> > The ovn-k proposal seeks to encapsulate packets with NSH info before >> >> > the >> >> > packet arrives in OVN. >> >> > >> >> > >> >> > We don't seek to use NSH. It's just the only real standard out there as >> >> > an SFC transport. I'm fine with not supporting NSH (at least initially) >> >> > and just using Geneve to carry the chain/index information. >> >> >> >> OK great. I'll keep NSH out of discussions from this point since we can >> >> save it for a later add-on. >> >> >> >> > >> >> > Upon arrival in OVN, the packet should, as soon >> >> > as possible, be sent out to NFs. OVN may or may not need to proxy >> >> > the >> >> > NSH information (though it likely will need to since most NFs are >> >> > not >> >> > NSH-aware). Chaining may happen within the NFs or it can be handled >> >> > by >> >> > OVN. The NFs may mutate the packet, meaning OVN cannot use >> >> > conntrack to >> >> > track the chain id or index. Once all NFs have handled the packet, >> >> > then >> >> > it is entered into the typical switch ingress pipeline and handled >> >> > as it >> >> > normally would be. AFAICT, sending the packets to NFs will *always* >> >> > happen ASAP, and cannot happen during the egress pipeline. >> >> > >> >> > >> >> > Yeah this is because we do not want the packet to be altered before it >> >> > is sent to the NF. For example, if the dest was a load balancer VIP, we >> >> > do not want the packet to get DNAT'ed by the LB and then sent to the >> >> > chain later. The classification and diversion of the packet should >> >> > happen as early as possible. >> >> >> >> This is another reason why the composable service would be a good idea, >> >> because the packet would have all SFC processing completed before ever >> >> ingressing a switch. >> >> >> >> If we're not using a composable service but instead using a logical >> >> switch, though, then I have come around to your POV that we should do >> >> SFC as the very first thing on ingress. Waiting until the ACL stage >> >> means performing (potentially pointless and incorrect) FDB lookups on >> >> packets that may be altered by NFs. Those FDB lookups should happen on >> >> the altered packet instead. >> >> >> >> > >> >> > >> >> > Generally speaking, these are very different with regards to >> >> > implementation details. However, I think Numan and Tim are correct >> >> > that >> >> > we could tailor the new tables to be able to work with both use >> >> > cases. I >> >> > can think through this and try to propose something that will work >> >> > for >> >> > both. I had previously thought that the ovn-k case could bend to the >> >> > Nutanix's use case, but I think that's incorrect. I think they are >> >> > distinct enough to exist as separate features in OVN. I don't think >> >> > either use case is invalid, and aside from ensuring the tables can >> >> > accommodate both use cases, I don't think anything should block >> >> > merging >> >> > of the Nutanix patch series. >> >> > >> >> > Now more than ever, I think ovn-k SFC proposal would work best as a >> >> > composable service rather than in OVN logical switches. As a >> >> > refresher, >> >> > the composable service idea is to essentially be able to insert new >> >> > hyper-specialized logical datapath types between VMs and switches, >> >> > or >> >> > between switches and router distributed gateway ports. You could >> >> > place >> >> > an "SFC classifier" datapath between the VM and the switch, allowing >> >> > for >> >> > the SFC processing to happen before the packet ever even enters the >> >> > logical switch, thereby not messing with the logical switch's >> >> > functionality at all. One thing I had always considered with the >> >> > composable services feature was that all composable services would >> >> > still >> >> > operate in OVN's br-int bridge. But if we want to be able to play >> >> > fast >> >> > and loose with register behavior in composable services, it may be a >> >> > requirement to implement them within their own OVS bridges instead. >> >> > This >> >> > way they would have their own independent register space to use as >> >> > they >> >> > see fit, including persisting register values after packets depart. >> >> > Depending on your answer to my questions above with regards to >> >> > register >> >> > restoration, we may be able to implement identical logic to what you >> >> > illustrated within OVN. No composable services are actually >> >> > implemented >> >> > in OVN yet, but ovn-northd refactoring efforts to allow for them to >> >> > exist are posted on patchwork currently. I had planned to try to >> >> > implement a simple NAT composable service as the first one, but SFC >> >> > may >> >> > be a better way of proving their worth, especially if we need to be >> >> > able >> >> > to utilize secondary bridges. >> >> > >> >> > >> >> > Sounds interesting. Do you have a pointer to some links about composable >> >> > services and how to use them? >> >> >> >> Sure, I have this: >> >> https://docs.google.com/document/d/1hRdx9LTiquXoeKQNsTfq0LWGbGOHIqbhGTaxYVB4yUU/edit?tab=t.0 >> >> . From the document, everything up until the "Hook Services" section is >> >> relevant. The "External" service is something I had come up with before >> >> I had ever heard the term "SFC", but it kind of sought to have the same >> >> goal. It proposes OVN dumbly sending packets to an OVS bridge and having >> >> that OVS bridge do whatever it wants to the packet before sending the >> >> packet back into OVN. >> >> >> >> And there's also my presentation I gave last year at OVS+OVN conf: >> >> https://www.youtube.com/watch?v=Gf4M-ZSmTz4 . This definitely is not >> >> centered around SFC (I don't even recall if I mention the "external" >> >> service in the talk), but it goes into detail about the intent of how >> >> composable services will work and some use-cases they help with. >> >> >> >> > >> >> > >> >> > >> >> > > > >> >> > > > >> >> > > > Thanks, >> >> > > > Mark Michelson >> >> > > > > >> >> > > > >> Thanks >> >> > > > >> >> >> > > > >> Tim Rozet >> >> > > > >> Red Hat OpenShift Networking Team >> >> > > > >> >> >> > > > >>> >> >> > > > >> _______________________________________________ >> >> > > > >> dev mailing list >> >> > > > >> d...@openvswitch.org <mailto:d...@openvswitch.org> >> >> > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org>> >> >> > > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org> >> >> > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org>>> >> >> > > > >> https://mail.openvswitch.org/mailman/listinfo/ovs- >> >> > dev <https://mail.openvswitch.org/mailman/listinfo/ovs-dev> >> >> > > <https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> > <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>> <https:// >> >> > > > mail.openvswitch.org/mailman/listinfo/ovs-dev <http:// >> >> > mail.openvswitch.org/mailman/listinfo/ovs-dev> <http:// >> >> > > mail.openvswitch.org/mailman/listinfo/ovs-dev <http:// >> >> > mail.openvswitch.org/mailman/listinfo/ovs-dev>>> >> >> > > > > _______________________________________________ >> >> > > > > dev mailing list >> >> > > > > d...@openvswitch.org <mailto:d...@openvswitch.org> >> >> > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org>> >> >> > > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org> >> >> > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org>>> >> >> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs- >> >> > dev <https://mail.openvswitch.org/mailman/listinfo/ovs-dev> >> >> > > <https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> > <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>> <https:// >> >> > > > mail.openvswitch.org/mailman/listinfo/ovs-dev <http:// >> >> > mail.openvswitch.org/mailman/listinfo/ovs-dev> <http:// >> >> > > mail.openvswitch.org/mailman/listinfo/ovs-dev <http:// >> >> > mail.openvswitch.org/mailman/listinfo/ovs-dev>>> >> >> > > > >> >> > > >> >> > >> >> >> _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev