On 6/26/25 6:02 PM, Dumitru Ceara wrote: > On 6/16/25 5:26 PM, Numan Siddique wrote: >> On Sun, May 25, 2025 at 10:46 PM Sragdhara Datta Chaudhuri >> <sragdha.chau...@nutanix.com> wrote: >>> >>> RFC: NETWORK FUNCTION INSERTION IN OVN >>> >>> 1. Introduction >>> ================ >>> The objective is to insert a Network Function (NF) in the path of >>> outbound/inbound traffic from/to a port-group. The use case is to integrate >>> a 3rd party service in the path of traffic. An example of such a service >>> would be layer7 firewall. The NF VM will be like a bump in the wire and >>> should not modify the packet, i.e. the IP header, the MAC addresses, VLAN >>> tag, sequence numbers remain unchanged. >>> >>> Here are some of the highlights: >>> - A new entity network-function (NF) has been introduced. It contains a >>> pair of LSPs. The CMS would designate one as “inport” and the other as >>> “outport”. >>> - For high-availability, a network function group (NFG) entity consists of >>> a group of NFs. Only one NF in a NFG has an active role based on health >>> monitoring. >>> - ACL would accept NFG as a parameter and traffic matching the ACL would be >>> redirected to the associated active NF’s port. NFG is accepted for stateful >>> allow action only. >>> - The ACL’s port-group is the point of reference when defining the role of >>> the NF ports. The “inport” is the port closer to the port-group and >>> “outport” is the one away from it. For from-lport ACLs, the request packets >>> would be redirected to the NF “inport” and for to-lport ACLs, the request >>> packets would be redirected to NF “outport”. When the same packet comes out >>> of the other NF port, it gets simply forwarded. >>> - Statefulness will be maintained, i.e. the response traffic will also go >>> through the same pair of NF ports but in reverse order. >>> - For the NF ports we need to disable port security check, fdb learning and >>> multicast/broadcast forwarding. >>> - Health monitoring involves ovn-controller periodically injecting ICMP >>> probe packets into the NF inport and monitor the same packet coming out of >>> the NF outport. >>> - If the traffic redirection involves cross-host traffic (e.g. for a >>> from-lport ACL, if the source VM and NF VM are on different hosts), packets >>> would be tunneled to and from the NF VM's host. >>> - If the port-group to which the ACL is being applied has members spread >>> across multiple LSs, CMS needs to create child ports for the NF ports on >>> each of these LSs. The redirection rules in each LS will use the child >>> ports on that LS. >>> >>> 2. NB tables >>> ============= >>> New NB tables >>> —------------ >>> Network_Function: Each row contains {inport, outport, health_check} >>> Network_Function_Group: Each row contains a list of Network_Function >>> entities. It also contains a unique id (between 1 and 255, generated by >>> northd) and a reference to the current active NF. >>> Network_Function_Health_Check: Each row contains configuration for probes >>> in options field: {interval, timeout, success_count, failure_count} >>> >>> "Network_Function_Health_Check": { >>> "columns": { >>> "name": {"type": "string"}, >>> "options": { >>> "type": {"key": "string", >>> "value": "string", >>> "min": 0, >>> "max": "unlimited"}}, >>> "external_ids": { >>> "type": {"key": "string", "value": "string", >>> "min": 0, "max": "unlimited"}}}, >>> "isRoot": true}, >>> "Network_Function": { >>> "columns": { >>> "name": {"type": "string"}, >>> "outport": {"type": {"key": {"type": "uuid", >>> "refTable": >>> "Logical_Switch_Port", >>> "refType": "strong"}, >>> "min": 1, "max": 1}}, >>> "inport": {"type": {"key": {"type": "uuid", >>> "refTable": >>> "Logical_Switch_Port", >>> "refType": "strong"}, >>> "min": 1, "max": 1}}, >>> "health_check": {"type": { >>> "key": {"type": "uuid", >>> "refTable": "Network_Function_Health_Check", >>> "refType": "strong"}, >>> "min": 0, "max": 1}}, >>> "external_ids": { >>> "type": {"key": "string", "value": "string", >>> "min": 0, "max": "unlimited"}}}, >>> "isRoot": true}, >>> "Network_Function_Group": { >>> "columns": { >>> "name": {"type": "string"}, >>> "network_function": {"type": >>> {"key": {"type": "uuid", >>> "refTable": "Network_Function", >>> "refType": "strong"}, >>> "min": 0, "max": "unlimited"}}, >>> "mode": {"type": {"key": {"type": "string", >>> "enum": ["set", ["inline"]]}}}, >>> "network_function_active": {"type": >>> {"key": {"type": "uuid", >>> "refTable": "Network_Function", >>> "refType": "strong"}, >>> "min": 0, "max": 1}}, >>> "id": { >>> "type": {"key": {"type": "integer", >>> "minInteger": 0, >>> "maxInteger": 255}}}, >>> "external_ids": { >>> "type": {"key": "string", "value": "string", >>> "min": 0, "max": "unlimited"}}}, >>> "isRoot": true}, >>> >>> >>> Modified NB table >>> —---------------- >>> ACL: The ACL entity would have a new optional field that is a reference to >>> a Network_Function_Group entity. This field can be present only for >>> stateful allow ACLs. >>> >>> "ACL": { >>> "columns": { >>> "network_function_group": {"type": {"key": {"type": "uuid", >>> "refTable": >>> "Network_Function_Group", >>> "refType": "strong"}, >>> "min": 0, >>> "max": 1}}, >>> >>> New options for Logical_Switch_Port >>> —---------------------------------- >>> receive_multicast=<boolean>: Default true. If set to false, LS will not >>> forward broadcast/multicast traffic to this port. This is to prevent >>> looping of such packets. >>> >>> lsp_learn_fdb=<boolean>: Default true. If set to false, fdb learning will >>> be skipped for packets coming out of this port. Redirected packets from the >>> NF port would be carrying the originating VM’s MAC in source, and so >>> learning should not happen. >>> >>> CMS needs to set both the above options to false for NF ports, in addition >>> to disabling port security. >>> >>> network-function-linked-port=<lsp-name>: Each NF port needs to have this >>> set to the other NF port of the pair. >>> >>> New NB_global options >>> —-------------------- >>> svc_monitor_mac_dst: destination MAC of probe packets (svc_monitor_mac is >>> already there and will be used as source MAC) >>> svc_monitor_ip4: source IP of probe packets >>> svc_monitor_ip4_dst: destination IP of probe packets >>> >>> Sample configuration >>> —------------------- >>> ovn-nbctl ls-add ls1 >>> ovn-nbctl lsp-add ls1 nfp1 >>> ovn-nbctl lsp-add ls1 nfp2 >>> ovn-nbctl set logical_switch_port nfp1 options:receive_multicast=false >>> options:lsp_learn_fdb=false options:network-function-linked-port=nfp2 >>> ovn-nbctl set logical_switch_port nfp2 options:receive_multicast=false >>> options:lsp_learn_fdb=false options:network-function-linked-port=nfp1 >>> ovn-nbctl network-function-add nf1 nfp1 nfp2 >>> ovn-nbctl network-function-group-add nfg1 nf1 >>> ovn-nbctl lsp-add ls1 p1 -- lsp-set-addresses p1 "50:6b:8d:3e:ed:c4 >>> 10.1.1.4" >>> ovn-nbctl pg-add pg1 p1 >>> ovn-nbctl create Address_Set name=as1 addresses=10.1.1.4 >>> ovn-nbctl lsp-add ls1 p2 -- lsp-set-addresses p2 "50:6b:8d:3e:ed:c5 >>> 10.1.1.5" >>> ovn-nbctl create Address_Set name=as2 addresses=10.1.1.5 >>> ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' >>> allow-related nfg1 >>> ovn-nbctl acl-add pg1 to-lport 100 'outport==@pg1 && ip4.src == $as2' >>> allow-related nfg1 >>> >>> 3. SB tables >>> ============ >>> Service_Monitor: >>> This is currently used by Load balancer. New fields are: “type” - to >>> indicate LB or NF, “mac” - the destination MAC address for monitor packets, >>> “logical_input_port” - the LSP to which the probe packet would be sent. >>> Also, “icmp” has been added as a protocol type, used only for NF. >>> >>> "Service_Monitor": { >>> "columns": { >>> "type": {"type": {"key": { >>> "type": "string", >>> "enum": ["set", ["load-balancer", >>> "network-function"]]}}}, >>> "mac": {"type": "string"}, >>> "protocol": { >>> "type": {"key": {"type": "string", >>> "enum": ["set", ["tcp", "udp", "icmp"]]}, >>> "min": 0, "max": 1}}, >>> "logical_input_port": {"type": "string"}, >>> >>> northd would create one Service_Monitor entity for each NF. The >>> logical_input_port and logical_port would be populated from the NF inport >>> and outport fields respectively. The probe packets would be injected into >>> the logical_input_port and would be monitored out of logical_port. >>> >>> 4. Logical Flows >>> ================ >>> Logical Switch ingress pipeline: >>> - in_network_function added after in_stateful. >>> - Modifications to in_acl_eval, in_stateful and in_l2_lookup. >>> Logical Switch egress pipeline: >>> - out_network_function added after out_stateful. >>> - Modifications to out_pre_acl, out_acl_eval and out_stateful. >>> >>> 4.1 from-lport ACL >>> ------------------ >>> The diagram shows the request path for packets from VM1 port p1, which is a >>> member of the pg to which ACL is applied. The response would follow the >>> reverse path, i.e. packet would be redirected to nfp2 and come out of nfp1 >>> and be forwarded to p1. >>> Also, p2 does not need to be on the same LS. Only the p1, nfp1, nfp2 are on >>> the same LS. >>> >>> ----- ------- ----- >>> | VM1 | | NF VM | | VM2 | >>> ----- ------- ----- >>> | /\ | / \ >>> | | | | >>> \ / | \ / | >>> ------------------------------------------------------------ >>> | p1 nfp1 nfp2 p2 | >>> | | >>> | Logical Switch | >>> ------------------------------------------------------------- >>> pg1: [p1] as2: [p2-ip] >>> ovn-nbctl network-function-add nf1 nfp1 nfp2 >>> ovn-nbctl network-function-group-add nfg1 nf1 >>> ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' >>> allow-related nfg1 >>> Say, the unique id northd assigned to this NFG, is 123 >>> >>> The request packets from p1 matching a from-lport ACL with NFG, are >>> redirected to nfp1 and the NFG id is committed to the ct label in p1's >>> zone. When the same packet comes out of nfp2 it gets forwarded the normal >>> way. >>> Response packets have destination as p1's MAC. Ingress processing sets the >>> outport to p1 and the CT lookup in egress pipeline (in p1's ct zone) yields >>> the NFG id and the packet injected back to ingress pipeline after setting >>> the outport to nfp2. >>> >>> Below are the changes in detail. >>> >>> 4.1.1 Request processing >>> ------------------------ >>> >>> in_acl_eval: For from-lport ACLs with NFG, the existing rule's action has >>> been enhanced to set: >>> - reg8[21] = 1: to indicate that packet has matched a rule with NFG >>> - reg5[0..7] = <NFG-unique-id> >>> - reg8[22] = <direction> (1: request, 0: response) >>> >>> table=8 (ls_in_acl_eval), priority=1200 , match=(reg0[7] == 1 && >>> (inport==@pg1 && ip4.dst == $as2)), action=(reg8[16] = 1; reg0[1] = 1; >>> reg8[21] = 1; reg8[22] = 1; reg5[0..7] = 123; next;) >>> table=8 (ls_in_acl_eval), priority=1200 , match=(reg0[8] == 1 && >>> (inport==@pg1 && ip4.dst == $as2)), action=(reg8[16] = 1; reg8[21] = 1; >>> reg8[22] = 1; reg5[0..7] = 123; next;) >>> >>> in_stateful: Priority 110: set NFG id in CT label if reg8[21] is set. >>> - bit 7 (ct_label.network_function_group): Set to 1 to indicate NF >>> insertion. >>> - bits 17 to 24 (ct_label.network_function_group_id): Stores the 8 bit NFG >>> id >>> >>> table=21(ls_in_stateful ), priority=110 , match=(reg0[1] == 1 && >>> reg0[13] == 0 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; >>> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; >>> ct_label.network_function_group = 1; ct_label.network_function_group_id = >>> reg5[0..7]; }; next;) >>> table=21(ls_in_stateful ), priority=110 , match=(reg0[1] == 1 && >>> reg0[13] == 1 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; >>> ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; >>> ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; >>> ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 1; >>> ct_label.network_function_group_id = reg5[0..7]; }; next;) >>> table=21(ls_in_stateful ), priority=100 , match=(reg0[1] == 1 && >>> reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; >>> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; >>> ct_label.network_function_group = 0; ct_label.network_function_group_id = >>> 0; }; next;) >>> table=21(ls_in_stateful ), priority=100 , match=(reg0[1] == 1 && >>> reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; >>> ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; >>> ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; >>> ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 0; >>> ct_label.network_function_group_id = 0; }; next;) >>> table=21(ls_in_stateful ), priority=0 , match=(1), action=(next;) >>> >>> >>> For non-NFG cases, the existing priority 100 rules will be hit. There >>> additional action has been added to clear the NFG bits in ct label. >>> >>> in_network_function: A new stage with priority 99 rules to redirect packets >>> by setting outport to the NF “inport” (or its child port) based on the NFG >>> id set by the prior ACL stage. >>> Priority 100 rules ensure that when the same packets come out of the NF >>> ports, they are not redirected again (the setting of reg5 here relates to >>> the cross-host packet tunneling and will be explained later). >>> Priority 1 rule: if reg8[21] is set, but the NF port (or child port) is not >>> present on this LS, drop packets. >>> >>> table=22(ls_in_network_function), priority=100 , match=(inport == >>> "nfp1"), action=(reg5[16..31] = ct_label.tun_if_id; next;) >>> table=22(ls_in_network_function), priority=100 , match=(inport == >>> "nfp2"), action=(reg5[16..31] = ct_label.tun_if_id; next;) >>> table=22(ls_in_network_function), priority=100 , match=(reg8[21] == 1 && >>> eth.mcast), action=(next;) >>> table=22(ls_in_network_function), priority=99 , match=(reg8[21] == 1 && >>> reg8[22] == 1 && reg5[0..7] == 1), action=(outport = "nfp1"; output;) >>> table=22(ls_in_network_function), priority=1 , match=(reg8[21] == 1), >>> action=(drop;) >>> table=22(ls_in_network_function), priority=0 , match=(1), >>> action=(next;) >>> >>> >>> 4.1.2 Response processing >>> ------------------------- >>> out_acl_eval: High priority rules that allow response and related packets >>> to go through have been enhanced to also copy CT label NFG bit into >>> reg8[21]. >>> >>> table=6(ls_out_acl_eval), priority=65532, match=(!ct.est && ct.rel && >>> !ct.new && !ct.inv && ct_mark.blocked == 0), action=(reg8[21] = >>> ct_label.network_function_group; reg8[16] = 1; ct_commit_nat;) >>> table=6(ls_out_acl_eval), priority=65532, match=(ct.est && !ct.rel && >>> !ct.new && !ct.inv && ct.rpl && ct_mark.blocked == 0), action=(reg8[21] = >>> ct_label.network_function_group; reg8[16] = 1; next;) >>> >>> out_network_function: Priority 99 rule matches on the nfg_id in ct_label >>> and sets the outport to the NF “outport”. It also sets reg8[23]=1 and >>> injects the packet to ingress pipeline (in_l2_lookup). >>> Priority 100 rule forwards all packets to NF ports to the next table. >>> >>> table=11 (ls_out_network_function), priority=100 , match=(outport == >>> "nfp1"), action=(next;) >>> table=11 (ls_out_network_function), priority=100 , match=(outport == >>> "nfp2"), action=(next;) >>> table=11(ls_out_network_function), priority=100 , match=(reg8[21] == 1 >>> && eth.mcast), action=(next;) >>> table=11 (ls_out_network_function), priority=99 , match=(reg8[21] == 1 >>> && reg8[22] == 0 && ct_label.network_function_group_id == 123), >>> action=(outport = "nfp2"; reg8[23] = 1; next(pipeline=ingress, table=29);) >>> table=11 (ls_out_network_function), priority=1 , match=(reg8[21] == >>> 1), action=(drop;) >>> table=11 (ls_out_network_function), priority=0 , match=(1), >>> action=(next;) >>> >>> in_l2_lkup: if reg8[23] == 1 (packet has come back from egress), simply >>> forward such packets as outport is already set. >>> >>> table=29(ls_in_l2_lkup), priority=100 , match=(reg8[23] == 1), >>> action=(output;) >>> >>> The above set of rules ensure that the response packet is sent to nfp2. >>> When the same packet comes out of nfp1, the ingress pipeline would set the >>> outport to p1 and it enters the egress pipeline. >>> >>> out_pre_acl: If the packet is coming from the NF inport, skip the egress >>> pipeline upto the out_nf stage, as the packet has already gone through it >>> and we don't want the same packet to be processed by CT twice. >>> table=2 (ls_out_pre_acl ), priority=110 , match=(inport == "nfp1"), >>> action=(next(pipeline=egress, table=12);) >>> >>> >>> 4.2 to-lport ACL >>> ---------------- >>> ----- -------- ----- >>> | VM1 | | NF VM | | VM2 | >>> ----- -------- ----- >>> / \ | / \ | >>> | | | | >>> | \ / | \ / >>> ------------------------------------------------------------- >>> | p1 nfp1 nfp2 p2 | >>> | | >>> | Logical Switch | >>> ------------------------------------------------------------- >>> ovn-nbctl acl-add pg1 to-lport 100 'outport==@pg1&& ip4.src == $as2' >>> allow-related nfg1 >>> Diagram shows request traffic path. The response will follow a reverse path. >>> >>> Ingress pipeline sets the outport to p1 based on destination MAC lookup. >>> The packet enters the egress pipeline. There the to-lport ACL with NFG gets >>> evaluated and the NFG id gets committed to the CT label. Then the outport >>> is set to nfp2 and then the packet is injected back to ingress. When the >>> same packet comes out of nfp1, it gets forwarded to p1 the normal way. >>> >From the response packet from p1, ingress pipeline gets the NFG id from CT >>> >label and accordingly redirects it to nfp1. When it comes out of nfp2 it >>> >is forwarded the normal way. >>> >>> 4.2.1 Request processing >>> ------------------------ >>> out_acl_eval: For to-lport ACLs with NFG, the existing rule's action has >>> been enhanced to set: >>> - reg8[21] = 1: to indicate that packet has matched a rule with NFG >>> - reg5[0..7] = <NFG-unique-id> >>> - reg8[22] = <direction> (1: request, 0: response) >>> >>> table=6 (ls_out_acl_eval ), priority=1100 , match=(reg0[7] == 1 && >>> (outport==@pg1 && ip4.src == $as2)), action=(reg8[16] = 1; reg0[1] = 1; >>> reg8[21] = 1; reg8[22] = 1; reg5[0..7] = 1; next;) >>> table=6 (ls_out_acl_eval ), priority=1100 , match=(reg0[8] == 1 && >>> (outport==@pg1 && ip4.src == $as2)), action=(reg8[16] = 1; reg0[1] = 1; >>> reg8[21] = 1; reg8[22] = 1; reg5[0..7] = 1; next;) >>> >>> >>> >>> Out_stateful: Priority 110: set NFG id in CT label if reg8[21] is set. >>> >>> table=10(ls_out_stateful ), priority=110 , match=(reg0[1] == 1 && >>> reg0[13] == 0 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; >>> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; >>> ct_label.network_function_group = 1; ct_label.network_function_group_id = >>> reg5[0..7]; }; next;) >>> table=10(ls_out_stateful ), priority=110 , match=(reg0[1] == 1 && >>> reg0[13] == 1 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; >>> ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; >>> ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; >>> ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 1; >>> ct_label.network_function_group_id = reg5[0..7]; }; next;) >>> table=10(ls_out_stateful ), priority=100 , match=(reg0[1] == 1 && >>> reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; >>> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; >>> ct_label.network_function_group = 0; ct_label.network_function_group_id = >>> 0; }; next;) >>> table=10(ls_out_stateful ), priority=100 , match=(reg0[1] == 1 && >>> reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; >>> ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; >>> ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; >>> ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 0; >>> ct_label.network_function_group_id = 0; }; next;) >>> table=10(ls_out_stateful ), priority=0 , match=(1), action=(next;) >>> >>> out_network_function: A new stage that has priority 99 rules to redirect >>> packet by setting outport to the NF “outport” (or its child port) based on >>> the NFG id set by the prior ACL stage, and then injecting back to ingress. >>> Priority 100 rules ensure that when the packets are going to NF ports, they >>> are not redirected again. >>> Priority 1 rule: if reg8[21] is set, but the NF port (or child port) is not >>> present on this LS, drop packets. >>> >>> table=11(ls_out_network_function), priority=100 , match=(outport == >>> "nfp1"), action=(next;) >>> table=11(ls_out_network_function), priority=100 , match=(outport == >>> "nfp2"), action=(next;) >>> table=11(ls_out_network_function), priority=100 , match=(reg8[21] == 1 >>> && eth.mcast), action=(next;) >>> table=11(ls_out_network_function), priority=99 , match=(reg8[21] == 1 >>> && reg8[22] == 1 && reg5[0..7] == 123), action=(outport = "nfp2"; reg8[23] >>> = 1; next(pipeline=ingress, table=29);) >>> table=11(ls_out_network_function), priority=1 , match=(reg8[21] == 1), >>> action=(drop;) >>> table=11(ls_out_network_function), priority=0 , match=(1), >>> action=(next;) >>> >>> >>> in_l2_lkup: As described earlier, the priority 100 rule will forward these >>> packets. >>> >>> Then the same packet comes out from nfp1 and goes through the ingress >>> processing where the outport gets set to p1. The egress pipeline >>> out_pre_acl priority 110 rule described earlier, matches against inport as >>> nfp1 and directly jumps to the stage after out_network_function. Thus the >>> packet is not redirected again. >>> >>> 4.2.2 Response processing >>> ------------------------- >>> in_acl_eval: High priority rules that allow response and related packets to >>> go through have been enhanced to also copy CT label NFG bit into reg8[21]. >>> >>> table=8(ls_in_acl_eval), priority=65532, match=(!ct.est && ct.rel && >>> !ct.new && !ct.inv && ct_mark.blocked == 0), action=(reg0[17] = 1; reg8[21] >>> = ct_label.network_function_group; reg8[16] = 1; ct_commit_nat;) >>> table=8 (ls_in_acl_eval), priority=65532, match=(ct.est && !ct.rel && >>> !ct.new && !ct.inv && ct.rpl && ct_mark.blocked == 0), action=(reg0[9] = 0; >>> reg0[10] = 0; reg0[17] = 1; reg8[21] = ct_label.network_function_group; >>> reg8[16] = 1; next;) >>> >>> in_network_function: Priority 99 rule matches on the nfg_id in ct_label and >>> sets the outport to the NF “inport”. >>> Priority 100 rule forwards all packets to NF ports to the next table. >>> table=22(ls_in_network_function), priority=99 , match=(reg8[21] == 1 && >>> reg8[22] == 0 && ct_label.network_function_group_id == 123), >>> action=(outport = "nfp1"; output;) >>> >>> >>> 5. Cross-host Traffic for VLAN Network >>> ====================================== >>> For overlay subnets, all cross-host traffic exchanges are tunneled. In the >>> case of VLAN subnets, there needs to be special handling to selectively >>> tunnel only the traffic to or from the NF ports. >>> Take the example of a from-lport ACL. Packets from p1 to p2, gets >>> redirected to nfp1 in host1. If this packet is simply sent out from host1, >>> the physical network will directly forward it to host2 where VM2 is. So, we >>> need to tunnel the redirected packets from host1 to host3. Now, once the >>> packets come out of nfp2, if host3 sends the packets out, the physical >>> network would learn p1's MAC coming from host3. So, these packets need to >>> be tunneled back to host1. From there the packet would be forwarded to VM2 >>> via the physical network. >>> >>> ----- ----- -------- >>> | VM2 | | VM1 | | NF VM | >>> ----- ----- -------- >>> / \ | / \ | >>> | (7) | (1) (3)| |(4) >>> | \ / | \ / >>> -------------- -------------- (2) --------------- >>> | p2 | (6) | p1 |______\ | nfp1 nfp2 | >>> | |/____ | |------/ | | >>> | host2 |\ | host1 |/______ | host3 | >>> | | | |\------ | | >>> -------------- -------------- (5) -------------- >>> >>> The above figure shows the request packet path for a from-lport ACL. >>> Response would follow the same path in reverse direction. >>> >>> To achieve this, the following would be done: >>> >>> On host where the ACL port group members are present (host1) >>> —----------------------------------------------------------- >>> REMOTE_OUTPUT (table 42): >>> Currently, it tunnels traffic destined to all non-local overlay ports to >>> their associated hosts. The same rule is now also added for traffic to >>> non-local NF ports. Thus the packets from p1 get tunneled to host 3. >>> >>> On host with NF (host3) forward packet to nfp1 >>> —---------------------------------------------- >>> Upon reaching host3, the following rules come into play: >>> PHY_TO_LOG (table 0): >>> Ppriority 100: Existing rule - for each geneve tunnel interface on the >>> chassis, copies info from header to inport, outport, metadata registers. >>> Now the same rule also stores the tun intf id in a register (reg5[16..31]). >>> >>> CHECK_LOOPBACK (table 44) >>> This table has a rule that clears all the registers. The change is to skip >>> the clearing of reg5[16..31]. >>> >>> Logical egress pipeline: >>> >>> ls_out_stateful priority 120: If the outport is an NF port, copy >>> reg5[16..31] (table0 had set it) to ct_label.tun_if_id.) >>> >>> table=10(ls_out_stateful ), priority=120 , match=(outport == "nfp1" >>> && reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; >>> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; >>> ct_label.tun_if_id = reg5[16..31]; }; next;) >>> table=10(ls_out_stateful ), priority=120 , match=(outport == "nfp1" >>> && reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; >>> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; >>> ct_mark.obs_stage = reg8[19..20]; ct_mark.obs_collector_id = reg8[8..15]; >>> ct_label.obs_point_id = reg9; ct_label.tun_if_id = reg5[16..31]; }; next;) >>> >>> The above sequence of flows ensure that if a packet is received via tunnel >>> on host3, with outport as nfp1, the tunnel interface id is committed to the >>> ct entry in nfp1's zone. >>> >>> On host with NF (host3) tunnel packets from nfp2 back to host1 >>> —-------------------------------------------------------------- >>> When the same packet comes out of nfp2 on host3: >>> >>> LOCAL_OUTPUT (table 43) >>> When the packet comes out of the other NF port (nfp2), following two rules >>> send it back to the host that it originally came from: >>> >>> Priority 110: For each NF port local to this host, following rule processes >>> the >>> packet through CT of linked port (for nfp2, it is nfp1): >>> match: inport==nfp2 && RECIRC_BIT==0 >>> action: RECIRC_BIT = 1, ct(zone=nfp1’s zone, table=LOCAL), resubmit to >>> table 43 >>> >>> Priority 109: For each {tunnel_id, nf port} on this host, if the tun_if_id >>> in ct_label matches the tunnel_id, send the recirculated packet using >>> tunnetl_id: >>> match: inport==nfp1 && RECIRC_BIT==1 && ct_label.tun_if_id==<tun-id> >>> action: tunnel packet using tun-id >>> >>> If p1 and nfp1 happen to be on the same host, the tun_if_id would not be >>> set and thus none of the priority 109 rules would match. It would be >>> forwarded the usual way matching the existing priority 100 rules in >>> LOCAL_TABLE. >>> >>> Special handling of the case where NF responds back on nfp1, instead of >>> forwarding packet out of nfp2: >>> For example, a SYN packet from p1 got redirected to nfp1. Then the NF, >>> which is a firewall VM, drops the SYN and sends RST back on port nfp1. In >>> this case, looking up in the linked port (nfp2) ct zone will not give >>> anything. The following rule uses ct.inv to identify such scenarios and >>> uses nfp1’s CT zone to send the packet back. To achieve this, following 2 >>> rules are installed: >>> >>> in_network_function: >>> Priority 100 rule that allows packets incoming from NF type ports, is >>> enhanced with additional action to store the tun_if_id from ct_label into >>> reg5[16..31]. >>> table=22(ls_in_network_function), priority=100 , match=(inport == >>> "nfp1"), action=(reg5[16..31] = ct_label.tun_if_id; next;) >>> >>> LOCAL_OUTPUT (table 43) >>> Priority 110 rule: for recirculated packets, if ct (of the linked port) is >>> invalid, use the tun id from reg5[16..31] to tunnel the packet back to >>> host1 (as CT zone info has been overwritten in the above 110 priority rule >>> in table 42). >>> match: inport==nf1 && RECIRC_BIT==1 && ct.inv && >>> MFF_LOG_TUN_OFPORT==<tun-id> >>> action: tunnel packet using tun-id >>> >>> >>> 6. NF insertion across logical switches >>> ======================================= >>> If the port-group where the ACL is being applied has members across >>> multiple logical switches, there needs to be a NF port pair on each of >>> these switches. >>> The NF VM will have only one inport and one outport. The CMS is expected to >>> create child ports linked to these ports on each logical switch where >>> port-group members are present. >>> The network-function entity would be configured with the parent ports only. >>> When CMS creates the child ports, it does not need to change any of the NF, >>> NFG or ACL config tables. >>> When northd configures the redirection rules for a specific LS, it will use >>> the parent or child port depending on what it finds on that LS. >>> -------- >>> | NF VM | >>> -------- >>> | | >>> ----- | | ----- >>> | VM1 | nfp1 nfp2 | VM2 | >>> ---- - | | -------------- ----- | | >>> | | | | SVC LS | | | | >>> p1| nfp1_ch1 nfp2_ch1 -------------- p3| nfp1_ch2 >>> nfp2_ch2 >>> -------------------- >>> -------------------- >>> | LS1 | | LS2 >>> | >>> -------------------- >>> -------------------- >>> >>> In this example, the CMS created the parent ports for the NF VM on LS named >>> SVC LS. The ports are nfp1 and nfp2. The CMS configures the NF using these >>> ports: >>> ovn-nbctl network-function-add nf1 nfp1 nfp2 >>> ovn-nbctl network-function-group-add nfg1 nf1 >>> ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' >>> allow-related nfg1 >>> >>> The port group to which the ACL is applied is pg1 and pg1 has two ports: p1 >>> on LS1 and p3 on LS2. >>> The CMS needs to create child ports for the NF ports on LS1 and LS2. On >>> LS1: nfp1_ch1 and nfp2_ch1. On LS2: nfp1_ch2 and nfp2_ch2 >>> >>> When northd creates rules on LS1, it would use nfp1_ch1 and nfp2_ch1. >>> >>> table=22(ls_in_network_function), priority=100 , match=(inport == >>> "nfp2_ch1"), action=(reg5[16..31] = ct_label.tun_if_id; next;) >>> table=22(ls_in_network_function), priority=99 , match=(reg8[21] == 1 && >>> reg8[22] == 1 && reg5[0..7] == 1), action=(outport = "nfp1_ch1"; output;) >>> >>> When northd is creating rules on LS2, it would use nfp1_ch2 and nfp2_ch2. >>> table=22(ls_in_network_function), priority=100 , match=(inport == >>> "nfp2_ch2"), action=(reg5[16..31] = ct_label.tun_if_id; next;) >>> table=22(ls_in_network_function), priority=99 , match=(reg8[21] == 1 && >>> reg8[22] == 1 && reg5[0..7] == 1), action=(outport = "nfp1_ch2"; output;) >>> >>> >>> 7. Health Monitoring >>> ==================== >>> The LB health monitoring functionality has been extended to support NFs. >>> Network_Function_Group has a list of Network_Functions, each of which has a >>> reference to network_Function_Health_Check that has the monitoring config. >>> There is a corresponding SB service_monitor maintaining the online/offline >>> status. When status changes, northd picks one of the “online” NFs and sets >>> it in the network_function_active field of NFG. The redirection rule in LS >>> uses the ports from this NF. >>> >>> Ovn-controller performs the health monitoring by sending ICMP echo request >>> with source IP and MAC from NB global options “svc_monitor_ip4” and >>> “svc_monitor_mac”, and destination IP and MAC from new NB global options >>> “svc_monitor_ip4_dst” and “svc_monitor_mac_dst”. The sequence number and id >>> are randomly generated and stored in service_mon. The NF VM forwards the >>> same packet out of the other port. When it comes out, ovn-controller >>> matches the sequence number and id with stored values and marks online if >>> matched. >>> >>> V1: >>> - First patch. >>> >>> V2: >>> - Rebased code. >>> - Added "mode" field in Network_function_group table, with only allowed >>> value as "inline". This is for future expansion to include "mirror" >>> mode. >>> - Added a flow in the in_network_function and out_network_function table >>> to >>> skip redirection of multicast traffic. >>> >>> V3: >>> - Rebased code. >>> >>> Sragdhara Datta Chaudhuri (5): >>> ovn-nb: Network Function insertion OVN-NB schema changes >>> ovn-nbctl: Network Function insertion commands. >>> northd, tests: Network Function insertion logical flow programming. >>> controller, tests: Network Function insertion tunneling of cross-host >>> VLAN traffic. >>> northd, controller: Network Function Health monitoring. >>> >>> controller/physical.c | 271 +++++++++++- >>> controller/pinctrl.c | 252 +++++++++-- >>> include/ovn/logical-fields.h | 14 + >>> lib/logical-fields.c | 26 ++ >>> lib/ovn-util.h | 4 +- >>> northd/en-global-config.c | 75 ++++ >>> northd/en-global-config.h | 12 +- >>> northd/en-multicast.c | 2 +- >>> northd/en-northd.c | 8 + >>> northd/en-sync-sb.c | 16 +- >>> northd/inc-proc-northd.c | 6 +- >>> northd/northd.c | 796 +++++++++++++++++++++++++++++++++-- >>> northd/northd.h | 41 +- >>> ovn-nb.ovsschema | 64 ++- >>> ovn-nb.xml | 123 ++++++ >>> ovn-sb.ovsschema | 12 +- >>> ovn-sb.xml | 22 +- >>> tests/ovn-controller.at | 6 +- >>> tests/ovn-nbctl.at | 83 ++++ >>> tests/ovn-northd.at | 548 ++++++++++++++++++------ >>> tests/ovn.at | 143 +++++++ >>> utilities/ovn-nbctl.c | 533 ++++++++++++++++++++++- >>> 22 files changed, 2792 insertions(+), 265 deletions(-) >> >> > > Hi Sragdhara, Numan, > >> Hi Sragdhara, >> >> I want to test out this patch series and looks like they don't apply >> cleanly. Can you either rebase the patch series again or >> share me the link to your cloned github branch which has these commits >> pushed, >> > > In order to help move things along I went ahead and rebased this series > in my fork on top of current main: > > https://github.com/dceara/ovn/tree/refs/heads/review-pws458355-network-function-insertion-v3 >
It might be because of the way I rebased the patches but it seems some of the tests fail with latest main and this series applied: https://github.com/dceara/ovn/actions/runs/15904875892/job/44857195972#step:12:5349 >> Also, please take a look at this proposal - >> https://mail.openvswitch.org/pipermail/ovs-dev/2025-June/424080.html >> > > The discussion is still going on there but it seems to me we might have > to treat the two proposals (Network Function from Nutanix and Service > Function Chaining from Red Hat) as different features (there's still a > chance we can extend the Nutanix one in order to implement the > ovn-kubernetes requirements). > >> Looks to me both your patch series and this proposal are trying to >> solve the same use case. And it makes sense to have >> a solution which works for both the proposal or which can be extended >> easily later without having to have 2 features. >> >> Does your proposal support having multiple Network functions chained ? >> > > I think we can probably add that as a follow up feature. > > I didn't manage to properly review the code yet but one thing that would > be really great to have is some system tests (system-ovn.at) that better > illustrate what happens (I guess we could simulate the network functions > with network namespaces that just forward packets between two veths). > > Maybe something to add in v4. > > Regards, > Dumitru > >> Thanks >> Numan >> >>> >>> -- >>> 2.39.3 >>> >>> _______________________________________________ >>> dev mailing list >>> d...@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> _______________________________________________ >> dev mailing list >> d...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev