On 6/26/25 6:02 PM, Dumitru Ceara wrote:
> On 6/16/25 5:26 PM, Numan Siddique wrote:
>> On Sun, May 25, 2025 at 10:46 PM Sragdhara Datta Chaudhuri
>> <sragdha.chau...@nutanix.com> wrote:
>>>
>>> RFC: NETWORK FUNCTION INSERTION IN OVN
>>>
>>> 1. Introduction
>>> ================
>>> The objective is to insert a Network Function (NF) in the path of 
>>> outbound/inbound traffic from/to a port-group. The use case is to integrate 
>>> a 3rd party service in the path of traffic. An example of such a service 
>>> would be layer7 firewall. The NF VM will be like a bump in the wire and 
>>> should not modify the packet, i.e. the IP header, the MAC addresses, VLAN 
>>> tag, sequence numbers remain unchanged.
>>>
>>> Here are some of the highlights:
>>> - A new entity network-function (NF) has been introduced. It contains a 
>>> pair of LSPs. The CMS would designate one as “inport” and the other as 
>>> “outport”.
>>> - For high-availability, a network function group (NFG) entity consists of 
>>> a group of NFs. Only one NF in a NFG has an active role based on health 
>>> monitoring.
>>> - ACL would accept NFG as a parameter and traffic matching the ACL would be 
>>> redirected to the associated active NF’s port. NFG is accepted for stateful 
>>> allow action only.
>>> - The ACL’s port-group is the point of reference when defining the role of 
>>> the NF ports. The “inport” is the port closer to the port-group and 
>>> “outport” is the one away from it. For from-lport ACLs, the request packets 
>>> would be redirected to the NF “inport” and for to-lport ACLs, the request 
>>> packets would be redirected to NF “outport”. When the same packet comes out 
>>> of the other NF port, it gets simply forwarded.
>>> - Statefulness will be maintained, i.e. the response traffic will also go 
>>> through the same pair of NF ports but in reverse order.
>>> - For the NF ports we need to disable port security check, fdb learning and 
>>> multicast/broadcast forwarding.
>>> - Health monitoring involves ovn-controller periodically injecting ICMP 
>>> probe packets into the NF inport and monitor the same packet coming out of 
>>> the NF outport.
>>> - If the traffic redirection involves cross-host traffic (e.g. for a 
>>> from-lport ACL, if the source VM and NF VM are on different hosts), packets 
>>> would be tunneled to and from the NF VM's host.
>>> - If the port-group to which the ACL is being applied has members spread 
>>> across multiple LSs, CMS needs to create child ports for the NF ports on 
>>> each of these LSs. The redirection rules in each LS will use the child 
>>> ports on that LS.
>>>
>>> 2. NB tables
>>> =============
>>> New NB tables
>>> —------------
>>> Network_Function: Each row contains {inport, outport, health_check}
>>> Network_Function_Group: Each row contains a list of Network_Function 
>>> entities. It also contains a unique id (between 1 and 255, generated by 
>>> northd) and a reference to the current active NF.
>>> Network_Function_Health_Check: Each row contains configuration for probes 
>>> in options field: {interval, timeout, success_count, failure_count}
>>>
>>>         "Network_Function_Health_Check": {
>>>             "columns": {
>>>                 "name": {"type": "string"},
>>>                 "options": {
>>>                      "type": {"key": "string",
>>>                               "value": "string",
>>>                               "min": 0,
>>>                               "max": "unlimited"}},
>>>                 "external_ids": {
>>>                     "type": {"key": "string", "value": "string",
>>>                              "min": 0, "max": "unlimited"}}},
>>>             "isRoot": true},
>>>         "Network_Function": {
>>>             "columns": {
>>>                 "name": {"type": "string"},
>>>                 "outport": {"type": {"key": {"type": "uuid",
>>>                                              "refTable": 
>>> "Logical_Switch_Port",
>>>                                              "refType": "strong"},
>>>                                      "min": 1, "max": 1}},
>>>                 "inport": {"type": {"key": {"type": "uuid",
>>>                                             "refTable": 
>>> "Logical_Switch_Port",
>>>                                             "refType": "strong"},
>>>                                     "min": 1, "max": 1}},
>>>                 "health_check": {"type": {
>>>                     "key": {"type": "uuid",
>>>                             "refTable": "Network_Function_Health_Check",
>>>                             "refType": "strong"},
>>>                     "min": 0, "max": 1}},
>>>                 "external_ids": {
>>>                     "type": {"key": "string", "value": "string",
>>>                              "min": 0, "max": "unlimited"}}},
>>>             "isRoot": true},
>>>         "Network_Function_Group": {
>>>             "columns": {
>>>                 "name": {"type": "string"},
>>>                 "network_function": {"type":
>>>                                   {"key": {"type": "uuid",
>>>                                            "refTable": "Network_Function",
>>>                                            "refType": "strong"},
>>>                                            "min": 0, "max": "unlimited"}},
>>>                 "mode": {"type": {"key": {"type": "string",
>>>                                           "enum": ["set", ["inline"]]}}},
>>>                 "network_function_active": {"type":
>>>                                   {"key": {"type": "uuid",
>>>                                            "refTable": "Network_Function",
>>>                                            "refType": "strong"},
>>>                                            "min": 0, "max": 1}},
>>>                 "id": {
>>>                      "type": {"key": {"type": "integer",
>>>                                       "minInteger": 0,
>>>                                       "maxInteger": 255}}},
>>>                 "external_ids": {
>>>                     "type": {"key": "string", "value": "string",
>>>                              "min": 0, "max": "unlimited"}}},
>>>             "isRoot": true},
>>>
>>>
>>> Modified NB table
>>> —----------------
>>> ACL: The ACL entity would have a new optional field that is a reference to 
>>> a Network_Function_Group entity. This field can be present only for 
>>> stateful allow ACLs.
>>>
>>>         "ACL": {
>>>             "columns": {
>>>                 "network_function_group": {"type": {"key": {"type": "uuid",
>>>                                            "refTable": 
>>> "Network_Function_Group",
>>>                                            "refType": "strong"},
>>>                                            "min": 0,
>>>                                            "max": 1}},
>>>
>>> New options for Logical_Switch_Port
>>> —----------------------------------
>>> receive_multicast=<boolean>: Default true. If set to false, LS will not 
>>> forward broadcast/multicast traffic to this port. This is to prevent 
>>> looping of such packets.
>>>
>>> lsp_learn_fdb=<boolean>: Default true. If set to false, fdb learning will 
>>> be skipped for packets coming out of this port. Redirected packets from the 
>>> NF port would be carrying the originating VM’s MAC in source, and so 
>>> learning should not happen.
>>>
>>> CMS needs to set both the above options to false for NF ports, in addition 
>>> to disabling port security.
>>>
>>> network-function-linked-port=<lsp-name>: Each NF port needs to have this 
>>> set to the other NF port of the pair.
>>>
>>> New NB_global options
>>> —--------------------
>>> svc_monitor_mac_dst: destination MAC of probe packets (svc_monitor_mac is 
>>> already there and will be used as source MAC)
>>> svc_monitor_ip4: source IP of probe packets
>>> svc_monitor_ip4_dst: destination IP of probe packets
>>>
>>> Sample configuration
>>> —-------------------
>>> ovn-nbctl ls-add ls1
>>> ovn-nbctl lsp-add ls1 nfp1
>>> ovn-nbctl lsp-add ls1 nfp2
>>> ovn-nbctl set logical_switch_port nfp1 options:receive_multicast=false 
>>> options:lsp_learn_fdb=false options:network-function-linked-port=nfp2
>>> ovn-nbctl set logical_switch_port nfp2 options:receive_multicast=false 
>>> options:lsp_learn_fdb=false options:network-function-linked-port=nfp1
>>> ovn-nbctl network-function-add nf1 nfp1 nfp2
>>> ovn-nbctl network-function-group-add nfg1 nf1
>>> ovn-nbctl lsp-add ls1 p1 -- lsp-set-addresses p1 "50:6b:8d:3e:ed:c4 
>>> 10.1.1.4"
>>> ovn-nbctl pg-add pg1 p1
>>> ovn-nbctl create Address_Set name=as1 addresses=10.1.1.4
>>> ovn-nbctl lsp-add ls1 p2 -- lsp-set-addresses p2 "50:6b:8d:3e:ed:c5 
>>> 10.1.1.5"
>>> ovn-nbctl create Address_Set name=as2 addresses=10.1.1.5
>>> ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' 
>>> allow-related nfg1
>>> ovn-nbctl acl-add pg1 to-lport 100 'outport==@pg1 && ip4.src == $as2' 
>>> allow-related nfg1
>>>
>>> 3. SB tables
>>> ============
>>> Service_Monitor:
>>> This is currently used by Load balancer. New fields are: “type” - to 
>>> indicate LB or NF, “mac” - the destination MAC address for monitor packets, 
>>> “logical_input_port” - the LSP to which the probe packet would be sent. 
>>> Also, “icmp” has been added as a protocol type, used only for NF.
>>>
>>>          "Service_Monitor": {
>>>              "columns": {
>>>                "type": {"type": {"key": {
>>>                           "type": "string",
>>>                           "enum": ["set", ["load-balancer", 
>>> "network-function"]]}}},
>>>                "mac": {"type": "string"},
>>>                  "protocol": {
>>>                      "type": {"key": {"type": "string",
>>>                             "enum": ["set", ["tcp", "udp", "icmp"]]},
>>>                               "min": 0, "max": 1}},
>>>                "logical_input_port": {"type": "string"},
>>>
>>> northd would create one Service_Monitor entity for each NF. The 
>>> logical_input_port and logical_port would be populated from the NF inport 
>>> and outport fields respectively. The probe packets would be injected into 
>>> the logical_input_port and would be monitored out of logical_port.
>>>
>>> 4. Logical Flows
>>> ================
>>> Logical Switch ingress pipeline:
>>> - in_network_function added after in_stateful.
>>> - Modifications to in_acl_eval, in_stateful and in_l2_lookup.
>>> Logical Switch egress pipeline:
>>> - out_network_function added after out_stateful.
>>> - Modifications to out_pre_acl, out_acl_eval and out_stateful.
>>>
>>> 4.1 from-lport ACL
>>> ------------------
>>> The diagram shows the request path for packets from VM1 port p1, which is a 
>>> member of the pg to which ACL is applied. The response would follow the 
>>> reverse path, i.e. packet would be redirected to nfp2 and come out of nfp1 
>>> and be forwarded to p1.
>>> Also, p2 does not need to be on the same LS. Only the p1, nfp1, nfp2 are on 
>>> the same LS.
>>>
>>>       -----                  -------                  -----
>>>      | VM1 |                | NF VM |                | VM2 |
>>>       -----                  -------                  -----
>>>         |                    /\    |                   / \
>>>         |                    |     |                    |
>>>        \ /                   |    \ /                   |
>>>    ------------------------------------------------------------
>>>   |     p1                 nfp1  nfp2                   p2     |
>>>   |                                                            |
>>>   |                      Logical Switch                        |
>>>    -------------------------------------------------------------
>>> pg1: [p1]         as2: [p2-ip]
>>> ovn-nbctl network-function-add nf1 nfp1 nfp2
>>> ovn-nbctl network-function-group-add nfg1 nf1
>>> ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' 
>>> allow-related nfg1
>>> Say, the unique id northd assigned to this NFG, is 123
>>>
>>> The request packets from p1 matching a from-lport ACL with NFG, are 
>>> redirected to nfp1 and the NFG id is committed to the ct label in p1's 
>>> zone. When the same packet comes out of nfp2 it gets forwarded the normal 
>>> way.
>>> Response packets have destination as p1's MAC. Ingress processing sets the 
>>> outport to p1 and the CT lookup in egress pipeline (in p1's ct zone) yields 
>>> the NFG id and the packet injected back to ingress pipeline after setting 
>>> the outport to nfp2.
>>>
>>> Below are the changes in detail.
>>>
>>> 4.1.1 Request processing
>>> ------------------------
>>>
>>> in_acl_eval: For from-lport ACLs with NFG, the existing rule's action has 
>>> been enhanced to set:
>>>  - reg8[21] = 1: to indicate that packet has matched a rule with NFG
>>>  - reg5[0..7] = <NFG-unique-id>
>>>  - reg8[22] = <direction> (1: request, 0: response)
>>>
>>>   table=8 (ls_in_acl_eval), priority=1200 , match=(reg0[7] == 1 && 
>>> (inport==@pg1 && ip4.dst == $as2)), action=(reg8[16] = 1; reg0[1] = 1; 
>>> reg8[21] = 1; reg8[22] = 1; reg5[0..7] = 123; next;)
>>>   table=8 (ls_in_acl_eval), priority=1200 , match=(reg0[8] == 1 && 
>>> (inport==@pg1 && ip4.dst == $as2)), action=(reg8[16] = 1; reg8[21] = 1; 
>>> reg8[22] = 1; reg5[0..7] = 123; next;)
>>>
>>> in_stateful: Priority 110: set NFG id in CT label if reg8[21] is set.
>>>  - bit 7 (ct_label.network_function_group): Set to 1 to indicate NF 
>>> insertion.
>>>  - bits 17 to 24 (ct_label.network_function_group_id): Stores the 8 bit NFG 
>>> id
>>>
>>>   table=21(ls_in_stateful     ), priority=110  , match=(reg0[1] == 1 && 
>>> reg0[13] == 0 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; 
>>> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
>>> ct_label.network_function_group = 1; ct_label.network_function_group_id = 
>>> reg5[0..7]; }; next;)
>>>   table=21(ls_in_stateful     ), priority=110  , match=(reg0[1] == 1 && 
>>> reg0[13] == 1 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; 
>>> ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; 
>>> ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; 
>>> ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 1; 
>>> ct_label.network_function_group_id = reg5[0..7]; }; next;)
>>>   table=21(ls_in_stateful     ), priority=100  , match=(reg0[1] == 1 && 
>>> reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; 
>>> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
>>> ct_label.network_function_group = 0; ct_label.network_function_group_id = 
>>> 0; }; next;)
>>>   table=21(ls_in_stateful     ), priority=100  , match=(reg0[1] == 1 && 
>>> reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; 
>>> ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; 
>>> ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; 
>>> ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 0; 
>>> ct_label.network_function_group_id = 0; }; next;)
>>>   table=21(ls_in_stateful     ), priority=0    , match=(1), action=(next;)
>>>
>>>
>>> For non-NFG cases, the existing priority 100 rules will be hit. There 
>>> additional action has been added to clear the NFG bits in ct label.
>>>
>>> in_network_function: A new stage with priority 99 rules to redirect packets 
>>> by setting outport to the NF “inport” (or its child port) based on the NFG 
>>> id set by the prior ACL stage.
>>> Priority 100 rules ensure that when the same packets come out of the NF 
>>> ports, they are not redirected again (the setting of reg5 here relates to 
>>> the cross-host packet tunneling and will be explained later).
>>> Priority 1 rule: if reg8[21] is set, but the NF port (or child port) is not 
>>> present on this LS, drop packets.
>>>
>>>   table=22(ls_in_network_function), priority=100  , match=(inport == 
>>> "nfp1"), action=(reg5[16..31] = ct_label.tun_if_id; next;)
>>>   table=22(ls_in_network_function), priority=100  , match=(inport == 
>>> "nfp2"), action=(reg5[16..31] = ct_label.tun_if_id; next;)
>>>   table=22(ls_in_network_function), priority=100  , match=(reg8[21] == 1 && 
>>> eth.mcast), action=(next;)
>>>   table=22(ls_in_network_function), priority=99   , match=(reg8[21] == 1 && 
>>> reg8[22] == 1 && reg5[0..7] == 1), action=(outport = "nfp1"; output;)
>>>   table=22(ls_in_network_function), priority=1    , match=(reg8[21] == 1), 
>>> action=(drop;)
>>>   table=22(ls_in_network_function), priority=0    , match=(1), 
>>> action=(next;)
>>>
>>>
>>> 4.1.2 Response processing
>>> -------------------------
>>> out_acl_eval: High priority rules that allow response and related packets 
>>> to go through have been enhanced to also copy CT label NFG bit into 
>>> reg8[21].
>>>
>>>   table=6(ls_out_acl_eval), priority=65532, match=(!ct.est && ct.rel && 
>>> !ct.new && !ct.inv && ct_mark.blocked == 0), action=(reg8[21] = 
>>> ct_label.network_function_group; reg8[16] = 1; ct_commit_nat;)
>>>   table=6(ls_out_acl_eval), priority=65532, match=(ct.est && !ct.rel && 
>>> !ct.new && !ct.inv && ct.rpl && ct_mark.blocked == 0), action=(reg8[21] = 
>>> ct_label.network_function_group; reg8[16] = 1; next;)
>>>
>>> out_network_function: Priority 99 rule matches on the nfg_id in ct_label 
>>> and sets the outport to the NF “outport”. It also sets reg8[23]=1 and 
>>> injects the packet to ingress pipeline (in_l2_lookup).
>>> Priority 100 rule forwards all packets to NF ports to the next table.
>>>
>>>   table=11 (ls_out_network_function), priority=100  , match=(outport == 
>>> "nfp1"), action=(next;)
>>>   table=11 (ls_out_network_function), priority=100  , match=(outport == 
>>> "nfp2"), action=(next;)
>>>   table=11(ls_out_network_function), priority=100  , match=(reg8[21] == 1 
>>> && eth.mcast), action=(next;)
>>>   table=11 (ls_out_network_function), priority=99   , match=(reg8[21] == 1 
>>> && reg8[22] == 0 && ct_label.network_function_group_id == 123), 
>>> action=(outport = "nfp2"; reg8[23] = 1; next(pipeline=ingress, table=29);)
>>>   table=11 (ls_out_network_function), priority=1    , match=(reg8[21] == 
>>> 1), action=(drop;)
>>>   table=11 (ls_out_network_function), priority=0    , match=(1), 
>>> action=(next;)
>>>
>>> in_l2_lkup: if reg8[23] == 1 (packet has come back from egress), simply 
>>> forward such packets as outport is already set.
>>>
>>>   table=29(ls_in_l2_lkup), priority=100  , match=(reg8[23] == 1), 
>>> action=(output;)
>>>
>>> The above set of rules ensure that the response packet is sent to nfp2. 
>>> When the same packet comes out of nfp1, the ingress pipeline would set the 
>>> outport to p1 and it enters the egress pipeline.
>>>
>>> out_pre_acl: If the packet is coming from the NF inport, skip the egress 
>>> pipeline upto the out_nf stage, as the packet has already gone through it 
>>> and we don't want the same packet to be processed by CT twice.
>>>   table=2 (ls_out_pre_acl     ), priority=110  , match=(inport == "nfp1"), 
>>> action=(next(pipeline=egress, table=12);)
>>>
>>>
>>> 4.2 to-lport ACL
>>> ----------------
>>>       -----                  --------                  -----
>>>      | VM1 |                |  NF VM |                | VM2 |
>>>       -----                  --------                  -----
>>>        / \                    |   / \                    |
>>>         |                     |    |                     |
>>>         |                    \ /   |                    \ /
>>>    -------------------------------------------------------------
>>>   |     p1                  nfp1   nfp2                  p2     |
>>>   |                                                             |
>>>   |                      Logical Switch                         |
>>>    -------------------------------------------------------------
>>> ovn-nbctl acl-add pg1 to-lport 100 'outport==@pg1&& ip4.src == $as2' 
>>> allow-related nfg1
>>> Diagram shows request traffic path. The response will follow a reverse path.
>>>
>>> Ingress pipeline sets the outport to p1 based on destination MAC lookup. 
>>> The packet enters the egress pipeline. There the to-lport ACL with NFG gets 
>>> evaluated and the NFG id gets committed to the CT label. Then the outport 
>>> is set to nfp2 and then the packet is injected back to ingress. When the 
>>> same packet comes out of nfp1, it gets forwarded to p1 the normal way.
>>> >From the response packet from p1, ingress pipeline gets the NFG id from CT 
>>> >label and accordingly redirects it to nfp1. When it comes out of nfp2 it 
>>> >is forwarded the normal way.
>>>
>>> 4.2.1 Request processing
>>> ------------------------
>>> out_acl_eval: For to-lport ACLs with NFG, the existing rule's action has 
>>> been enhanced to set:
>>>  - reg8[21] = 1: to indicate that packet has matched a rule with NFG
>>>  - reg5[0..7] = <NFG-unique-id>
>>>  - reg8[22] = <direction> (1: request, 0: response)
>>>
>>>   table=6 (ls_out_acl_eval    ), priority=1100 , match=(reg0[7] == 1 && 
>>> (outport==@pg1 && ip4.src == $as2)), action=(reg8[16] = 1; reg0[1] = 1; 
>>> reg8[21] = 1; reg8[22] = 1; reg5[0..7] = 1; next;)
>>>   table=6 (ls_out_acl_eval    ), priority=1100 , match=(reg0[8] == 1 && 
>>> (outport==@pg1 && ip4.src == $as2)), action=(reg8[16] = 1; reg0[1] = 1; 
>>> reg8[21] = 1; reg8[22] = 1; reg5[0..7] = 1; next;)
>>>
>>>
>>>
>>> Out_stateful: Priority 110: set NFG id in CT label if reg8[21] is set.
>>>
>>>   table=10(ls_out_stateful    ), priority=110  , match=(reg0[1] == 1 && 
>>> reg0[13] == 0 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; 
>>> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
>>> ct_label.network_function_group = 1; ct_label.network_function_group_id = 
>>> reg5[0..7]; }; next;)
>>>   table=10(ls_out_stateful    ), priority=110  , match=(reg0[1] == 1 && 
>>> reg0[13] == 1 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; 
>>> ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; 
>>> ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; 
>>> ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 1; 
>>> ct_label.network_function_group_id = reg5[0..7]; }; next;)
>>>   table=10(ls_out_stateful    ), priority=100  , match=(reg0[1] == 1 && 
>>> reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; 
>>> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
>>> ct_label.network_function_group = 0; ct_label.network_function_group_id = 
>>> 0; }; next;)
>>>   table=10(ls_out_stateful    ), priority=100  , match=(reg0[1] == 1 && 
>>> reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; 
>>> ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; 
>>> ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; 
>>> ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 0; 
>>> ct_label.network_function_group_id = 0; }; next;)
>>>   table=10(ls_out_stateful    ), priority=0    , match=(1), action=(next;)
>>>
>>> out_network_function: A new stage that has priority 99 rules to redirect 
>>> packet by setting outport to the NF “outport” (or its child port) based on 
>>> the NFG id set by the prior ACL stage, and then injecting back to ingress. 
>>> Priority 100 rules ensure that when the packets are going to NF ports, they 
>>> are not redirected again.
>>> Priority 1 rule: if reg8[21] is set, but the NF port (or child port) is not 
>>> present on this LS, drop packets.
>>>
>>>   table=11(ls_out_network_function), priority=100  , match=(outport == 
>>> "nfp1"), action=(next;)
>>>   table=11(ls_out_network_function), priority=100  , match=(outport == 
>>> "nfp2"), action=(next;)
>>>   table=11(ls_out_network_function), priority=100  , match=(reg8[21] == 1 
>>> && eth.mcast), action=(next;)
>>>   table=11(ls_out_network_function), priority=99   , match=(reg8[21] == 1 
>>> && reg8[22] == 1 && reg5[0..7] == 123), action=(outport = "nfp2"; reg8[23] 
>>> = 1; next(pipeline=ingress, table=29);)
>>>   table=11(ls_out_network_function), priority=1    , match=(reg8[21] == 1), 
>>> action=(drop;)
>>>   table=11(ls_out_network_function), priority=0    , match=(1), 
>>> action=(next;)
>>>
>>>
>>> in_l2_lkup: As described earlier, the priority 100 rule will forward these 
>>> packets.
>>>
>>> Then the same packet comes out from nfp1 and goes through the ingress 
>>> processing where the outport gets set to p1. The egress pipeline 
>>> out_pre_acl priority 110 rule described earlier, matches against inport as 
>>> nfp1 and directly jumps to the stage after out_network_function. Thus the 
>>> packet is not redirected again.
>>>
>>> 4.2.2 Response processing
>>> -------------------------
>>> in_acl_eval: High priority rules that allow response and related packets to 
>>> go through have been enhanced to also copy CT label NFG bit into reg8[21].
>>>
>>>   table=8(ls_in_acl_eval), priority=65532, match=(!ct.est && ct.rel && 
>>> !ct.new && !ct.inv && ct_mark.blocked == 0), action=(reg0[17] = 1; reg8[21] 
>>> = ct_label.network_function_group; reg8[16] = 1; ct_commit_nat;)
>>>   table=8 (ls_in_acl_eval), priority=65532, match=(ct.est && !ct.rel && 
>>> !ct.new && !ct.inv && ct.rpl && ct_mark.blocked == 0), action=(reg0[9] = 0; 
>>> reg0[10] = 0; reg0[17] = 1; reg8[21] = ct_label.network_function_group; 
>>> reg8[16] = 1; next;)
>>>
>>> in_network_function: Priority 99 rule matches on the nfg_id in ct_label and 
>>> sets the outport to the NF “inport”.
>>> Priority 100 rule forwards all packets to NF ports to the next table.
>>>   table=22(ls_in_network_function), priority=99   , match=(reg8[21] == 1 && 
>>> reg8[22] == 0 && ct_label.network_function_group_id == 123), 
>>> action=(outport = "nfp1"; output;)
>>>
>>>
>>> 5. Cross-host Traffic for VLAN Network
>>> ======================================
>>> For overlay subnets, all cross-host traffic exchanges are tunneled. In the 
>>> case of VLAN subnets, there needs to be special handling to selectively 
>>> tunnel only the traffic to or from the NF ports.
>>> Take the example of a from-lport ACL. Packets from p1 to p2, gets 
>>> redirected to nfp1 in host1. If this packet is simply sent out from host1, 
>>> the physical network will directly forward it to host2 where VM2 is. So, we 
>>> need to tunnel the redirected packets from host1 to host3. Now, once the 
>>> packets come out of nfp2, if host3 sends the packets out, the physical 
>>> network would learn p1's MAC coming from host3. So, these packets need to 
>>> be tunneled back to host1. From there the packet would be forwarded to VM2 
>>> via the physical network.
>>>
>>>       -----                  -----                  --------
>>>      | VM2 |                | VM1 |                | NF VM  |
>>>       -----                  -----                  --------
>>>        / \                     |                    / \   |
>>>         | (7)                  |  (1)             (3)|    |(4)
>>>         |                     \ /                    |   \ /
>>>   --------------        --------------   (2)    ---------------
>>>  |      p2      |  (6) |      p1      |______\ |   nfp1  nfp2  |
>>>  |              |/____ |              |------/ |               |
>>>  |    host2     |\     |     host1    |/______ |     host3     |
>>>  |              |      |              |\------ |               |
>>>   --------------        --------------   (5)    --------------
>>>
>>> The above figure shows the request packet path for a from-lport ACL. 
>>> Response would follow the same path in reverse direction.
>>>
>>> To achieve this, the following would be done:
>>>
>>> On host where the ACL port group members are present (host1)
>>> —-----------------------------------------------------------
>>> REMOTE_OUTPUT (table 42):
>>> Currently, it tunnels traffic destined to all non-local overlay ports to 
>>> their associated hosts. The same rule is now also added for traffic to 
>>> non-local NF ports. Thus the packets from p1 get tunneled to host 3.
>>>
>>> On host with NF (host3) forward packet to nfp1
>>> —----------------------------------------------
>>> Upon reaching host3, the following rules come into play:
>>> PHY_TO_LOG (table 0):
>>> Ppriority 100: Existing rule - for each geneve tunnel interface on the 
>>> chassis, copies info from header to inport, outport, metadata registers. 
>>> Now the same rule also stores the tun intf id in a register (reg5[16..31]).
>>>
>>> CHECK_LOOPBACK (table 44)
>>> This table has a rule that clears all the registers. The change is to skip 
>>> the clearing of reg5[16..31].
>>>
>>> Logical egress pipeline:
>>>
>>> ls_out_stateful priority 120: If the outport is an NF port, copy 
>>> reg5[16..31] (table0 had set it) to ct_label.tun_if_id.)
>>>
>>>   table=10(ls_out_stateful    ), priority=120  , match=(outport == "nfp1" 
>>> && reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; 
>>> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
>>> ct_label.tun_if_id = reg5[16..31]; }; next;)
>>>   table=10(ls_out_stateful    ), priority=120  , match=(outport == "nfp1" 
>>> && reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; 
>>> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
>>> ct_mark.obs_stage = reg8[19..20]; ct_mark.obs_collector_id = reg8[8..15]; 
>>> ct_label.obs_point_id = reg9; ct_label.tun_if_id = reg5[16..31]; }; next;)
>>>
>>> The above sequence of flows ensure that if a packet is received via tunnel 
>>> on host3, with outport as nfp1, the tunnel interface id is committed to the 
>>> ct entry in nfp1's zone.
>>>
>>> On host with NF (host3) tunnel packets from nfp2 back to host1
>>> —--------------------------------------------------------------
>>> When the same packet comes out of nfp2 on host3:
>>>
>>> LOCAL_OUTPUT (table 43)
>>> When the packet comes out of the other NF port (nfp2), following two rules 
>>> send it back to the host that it originally came from:
>>>
>>> Priority 110: For each NF port local to this host, following rule processes 
>>> the
>>> packet through CT of linked port (for nfp2, it is nfp1):
>>>   match: inport==nfp2 && RECIRC_BIT==0
>>>   action: RECIRC_BIT = 1, ct(zone=nfp1’s zone, table=LOCAL), resubmit to 
>>> table 43
>>>
>>> Priority 109: For each {tunnel_id, nf port} on this host, if the tun_if_id 
>>> in ct_label matches the tunnel_id, send the recirculated packet using 
>>> tunnetl_id:
>>>   match: inport==nfp1 && RECIRC_BIT==1 && ct_label.tun_if_id==<tun-id>
>>>   action: tunnel packet using tun-id
>>>
>>> If p1 and nfp1 happen to be on the same host, the tun_if_id would not be 
>>> set and thus none of the priority 109 rules would match. It would be 
>>> forwarded the usual way matching the existing priority 100 rules in 
>>> LOCAL_TABLE.
>>>
>>> Special handling of the case where NF responds back on nfp1, instead of 
>>> forwarding packet out of nfp2:
>>> For example, a SYN packet from p1 got redirected to nfp1. Then the NF, 
>>> which is a firewall VM, drops the SYN and sends RST back on port nfp1. In 
>>> this case, looking up in the linked port (nfp2) ct zone will not give 
>>> anything. The following rule uses ct.inv to identify such scenarios and 
>>> uses nfp1’s CT zone to send the packet back. To achieve this, following 2 
>>> rules are installed:
>>>
>>> in_network_function:
>>> Priority 100 rule that allows packets incoming from NF type ports, is 
>>> enhanced with additional action to store the tun_if_id from ct_label into 
>>> reg5[16..31].
>>>   table=22(ls_in_network_function), priority=100  , match=(inport == 
>>> "nfp1"), action=(reg5[16..31] = ct_label.tun_if_id; next;)
>>>
>>> LOCAL_OUTPUT (table 43)
>>> Priority 110 rule: for recirculated packets, if ct (of the linked port) is 
>>> invalid, use the tun id from reg5[16..31] to tunnel the packet back to 
>>> host1 (as CT zone info has been overwritten in the above 110 priority rule 
>>> in table 42).
>>>       match: inport==nf1 && RECIRC_BIT==1 && ct.inv && 
>>> MFF_LOG_TUN_OFPORT==<tun-id>
>>>       action: tunnel packet using tun-id
>>>
>>>
>>> 6. NF insertion across logical switches
>>> =======================================
>>> If the port-group where the ACL is being applied has members across 
>>> multiple logical switches, there needs to be a NF port pair on each of 
>>> these switches.
>>> The NF VM will have only one inport and one outport. The CMS is expected to 
>>> create child ports linked to these ports on each logical switch where 
>>> port-group members are present.
>>> The network-function entity would be configured with the parent ports only. 
>>> When CMS creates the child ports, it does not need to change any of the NF, 
>>> NFG or ACL config tables.
>>> When northd configures the redirection rules for a specific LS, it will use 
>>> the parent or child port depending on what it finds on that LS.
>>>                                      --------
>>>                                     | NF VM  |
>>>                                      --------
>>>                                      |      |
>>>           -----                      |      |              -----
>>>          | VM1 |                    nfp1   nfp2           | VM2 |
>>>           ---- -   |     |         --------------          -----    |      |
>>>             |      |     |        |    SVC LS    |          |       |      |
>>>           p1|  nfp1_ch1  nfp2_ch1  --------------         p3|  nfp1_ch2  
>>> nfp2_ch2
>>>           --------------------                             
>>> --------------------
>>>          |         LS1        |                           |         LS2     
>>>    |
>>>           --------------------                             
>>> --------------------
>>>
>>> In this example, the CMS created the parent ports for the NF VM on LS named 
>>> SVC LS. The ports are nfp1 and nfp2. The CMS configures the NF using these 
>>> ports:
>>> ovn-nbctl network-function-add nf1 nfp1 nfp2
>>> ovn-nbctl network-function-group-add nfg1 nf1
>>> ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' 
>>> allow-related nfg1
>>>
>>> The port group to which the ACL is applied is pg1 and pg1 has two ports: p1 
>>> on LS1 and p3 on LS2.
>>> The CMS needs to create child ports for the NF ports on LS1 and LS2. On 
>>> LS1: nfp1_ch1 and nfp2_ch1. On LS2: nfp1_ch2 and nfp2_ch2
>>>
>>> When northd creates rules on LS1, it would use nfp1_ch1 and nfp2_ch1.
>>>
>>>   table=22(ls_in_network_function), priority=100  , match=(inport == 
>>> "nfp2_ch1"), action=(reg5[16..31] = ct_label.tun_if_id; next;)
>>>   table=22(ls_in_network_function), priority=99   , match=(reg8[21] == 1 && 
>>> reg8[22] == 1 && reg5[0..7] == 1), action=(outport = "nfp1_ch1"; output;)
>>>
>>> When northd is creating rules on LS2, it would use nfp1_ch2 and nfp2_ch2.
>>>   table=22(ls_in_network_function), priority=100  , match=(inport == 
>>> "nfp2_ch2"), action=(reg5[16..31] = ct_label.tun_if_id; next;)
>>>   table=22(ls_in_network_function), priority=99   , match=(reg8[21] == 1 && 
>>> reg8[22] == 1 && reg5[0..7] == 1), action=(outport = "nfp1_ch2"; output;)
>>>
>>>
>>> 7. Health Monitoring
>>> ====================
>>> The LB health monitoring functionality has been extended to support NFs. 
>>> Network_Function_Group has a list of Network_Functions, each of which has a 
>>> reference to network_Function_Health_Check that has the monitoring config. 
>>> There is a corresponding SB service_monitor maintaining the online/offline 
>>> status. When status changes, northd picks one of the “online” NFs and sets 
>>> it in the network_function_active field of NFG. The redirection rule in LS 
>>> uses the ports from this NF.
>>>
>>> Ovn-controller performs the health monitoring by sending ICMP echo request 
>>> with source IP and MAC from NB global options “svc_monitor_ip4” and 
>>> “svc_monitor_mac”, and destination IP and MAC from new NB global options 
>>> “svc_monitor_ip4_dst” and “svc_monitor_mac_dst”. The sequence number and id 
>>> are randomly generated and stored in service_mon. The NF VM forwards the 
>>> same packet out of the other port. When it comes out, ovn-controller 
>>> matches the sequence number and id with stored values and marks online if 
>>> matched.
>>>
>>> V1:
>>>   - First patch.
>>>
>>> V2:
>>>   - Rebased code.
>>>   - Added "mode" field in Network_function_group table, with only allowed
>>>     value as "inline". This is for future expansion to include "mirror" 
>>> mode.
>>>   - Added a flow in the in_network_function and out_network_function table 
>>> to
>>>     skip redirection of multicast traffic.
>>>
>>> V3:
>>>  - Rebased code.
>>>
>>> Sragdhara Datta Chaudhuri (5):
>>>   ovn-nb: Network Function insertion OVN-NB schema changes
>>>   ovn-nbctl: Network Function insertion commands.
>>>   northd, tests: Network Function insertion logical flow programming.
>>>   controller, tests: Network Function insertion tunneling of cross-host
>>>     VLAN traffic.
>>>   northd, controller: Network Function Health monitoring.
>>>
>>>  controller/physical.c        | 271 +++++++++++-
>>>  controller/pinctrl.c         | 252 +++++++++--
>>>  include/ovn/logical-fields.h |  14 +
>>>  lib/logical-fields.c         |  26 ++
>>>  lib/ovn-util.h               |   4 +-
>>>  northd/en-global-config.c    |  75 ++++
>>>  northd/en-global-config.h    |  12 +-
>>>  northd/en-multicast.c        |   2 +-
>>>  northd/en-northd.c           |   8 +
>>>  northd/en-sync-sb.c          |  16 +-
>>>  northd/inc-proc-northd.c     |   6 +-
>>>  northd/northd.c              | 796 +++++++++++++++++++++++++++++++++--
>>>  northd/northd.h              |  41 +-
>>>  ovn-nb.ovsschema             |  64 ++-
>>>  ovn-nb.xml                   | 123 ++++++
>>>  ovn-sb.ovsschema             |  12 +-
>>>  ovn-sb.xml                   |  22 +-
>>>  tests/ovn-controller.at      |   6 +-
>>>  tests/ovn-nbctl.at           |  83 ++++
>>>  tests/ovn-northd.at          | 548 ++++++++++++++++++------
>>>  tests/ovn.at                 | 143 +++++++
>>>  utilities/ovn-nbctl.c        | 533 ++++++++++++++++++++++-
>>>  22 files changed, 2792 insertions(+), 265 deletions(-)
>>
>>
> 
> Hi Sragdhara, Numan,
> 
>> Hi Sragdhara,
>>
>> I want to test out this patch series and looks like they don't apply
>> cleanly.  Can you either rebase the patch series again or
>> share me the link to your cloned github  branch which has these commits 
>> pushed,
>>
> 
> In order to help move things along I went ahead and rebased this series
> in my fork on top of current main:
> 
> https://github.com/dceara/ovn/tree/refs/heads/review-pws458355-network-function-insertion-v3
> 

It might be because of the way I rebased the patches but it seems some
of the tests fail with latest main and this series applied:

https://github.com/dceara/ovn/actions/runs/15904875892/job/44857195972#step:12:5349

>> Also, please take a look at this proposal -
>> https://mail.openvswitch.org/pipermail/ovs-dev/2025-June/424080.html
>>
> 
> The discussion is still going on there but it seems to me we might have
> to treat the two proposals (Network Function from Nutanix and Service
> Function Chaining from Red Hat) as different features (there's still a
> chance we can extend the Nutanix one in order to implement the
> ovn-kubernetes requirements).
> 
>> Looks to me both your patch series and this proposal are trying to
>> solve the same use case.  And it makes sense to have
>> a solution which works for both the proposal or which can be extended
>> easily later without having to have 2 features.
>>
>> Does your proposal support having multiple Network functions chained ?
>>
> 
> I think we can probably add that as a follow up feature.
> 
> I didn't manage to properly review the code yet but one thing that would
> be really great to have is some system tests (system-ovn.at) that better
> illustrate what happens (I guess we could simulate the network functions
> with network namespaces that just forward packets between two veths).
> 
> Maybe something to add in v4.
> 
> Regards,
> Dumitru
> 
>> Thanks
>> Numan
>>
>>>
>>> --
>>> 2.39.3
>>>
>>> _______________________________________________
>>> dev mailing list
>>> d...@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> _______________________________________________
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to