Re: [ovs-dev] [PATCH OVN v4 0/5] Network Function Insertion.

Numan Siddique Fri, 11 Jul 2025 20:45:31 -0700

On Wed, Jul 9, 2025 at 11:17 AM Numan Siddique <num...@ovn.org> wrote:
>
> On Wed, Jul 9, 2025 at 5:06 AM Sragdhara Datta Chaudhuri
> <sragdha.chau...@nutanix.com> wrote:
> >
> > Hi Numan,
> >
> >
> >
> > Thanks for taking a look and trying out the patch. Our multi-chassis 
> > testing was done mainly with VLAN network. Will add testcases like you 
> > suggested to cover overlay network cases and will address the issue you 
> > brought up.
>
> Please cover the VLAN scenario too.  It should be possible with the
> multi-chassis setup.
>
> Numan
>
> >
> >
> >
> > Thanks,
> >
> > Sragdhara
> >
> >
> >
> > From: Numan Siddique <num...@ovn.org>
> > Date: Monday, July 7, 2025 at 8:19 AM
> > To: Sragdhara Datta Chaudhuri <sragdha.chau...@nutanix.com>
> > Cc: ovs-dev@openvswitch.org <ovs-dev@openvswitch.org>
> > Subject: Re: [ovs-dev] [PATCH OVN v4 0/5] Network Function Insertion.
> >
> > !-------------------------------------------------------------------|
> >   CAUTION: External Email
> >
> > |-------------------------------------------------------------------!
> >
> > On Fri, Jun 27, 2025 at 6:03 AM Sragdhara Datta Chaudhuri
> > <sragdha.chau...@nutanix.com> wrote:
> > >
> > > RFC: NETWORK FUNCTION INSERTION IN OVN
> > >
> > > 1. Introduction
> > > ================
> > > The objective is to insert a Network Function (NF) in the path of 
> > > outbound/inbound traffic from/to a port-group. The use case is to 
> > > integrate a 3rd party service in the path of traffic. An example of such 
> > > a service would be layer7 firewall. The NF VM will be like a bump in the 
> > > wire and should not modify the packet, i.e. the IP header, the MAC 
> > > addresses, VLAN tag, sequence numbers remain unchanged.
> > >
> > > Here are some of the highlights:
> > > - A new entity network-function (NF) has been introduced. It contains a 
> > > pair of LSPs. The CMS would designate one as “inport” and the other as 
> > > “outport”.
> > > - For high-availability, a network function group (NFG) entity consists 
> > > of a group of NFs. Only one NF in a NFG has an active role based on 
> > > health monitoring.
> > > - ACL would accept NFG as a parameter and traffic matching the ACL would 
> > > be redirected to the associated active NF’s port. NFG is accepted for 
> > > stateful allow action only.
> > > - The ACL’s port-group is the point of reference when defining the role 
> > > of the NF ports. The “inport” is the port closer to the port-group and 
> > > “outport” is the one away from it. For from-lport ACLs, the request 
> > > packets would be redirected to the NF “inport” and for to-lport ACLs, the 
> > > request packets would be redirected to NF “outport”. When the same packet 
> > > comes out of the other NF port, it gets simply forwarded.
> > > - Statefulness will be maintained, i.e. the response traffic will also go 
> > > through the same pair of NF ports but in reverse order.
> > > - For the NF ports we need to disable port security check, fdb learning 
> > > and multicast/broadcast forwarding.
> > > - Health monitoring involves ovn-controller periodically injecting ICMP 
> > > probe packets into the NF inport and monitor the same packet coming out 
> > > of the NF outport.
> > > - If the traffic redirection involves cross-host traffic (e.g. for a 
> > > from-lport ACL, if the source VM and NF VM are on different hosts), 
> > > packets would be tunneled to and from the NF VM's host.
> > > - If the port-group to which the ACL is being applied has members spread 
> > > across multiple LSs, CMS needs to create child ports for the NF ports on 
> > > each of these LSs. The redirection rules in each LS will use the child 
> > > ports on that LS.
> > >
> > > 2. NB tables
> > > =============
> > > New NB tables
> > > —------------
> > > Network_Function: Each row contains {inport, outport, health_check}
> > > Network_Function_Group: Each row contains a list of Network_Function 
> > > entities. It also contains a unique id (between 1 and 255, generated by 
> > > northd) and a reference to the current active NF.
> > > Network_Function_Health_Check: Each row contains configuration for probes 
> > > in options field: {interval, timeout, success_count, failure_count}
> > >
> > >         "Network_Function_Health_Check": {
> > >             "columns": {
> > >                 "name": {"type": "string"},
> > >                 "options": {
> > >                      "type": {"key": "string",
> > >                               "value": "string",
> > >                               "min": 0,
> > >                               "max": "unlimited"}},
> > >                 "external_ids": {
> > >                     "type": {"key": "string", "value": "string",
> > >                              "min": 0, "max": "unlimited"}}},
> > >             "isRoot": true},
> > >         "Network_Function": {
> > >             "columns": {
> > >                 "name": {"type": "string"},
> > >                 "outport": {"type": {"key": {"type": "uuid",
> > >                                              "refTable": 
> > > "Logical_Switch_Port",
> > >                                              "refType": "strong"},
> > >                                      "min": 1, "max": 1}},
> > >                 "inport": {"type": {"key": {"type": "uuid",
> > >                                             "refTable": 
> > > "Logical_Switch_Port",
> > >                                             "refType": "strong"},
> > >                                     "min": 1, "max": 1}},
> > >                 "health_check": {"type": {
> > >                     "key": {"type": "uuid",
> > >                             "refTable": "Network_Function_Health_Check",
> > >                             "refType": "strong"},
> > >                     "min": 0, "max": 1}},
> > >                 "external_ids": {
> > >                     "type": {"key": "string", "value": "string",
> > >                              "min": 0, "max": "unlimited"}}},
> > >             "isRoot": true},
> > >         "Network_Function_Group": {
> > >             "columns": {
> > >                 "name": {"type": "string"},
> > >                 "network_function": {"type":
> > >                                   {"key": {"type": "uuid",
> > >                                            "refTable": "Network_Function",
> > >                                            "refType": "strong"},
> > >                                            "min": 0, "max": "unlimited"}},
> > >                 "mode": {"type": {"key": {"type": "string",
> > >                                           "enum": ["set", ["inline"]]}}},
> > >                 "network_function_active": {"type":
> > >                                   {"key": {"type": "uuid",
> > >                                            "refTable": "Network_Function",
> > >                                            "refType": "strong"},
> > >                                            "min": 0, "max": 1}},
> > >                 "id": {
> > >                      "type": {"key": {"type": "integer",
> > >                                       "minInteger": 0,
> > >                                       "maxInteger": 255}}},
> > >                 "external_ids": {
> > >                     "type": {"key": "string", "value": "string",
> > >                              "min": 0, "max": "unlimited"}}},
> > >             "isRoot": true},
> > >
> > >
> > > Modified NB table
> > > —----------------
> > > ACL: The ACL entity would have a new optional field that is a reference 
> > > to a Network_Function_Group entity. This field can be present only for 
> > > stateful allow ACLs.
> > >
> > >         "ACL": {
> > >             "columns": {
> > >                 "network_function_group": {"type": {"key": {"type": 
> > > "uuid",
> > >                                            "refTable": 
> > > "Network_Function_Group",
> > >                                            "refType": "strong"},
> > >                                            "min": 0,
> > >                                            "max": 1}},
> > >
> > > New options for Logical_Switch_Port
> > > —----------------------------------
> > > receive_multicast=<boolean>: Default true. If set to false, LS will not 
> > > forward broadcast/multicast traffic to this port. This is to prevent 
> > > looping of such packets.
> > >
> > > lsp_learn_fdb=<boolean>: Default true. If set to false, fdb learning will 
> > > be skipped for packets coming out of this port. Redirected packets from 
> > > the NF port would be carrying the originating VM’s MAC in source, and so 
> > > learning should not happen.
> > >
> > > CMS needs to set both the above options to false for NF ports, in 
> > > addition to disabling port security.
> > >
> > > network-function-linked-port=<lsp-name>: Each NF port needs to have this 
> > > set to the other NF port of the pair.
> > >
> > > New NB_global options
> > > —--------------------
> > > svc_monitor_mac_dst: destination MAC of probe packets (svc_monitor_mac is 
> > > already there and will be used as source MAC)
> > > svc_monitor_ip4: source IP of probe packets
> > > svc_monitor_ip4_dst: destination IP of probe packets
> > >
> > > Sample configuration
> > > —-------------------
> > > ovn-nbctl ls-add ls1
> > > ovn-nbctl lsp-add ls1 nfp1
> > > ovn-nbctl lsp-add ls1 nfp2
> > > ovn-nbctl set logical_switch_port nfp1 options:receive_multicast=false 
> > > options:lsp_learn_fdb=false options:network-function-linked-port=nfp2
> > > ovn-nbctl set logical_switch_port nfp2 options:receive_multicast=false 
> > > options:lsp_learn_fdb=false options:network-function-linked-port=nfp1
> > > ovn-nbctl network-function-add nf1 nfp1 nfp2
> > > ovn-nbctl network-function-group-add nfg1 nf1
> > > ovn-nbctl lsp-add ls1 p1 -- lsp-set-addresses p1 "50:6b:8d:3e:ed:c4 
> > > 10.1.1.4"
> > > ovn-nbctl pg-add pg1 p1
> > > ovn-nbctl create Address_Set name=as1 addresses=10.1.1.4
> > > ovn-nbctl lsp-add ls1 p2 -- lsp-set-addresses p2 "50:6b:8d:3e:ed:c5 
> > > 10.1.1.5"
> > > ovn-nbctl create Address_Set name=as2 addresses=10.1.1.5
> > > ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' 
> > > allow-related nfg1
> > > ovn-nbctl acl-add pg1 to-lport 100 'outport==@pg1 && ip4.src == $as2' 
> > > allow-related nfg1
> > >
> > > 3. SB tables
> > > ============
> > > Service_Monitor:
> > > This is currently used by Load balancer. New fields are: “type” - to 
> > > indicate LB or NF, “mac” - the destination MAC address for monitor 
> > > packets, “logical_input_port” - the LSP to which the probe packet would 
> > > be sent. Also, “icmp” has been added as a protocol type, used only for NF.
> > >
> > >          "Service_Monitor": {
> > >              "columns": {
> > >                "type": {"type": {"key": {
> > >                           "type": "string",
> > >                           "enum": ["set", ["load-balancer", 
> > > "network-function"]]}}},
> > >                "mac": {"type": "string"},
> > >                  "protocol": {
> > >                      "type": {"key": {"type": "string",
> > >                             "enum": ["set", ["tcp", "udp", "icmp"]]},
> > >                               "min": 0, "max": 1}},
> > >                "logical_input_port": {"type": "string"},
> > >
> > > northd would create one Service_Monitor entity for each NF. The 
> > > logical_input_port and logical_port would be populated from the NF inport 
> > > and outport fields respectively. The probe packets would be injected into 
> > > the logical_input_port and would be monitored out of logical_port.
> > >
> > > 4. Logical Flows
> > > ================
> > > Logical Switch ingress pipeline:
> > > - in_network_function added after in_stateful.
> > > - Modifications to in_acl_eval, in_stateful and in_l2_lookup.
> > > Logical Switch egress pipeline:
> > > - out_network_function added after out_stateful.
> > > - Modifications to out_pre_acl, out_acl_eval and out_stateful.
> > >
> > > 4.1 from-lport ACL
> > > ------------------
> > > The diagram shows the request path for packets from VM1 port p1, which is 
> > > a member of the pg to which ACL is applied. The response would follow the 
> > > reverse path, i.e. packet would be redirected to nfp2 and come out of 
> > > nfp1 and be forwarded to p1.
> > > Also, p2 does not need to be on the same LS. Only the p1, nfp1, nfp2 are 
> > > on the same LS.
> > >
> > >       -----                  -------                  -----
> > >      | VM1 |                | NF VM |                | VM2 |
> > >       -----                  -------                  -----
> > >         |                    /\    |                   / \
> > >         |                    |     |                    |
> > >        \ /                   |    \ /                   |
> > >    ------------------------------------------------------------
> > >   |     p1                 nfp1  nfp2                   p2     |
> > >   |                                                            |
> > >   |                      Logical Switch                        |
> > >    -------------------------------------------------------------
> > > pg1: [p1]         as2: [p2-ip]
> > > ovn-nbctl network-function-add nf1 nfp1 nfp2
> > > ovn-nbctl network-function-group-add nfg1 nf1
> > > ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' 
> > > allow-related nfg1
> > > Say, the unique id northd assigned to this NFG, is 123
> > >
> > > The request packets from p1 matching a from-lport ACL with NFG, are 
> > > redirected to nfp1 and the NFG id is committed to the ct label in p1's 
> > > zone. When the same packet comes out of nfp2 it gets forwarded the normal 
> > > way.
> > > Response packets have destination as p1's MAC. Ingress processing sets 
> > > the outport to p1 and the CT lookup in egress pipeline (in p1's ct zone) 
> > > yields the NFG id and the packet injected back to ingress pipeline after 
> > > setting the outport to nfp2.
> > >
> > > Below are the changes in detail.
> > >
> > > 4.1.1 Request processing
> > > ------------------------
> > >
> > > in_acl_eval: For from-lport ACLs with NFG, the existing rule's action has 
> > > been enhanced to set:
> > >  - reg8[21] = 1: to indicate that packet has matched a rule with NFG
> > >  - reg5[0..7] = <NFG-unique-id>
> > >  - reg8[22] = <direction> (1: request, 0: response)
> > >
> > >   table=8 (ls_in_acl_eval), priority=1200 , match=(reg0[7] == 1 && 
> > > (inport==@pg1 && ip4.dst == $as2)), action=(reg8[16] = 1; reg0[1] = 1; 
> > > reg8[21] = 1; reg8[22] = 1; reg5[0..7] = 123; next;)
> > >   table=8 (ls_in_acl_eval), priority=1200 , match=(reg0[8] == 1 && 
> > > (inport==@pg1 && ip4.dst == $as2)), action=(reg8[16] = 1; reg8[21] = 1; 
> > > reg8[22] = 1; reg5[0..7] = 123; next;)
> > >
> > > in_stateful: Priority 110: set NFG id in CT label if reg8[21] is set.
> > >  - bit 7 (ct_label.network_function_group): Set to 1 to indicate NF 
> > > insertion.
> > >  - bits 17 to 24 (ct_label.network_function_group_id): Stores the 8 bit 
> > > NFG id
> > >
> > >   table=21(ls_in_stateful     ), priority=110  , match=(reg0[1] == 1 && 
> > > reg0[13] == 0 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; 
> > > ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
> > > ct_label.network_function_group = 1; ct_label.network_function_group_id = 
> > > reg5[0..7]; }; next;)
> > >   table=21(ls_in_stateful     ), priority=110  , match=(reg0[1] == 1 && 
> > > reg0[13] == 1 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; 
> > > ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; 
> > > ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; 
> > > ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 1; 
> > > ct_label.network_function_group_id = reg5[0..7]; }; next;)
> > >   table=21(ls_in_stateful     ), priority=100  , match=(reg0[1] == 1 && 
> > > reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; 
> > > ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
> > > ct_label.network_function_group = 0; ct_label.network_function_group_id = 
> > > 0; }; next;)
> > >   table=21(ls_in_stateful     ), priority=100  , match=(reg0[1] == 1 && 
> > > reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; 
> > > ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; 
> > > ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; 
> > > ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 0; 
> > > ct_label.network_function_group_id = 0; }; next;)
> > >   table=21(ls_in_stateful     ), priority=0    , match=(1), action=(next;)
> > >
> > >
> > > For non-NFG cases, the existing priority 100 rules will be hit. There 
> > > additional action has been added to clear the NFG bits in ct label.
> > >
> > > in_network_function: A new stage with priority 99 rules to redirect 
> > > packets by setting outport to the NF “inport” (or its child port) based 
> > > on the NFG id set by the prior ACL stage.
> > > Priority 100 rules ensure that when the same packets come out of the NF 
> > > ports, they are not redirected again (the setting of reg5 here relates to 
> > > the cross-host packet tunneling and will be explained later).
> > > Priority 1 rule: if reg8[21] is set, but the NF port (or child port) is 
> > > not present on this LS, drop packets.
> > >
> > >   table=22(ls_in_network_function), priority=100  , match=(inport == 
> > > "nfp1"), action=(reg5[16..31] = ct_label.tun_if_id; next;)
> > >   table=22(ls_in_network_function), priority=100  , match=(inport == 
> > > "nfp2"), action=(reg5[16..31] = ct_label.tun_if_id; next;)
> > >   table=22(ls_in_network_function), priority=100  , match=(reg8[21] == 1 
> > > && eth.mcast), action=(next;)
> > >   table=22(ls_in_network_function), priority=99   , match=(reg8[21] == 1 
> > > && reg8[22] == 1 && reg5[0..7] == 1), action=(outport = "nfp1"; output;)
> > >   table=22(ls_in_network_function), priority=1    , match=(reg8[21] == 
> > > 1), action=(drop;)
> > >   table=22(ls_in_network_function), priority=0    , match=(1), 
> > > action=(next;)
> > >
> > >
> > > 4.1.2 Response processing
> > > -------------------------
> > > out_acl_eval: High priority rules that allow response and related packets 
> > > to go through have been enhanced to also copy CT label NFG bit into 
> > > reg8[21].
> > >
> > >   table=6(ls_out_acl_eval), priority=65532, match=(!ct.est && ct.rel && 
> > > !ct.new && !ct.inv && ct_mark.blocked == 0), action=(reg8[21] = 
> > > ct_label.network_function_group; reg8[16] = 1; ct_commit_nat;)
> > >   table=6(ls_out_acl_eval), priority=65532, match=(ct.est && !ct.rel && 
> > > !ct.new && !ct.inv && ct.rpl && ct_mark.blocked == 0), action=(reg8[21] = 
> > > ct_label.network_function_group; reg8[16] = 1; next;)
> > >
> > > out_network_function: Priority 99 rule matches on the nfg_id in ct_label 
> > > and sets the outport to the NF “outport”. It also sets reg8[23]=1 and 
> > > injects the packet to ingress pipeline (in_l2_lookup).
> > > Priority 100 rule forwards all packets to NF ports to the next table.
> > >
> > >   table=11 (ls_out_network_function), priority=100  , match=(outport == 
> > > "nfp1"), action=(next;)
> > >   table=11 (ls_out_network_function), priority=100  , match=(outport == 
> > > "nfp2"), action=(next;)
> > >   table=11(ls_out_network_function), priority=100  , match=(reg8[21] == 1 
> > > && eth.mcast), action=(next;)
> > >   table=11 (ls_out_network_function), priority=99   , match=(reg8[21] == 
> > > 1 && reg8[22] == 0 && ct_label.network_function_group_id == 123), 
> > > action=(outport = "nfp2"; reg8[23] = 1; next(pipeline=ingress, table=29);)
> > >   table=11 (ls_out_network_function), priority=1    , match=(reg8[21] == 
> > > 1), action=(drop;)
> > >   table=11 (ls_out_network_function), priority=0    , match=(1), 
> > > action=(next;)
> > >
> > > in_l2_lkup: if reg8[23] == 1 (packet has come back from egress), simply 
> > > forward such packets as outport is already set.
> > >
> > >   table=29(ls_in_l2_lkup), priority=100  , match=(reg8[23] == 1), 
> > > action=(output;)
> > >
> > > The above set of rules ensure that the response packet is sent to nfp2. 
> > > When the same packet comes out of nfp1, the ingress pipeline would set 
> > > the outport to p1 and it enters the egress pipeline.
> > >
> > > out_pre_acl: If the packet is coming from the NF inport, skip the egress 
> > > pipeline upto the out_nf stage, as the packet has already gone through it 
> > > and we don't want the same packet to be processed by CT twice.
> > >   table=2 (ls_out_pre_acl     ), priority=110  , match=(inport == 
> > > "nfp1"), action=(next(pipeline=egress, table=12);)
> > >
> > >
> > > 4.2 to-lport ACL
> > > ----------------
> > >       -----                  --------                  -----
> > >      | VM1 |                |  NF VM |                | VM2 |
> > >       -----                  --------                  -----
> > >        / \                    |   / \                    |
> > >         |                     |    |                     |
> > >         |                    \ /   |                    \ /
> > >    -------------------------------------------------------------
> > >   |     p1                  nfp1   nfp2                  p2     |
> > >   |                                                             |
> > >   |                      Logical Switch                         |
> > >    -------------------------------------------------------------
> > > ovn-nbctl acl-add pg1 to-lport 100 'outport==@pg1&& ip4.src == $as2' 
> > > allow-related nfg1
> > > Diagram shows request traffic path. The response will follow a reverse 
> > > path.
> > >
> > > Ingress pipeline sets the outport to p1 based on destination MAC lookup. 
> > > The packet enters the egress pipeline. There the to-lport ACL with NFG 
> > > gets evaluated and the NFG id gets committed to the CT label. Then the 
> > > outport is set to nfp2 and then the packet is injected back to ingress. 
> > > When the same packet comes out of nfp1, it gets forwarded to p1 the 
> > > normal way.
> > > >From the response packet from p1, ingress pipeline gets the NFG id from 
> > > >CT label and accordingly redirects it to nfp1. When it comes out of nfp2 
> > > >it is forwarded the normal way.
> > >
> > > 4.2.1 Request processing
> > > ------------------------
> > > out_acl_eval: For to-lport ACLs with NFG, the existing rule's action has 
> > > been enhanced to set:
> > >  - reg8[21] = 1: to indicate that packet has matched a rule with NFG
> > >  - reg5[0..7] = <NFG-unique-id>
> > >  - reg8[22] = <direction> (1: request, 0: response)
> > >
> > >   table=6 (ls_out_acl_eval    ), priority=1100 , match=(reg0[7] == 1 && 
> > > (outport==@pg1 && ip4.src == $as2)), action=(reg8[16] = 1; reg0[1] = 1; 
> > > reg8[21] = 1; reg8[22] = 1; reg5[0..7] = 1; next;)
> > >   table=6 (ls_out_acl_eval    ), priority=1100 , match=(reg0[8] == 1 && 
> > > (outport==@pg1 && ip4.src == $as2)), action=(reg8[16] = 1; reg0[1] = 1; 
> > > reg8[21] = 1; reg8[22] = 1; reg5[0..7] = 1; next;)
> > >
> > >
> > >
> > > Out_stateful: Priority 110: set NFG id in CT label if reg8[21] is set.
> > >
> > >   table=10(ls_out_stateful    ), priority=110  , match=(reg0[1] == 1 && 
> > > reg0[13] == 0 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; 
> > > ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
> > > ct_label.network_function_group = 1; ct_label.network_function_group_id = 
> > > reg5[0..7]; }; next;)
> > >   table=10(ls_out_stateful    ), priority=110  , match=(reg0[1] == 1 && 
> > > reg0[13] == 1 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; 
> > > ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; 
> > > ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; 
> > > ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 1; 
> > > ct_label.network_function_group_id = reg5[0..7]; }; next;)
> > >   table=10(ls_out_stateful    ), priority=100  , match=(reg0[1] == 1 && 
> > > reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; 
> > > ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
> > > ct_label.network_function_group = 0; ct_label.network_function_group_id = 
> > > 0; }; next;)
> > >   table=10(ls_out_stateful    ), priority=100  , match=(reg0[1] == 1 && 
> > > reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; 
> > > ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; 
> > > ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; 
> > > ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 0; 
> > > ct_label.network_function_group_id = 0; }; next;)
> > >   table=10(ls_out_stateful    ), priority=0    , match=(1), action=(next;)
> > >
> > > out_network_function: A new stage that has priority 99 rules to redirect 
> > > packet by setting outport to the NF “outport” (or its child port) based 
> > > on the NFG id set by the prior ACL stage, and then injecting back to 
> > > ingress. Priority 100 rules ensure that when the packets are going to NF 
> > > ports, they are not redirected again.
> > > Priority 1 rule: if reg8[21] is set, but the NF port (or child port) is 
> > > not present on this LS, drop packets.
> > >
> > >   table=11(ls_out_network_function), priority=100  , match=(outport == 
> > > "nfp1"), action=(next;)
> > >   table=11(ls_out_network_function), priority=100  , match=(outport == 
> > > "nfp2"), action=(next;)
> > >   table=11(ls_out_network_function), priority=100  , match=(reg8[21] == 1 
> > > && eth.mcast), action=(next;)
> > >   table=11(ls_out_network_function), priority=99   , match=(reg8[21] == 1 
> > > && reg8[22] == 1 && reg5[0..7] == 123), action=(outport = "nfp2"; 
> > > reg8[23] = 1; next(pipeline=ingress, table=29);)
> > >   table=11(ls_out_network_function), priority=1    , match=(reg8[21] == 
> > > 1), action=(drop;)
> > >   table=11(ls_out_network_function), priority=0    , match=(1), 
> > > action=(next;)
> > >
> > >
> > > in_l2_lkup: As described earlier, the priority 100 rule will forward 
> > > these packets.
> > >
> > > Then the same packet comes out from nfp1 and goes through the ingress 
> > > processing where the outport gets set to p1. The egress pipeline 
> > > out_pre_acl priority 110 rule described earlier, matches against inport 
> > > as nfp1 and directly jumps to the stage after out_network_function. Thus 
> > > the packet is not redirected again.
> > >
> > > 4.2.2 Response processing
> > > -------------------------
> > > in_acl_eval: High priority rules that allow response and related packets 
> > > to go through have been enhanced to also copy CT label NFG bit into 
> > > reg8[21].
> > >
> > >   table=8(ls_in_acl_eval), priority=65532, match=(!ct.est && ct.rel && 
> > > !ct.new && !ct.inv && ct_mark.blocked == 0), action=(reg0[17] = 1; 
> > > reg8[21] = ct_label.network_function_group; reg8[16] = 1; ct_commit_nat;)
> > >   table=8 (ls_in_acl_eval), priority=65532, match=(ct.est && !ct.rel && 
> > > !ct.new && !ct.inv && ct.rpl && ct_mark.blocked == 0), action=(reg0[9] = 
> > > 0; reg0[10] = 0; reg0[17] = 1; reg8[21] = 
> > > ct_label.network_function_group; reg8[16] = 1; next;)
> > >
> > > in_network_function: Priority 99 rule matches on the nfg_id in ct_label 
> > > and sets the outport to the NF “inport”.
> > > Priority 100 rule forwards all packets to NF ports to the next table.
> > >   table=22(ls_in_network_function), priority=99   , match=(reg8[21] == 1 
> > > && reg8[22] == 0 && ct_label.network_function_group_id == 123), 
> > > action=(outport = "nfp1"; output;)
> > >
> > >
> > > 5. Cross-host Traffic for VLAN Network
> > > ======================================
> > > For overlay subnets, all cross-host traffic exchanges are tunneled. In 
> > > the case of VLAN subnets, there needs to be special handling to 
> > > selectively tunnel only the traffic to or from the NF ports.
> > > Take the example of a from-lport ACL. Packets from p1 to p2, gets 
> > > redirected to nfp1 in host1. If this packet is simply sent out from 
> > > host1, the physical network will directly forward it to host2 where VM2 
> > > is. So, we need to tunnel the redirected packets from host1 to host3. 
> > > Now, once the packets come out of nfp2, if host3 sends the packets out, 
> > > the physical network would learn p1's MAC coming from host3. So, these 
> > > packets need to be tunneled back to host1. From there the packet would be 
> > > forwarded to VM2 via the physical network.
> > >
> > >       -----                  -----                  --------
> > >      | VM2 |                | VM1 |                | NF VM  |
> > >       -----                  -----                  --------
> > >        / \                     |                    / \   |
> > >         | (7)                  |  (1)             (3)|    |(4)
> > >         |                     \ /                    |   \ /
> > >   --------------        --------------   (2)    ---------------
> > >  |      p2      |  (6) |      p1      |______\ |   nfp1  nfp2  |
> > >  |              |/____ |              |------/ |               |
> > >  |    host2     |\     |     host1    |/______ |     host3     |
> > >  |              |      |              |\------ |               |
> > >   --------------        --------------   (5)    --------------
> > >
> > > The above figure shows the request packet path for a from-lport ACL. 
> > > Response would follow the same path in reverse direction.
> > >
> > > To achieve this, the following would be done:
> > >
> > > On host where the ACL port group members are present (host1)
> > > —-----------------------------------------------------------
> > > REMOTE_OUTPUT (table 42):
> > > Currently, it tunnels traffic destined to all non-local overlay ports to 
> > > their associated hosts. The same rule is now also added for traffic to 
> > > non-local NF ports. Thus the packets from p1 get tunneled to host 3.
> > >
> > > On host with NF (host3) forward packet to nfp1
> > > —----------------------------------------------
> > > Upon reaching host3, the following rules come into play:
> > > PHY_TO_LOG (table 0):
> > > Ppriority 100: Existing rule - for each geneve tunnel interface on the 
> > > chassis, copies info from header to inport, outport, metadata registers. 
> > > Now the same rule also stores the tun intf id in a register 
> > > (reg5[16..31]).
> > >
> > > CHECK_LOOPBACK (table 44)
> > > This table has a rule that clears all the registers. The change is to 
> > > skip the clearing of reg5[16..31].
> > >
> > > Logical egress pipeline:
> > >
> > > ls_out_stateful priority 120: If the outport is an NF port, copy 
> > > reg5[16..31] (table0 had set it) to ct_label.tun_if_id.)
> > >
> > >   table=10(ls_out_stateful    ), priority=120  , match=(outport == "nfp1" 
> > > && reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; 
> > > ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
> > > ct_label.tun_if_id = reg5[16..31]; }; next;)
> > >   table=10(ls_out_stateful    ), priority=120  , match=(outport == "nfp1" 
> > > && reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; 
> > > ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
> > > ct_mark.obs_stage = reg8[19..20]; ct_mark.obs_collector_id = reg8[8..15]; 
> > > ct_label.obs_point_id = reg9; ct_label.tun_if_id = reg5[16..31]; }; next;)
> > >
> > > The above sequence of flows ensure that if a packet is received via 
> > > tunnel on host3, with outport as nfp1, the tunnel interface id is 
> > > committed to the ct entry in nfp1's zone.
> > >
> > > On host with NF (host3) tunnel packets from nfp2 back to host1
> > > —--------------------------------------------------------------
> > > When the same packet comes out of nfp2 on host3:
> > >
> > > LOCAL_OUTPUT (table 43)
> > > When the packet comes out of the other NF port (nfp2), following two 
> > > rules send it back to the host that it originally came from:
> > >
> > > Priority 110: For each NF port local to this host, following rule 
> > > processes the
> > > packet through CT of linked port (for nfp2, it is nfp1):
> > >   match: inport==nfp2 && RECIRC_BIT==0
> > >   action: RECIRC_BIT = 1, ct(zone=nfp1’s zone, table=LOCAL), resubmit to 
> > > table 43
> > >
> > > Priority 109: For each {tunnel_id, nf port} on this host, if the 
> > > tun_if_id in ct_label matches the tunnel_id, send the recirculated packet 
> > > using tunnetl_id:
> > >   match: inport==nfp1 && RECIRC_BIT==1 && ct_label.tun_if_id==<tun-id>
> > >   action: tunnel packet using tun-id
> > >
> > > If p1 and nfp1 happen to be on the same host, the tun_if_id would not be 
> > > set and thus none of the priority 109 rules would match. It would be 
> > > forwarded the usual way matching the existing priority 100 rules in 
> > > LOCAL_TABLE.
> > >
> > > Special handling of the case where NF responds back on nfp1, instead of 
> > > forwarding packet out of nfp2:
> > > For example, a SYN packet from p1 got redirected to nfp1. Then the NF, 
> > > which is a firewall VM, drops the SYN and sends RST back on port nfp1. In 
> > > this case, looking up in the linked port (nfp2) ct zone will not give 
> > > anything. The following rule uses ct.inv to identify such scenarios and 
> > > uses nfp1’s CT zone to send the packet back. To achieve this, following 2 
> > > rules are installed:
> > >
> > > in_network_function:
> > > Priority 100 rule that allows packets incoming from NF type ports, is 
> > > enhanced with additional action to store the tun_if_id from ct_label into 
> > > reg5[16..31].
> > >   table=22(ls_in_network_function), priority=100  , match=(inport == 
> > > "nfp1"), action=(reg5[16..31] = ct_label.tun_if_id; next;)
> > >
> > > LOCAL_OUTPUT (table 43)
> > > Priority 110 rule: for recirculated packets, if ct (of the linked port) 
> > > is invalid, use the tun id from reg5[16..31] to tunnel the packet back to 
> > > host1 (as CT zone info has been overwritten in the above 110 priority 
> > > rule in table 42).
> > >       match: inport==nf1 && RECIRC_BIT==1 && ct.inv && 
> > > MFF_LOG_TUN_OFPORT==<tun-id>
> > >       action: tunnel packet using tun-id
> > >
> > >
> > > 6. NF insertion across logical switches
> > > =======================================
> > > If the port-group where the ACL is being applied has members across 
> > > multiple logical switches, there needs to be a NF port pair on each of 
> > > these switches.
> > > The NF VM will have only one inport and one outport. The CMS is expected 
> > > to create child ports linked to these ports on each logical switch where 
> > > port-group members are present.
> > > The network-function entity would be configured with the parent ports 
> > > only. When CMS creates the child ports, it does not need to change any of 
> > > the NF, NFG or ACL config tables.
> > > When northd configures the redirection rules for a specific LS, it will 
> > > use the parent or child port depending on what it finds on that LS.
> > >                                      --------
> > >                                     | NF VM  |
> > >                                      --------
> > >                                      |      |
> > >           -----                      |      |              -----
> > >          | VM1 |                    nfp1   nfp2           | VM2 |
> > >           ---- -   |     |         --------------          -----    |     
> > >  |
> > >             |      |     |        |    SVC LS    |          |       |     
> > >  |
> > >           p1|  nfp1_ch1  nfp2_ch1  --------------         p3|  nfp1_ch2  
> > > nfp2_ch2
> > >           --------------------                             
> > > --------------------
> > >          |         LS1        |                           |         LS2   
> > >      |
> > >           --------------------                             
> > > --------------------
> > >
> > > In this example, the CMS created the parent ports for the NF VM on LS 
> > > named SVC LS. The ports are nfp1 and nfp2. The CMS configures the NF 
> > > using these ports:
> > > ovn-nbctl network-function-add nf1 nfp1 nfp2
> > > ovn-nbctl network-function-group-add nfg1 nf1
> > > ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' 
> > > allow-related nfg1
> > >
> > > The port group to which the ACL is applied is pg1 and pg1 has two ports: 
> > > p1 on LS1 and p3 on LS2.
> > > The CMS needs to create child ports for the NF ports on LS1 and LS2. On 
> > > LS1: nfp1_ch1 and nfp2_ch1. On LS2: nfp1_ch2 and nfp2_ch2
> > >
> > > When northd creates rules on LS1, it would use nfp1_ch1 and nfp2_ch1.
> > >
> > >   table=22(ls_in_network_function), priority=100  , match=(inport == 
> > > "nfp2_ch1"), action=(reg5[16..31] = ct_label.tun_if_id; next;)
> > >   table=22(ls_in_network_function), priority=99   , match=(reg8[21] == 1 
> > > && reg8[22] == 1 && reg5[0..7] == 1), action=(outport = "nfp1_ch1"; 
> > > output;)
> > >
> > > When northd is creating rules on LS2, it would use nfp1_ch2 and nfp2_ch2.
> > >   table=22(ls_in_network_function), priority=100  , match=(inport == 
> > > "nfp2_ch2"), action=(reg5[16..31] = ct_label.tun_if_id; next;)
> > >   table=22(ls_in_network_function), priority=99   , match=(reg8[21] == 1 
> > > && reg8[22] == 1 && reg5[0..7] == 1), action=(outport = "nfp1_ch2"; 
> > > output;)
> > >
> > >
> > > 7. Health Monitoring
> > > ====================
> > > The LB health monitoring functionality has been extended to support NFs. 
> > > Network_Function_Group has a list of Network_Functions, each of which has 
> > > a reference to network_Function_Health_Check that has the monitoring 
> > > config. There is a corresponding SB service_monitor maintaining the 
> > > online/offline status. When status changes, northd picks one of the 
> > > “online” NFs and sets it in the network_function_active field of NFG. The 
> > > redirection rule in LS uses the ports from this NF.
> > >
> > > Ovn-controller performs the health monitoring by sending ICMP echo 
> > > request with source IP and MAC from NB global options “svc_monitor_ip4” 
> > > and “svc_monitor_mac”, and destination IP and MAC from new NB global 
> > > options “svc_monitor_ip4_dst” and “svc_monitor_mac_dst”. The sequence 
> > > number and id are randomly generated and stored in service_mon. The NF VM 
> > > forwards the same packet out of the other port. When it comes out, 
> > > ovn-controller matches the sequence number and id with stored values and 
> > > marks online if matched.
> > >
> >
> > Hi Sragdhara,
> >
> > Thanks for adding this feature to OVN and sorry for the delay in
> > providing review comments.
> >
> > I tested this patch series (patches 1 - 4) out and I found one issue.
> >
> > This is how the topology looks like:
> >
> > ------------
> > [root@ovn-central-az1 ovn]# ovn-nbctl show sw01
> > switch e93f16c7-e61e-4531-9367-89ca2b1bbddc (sw01)
> >     port nf-p2
> >     port nf-p1
> >     port sw01-lr1
> >         type: router
> >         router-port: lr1-sw01
> >     port sw01-port3
> >         addresses: ["50:51:00:00:00:05 11.0.0.5"]
> >     port sw01-port1
> >         addresses: ["50:51:00:00:00:03 11.0.0.3 1001::3"]
> >
> > ovn-sbctl show
> > Chassis ovn-chassis-2
> >     hostname: ovn-chassis-2
> >     Encap geneve
> >         ip: "170.168.0.6"
> >         options: {csum="true"}
> >     Port_Binding sw11-port1
> >     Port_Binding nf-p1
> >     Port_Binding sw01-port4
> >     Port_Binding nf-p2
> > Chassis ovn-gw-1
> >     hostname: ovn-gw-1
> >     Encap geneve
> >         ip: "170.168.0.3"
> >         options: {csum="true"}
> >     Port_Binding cr-lr1-public1
> > Chassis ovn-chassis-1
> >     hostname: ovn-chassis-1
> >     Encap geneve
> >         ip: "170.168.0.5"
> >         options: {csum="true"}
> >     Port_Binding sw01-port3
> >     Port_Binding sw01-port1
> > ------
> >
> > I created a network function with nf-ports - nf-p1 and nf-p2 and added
> > the below ACL
> >
> > --[root@ovn-central-az1 ovn]# ovn-nbctl acl-list pg0
> > from-lport  1002 (inport == @pg0) allow-related network-function-group=nfg0
> > ---
> >
> > pg0 port group has only sw0-port1.  So essentially all the traffic
> > from sw01-port1 will be sent to the network function.
> >
> >
> > When I send an icmp packet from sw01-port1 to sw01-port3, I notice
> > that both sw01-port1 and sw01-port3 are receiving duplicate packets.
> > ----
> > [root@ovn-chassis-1 /]# ip netns exec sw01p1 ping 11.0.0.5
> > PING 11.0.0.5 (11.0.0.5) 56(84) bytes of data.
> > 64 bytes from 11.0.0.5: icmp_seq=1 ttl=64 time=0.620 ms
> > 64 bytes from 11.0.0.5: icmp_seq=1 ttl=64 time=0.621 ms (DUP!)
> > 64 bytes from 11.0.0.5: icmp_seq=1 ttl=64 time=0.654 ms (DUP!)
> > 64 bytes from 11.0.0.5: icmp_seq=1 ttl=64 time=0.655 ms (DUP!)
> > 64 bytes from 11.0.0.5: icmp_seq=2 ttl=64 time=1.36 ms
> > 64 bytes from 11.0.0.5: icmp_seq=2 ttl=64 time=1.36 ms (DUP!)
> > 64 bytes from 11.0.0.5: icmp_seq=2 ttl=64 time=1.37 ms (DUP!)
> > 64 bytes from 11.0.0.5: icmp_seq=2 ttl=64 time=1.36 ms (DUP!)
> > ^C
> > --- 11.0.0.5 ping statistics ---
> > 2 packets transmitted, 2 received, +6 duplicates, 0% packet loss, time 
> > 1055ms
> > rtt min/avg/max/mdev = 0.620/1.000/1.366/0.362 ms
> >
> >
> > [root@ovn-chassis-1 ~]# ip netns exec sw01p3 tcpdump -i sw01p3 -vvneee
> > dropped privs to tcpdump
> >
> > 15:10:25.980838 50:51:00:00:00:03 > 50:51:00:00:00:05, ethertype IPv4
> > (0x0800), length 98: (tos 0x0, ttl 64, id 42082, offset 0, flags [DF],
> > proto ICMP (1), length 84)
> >     11.0.0.3 > 11.0.0.5: ICMP echo request, id 2242, seq 1, length 64
> > 15:10:25.980854 50:51:00:00:00:05 > 50:51:00:00:00:03, ethertype IPv4
> > (0x0800), length 98: (tos 0x0, ttl 64, id 39531, offset 0, flags
> > [none], proto ICMP (1), length 84)
> >     11.0.0.5 > 11.0.0.3: ICMP echo reply, id 2242, seq 1, length 64
> > 15:10:25.980840 50:51:00:00:00:03 > 50:51:00:00:00:05, ethertype IPv4
> > (0x0800), length 98: (tos 0x0, ttl 64, id 42082, offset 0, flags [DF],
> > proto ICMP (1), length 84)
> >     11.0.0.3 > 11.0.0.5: ICMP echo request, id 2242, seq 1, length 64
> > 15:10:25.980860 50:51:00:00:00:05 > 50:51:00:00:00:03, ethertype IPv4
> > (0x0800), length 98: (tos 0x0, ttl 64, id 39532, offset 0, flags
> > [none], proto ICMP (1), length 84)
> >     11.0.0.5 > 11.0.0.3: ICMP echo reply, id 2242, seq 1, length 64
> > 15:10:27.032587 50:51:00:00:00:03 > 50:51:00:00:00:05, ethertype IPv4
> > (0x0800), length 98: (tos 0x0, ttl 64, id 42804, offset 0, flags [DF],
> > proto ICMP (1), length 84)
> >     11.0.0.3 > 11.0.0.5: ICMP echo request, id 2242, seq 2, length 64
> > 15:10:27.032613 50:51:00:00:00:05 > 50:51:00:00:00:03, ethertype IPv4
> > (0x0800), length 98: (tos 0x0, ttl 64, id 39775, offset 0, flags
> > [none], proto ICMP (1), length 84)
> >     11.0.0.5 > 11.0.0.3: ICMP echo reply, id 2242, seq 2, length 64
> > 15:10:27.032591 50:51:00:00:00:03 > 50:51:00:00:00:05, ethertype IPv4
> > (0x0800), length 98: (tos 0x0, ttl 64, id 42804, offset 0, flags [DF],
> > proto ICMP (1), length 84)
> >     11.0.0.3 > 11.0.0.5: ICMP echo request, id 2242, seq 2, length 64
> > 15:10:27.032624 50:51:00:00:00:05 > 50:51:00:00:00:03, ethertype IPv4
> > (0x0800), length 98: (tos 0x0, ttl 64, id 39776, offset 0, flags
> > [none], proto ICMP (1), length 84)
> >     11.0.0.5 > 11.0.0.3: ICMP echo reply, id 2242, seq 2, length 64
> >
> > ------
> >
> >
> > Please take a look into that.  I think it's because of the below
> > openflow rules getting hit in ovn-chassis-2 (where network function
> > ports are claimed)
> >
> > ------
> > cookie=0xd2bdabbe, duration=27348.683s, table=43, n_packets=333,
> > n_bytes=32018, idle_age=6,
> > priority=100,reg13=0/0xffff0000,reg15=0x1,metadata=0x1
> > actions=load:0x1->NXM_NX_TUN_ID[0..23],set_field:0x1->tun_metadata0,move:NXM_NX_REG14[0..14]->NXM_NX_TUN_METADATA0[16..30],output:12,resubmit(,44)
> >  cookie=0x80ec4f65, duration=27348.683s, table=43, n_packets=10,
> > n_bytes=980, idle_age=6,
> > priority=100,reg13=0/0xffff0000,reg15=0x3,metadata=0x1
> > actions=load:0x1->NXM_NX_TUN_ID[0..23],set_field:0x3->tun_metadata0,move:NXM_NX_REG14[0..14]->NXM_NX_TUN_METADATA0[16..30],output:12,resubmit(,44)
> >
> >  cookie=0x9df527bf, duration=27138.219s, table=44, n_packets=124,
> > n_bytes=12152, idle_age=0,
> > priority=109,ct_label=0xc00000000000000000000/0xffff00000000000000000000,reg10=0x800000/0x800000,reg14=0x7,metadata=0x1
> > actions=load:0x1->NXM_NX_TUN_ID[0..23],move:NXM_NX_REG15[]->NXM_NX_TUN_METADATA0[0..31],set_field:0x70000/0x7fff0000->tun_metadata0,output:12
> >  cookie=0x46f88876, duration=27138.219s, table=44, n_packets=128,
> > n_bytes=12544, idle_age=0,
> > priority=109,ct_label=0xc00000000000000000000/0xffff00000000000000000000,reg10=0x800000/0x800000,reg14=0x6,metadata=0x1
> > actions=load:0x1->NXM_NX_TUN_ID[0..23],move:NXM_NX_REG15[]->NXM_NX_TUN_METADATA0[0..31],set_field:0x60000/0x7fff0000->tun_metadata0,output:12
> >
> >
> > [root@ovn-central-az1 ovn]# ovn-sbctl --columns tunnel_key list
> > port_Binding sw01-port1
> > tunnel_key          : 1
> > [root@ovn-central-az1 ovn]# ovn-sbctl --columns tunnel_key list
> > port_Binding sw01-port3
> > tunnel_key          : 3
> > [root@ovn-central-az1 ovn]# ovn-sbctl --columns tunnel_key list
> > port_Binding nf-p1
> > tunnel_key          : 6
> > [root@ovn-central-az1 ovn]# ovn-sbctl --columns tunnel_key list
> > port_Binding nf-p2
> > tunnel_key          : 7
> >
> > ---
> >
> >
> > Note that I tested using ovn-fake-multinode and attached both nf-p1
> > and nf-p2 OVS ports to a namespace and inside that namespace I ran an
> > instance of OVS and added the below openflow rules
> > to mimic the network functions.
> >
> >
> > -----------------------
> > [root@ovn-chassis-2 ~]# ip netns exec nf-vm ip a
> > 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
> >     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> > 2: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
> > group default qlen 1000
> >     link/ether 86:6c:21:ca:20:5e brd ff:ff:ff:ff:ff:ff
> > 3: br-tmp: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
> > default qlen 1000
> >     link/ether f6:fa:27:9e:f1:44 brd ff:ff:ff:ff:ff:ff
> > 11: nf1-p1@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> > noqueue master ovs-system state UP group default qlen 1000
> >     link/ether 6a:c6:4c:6b:f5:e9 brd ff:ff:ff:ff:ff:ff link-netnsid 0
> >     inet6 fe80::68c6:4cff:fe6b:f5e9/64 scope link proto kernel_ll
> >        valid_lft forever preferred_lft forever
> > 13: nf-p2@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> > noqueue master ovs-system state UP group default qlen 1000
> >     link/ether 66:ba:e2:2d:36:57 brd ff:ff:ff:ff:ff:ff link-netnsid 0
> >     inet6 fe80::64ba:e2ff:fe2d:3657/64 scope link proto kernel_ll
> >        valid_lft forever preferred_lft forever
> >
> >
> > [root@ovn-chassis-2 ~]# ip netns exec nf-vm bash
> > [root@ovn-chassis-2 ~]# ovs-vsctl show
> > 9d3886ff-d425-4cda-a532-3351bf7c7a8f
> >     Bridge br-tmp
> >         Port nf-p2
> >             Interface nf-p2
> >         Port br-tmp
> >             Interface br-tmp
> >                 type: internal
> >         Port nf1-p1
> >             Interface nf1-p1
> >     ovs_version: "3.4.90-1.fc41"
> > [root@ovn-chassis-2 ~]#
> > [root@ovn-chassis-2 ~]#
> > [root@ovn-chassis-2 ~]# ovs-ofctl dump-flows br-tmp
> >  cookie=0x0, duration=35795.242s, table=0, n_packets=722,
> > n_bytes=67734, priority=100,in_port="nf1-p1" actions=output:"nf-p2"
> >  cookie=0x0, duration=35795.242s, table=0, n_packets=701,
> > n_bytes=68147, priority=100,in_port="nf-p2" actions=output:"nf1-p1"
> >  cookie=0x0, duration=35915.590s, table=0, n_packets=1, n_bytes=70,
> > priority=0 actions=NORMAL
> >
> > -------------------------------
> >
> > I have a few comments which I'll reply to separately.


Hi Sragdhara,

I've a few comments about the patch series:

1.  I think it's better to separate out  the Network functions from the ACLs.
     I'd suggest that
          -  you add a match column in the Network_Function group table
          - Add a new stage ls_in_pre_nf just after ls_in_pre_lb and
set the register bit reg0[0] to indicate that the packet needs to be
sent to conntrack very similar to how we handle load balancers.
          - Add another stage - ls_in_nf_eval after ls_in_acl_eval
which matches on the "match" condition set in the each
Network_Function_Group table and set the appropriate registers so that
the packet is committed to conntrack in ls_in_stateful.
         - You may have to add similar stages in the egress pipeline.

This approach according to me is cleaner and we can keep ACL
evaluation separate from Network function evaluation.

2.  In your patch series,  I see the below logical flow in "ls_in_acl_eval"
    table=9 (ls_in_acl_eval     ), priority=65532, match=(!ct.est &&
ct.rel && !ct.new && ct_mark.blocked == 0), action=(reg0[17] = 1;
reg8[21] = ct_label.network_function_group; reg8[16] = 1;
ct_commit_nat;)

    Why is this logical flow added ?
    This seems wrong.  We don't commit to conntrack in this stage.
Rather its committed in ls_in_stateful state.  I think you should
remove this and maybe find a better way

3.  In the patch 4,   you're storing the tunnel interface id from
where the packet was received in the conntrack and then using it to
tunnel back to the same port for the reply traffic.
    Instead of storing the tunnel interface id, why can't we store the
value of "inport"  (i,e reg14) in the ct_label ?  For the reply
traffic, you can load the value from ct_label to reg15.
    Since OVN knows in which hypervisor a logical port resides and
since it already installs flows in table 43, the reply packet should
be tunnelled to the appropriate tunnel port.
    Let me know if this works or not and if not why ?

4.  If I understand correctly,   for VLAN tenant networks,  the
traffic will be tunnelled to the hypervisor where the network function
logical ports reside ?
    If so, will there not be any MTU issues if the packet size +
Geneve header is greater than the interface MTU ?


Thanks
Numan



> >
> > I think this patch series doesn't have test cases to cover the
> > multiple chassis scenario (like the one I tested above).
> > Please add some more tests covering this.  I think you can add multi
> > node system tests - see tests/ovn-multinode.at
> >
> > I think you can mimic a network function running another ovs instance
> > inside a namespace like I did.  Let me know if you need
> > any pointers here.
> >
> > Thanks
> > Numan
> >
> >
> >
> >
> >
> > > V1:
> > >   - First patch.
> > >
> > > V2:
> > >   - Rebased code.
> > >   - Added "mode" field in Network_function_group table, with only allowed
> > >     value as "inline". This is for future expansion to include "mirror" 
> > > mode.
> > >   - Added a flow in the in_network_function and out_network_function 
> > > table to
> > >     skip redirection of multicast traffic.
> > >
> > > V3:
> > >  - Rebased code.
> > >
> > > V4:
> > >  - Rebased code.
> > >
> > > Sragdhara Datta Chaudhuri (5):
> > >   ovn-nb: Network Function insertion OVN-NB schema changes
> > >   ovn-nbctl: Network Function insertion commands.
> > >   northd, tests: Network Function insertion logical flow programming.
> > >   controller, tests: Network Function insertion tunneling of cross-host
> > >     VLAN traffic.
> > >   northd, controller: Network Function Health monitoring.
> > >
> > >  controller/physical.c        | 271 +++++++++++-
> > >  controller/pinctrl.c         | 252 +++++++++--
> > >  include/ovn/logical-fields.h |  14 +
> > >  lib/logical-fields.c         |  26 ++
> > >  lib/ovn-util.c               |   2 +-
> > >  lib/ovn-util.h               |   4 +-
> > >  northd/en-global-config.c    |  75 ++++
> > >  northd/en-global-config.h    |  12 +-
> > >  northd/en-multicast.c        |   2 +-
> > >  northd/en-northd.c           |   8 +
> > >  northd/en-sync-sb.c          |  16 +-
> > >  northd/inc-proc-northd.c     |   6 +-
> > >  northd/northd.c              | 804 +++++++++++++++++++++++++++++++++--
> > >  northd/northd.h              |  41 +-
> > >  ovn-nb.ovsschema             |  64 ++-
> > >  ovn-nb.xml                   | 123 ++++++
> > >  ovn-sb.ovsschema             |  12 +-
> > >  ovn-sb.xml                   |  22 +-
> > >  tests/ovn-controller.at      |   6 +-
> > >  tests/ovn-nbctl.at           |  83 ++++
> > >  tests/ovn-northd.at          | 560 +++++++++++++++++-------
> > >  tests/ovn.at                 | 143 +++++++
> > >  utilities/ovn-nbctl.c        | 533 ++++++++++++++++++++++-
> > >  23 files changed, 2807 insertions(+), 272 deletions(-)
> > >
> > > --
> > > 2.39.3
> > >
> > > _______________________________________________
> > > dev mailing list
> > > d...@openvswitch.org
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=uXnTjPDrt8WYa8nbZqANTqL0TyzFTTKpPHphGFPgvBw&m=WMlHw4tvl9h1HxlfXZGmd-QBD4R2NBPS4sjPIi7deJ9xmE4nGPuckQfRwTypVMq6&s=KdbegtAeld5Zv2Q-j6bnheMC9yAD_JMoE4AKBcQVJ2c&e=
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH OVN v4 0/5] Network Function Insertion.

Reply via email to