On Thu, Mar 13, 2025 at 4:46 AM Sragdhara Datta Chaudhuri
<sragdha.chau...@nutanix.com> wrote:
>
> RFC: NETWORK FUNCTION INSERTION IN OVN
>
> 1. Introduction
> ================
> The objective is to insert a Network Function (NF) in the path of 
> outbound/inbound traffic from/to a port-group. The use case is to integrate a 
> 3rd party service in the path of traffic. An example of such a service would 
> be layer7 firewall. The NF VM will be like a bump in the wire and should not 
> modify the packet, i.e. the IP header, the MAC addresses, VLAN tag, sequence 
> numbers remain unchanged.
>
> Here are some of the highlights:
> - A new entity network-function (NF) has been introduced. It contains a pair 
> of LSPs. The CMS would designate one as “inport” and the other as “outport”.
> - For high-availability, a network function group (NFG) entity consists of a 
> group of NFs. Only one NF in a NFG has an active role based on health 
> monitoring.
> - ACL would accept NFG as a parameter and traffic matching the ACL would be 
> redirected to the associated active NF’s port. NFG is accepted for stateful 
> allow action only.
> - The ACL’s port-group is the point of reference when defining the role of 
> the NF ports. The “inport” is the port closer to the port-group and “outport” 
> is the one away from it. For from-lport ACLs, the request packets would be 
> redirected to the NF “inport” and for to-lport ACLs, the request packets 
> would be redirected to NF “outport”. When the same packet comes out of the 
> other NF port, it gets simply forwarded.
> - Statefulness will be maintained, i.e. the response traffic will also go 
> through the same pair of NF ports but in reverse order.
> - For the NF ports we need to disable port security check, fdb learning and 
> multicast/broadcast forwarding.
> - Health monitoring involves ovn-controller periodically injecting ICMP probe 
> packets into the NF inport and monitor the same packet coming out of the NF 
> outport.
> - If the traffic redirection involves cross-host traffic (e.g. for a 
> from-lport ACL, if the source VM and NF VM are on different hosts), packets 
> would be tunneled to and from the NF VM's host.
> - If the port-group to which the ACL is being applied has members spread 
> across multiple LSs, CMS needs to create child ports for the NF ports on each 
> of these LSs. The redirection rules in each LS will use the child ports on 
> that LS.
>
> 2. NB tables
> =============
> New NB tables
> —------------
> Network_Function: Each row contains {inport, outport, health_check}
> Network_Function_Group: Each row contains a list of Network_Function 
> entities. It also contains a unique id (between 1 and 255, generated by 
> northd) and a reference to the current active NF.
> Network_Function_Health_Check: Each row contains configuration for probes in 
> options field: {interval, timeout, success_count, failure_count}
>
>         "Network_Function_Health_Check": {
>             "columns": {
>                 "name": {"type": "string"},
>                 "options": {
>                      "type": {"key": "string",
>                               "value": "string",
>                               "min": 0,
>                               "max": "unlimited"}},
>                 "external_ids": {
>                     "type": {"key": "string", "value": "string",
>                              "min": 0, "max": "unlimited"}}},
>             "isRoot": true},
>         "Network_Function": {
>             "columns": {
>                 "name": {"type": "string"},
>                 "outport": {"type": {"key": {"type": "uuid",
>                                              "refTable": 
> "Logical_Switch_Port",
>                                              "refType": "strong"},
>                                      "min": 1, "max": 1}},
>                 "inport": {"type": {"key": {"type": "uuid",
>                                             "refTable": "Logical_Switch_Port",
>                                             "refType": "strong"},
>                                     "min": 1, "max": 1}},
>                 "health_check": {"type": {
>                     "key": {"type": "uuid",
>                             "refTable": "Network_Function_Health_Check",
>                             "refType": "strong"},
>                     "min": 0, "max": 1}},
>                 "external_ids": {
>                     "type": {"key": "string", "value": "string",
>                              "min": 0, "max": "unlimited"}}},
>             "isRoot": true},
>         "Network_Function_Group": {
>             "columns": {
>                 "name": {"type": "string"},
>                 "network_function": {"type":
>                                   {"key": {"type": "uuid",
>                                            "refTable": "Network_Function",
>                                            "refType": "strong"},
>                                            "min": 0, "max": "unlimited"}},
>                 "mode": {"type": {"key": {"type": "string",
>                                           "enum": ["set", ["inline"]]}}},
>                 "network_function_active": {"type":
>                                   {"key": {"type": "uuid",
>                                            "refTable": "Network_Function",
>                                            "refType": "strong"},
>                                            "min": 0, "max": 1}},
>                 "id": {
>                      "type": {"key": {"type": "integer",
>                                       "minInteger": 0,
>                                       "maxInteger": 255}}},
>                 "external_ids": {
>                     "type": {"key": "string", "value": "string",
>                              "min": 0, "max": "unlimited"}}},
>             "isRoot": true},
>
>
> Modified NB table
> —----------------
> ACL: The ACL entity would have a new optional field that is a reference to a 
> Network_Function_Group entity. This field can be present only for stateful 
> allow ACLs.
>
>         "ACL": {
>             "columns": {
>                 "network_function_group": {"type": {"key": {"type": "uuid",
>                                            "refTable": 
> "Network_Function_Group",
>                                            "refType": "strong"},
>                                            "min": 0,
>                                            "max": 1}},
>
> New options for Logical_Switch_Port
> —----------------------------------
> receive_multicast=<boolean>: Default true. If set to false, LS will not 
> forward broadcast/multicast traffic to this port. This is to prevent looping 
> of such packets.
>
> lsp_learn_fdb=<boolean>: Default true. If set to false, fdb learning will be 
> skipped for packets coming out of this port. Redirected packets from the NF 
> port would be carrying the originating VM’s MAC in source, and so learning 
> should not happen.
>
> CMS needs to set both the above options to false for NF ports, in addition to 
> disabling port security.
>
> network-function-linked-port=<lsp-name>: Each NF port needs to have this set 
> to the other NF port of the pair.
>
> New NB_global options
> —--------------------
> svc_monitor_mac_dst: destination MAC of probe packets (svc_monitor_mac is 
> already there and will be used as source MAC)
> svc_monitor_ip4: source IP of probe packets
> svc_monitor_ip4_dst: destination IP of probe packets
>
> Sample configuration
> —-------------------
> ovn-nbctl ls-add ls1
> ovn-nbctl lsp-add ls1 nfp1
> ovn-nbctl lsp-add ls1 nfp2
> ovn-nbctl set logical_switch_port nfp1 options:receive_multicast=false 
> options:lsp_learn_fdb=false options:network-function-linked-port=nfp2
> ovn-nbctl set logical_switch_port nfp2 options:receive_multicast=false 
> options:lsp_learn_fdb=false options:network-function-linked-port=nfp1
> ovn-nbctl network-function-add nf1 nfp1 nfp2
> ovn-nbctl network-function-group-add nfg1 nf1
> ovn-nbctl lsp-add ls1 p1 -- lsp-set-addresses p1 "50:6b:8d:3e:ed:c4 10.1.1.4"
> ovn-nbctl pg-add pg1 p1
> ovn-nbctl create Address_Set name=as1 addresses=10.1.1.4
> ovn-nbctl lsp-add ls1 p2 -- lsp-set-addresses p2 "50:6b:8d:3e:ed:c5 10.1.1.5"
> ovn-nbctl create Address_Set name=as2 addresses=10.1.1.5
> ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' 
> allow-related nfg1
> ovn-nbctl acl-add pg1 to-lport 100 'outport==@pg1 && ip4.src == $as2' 
> allow-related nfg1
>
> 3. SB tables
> ============
> Service_Monitor:
> This is currently used by Load balancer. New fields are: “type” - to indicate 
> LB or NF, “mac” - the destination MAC address for monitor packets, 
> “logical_input_port” - the LSP to which the probe packet would be sent. Also, 
> “icmp” has been added as a protocol type, used only for NF.
>
>          "Service_Monitor": {
>              "columns": {
>                "type": {"type": {"key": {
>                           "type": "string",
>                           "enum": ["set", ["load-balancer", 
> "network-function"]]}}},
>                "mac": {"type": "string"},
>                  "protocol": {
>                      "type": {"key": {"type": "string",
>                             "enum": ["set", ["tcp", "udp", "icmp"]]},
>                               "min": 0, "max": 1}},
>                "logical_input_port": {"type": "string"},
>
> northd would create one Service_Monitor entity for each NF. The 
> logical_input_port and logical_port would be populated from the NF inport and 
> outport fields respectively. The probe packets would be injected into the 
> logical_input_port and would be monitored out of logical_port.
>
> 4. Logical Flows
> ================
> Logical Switch ingress pipeline:
> - in_network_function added after in_stateful.
> - Modifications to in_acl_eval, in_stateful and in_l2_lookup.
> Logical Switch egress pipeline:
> - out_network_function added after out_stateful.
> - Modifications to out_pre_acl, out_acl_eval and out_stateful.
>
> 4.1 from-lport ACL
> ------------------
> The diagram shows the request path for packets from VM1 port p1, which is a 
> member of the pg to which ACL is applied. The response would follow the 
> reverse path, i.e. packet would be redirected to nfp2 and come out of nfp1 
> and be forwarded to p1.
> Also, p2 does not need to be on the same LS. Only the p1, nfp1, nfp2 are on 
> the same LS.
>
>       -----                  -------                  -----
>      | VM1 |                | NF VM |                | VM2 |
>       -----                  -------                  -----
>         |                    /\    |                   / \
>         |                    |     |                    |
>        \ /                   |    \ /                   |
>    ------------------------------------------------------------
>   |     p1                 nfp1  nfp2                   p2     |
>   |                                                            |
>   |                      Logical Switch                        |
>    -------------------------------------------------------------
> pg1: [p1]         as2: [p2-ip]
> ovn-nbctl network-function-add nf1 nfp1 nfp2
> ovn-nbctl network-function-group-add nfg1 nf1
> ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' 
> allow-related nfg1
> Say, the unique id northd assigned to this NFG, is 123
>
> The request packets from p1 matching a from-lport ACL with NFG, are 
> redirected to nfp1 and the NFG id is committed to the ct label in p1's zone. 
> When the same packet comes out of nfp2 it gets forwarded the normal way.
> Response packets have destination as p1's MAC. Ingress processing sets the 
> outport to p1 and the CT lookup in egress pipeline (in p1's ct zone) yields 
> the NFG id and the packet injected back to ingress pipeline after setting the 
> outport to nfp2.
>
> Below are the changes in detail.
>
> 4.1.1 Request processing
> ------------------------
>
> in_acl_eval: For from-lport ACLs with NFG, the existing rule's action has 
> been enhanced to set:
>  - reg8[21] = 1: to indicate that packet has matched a rule with NFG
>  - reg5[0..7] = <NFG-unique-id>
>  - reg8[22] = <direction> (1: request, 0: response)
>
>   table=8 (ls_in_acl_eval), priority=1200 , match=(reg0[7] == 1 && 
> (inport==@pg1 && ip4.dst == $as2)), action=(reg8[16] = 1; reg0[1] = 1; 
> reg8[21] = 1; reg8[22] = 1; reg5[0..7] = 123; next;)
>   table=8 (ls_in_acl_eval), priority=1200 , match=(reg0[8] == 1 && 
> (inport==@pg1 && ip4.dst == $as2)), action=(reg8[16] = 1; reg8[21] = 1; 
> reg8[22] = 1; reg5[0..7] = 123; next;)
>
> in_stateful: Priority 110: set NFG id in CT label if reg8[21] is set.
>  - bit 7 (ct_label.network_function_group): Set to 1 to indicate NF insertion.
>  - bits 17 to 24 (ct_label.network_function_group_id): Stores the 8 bit NFG id
>
>   table=21(ls_in_stateful     ), priority=110  , match=(reg0[1] == 1 && 
> reg0[13] == 0 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; 
> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
> ct_label.network_function_group = 1; ct_label.network_function_group_id = 
> reg5[0..7]; }; next;)
>   table=21(ls_in_stateful     ), priority=110  , match=(reg0[1] == 1 && 
> reg0[13] == 1 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; 
> ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; 
> ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; 
> ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 1; 
> ct_label.network_function_group_id = reg5[0..7]; }; next;)
>   table=21(ls_in_stateful     ), priority=100  , match=(reg0[1] == 1 && 
> reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; 
> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
> ct_label.network_function_group = 0; ct_label.network_function_group_id = 0; 
> }; next;)
>   table=21(ls_in_stateful     ), priority=100  , match=(reg0[1] == 1 && 
> reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; 
> ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; 
> ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; 
> ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 0; 
> ct_label.network_function_group_id = 0; }; next;)
>   table=21(ls_in_stateful     ), priority=0    , match=(1), action=(next;)
>
>
> For non-NFG cases, the existing priority 100 rules will be hit. There 
> additional action has been added to clear the NFG bits in ct label.
>
> in_network_function: A new stage with priority 99 rules to redirect packets 
> by setting outport to the NF “inport” (or its child port) based on the NFG id 
> set by the prior ACL stage.
> Priority 100 rules ensure that when the same packets come out of the NF 
> ports, they are not redirected again (the setting of reg5 here relates to the 
> cross-host packet tunneling and will be explained later).
> Priority 1 rule: if reg8[21] is set, but the NF port (or child port) is not 
> present on this LS, drop packets.
>
>   table=22(ls_in_network_function), priority=100  , match=(inport == "nfp1"), 
> action=(reg5[16..31] = ct_label.tun_if_id; next;)
>   table=22(ls_in_network_function), priority=100  , match=(inport == "nfp2"), 
> action=(reg5[16..31] = ct_label.tun_if_id; next;)
>   table=22(ls_in_network_function), priority=100  , match=(reg8[21] == 1 && 
> eth.mcast), action=(next;)
>   table=22(ls_in_network_function), priority=99   , match=(reg8[21] == 1 && 
> reg8[22] == 1 && reg5[0..7] == 1), action=(outport = "nfp1"; output;)
>   table=22(ls_in_network_function), priority=1    , match=(reg8[21] == 1), 
> action=(drop;)
>   table=22(ls_in_network_function), priority=0    , match=(1), action=(next;)
>
>
> 4.1.2 Response processing
> -------------------------
> out_acl_eval: High priority rules that allow response and related packets to 
> go through have been enhanced to also copy CT label NFG bit into reg8[21].
>
>   table=6(ls_out_acl_eval), priority=65532, match=(!ct.est && ct.rel && 
> !ct.new && !ct.inv && ct_mark.blocked == 0), action=(reg8[21] = 
> ct_label.network_function_group; reg8[16] = 1; ct_commit_nat;)
>   table=6(ls_out_acl_eval), priority=65532, match=(ct.est && !ct.rel && 
> !ct.new && !ct.inv && ct.rpl && ct_mark.blocked == 0), action=(reg8[21] = 
> ct_label.network_function_group; reg8[16] = 1; next;)
>
> out_network_function: Priority 99 rule matches on the nfg_id in ct_label and 
> sets the outport to the NF “outport”. It also sets reg8[23]=1 and injects the 
> packet to ingress pipeline (in_l2_lookup).
> Priority 100 rule forwards all packets to NF ports to the next table.
>
>   table=11 (ls_out_network_function), priority=100  , match=(outport == 
> "nfp1"), action=(next;)
>   table=11 (ls_out_network_function), priority=100  , match=(outport == 
> "nfp2"), action=(next;)
>   table=11(ls_out_network_function), priority=100  , match=(reg8[21] == 1 && 
> eth.mcast), action=(next;)
>   table=11 (ls_out_network_function), priority=99   , match=(reg8[21] == 1 && 
> reg8[22] == 0 && ct_label.network_function_group_id == 123), action=(outport 
> = "nfp2"; reg8[23] = 1; next(pipeline=ingress, table=29);)
>   table=11 (ls_out_network_function), priority=1    , match=(reg8[21] == 1), 
> action=(drop;)
>   table=11 (ls_out_network_function), priority=0    , match=(1), 
> action=(next;)
>
> in_l2_lkup: if reg8[23] == 1 (packet has come back from egress), simply 
> forward such packets as outport is already set.
>
>   table=29(ls_in_l2_lkup), priority=100  , match=(reg8[23] == 1), 
> action=(output;)
>
> The above set of rules ensure that the response packet is sent to nfp2. When 
> the same packet comes out of nfp1, the ingress pipeline would set the outport 
> to p1 and it enters the egress pipeline.
>
> out_pre_acl: If the packet is coming from the NF inport, skip the egress 
> pipeline upto the out_nf stage, as the packet has already gone through it and 
> we don't want the same packet to be processed by CT twice.
>   table=2 (ls_out_pre_acl     ), priority=110  , match=(inport == "nfp1"), 
> action=(next(pipeline=egress, table=12);)
>
>
> 4.2 to-lport ACL
> ----------------
>       -----                  --------                  -----
>      | VM1 |                |  NF VM |                | VM2 |
>       -----                  --------                  -----
>        / \                    |   / \                    |
>         |                     |    |                     |
>         |                    \ /   |                    \ /
>    -------------------------------------------------------------
>   |     p1                  nfp1   nfp2                  p2     |
>   |                                                             |
>   |                      Logical Switch                         |
>    -------------------------------------------------------------
> ovn-nbctl acl-add pg1 to-lport 100 'outport==@pg1&& ip4.src == $as2' 
> allow-related nfg1
> Diagram shows request traffic path. The response will follow a reverse path.
>
> Ingress pipeline sets the outport to p1 based on destination MAC lookup. The 
> packet enters the egress pipeline. There the to-lport ACL with NFG gets 
> evaluated and the NFG id gets committed to the CT label. Then the outport is 
> set to nfp2 and then the packet is injected back to ingress. When the same 
> packet comes out of nfp1, it gets forwarded to p1 the normal way.
> From the response packet from p1, ingress pipeline gets the NFG id from CT 
> label and accordingly redirects it to nfp1. When it comes out of nfp2 it is 
> forwarded the normal way.
>
> 4.2.1 Request processing
> ------------------------
> out_acl_eval: For to-lport ACLs with NFG, the existing rule's action has been 
> enhanced to set:
>  - reg8[21] = 1: to indicate that packet has matched a rule with NFG
>  - reg5[0..7] = <NFG-unique-id>
>  - reg8[22] = <direction> (1: request, 0: response)
>
>   table=6 (ls_out_acl_eval    ), priority=1100 , match=(reg0[7] == 1 && 
> (outport==@pg1 && ip4.src == $as2)), action=(reg8[16] = 1; reg0[1] = 1; 
> reg8[21] = 1; reg8[22] = 1; reg5[0..7] = 1; next;)
>   table=6 (ls_out_acl_eval    ), priority=1100 , match=(reg0[8] == 1 && 
> (outport==@pg1 && ip4.src == $as2)), action=(reg8[16] = 1; reg0[1] = 1; 
> reg8[21] = 1; reg8[22] = 1; reg5[0..7] = 1; next;)
>
>
>
> Out_stateful: Priority 110: set NFG id in CT label if reg8[21] is set.
>
>   table=10(ls_out_stateful    ), priority=110  , match=(reg0[1] == 1 && 
> reg0[13] == 0 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; 
> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
> ct_label.network_function_group = 1; ct_label.network_function_group_id = 
> reg5[0..7]; }; next;)
>   table=10(ls_out_stateful    ), priority=110  , match=(reg0[1] == 1 && 
> reg0[13] == 1 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; 
> ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; 
> ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; 
> ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 1; 
> ct_label.network_function_group_id = reg5[0..7]; }; next;)
>   table=10(ls_out_stateful    ), priority=100  , match=(reg0[1] == 1 && 
> reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; 
> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
> ct_label.network_function_group = 0; ct_label.network_function_group_id = 0; 
> }; next;)
>   table=10(ls_out_stateful    ), priority=100  , match=(reg0[1] == 1 && 
> reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; 
> ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; 
> ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; 
> ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 0; 
> ct_label.network_function_group_id = 0; }; next;)
>   table=10(ls_out_stateful    ), priority=0    , match=(1), action=(next;)
>
> out_network_function: A new stage that has priority 99 rules to redirect 
> packet by setting outport to the NF “outport” (or its child port) based on 
> the NFG id set by the prior ACL stage, and then injecting back to ingress. 
> Priority 100 rules ensure that when the packets are going to NF ports, they 
> are not redirected again.
> Priority 1 rule: if reg8[21] is set, but the NF port (or child port) is not 
> present on this LS, drop packets.
>
>   table=11(ls_out_network_function), priority=100  , match=(outport == 
> "nfp1"), action=(next;)
>   table=11(ls_out_network_function), priority=100  , match=(outport == 
> "nfp2"), action=(next;)
>   table=11(ls_out_network_function), priority=100  , match=(reg8[21] == 1 && 
> eth.mcast), action=(next;)
>   table=11(ls_out_network_function), priority=99   , match=(reg8[21] == 1 && 
> reg8[22] == 1 && reg5[0..7] == 123), action=(outport = "nfp2"; reg8[23] = 1; 
> next(pipeline=ingress, table=29);)
>   table=11(ls_out_network_function), priority=1    , match=(reg8[21] == 1), 
> action=(drop;)
>   table=11(ls_out_network_function), priority=0    , match=(1), action=(next;)
>
>
> in_l2_lkup: As described earlier, the priority 100 rule will forward these 
> packets.
>
> Then the same packet comes out from nfp1 and goes through the ingress 
> processing where the outport gets set to p1. The egress pipeline out_pre_acl 
> priority 110 rule described earlier, matches against inport as nfp1 and 
> directly jumps to the stage after out_network_function. Thus the packet is 
> not redirected again.
>
> 4.2.2 Response processing
> -------------------------
> in_acl_eval: High priority rules that allow response and related packets to 
> go through have been enhanced to also copy CT label NFG bit into reg8[21].
>
>   table=8(ls_in_acl_eval), priority=65532, match=(!ct.est && ct.rel && 
> !ct.new && !ct.inv && ct_mark.blocked == 0), action=(reg0[17] = 1; reg8[21] = 
> ct_label.network_function_group; reg8[16] = 1; ct_commit_nat;)
>   table=8 (ls_in_acl_eval), priority=65532, match=(ct.est && !ct.rel && 
> !ct.new && !ct.inv && ct.rpl && ct_mark.blocked == 0), action=(reg0[9] = 0; 
> reg0[10] = 0; reg0[17] = 1; reg8[21] = ct_label.network_function_group; 
> reg8[16] = 1; next;)
>
> in_network_function: Priority 99 rule matches on the nfg_id in ct_label and 
> sets the outport to the NF “inport”.
> Priority 100 rule forwards all packets to NF ports to the next table.
>   table=22(ls_in_network_function), priority=99   , match=(reg8[21] == 1 && 
> reg8[22] == 0 && ct_label.network_function_group_id == 123), action=(outport 
> = "nfp1"; output;)
>
>
> 5. Cross-host Traffic for VLAN Network
> ======================================
> For overlay subnets, all cross-host traffic exchanges are tunneled. In the 
> case of VLAN subnets, there needs to be special handling to selectively 
> tunnel only the traffic to or from the NF ports.
> Take the example of a from-lport ACL. Packets from p1 to p2, gets redirected 
> to nfp1 in host1. If this packet is simply sent out from host1, the physical 
> network will directly forward it to host2 where VM2 is. So, we need to tunnel 
> the redirected packets from host1 to host3. Now, once the packets come out of 
> nfp2, if host3 sends the packets out, the physical network would learn p1's 
> MAC coming from host3. So, these packets need to be tunneled back to host1. 
> From there the packet would be forwarded to VM2 via the physical network.
>
>       -----                  -----                  --------
>      | VM2 |                | VM1 |                | NF VM  |
>       -----                  -----                  --------
>        / \                     |                    / \   |
>         | (7)                  |  (1)             (3)|    |(4)
>         |                     \ /                    |   \ /
>   --------------        --------------   (2)    ---------------
>  |      p2      |  (6) |      p1      |______\ |   nfp1  nfp2  |
>  |              |/____ |              |------/ |               |
>  |    host2     |\     |     host1    |/______ |     host3     |
>  |              |      |              |\------ |               |
>   --------------        --------------   (5)    --------------
>
> The above figure shows the request packet path for a from-lport ACL. Response 
> would follow the same path in reverse direction.
>
> To achieve this, the following would be done:
>
> On host where the ACL port group members are present (host1)
> —-----------------------------------------------------------
> REMOTE_OUTPUT (table 42):
> Currently, it tunnels traffic destined to all non-local overlay ports to 
> their associated hosts. The same rule is now also added for traffic to 
> non-local NF ports. Thus the packets from p1 get tunneled to host 3.
>
> On host with NF (host3) forward packet to nfp1
> —----------------------------------------------
> Upon reaching host3, the following rules come into play:
> PHY_TO_LOG (table 0):
> Ppriority 100: Existing rule - for each geneve tunnel interface on the 
> chassis, copies info from header to inport, outport, metadata registers. Now 
> the same rule also stores the tun intf id in a register (reg5[16..31]).
>
> CHECK_LOOPBACK (table 44)
> This table has a rule that clears all the registers. The change is to skip 
> the clearing of reg5[16..31].
>
> Logical egress pipeline:
>
> ls_out_stateful priority 120: If the outport is an NF port, copy reg5[16..31] 
> (table0 had set it) to ct_label.tun_if_id.)
>
>   table=10(ls_out_stateful    ), priority=120  , match=(outport == "nfp1" && 
> reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; 
> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
> ct_label.tun_if_id = reg5[16..31]; }; next;)
>   table=10(ls_out_stateful    ), priority=120  , match=(outport == "nfp1" && 
> reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; 
> ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; 
> ct_mark.obs_stage = reg8[19..20]; ct_mark.obs_collector_id = reg8[8..15]; 
> ct_label.obs_point_id = reg9; ct_label.tun_if_id = reg5[16..31]; }; next;)
>
> The above sequence of flows ensure that if a packet is received via tunnel on 
> host3, with outport as nfp1, the tunnel interface id is committed to the ct 
> entry in nfp1's zone.
>
> On host with NF (host3) tunnel packets from nfp2 back to host1
> —--------------------------------------------------------------
> When the same packet comes out of nfp2 on host3:
>
> LOCAL_OUTPUT (table 43)
> When the packet comes out of the other NF port (nfp2), following two rules 
> send it back to the host that it originally came from:
>
> Priority 110: For each NF port local to this host, following rule processes 
> the
> packet through CT of linked port (for nfp2, it is nfp1):
>   match: inport==nfp2 && RECIRC_BIT==0
>   action: RECIRC_BIT = 1, ct(zone=nfp1’s zone, table=LOCAL), resubmit to 
> table 43
>
> Priority 109: For each {tunnel_id, nf port} on this host, if the tun_if_id in 
> ct_label matches the tunnel_id, send the recirculated packet using tunnetl_id:
>   match: inport==nfp1 && RECIRC_BIT==1 && ct_label.tun_if_id==<tun-id>
>   action: tunnel packet using tun-id
>
> If p1 and nfp1 happen to be on the same host, the tun_if_id would not be set 
> and thus none of the priority 109 rules would match. It would be forwarded 
> the usual way matching the existing priority 100 rules in LOCAL_TABLE.
>
> Special handling of the case where NF responds back on nfp1, instead of 
> forwarding packet out of nfp2:
> For example, a SYN packet from p1 got redirected to nfp1. Then the NF, which 
> is a firewall VM, drops the SYN and sends RST back on port nfp1. In this 
> case, looking up in the linked port (nfp2) ct zone will not give anything. 
> The following rule uses ct.inv to identify such scenarios and uses nfp1’s CT 
> zone to send the packet back. To achieve this, following 2 rules are 
> installed:
>
> in_network_function:
> Priority 100 rule that allows packets incoming from NF type ports, is 
> enhanced with additional action to store the tun_if_id from ct_label into 
> reg5[16..31].
>   table=22(ls_in_network_function), priority=100  , match=(inport == "nfp1"), 
> action=(reg5[16..31] = ct_label.tun_if_id; next;)
>
> LOCAL_OUTPUT (table 43)
> Priority 110 rule: for recirculated packets, if ct (of the linked port) is 
> invalid, use the tun id from reg5[16..31] to tunnel the packet back to host1 
> (as CT zone info has been overwritten in the above 110 priority rule in table 
> 42).
>       match: inport==nf1 && RECIRC_BIT==1 && ct.inv && 
> MFF_LOG_TUN_OFPORT==<tun-id>
>       action: tunnel packet using tun-id
>
>
> 6. NF insertion across logical switches
> =======================================
> If the port-group where the ACL is being applied has members across multiple 
> logical switches, there needs to be a NF port pair on each of these switches.
> The NF VM will have only one inport and one outport. The CMS is expected to 
> create child ports linked to these ports on each logical switch where 
> port-group members are present.
> The network-function entity would be configured with the parent ports only. 
> When CMS creates the child ports, it does not need to change any of the NF, 
> NFG or ACL config tables.
> When northd configures the redirection rules for a specific LS, it will use 
> the parent or child port depending on what it finds on that LS.
>                                      --------
>                                     | NF VM  |
>                                      --------
>                                      |      |
>           -----                      |      |              -----
>          | VM1 |                    nfp1   nfp2           | VM2 |
>           ---- -   |     |         --------------          -----    |      |
>             |      |     |        |    SVC LS    |          |       |      |
>           p1|  nfp1_ch1  nfp2_ch1  --------------         p3|  nfp1_ch2  
> nfp2_ch2
>           --------------------                             
> --------------------
>          |         LS1        |                           |         LS2       
>  |
>           --------------------                             
> --------------------
>
> In this example, the CMS created the parent ports for the NF VM on LS named 
> SVC LS. The ports are nfp1 and nfp2. The CMS configures the NF using these 
> ports:
> ovn-nbctl network-function-add nf1 nfp1 nfp2
> ovn-nbctl network-function-group-add nfg1 nf1
> ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' 
> allow-related nfg1
>
> The port group to which the ACL is applied is pg1 and pg1 has two ports: p1 
> on LS1 and p3 on LS2.
> The CMS needs to create child ports for the NF ports on LS1 and LS2. On LS1: 
> nfp1_ch1 and nfp2_ch1. On LS2: nfp1_ch2 and nfp2_ch2
>
> When northd creates rules on LS1, it would use nfp1_ch1 and nfp2_ch1.
>
>   table=22(ls_in_network_function), priority=100  , match=(inport == 
> "nfp2_ch1"), action=(reg5[16..31] = ct_label.tun_if_id; next;)
>   table=22(ls_in_network_function), priority=99   , match=(reg8[21] == 1 && 
> reg8[22] == 1 && reg5[0..7] == 1), action=(outport = "nfp1_ch1"; output;)
>
> When northd is creating rules on LS2, it would use nfp1_ch2 and nfp2_ch2.
>   table=22(ls_in_network_function), priority=100  , match=(inport == 
> "nfp2_ch2"), action=(reg5[16..31] = ct_label.tun_if_id; next;)
>   table=22(ls_in_network_function), priority=99   , match=(reg8[21] == 1 && 
> reg8[22] == 1 && reg5[0..7] == 1), action=(outport = "nfp1_ch2"; output;)
>
>

Hi Sragdhara,

Sorry for the late reviews on this patch series.  I haven't looked
into the series yet.  I plan to take a look this week.  Is it possible
to rebase and submit v3 ? As it has conflicts.

Thanks
Numan

> 7. Health Monitoring
> ====================
> The LB health monitoring functionality has been extended to support NFs. 
> Network_Function_Group has a list of Network_Functions, each of which has a 
> reference to network_Function_Health_Check that has the monitoring config. 
> There is a corresponding SB service_monitor maintaining the online/offline 
> status. When status changes, northd picks one of the “online” NFs and sets it 
> in the network_function_active field of NFG. The redirection rule in LS uses 
> the ports from this NF.
>
> Ovn-controller performs the health monitoring by sending ICMP echo request 
> with source IP and MAC from NB global options “svc_monitor_ip4” and 
> “svc_monitor_mac”, and destination IP and MAC from new NB global options 
> “svc_monitor_ip4_dst” and “svc_monitor_mac_dst”. The sequence number and id 
> are randomly generated and stored in service_mon. The NF VM forwards the same 
> packet out of the other port. When it comes out, ovn-controller matches the 
> sequence number and id with stored values and marks online if matched.
>
> V1:
>   - First patch.
>
> V2:
>   - Rebased code.
>   - Added "mode" field in Network_function_group table, with only allowed
>     value as "inline". This is for future expansion to include "mirror" mode.
>   - Added a flow in the in_network_function and out_network_function table to
>     skip redirection of multicast traffic.
>
> Sragdhara Datta Chaudhuri (5):
>   ovn-nb: Network Function insertion OVN-NB schema changes
>   ovn-nbctl: Network Function insertion commands.
>   northd, tests: Network Function insertion logical flow programming.
>   controller, tests: Network Function insertion tunneling of cross-host
>     VLAN traffic.
>   northd, controller: Network Function Health monitoring.
>
>  controller/physical.c        | 249 ++++++++++-
>  controller/pinctrl.c         | 252 +++++++++--
>  include/ovn/logical-fields.h |  16 +-
>  lib/logical-fields.c         |  26 ++
>  lib/ovn-util.h               |   2 +-
>  northd/en-global-config.c    |  75 ++++
>  northd/en-global-config.h    |  12 +-
>  northd/en-multicast.c        |   2 +-
>  northd/en-northd.c           |   8 +
>  northd/en-sync-sb.c          |  16 +-
>  northd/inc-proc-northd.c     |   6 +-
>  northd/northd.c              | 789 +++++++++++++++++++++++++++++++++--
>  northd/northd.h              |  39 +-
>  ovn-nb.ovsschema             |  64 ++-
>  ovn-nb.xml                   | 123 ++++++
>  ovn-sb.ovsschema             |  12 +-
>  ovn-sb.xml                   |  22 +-
>  tests/ovn-controller.at      |   6 +-
>  tests/ovn-nbctl.at           |  83 ++++
>  tests/ovn-northd.at          | 508 ++++++++++++++++------
>  tests/ovn.at                 | 137 ++++++
>  utilities/ovn-nbctl.c        | 533 ++++++++++++++++++++++-
>  22 files changed, 2747 insertions(+), 233 deletions(-)
>
> --
> 2.39.3
>
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to