On Thu, Mar 13, 2025 at 4:46 AM Sragdhara Datta Chaudhuri <sragdha.chau...@nutanix.com> wrote: > > RFC: NETWORK FUNCTION INSERTION IN OVN > > 1. Introduction > ================ > The objective is to insert a Network Function (NF) in the path of > outbound/inbound traffic from/to a port-group. The use case is to integrate a > 3rd party service in the path of traffic. An example of such a service would > be layer7 firewall. The NF VM will be like a bump in the wire and should not > modify the packet, i.e. the IP header, the MAC addresses, VLAN tag, sequence > numbers remain unchanged. > > Here are some of the highlights: > - A new entity network-function (NF) has been introduced. It contains a pair > of LSPs. The CMS would designate one as “inport” and the other as “outport”. > - For high-availability, a network function group (NFG) entity consists of a > group of NFs. Only one NF in a NFG has an active role based on health > monitoring. > - ACL would accept NFG as a parameter and traffic matching the ACL would be > redirected to the associated active NF’s port. NFG is accepted for stateful > allow action only. > - The ACL’s port-group is the point of reference when defining the role of > the NF ports. The “inport” is the port closer to the port-group and “outport” > is the one away from it. For from-lport ACLs, the request packets would be > redirected to the NF “inport” and for to-lport ACLs, the request packets > would be redirected to NF “outport”. When the same packet comes out of the > other NF port, it gets simply forwarded. > - Statefulness will be maintained, i.e. the response traffic will also go > through the same pair of NF ports but in reverse order. > - For the NF ports we need to disable port security check, fdb learning and > multicast/broadcast forwarding. > - Health monitoring involves ovn-controller periodically injecting ICMP probe > packets into the NF inport and monitor the same packet coming out of the NF > outport. > - If the traffic redirection involves cross-host traffic (e.g. for a > from-lport ACL, if the source VM and NF VM are on different hosts), packets > would be tunneled to and from the NF VM's host. > - If the port-group to which the ACL is being applied has members spread > across multiple LSs, CMS needs to create child ports for the NF ports on each > of these LSs. The redirection rules in each LS will use the child ports on > that LS. > > 2. NB tables > ============= > New NB tables > —------------ > Network_Function: Each row contains {inport, outport, health_check} > Network_Function_Group: Each row contains a list of Network_Function > entities. It also contains a unique id (between 1 and 255, generated by > northd) and a reference to the current active NF. > Network_Function_Health_Check: Each row contains configuration for probes in > options field: {interval, timeout, success_count, failure_count} > > "Network_Function_Health_Check": { > "columns": { > "name": {"type": "string"}, > "options": { > "type": {"key": "string", > "value": "string", > "min": 0, > "max": "unlimited"}}, > "external_ids": { > "type": {"key": "string", "value": "string", > "min": 0, "max": "unlimited"}}}, > "isRoot": true}, > "Network_Function": { > "columns": { > "name": {"type": "string"}, > "outport": {"type": {"key": {"type": "uuid", > "refTable": > "Logical_Switch_Port", > "refType": "strong"}, > "min": 1, "max": 1}}, > "inport": {"type": {"key": {"type": "uuid", > "refTable": "Logical_Switch_Port", > "refType": "strong"}, > "min": 1, "max": 1}}, > "health_check": {"type": { > "key": {"type": "uuid", > "refTable": "Network_Function_Health_Check", > "refType": "strong"}, > "min": 0, "max": 1}}, > "external_ids": { > "type": {"key": "string", "value": "string", > "min": 0, "max": "unlimited"}}}, > "isRoot": true}, > "Network_Function_Group": { > "columns": { > "name": {"type": "string"}, > "network_function": {"type": > {"key": {"type": "uuid", > "refTable": "Network_Function", > "refType": "strong"}, > "min": 0, "max": "unlimited"}}, > "mode": {"type": {"key": {"type": "string", > "enum": ["set", ["inline"]]}}}, > "network_function_active": {"type": > {"key": {"type": "uuid", > "refTable": "Network_Function", > "refType": "strong"}, > "min": 0, "max": 1}}, > "id": { > "type": {"key": {"type": "integer", > "minInteger": 0, > "maxInteger": 255}}}, > "external_ids": { > "type": {"key": "string", "value": "string", > "min": 0, "max": "unlimited"}}}, > "isRoot": true}, > > > Modified NB table > —---------------- > ACL: The ACL entity would have a new optional field that is a reference to a > Network_Function_Group entity. This field can be present only for stateful > allow ACLs. > > "ACL": { > "columns": { > "network_function_group": {"type": {"key": {"type": "uuid", > "refTable": > "Network_Function_Group", > "refType": "strong"}, > "min": 0, > "max": 1}}, > > New options for Logical_Switch_Port > —---------------------------------- > receive_multicast=<boolean>: Default true. If set to false, LS will not > forward broadcast/multicast traffic to this port. This is to prevent looping > of such packets. > > lsp_learn_fdb=<boolean>: Default true. If set to false, fdb learning will be > skipped for packets coming out of this port. Redirected packets from the NF > port would be carrying the originating VM’s MAC in source, and so learning > should not happen. > > CMS needs to set both the above options to false for NF ports, in addition to > disabling port security. > > network-function-linked-port=<lsp-name>: Each NF port needs to have this set > to the other NF port of the pair. > > New NB_global options > —-------------------- > svc_monitor_mac_dst: destination MAC of probe packets (svc_monitor_mac is > already there and will be used as source MAC) > svc_monitor_ip4: source IP of probe packets > svc_monitor_ip4_dst: destination IP of probe packets > > Sample configuration > —------------------- > ovn-nbctl ls-add ls1 > ovn-nbctl lsp-add ls1 nfp1 > ovn-nbctl lsp-add ls1 nfp2 > ovn-nbctl set logical_switch_port nfp1 options:receive_multicast=false > options:lsp_learn_fdb=false options:network-function-linked-port=nfp2 > ovn-nbctl set logical_switch_port nfp2 options:receive_multicast=false > options:lsp_learn_fdb=false options:network-function-linked-port=nfp1 > ovn-nbctl network-function-add nf1 nfp1 nfp2 > ovn-nbctl network-function-group-add nfg1 nf1 > ovn-nbctl lsp-add ls1 p1 -- lsp-set-addresses p1 "50:6b:8d:3e:ed:c4 10.1.1.4" > ovn-nbctl pg-add pg1 p1 > ovn-nbctl create Address_Set name=as1 addresses=10.1.1.4 > ovn-nbctl lsp-add ls1 p2 -- lsp-set-addresses p2 "50:6b:8d:3e:ed:c5 10.1.1.5" > ovn-nbctl create Address_Set name=as2 addresses=10.1.1.5 > ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' > allow-related nfg1 > ovn-nbctl acl-add pg1 to-lport 100 'outport==@pg1 && ip4.src == $as2' > allow-related nfg1 > > 3. SB tables > ============ > Service_Monitor: > This is currently used by Load balancer. New fields are: “type” - to indicate > LB or NF, “mac” - the destination MAC address for monitor packets, > “logical_input_port” - the LSP to which the probe packet would be sent. Also, > “icmp” has been added as a protocol type, used only for NF. > > "Service_Monitor": { > "columns": { > "type": {"type": {"key": { > "type": "string", > "enum": ["set", ["load-balancer", > "network-function"]]}}}, > "mac": {"type": "string"}, > "protocol": { > "type": {"key": {"type": "string", > "enum": ["set", ["tcp", "udp", "icmp"]]}, > "min": 0, "max": 1}}, > "logical_input_port": {"type": "string"}, > > northd would create one Service_Monitor entity for each NF. The > logical_input_port and logical_port would be populated from the NF inport and > outport fields respectively. The probe packets would be injected into the > logical_input_port and would be monitored out of logical_port. > > 4. Logical Flows > ================ > Logical Switch ingress pipeline: > - in_network_function added after in_stateful. > - Modifications to in_acl_eval, in_stateful and in_l2_lookup. > Logical Switch egress pipeline: > - out_network_function added after out_stateful. > - Modifications to out_pre_acl, out_acl_eval and out_stateful. > > 4.1 from-lport ACL > ------------------ > The diagram shows the request path for packets from VM1 port p1, which is a > member of the pg to which ACL is applied. The response would follow the > reverse path, i.e. packet would be redirected to nfp2 and come out of nfp1 > and be forwarded to p1. > Also, p2 does not need to be on the same LS. Only the p1, nfp1, nfp2 are on > the same LS. > > ----- ------- ----- > | VM1 | | NF VM | | VM2 | > ----- ------- ----- > | /\ | / \ > | | | | > \ / | \ / | > ------------------------------------------------------------ > | p1 nfp1 nfp2 p2 | > | | > | Logical Switch | > ------------------------------------------------------------- > pg1: [p1] as2: [p2-ip] > ovn-nbctl network-function-add nf1 nfp1 nfp2 > ovn-nbctl network-function-group-add nfg1 nf1 > ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' > allow-related nfg1 > Say, the unique id northd assigned to this NFG, is 123 > > The request packets from p1 matching a from-lport ACL with NFG, are > redirected to nfp1 and the NFG id is committed to the ct label in p1's zone. > When the same packet comes out of nfp2 it gets forwarded the normal way. > Response packets have destination as p1's MAC. Ingress processing sets the > outport to p1 and the CT lookup in egress pipeline (in p1's ct zone) yields > the NFG id and the packet injected back to ingress pipeline after setting the > outport to nfp2. > > Below are the changes in detail. > > 4.1.1 Request processing > ------------------------ > > in_acl_eval: For from-lport ACLs with NFG, the existing rule's action has > been enhanced to set: > - reg8[21] = 1: to indicate that packet has matched a rule with NFG > - reg5[0..7] = <NFG-unique-id> > - reg8[22] = <direction> (1: request, 0: response) > > table=8 (ls_in_acl_eval), priority=1200 , match=(reg0[7] == 1 && > (inport==@pg1 && ip4.dst == $as2)), action=(reg8[16] = 1; reg0[1] = 1; > reg8[21] = 1; reg8[22] = 1; reg5[0..7] = 123; next;) > table=8 (ls_in_acl_eval), priority=1200 , match=(reg0[8] == 1 && > (inport==@pg1 && ip4.dst == $as2)), action=(reg8[16] = 1; reg8[21] = 1; > reg8[22] = 1; reg5[0..7] = 123; next;) > > in_stateful: Priority 110: set NFG id in CT label if reg8[21] is set. > - bit 7 (ct_label.network_function_group): Set to 1 to indicate NF insertion. > - bits 17 to 24 (ct_label.network_function_group_id): Stores the 8 bit NFG id > > table=21(ls_in_stateful ), priority=110 , match=(reg0[1] == 1 && > reg0[13] == 0 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; > ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; > ct_label.network_function_group = 1; ct_label.network_function_group_id = > reg5[0..7]; }; next;) > table=21(ls_in_stateful ), priority=110 , match=(reg0[1] == 1 && > reg0[13] == 1 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; > ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; > ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; > ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 1; > ct_label.network_function_group_id = reg5[0..7]; }; next;) > table=21(ls_in_stateful ), priority=100 , match=(reg0[1] == 1 && > reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; > ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; > ct_label.network_function_group = 0; ct_label.network_function_group_id = 0; > }; next;) > table=21(ls_in_stateful ), priority=100 , match=(reg0[1] == 1 && > reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; > ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; > ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; > ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 0; > ct_label.network_function_group_id = 0; }; next;) > table=21(ls_in_stateful ), priority=0 , match=(1), action=(next;) > > > For non-NFG cases, the existing priority 100 rules will be hit. There > additional action has been added to clear the NFG bits in ct label. > > in_network_function: A new stage with priority 99 rules to redirect packets > by setting outport to the NF “inport” (or its child port) based on the NFG id > set by the prior ACL stage. > Priority 100 rules ensure that when the same packets come out of the NF > ports, they are not redirected again (the setting of reg5 here relates to the > cross-host packet tunneling and will be explained later). > Priority 1 rule: if reg8[21] is set, but the NF port (or child port) is not > present on this LS, drop packets. > > table=22(ls_in_network_function), priority=100 , match=(inport == "nfp1"), > action=(reg5[16..31] = ct_label.tun_if_id; next;) > table=22(ls_in_network_function), priority=100 , match=(inport == "nfp2"), > action=(reg5[16..31] = ct_label.tun_if_id; next;) > table=22(ls_in_network_function), priority=100 , match=(reg8[21] == 1 && > eth.mcast), action=(next;) > table=22(ls_in_network_function), priority=99 , match=(reg8[21] == 1 && > reg8[22] == 1 && reg5[0..7] == 1), action=(outport = "nfp1"; output;) > table=22(ls_in_network_function), priority=1 , match=(reg8[21] == 1), > action=(drop;) > table=22(ls_in_network_function), priority=0 , match=(1), action=(next;) > > > 4.1.2 Response processing > ------------------------- > out_acl_eval: High priority rules that allow response and related packets to > go through have been enhanced to also copy CT label NFG bit into reg8[21]. > > table=6(ls_out_acl_eval), priority=65532, match=(!ct.est && ct.rel && > !ct.new && !ct.inv && ct_mark.blocked == 0), action=(reg8[21] = > ct_label.network_function_group; reg8[16] = 1; ct_commit_nat;) > table=6(ls_out_acl_eval), priority=65532, match=(ct.est && !ct.rel && > !ct.new && !ct.inv && ct.rpl && ct_mark.blocked == 0), action=(reg8[21] = > ct_label.network_function_group; reg8[16] = 1; next;) > > out_network_function: Priority 99 rule matches on the nfg_id in ct_label and > sets the outport to the NF “outport”. It also sets reg8[23]=1 and injects the > packet to ingress pipeline (in_l2_lookup). > Priority 100 rule forwards all packets to NF ports to the next table. > > table=11 (ls_out_network_function), priority=100 , match=(outport == > "nfp1"), action=(next;) > table=11 (ls_out_network_function), priority=100 , match=(outport == > "nfp2"), action=(next;) > table=11(ls_out_network_function), priority=100 , match=(reg8[21] == 1 && > eth.mcast), action=(next;) > table=11 (ls_out_network_function), priority=99 , match=(reg8[21] == 1 && > reg8[22] == 0 && ct_label.network_function_group_id == 123), action=(outport > = "nfp2"; reg8[23] = 1; next(pipeline=ingress, table=29);) > table=11 (ls_out_network_function), priority=1 , match=(reg8[21] == 1), > action=(drop;) > table=11 (ls_out_network_function), priority=0 , match=(1), > action=(next;) > > in_l2_lkup: if reg8[23] == 1 (packet has come back from egress), simply > forward such packets as outport is already set. > > table=29(ls_in_l2_lkup), priority=100 , match=(reg8[23] == 1), > action=(output;) > > The above set of rules ensure that the response packet is sent to nfp2. When > the same packet comes out of nfp1, the ingress pipeline would set the outport > to p1 and it enters the egress pipeline. > > out_pre_acl: If the packet is coming from the NF inport, skip the egress > pipeline upto the out_nf stage, as the packet has already gone through it and > we don't want the same packet to be processed by CT twice. > table=2 (ls_out_pre_acl ), priority=110 , match=(inport == "nfp1"), > action=(next(pipeline=egress, table=12);) > > > 4.2 to-lport ACL > ---------------- > ----- -------- ----- > | VM1 | | NF VM | | VM2 | > ----- -------- ----- > / \ | / \ | > | | | | > | \ / | \ / > ------------------------------------------------------------- > | p1 nfp1 nfp2 p2 | > | | > | Logical Switch | > ------------------------------------------------------------- > ovn-nbctl acl-add pg1 to-lport 100 'outport==@pg1&& ip4.src == $as2' > allow-related nfg1 > Diagram shows request traffic path. The response will follow a reverse path. > > Ingress pipeline sets the outport to p1 based on destination MAC lookup. The > packet enters the egress pipeline. There the to-lport ACL with NFG gets > evaluated and the NFG id gets committed to the CT label. Then the outport is > set to nfp2 and then the packet is injected back to ingress. When the same > packet comes out of nfp1, it gets forwarded to p1 the normal way. > From the response packet from p1, ingress pipeline gets the NFG id from CT > label and accordingly redirects it to nfp1. When it comes out of nfp2 it is > forwarded the normal way. > > 4.2.1 Request processing > ------------------------ > out_acl_eval: For to-lport ACLs with NFG, the existing rule's action has been > enhanced to set: > - reg8[21] = 1: to indicate that packet has matched a rule with NFG > - reg5[0..7] = <NFG-unique-id> > - reg8[22] = <direction> (1: request, 0: response) > > table=6 (ls_out_acl_eval ), priority=1100 , match=(reg0[7] == 1 && > (outport==@pg1 && ip4.src == $as2)), action=(reg8[16] = 1; reg0[1] = 1; > reg8[21] = 1; reg8[22] = 1; reg5[0..7] = 1; next;) > table=6 (ls_out_acl_eval ), priority=1100 , match=(reg0[8] == 1 && > (outport==@pg1 && ip4.src == $as2)), action=(reg8[16] = 1; reg0[1] = 1; > reg8[21] = 1; reg8[22] = 1; reg5[0..7] = 1; next;) > > > > Out_stateful: Priority 110: set NFG id in CT label if reg8[21] is set. > > table=10(ls_out_stateful ), priority=110 , match=(reg0[1] == 1 && > reg0[13] == 0 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; > ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; > ct_label.network_function_group = 1; ct_label.network_function_group_id = > reg5[0..7]; }; next;) > table=10(ls_out_stateful ), priority=110 , match=(reg0[1] == 1 && > reg0[13] == 1 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; > ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; > ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; > ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 1; > ct_label.network_function_group_id = reg5[0..7]; }; next;) > table=10(ls_out_stateful ), priority=100 , match=(reg0[1] == 1 && > reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; > ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; > ct_label.network_function_group = 0; ct_label.network_function_group_id = 0; > }; next;) > table=10(ls_out_stateful ), priority=100 , match=(reg0[1] == 1 && > reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; > ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; > ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; > ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 0; > ct_label.network_function_group_id = 0; }; next;) > table=10(ls_out_stateful ), priority=0 , match=(1), action=(next;) > > out_network_function: A new stage that has priority 99 rules to redirect > packet by setting outport to the NF “outport” (or its child port) based on > the NFG id set by the prior ACL stage, and then injecting back to ingress. > Priority 100 rules ensure that when the packets are going to NF ports, they > are not redirected again. > Priority 1 rule: if reg8[21] is set, but the NF port (or child port) is not > present on this LS, drop packets. > > table=11(ls_out_network_function), priority=100 , match=(outport == > "nfp1"), action=(next;) > table=11(ls_out_network_function), priority=100 , match=(outport == > "nfp2"), action=(next;) > table=11(ls_out_network_function), priority=100 , match=(reg8[21] == 1 && > eth.mcast), action=(next;) > table=11(ls_out_network_function), priority=99 , match=(reg8[21] == 1 && > reg8[22] == 1 && reg5[0..7] == 123), action=(outport = "nfp2"; reg8[23] = 1; > next(pipeline=ingress, table=29);) > table=11(ls_out_network_function), priority=1 , match=(reg8[21] == 1), > action=(drop;) > table=11(ls_out_network_function), priority=0 , match=(1), action=(next;) > > > in_l2_lkup: As described earlier, the priority 100 rule will forward these > packets. > > Then the same packet comes out from nfp1 and goes through the ingress > processing where the outport gets set to p1. The egress pipeline out_pre_acl > priority 110 rule described earlier, matches against inport as nfp1 and > directly jumps to the stage after out_network_function. Thus the packet is > not redirected again. > > 4.2.2 Response processing > ------------------------- > in_acl_eval: High priority rules that allow response and related packets to > go through have been enhanced to also copy CT label NFG bit into reg8[21]. > > table=8(ls_in_acl_eval), priority=65532, match=(!ct.est && ct.rel && > !ct.new && !ct.inv && ct_mark.blocked == 0), action=(reg0[17] = 1; reg8[21] = > ct_label.network_function_group; reg8[16] = 1; ct_commit_nat;) > table=8 (ls_in_acl_eval), priority=65532, match=(ct.est && !ct.rel && > !ct.new && !ct.inv && ct.rpl && ct_mark.blocked == 0), action=(reg0[9] = 0; > reg0[10] = 0; reg0[17] = 1; reg8[21] = ct_label.network_function_group; > reg8[16] = 1; next;) > > in_network_function: Priority 99 rule matches on the nfg_id in ct_label and > sets the outport to the NF “inport”. > Priority 100 rule forwards all packets to NF ports to the next table. > table=22(ls_in_network_function), priority=99 , match=(reg8[21] == 1 && > reg8[22] == 0 && ct_label.network_function_group_id == 123), action=(outport > = "nfp1"; output;) > > > 5. Cross-host Traffic for VLAN Network > ====================================== > For overlay subnets, all cross-host traffic exchanges are tunneled. In the > case of VLAN subnets, there needs to be special handling to selectively > tunnel only the traffic to or from the NF ports. > Take the example of a from-lport ACL. Packets from p1 to p2, gets redirected > to nfp1 in host1. If this packet is simply sent out from host1, the physical > network will directly forward it to host2 where VM2 is. So, we need to tunnel > the redirected packets from host1 to host3. Now, once the packets come out of > nfp2, if host3 sends the packets out, the physical network would learn p1's > MAC coming from host3. So, these packets need to be tunneled back to host1. > From there the packet would be forwarded to VM2 via the physical network. > > ----- ----- -------- > | VM2 | | VM1 | | NF VM | > ----- ----- -------- > / \ | / \ | > | (7) | (1) (3)| |(4) > | \ / | \ / > -------------- -------------- (2) --------------- > | p2 | (6) | p1 |______\ | nfp1 nfp2 | > | |/____ | |------/ | | > | host2 |\ | host1 |/______ | host3 | > | | | |\------ | | > -------------- -------------- (5) -------------- > > The above figure shows the request packet path for a from-lport ACL. Response > would follow the same path in reverse direction. > > To achieve this, the following would be done: > > On host where the ACL port group members are present (host1) > —----------------------------------------------------------- > REMOTE_OUTPUT (table 42): > Currently, it tunnels traffic destined to all non-local overlay ports to > their associated hosts. The same rule is now also added for traffic to > non-local NF ports. Thus the packets from p1 get tunneled to host 3. > > On host with NF (host3) forward packet to nfp1 > —---------------------------------------------- > Upon reaching host3, the following rules come into play: > PHY_TO_LOG (table 0): > Ppriority 100: Existing rule - for each geneve tunnel interface on the > chassis, copies info from header to inport, outport, metadata registers. Now > the same rule also stores the tun intf id in a register (reg5[16..31]). > > CHECK_LOOPBACK (table 44) > This table has a rule that clears all the registers. The change is to skip > the clearing of reg5[16..31]. > > Logical egress pipeline: > > ls_out_stateful priority 120: If the outport is an NF port, copy reg5[16..31] > (table0 had set it) to ct_label.tun_if_id.) > > table=10(ls_out_stateful ), priority=120 , match=(outport == "nfp1" && > reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; > ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; > ct_label.tun_if_id = reg5[16..31]; }; next;) > table=10(ls_out_stateful ), priority=120 , match=(outport == "nfp1" && > reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; > ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; > ct_mark.obs_stage = reg8[19..20]; ct_mark.obs_collector_id = reg8[8..15]; > ct_label.obs_point_id = reg9; ct_label.tun_if_id = reg5[16..31]; }; next;) > > The above sequence of flows ensure that if a packet is received via tunnel on > host3, with outport as nfp1, the tunnel interface id is committed to the ct > entry in nfp1's zone. > > On host with NF (host3) tunnel packets from nfp2 back to host1 > —-------------------------------------------------------------- > When the same packet comes out of nfp2 on host3: > > LOCAL_OUTPUT (table 43) > When the packet comes out of the other NF port (nfp2), following two rules > send it back to the host that it originally came from: > > Priority 110: For each NF port local to this host, following rule processes > the > packet through CT of linked port (for nfp2, it is nfp1): > match: inport==nfp2 && RECIRC_BIT==0 > action: RECIRC_BIT = 1, ct(zone=nfp1’s zone, table=LOCAL), resubmit to > table 43 > > Priority 109: For each {tunnel_id, nf port} on this host, if the tun_if_id in > ct_label matches the tunnel_id, send the recirculated packet using tunnetl_id: > match: inport==nfp1 && RECIRC_BIT==1 && ct_label.tun_if_id==<tun-id> > action: tunnel packet using tun-id > > If p1 and nfp1 happen to be on the same host, the tun_if_id would not be set > and thus none of the priority 109 rules would match. It would be forwarded > the usual way matching the existing priority 100 rules in LOCAL_TABLE. > > Special handling of the case where NF responds back on nfp1, instead of > forwarding packet out of nfp2: > For example, a SYN packet from p1 got redirected to nfp1. Then the NF, which > is a firewall VM, drops the SYN and sends RST back on port nfp1. In this > case, looking up in the linked port (nfp2) ct zone will not give anything. > The following rule uses ct.inv to identify such scenarios and uses nfp1’s CT > zone to send the packet back. To achieve this, following 2 rules are > installed: > > in_network_function: > Priority 100 rule that allows packets incoming from NF type ports, is > enhanced with additional action to store the tun_if_id from ct_label into > reg5[16..31]. > table=22(ls_in_network_function), priority=100 , match=(inport == "nfp1"), > action=(reg5[16..31] = ct_label.tun_if_id; next;) > > LOCAL_OUTPUT (table 43) > Priority 110 rule: for recirculated packets, if ct (of the linked port) is > invalid, use the tun id from reg5[16..31] to tunnel the packet back to host1 > (as CT zone info has been overwritten in the above 110 priority rule in table > 42). > match: inport==nf1 && RECIRC_BIT==1 && ct.inv && > MFF_LOG_TUN_OFPORT==<tun-id> > action: tunnel packet using tun-id > > > 6. NF insertion across logical switches > ======================================= > If the port-group where the ACL is being applied has members across multiple > logical switches, there needs to be a NF port pair on each of these switches. > The NF VM will have only one inport and one outport. The CMS is expected to > create child ports linked to these ports on each logical switch where > port-group members are present. > The network-function entity would be configured with the parent ports only. > When CMS creates the child ports, it does not need to change any of the NF, > NFG or ACL config tables. > When northd configures the redirection rules for a specific LS, it will use > the parent or child port depending on what it finds on that LS. > -------- > | NF VM | > -------- > | | > ----- | | ----- > | VM1 | nfp1 nfp2 | VM2 | > ---- - | | -------------- ----- | | > | | | | SVC LS | | | | > p1| nfp1_ch1 nfp2_ch1 -------------- p3| nfp1_ch2 > nfp2_ch2 > -------------------- > -------------------- > | LS1 | | LS2 > | > -------------------- > -------------------- > > In this example, the CMS created the parent ports for the NF VM on LS named > SVC LS. The ports are nfp1 and nfp2. The CMS configures the NF using these > ports: > ovn-nbctl network-function-add nf1 nfp1 nfp2 > ovn-nbctl network-function-group-add nfg1 nf1 > ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' > allow-related nfg1 > > The port group to which the ACL is applied is pg1 and pg1 has two ports: p1 > on LS1 and p3 on LS2. > The CMS needs to create child ports for the NF ports on LS1 and LS2. On LS1: > nfp1_ch1 and nfp2_ch1. On LS2: nfp1_ch2 and nfp2_ch2 > > When northd creates rules on LS1, it would use nfp1_ch1 and nfp2_ch1. > > table=22(ls_in_network_function), priority=100 , match=(inport == > "nfp2_ch1"), action=(reg5[16..31] = ct_label.tun_if_id; next;) > table=22(ls_in_network_function), priority=99 , match=(reg8[21] == 1 && > reg8[22] == 1 && reg5[0..7] == 1), action=(outport = "nfp1_ch1"; output;) > > When northd is creating rules on LS2, it would use nfp1_ch2 and nfp2_ch2. > table=22(ls_in_network_function), priority=100 , match=(inport == > "nfp2_ch2"), action=(reg5[16..31] = ct_label.tun_if_id; next;) > table=22(ls_in_network_function), priority=99 , match=(reg8[21] == 1 && > reg8[22] == 1 && reg5[0..7] == 1), action=(outport = "nfp1_ch2"; output;) > >
Hi Sragdhara, Sorry for the late reviews on this patch series. I haven't looked into the series yet. I plan to take a look this week. Is it possible to rebase and submit v3 ? As it has conflicts. Thanks Numan > 7. Health Monitoring > ==================== > The LB health monitoring functionality has been extended to support NFs. > Network_Function_Group has a list of Network_Functions, each of which has a > reference to network_Function_Health_Check that has the monitoring config. > There is a corresponding SB service_monitor maintaining the online/offline > status. When status changes, northd picks one of the “online” NFs and sets it > in the network_function_active field of NFG. The redirection rule in LS uses > the ports from this NF. > > Ovn-controller performs the health monitoring by sending ICMP echo request > with source IP and MAC from NB global options “svc_monitor_ip4” and > “svc_monitor_mac”, and destination IP and MAC from new NB global options > “svc_monitor_ip4_dst” and “svc_monitor_mac_dst”. The sequence number and id > are randomly generated and stored in service_mon. The NF VM forwards the same > packet out of the other port. When it comes out, ovn-controller matches the > sequence number and id with stored values and marks online if matched. > > V1: > - First patch. > > V2: > - Rebased code. > - Added "mode" field in Network_function_group table, with only allowed > value as "inline". This is for future expansion to include "mirror" mode. > - Added a flow in the in_network_function and out_network_function table to > skip redirection of multicast traffic. > > Sragdhara Datta Chaudhuri (5): > ovn-nb: Network Function insertion OVN-NB schema changes > ovn-nbctl: Network Function insertion commands. > northd, tests: Network Function insertion logical flow programming. > controller, tests: Network Function insertion tunneling of cross-host > VLAN traffic. > northd, controller: Network Function Health monitoring. > > controller/physical.c | 249 ++++++++++- > controller/pinctrl.c | 252 +++++++++-- > include/ovn/logical-fields.h | 16 +- > lib/logical-fields.c | 26 ++ > lib/ovn-util.h | 2 +- > northd/en-global-config.c | 75 ++++ > northd/en-global-config.h | 12 +- > northd/en-multicast.c | 2 +- > northd/en-northd.c | 8 + > northd/en-sync-sb.c | 16 +- > northd/inc-proc-northd.c | 6 +- > northd/northd.c | 789 +++++++++++++++++++++++++++++++++-- > northd/northd.h | 39 +- > ovn-nb.ovsschema | 64 ++- > ovn-nb.xml | 123 ++++++ > ovn-sb.ovsschema | 12 +- > ovn-sb.xml | 22 +- > tests/ovn-controller.at | 6 +- > tests/ovn-nbctl.at | 83 ++++ > tests/ovn-northd.at | 508 ++++++++++++++++------ > tests/ovn.at | 137 ++++++ > utilities/ovn-nbctl.c | 533 ++++++++++++++++++++++- > 22 files changed, 2747 insertions(+), 233 deletions(-) > > -- > 2.39.3 > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev