RFC: NETWORK FUNCTION INSERTION IN OVN 1. Introduction ================ The objective is to insert a Network Function (NF) in the path of outbound/inbound traffic from/to a port-group. The use case is to integrate a 3rd party service in the path of traffic. An example of such a service would be layer7 firewall. The NF VM will be like a bump in the wire and should not modify the packet, i.e. the IP header, the MAC addresses, VLAN tag, sequence numbers remain unchanged.
Here are some of the highlights: - A new entity network-function (NF) has been introduced. It contains a pair of LSPs. The CMS would designate one as “inport” and the other as “outport”. - For high-availability, a network function group (NFG) entity consists of a group of NFs. Only one NF in a NFG has an active role based on health monitoring. - ACL would accept NFG as a parameter and traffic matching the ACL would be redirected to the associated active NF’s port. NFG is accepted for stateful allow action only. - The ACL’s port-group is the point of reference when defining the role of the NF ports. The “inport” is the port closer to the port-group and “outport” is the one away from it. For from-lport ACLs, the request packets would be redirected to the NF “inport” and for to-lport ACLs, the request packets would be redirected to NF “outport”. When the same packet comes out of the other NF port, it gets simply forwarded. - Statefulness will be maintained, i.e. the response traffic will also go through the same pair of NF ports but in reverse order. - For the NF ports we need to disable port security check, fdb learning and multicast/broadcast forwarding. - Health monitoring involves ovn-controller periodically injecting ICMP probe packets into the NF inport and monitor the same packet coming out of the NF outport. - If the traffic redirection involves cross-host traffic (e.g. for a from-lport ACL, if the source VM and NF VM are on different hosts), packets would be tunneled to and from the NF VM's host. - If the port-group to which the ACL is being applied has members spread across multiple LSs, CMS needs to create child ports for the NF ports on each of these LSs. The redirection rules in each LS will use the child ports on that LS. 2. NB tables ============= New NB tables —------------ Network_Function: Each row contains {inport, outport, health_check} Network_Function_Group: Each row contains a list of Network_Function entities. It also contains a unique id (between 1 and 255, generated by northd) and a reference to the current active NF. Network_Function_Health_Check: Each row contains configuration for probes in options field: {interval, timeout, success_count, failure_count} "Network_Function_Health_Check": { "columns": { "name": {"type": "string"}, "options": { "type": {"key": "string", "value": "string", "min": 0, "max": "unlimited"}}, "external_ids": { "type": {"key": "string", "value": "string", "min": 0, "max": "unlimited"}}}, "isRoot": true}, "Network_Function": { "columns": { "name": {"type": "string"}, "outport": {"type": {"key": {"type": "uuid", "refTable": "Logical_Switch_Port", "refType": "strong"}, "min": 1, "max": 1}}, "inport": {"type": {"key": {"type": "uuid", "refTable": "Logical_Switch_Port", "refType": "strong"}, "min": 1, "max": 1}}, "health_check": {"type": { "key": {"type": "uuid", "refTable": "Network_Function_Health_Check", "refType": "strong"}, "min": 0, "max": 1}}, "external_ids": { "type": {"key": "string", "value": "string", "min": 0, "max": "unlimited"}}}, "isRoot": true}, "Network_Function_Group": { "columns": { "name": {"type": "string"}, "network_function": {"type": {"key": {"type": "uuid", "refTable": "Network_Function", "refType": "strong"}, "min": 0, "max": "unlimited"}}, "mode": {"type": {"key": {"type": "string", "enum": ["set", ["inline"]]}}}, "network_function_active": {"type": {"key": {"type": "uuid", "refTable": "Network_Function", "refType": "strong"}, "min": 0, "max": 1}}, "id": { "type": {"key": {"type": "integer", "minInteger": 0, "maxInteger": 255}}}, "external_ids": { "type": {"key": "string", "value": "string", "min": 0, "max": "unlimited"}}}, "isRoot": true}, Modified NB table —---------------- ACL: The ACL entity would have a new optional field that is a reference to a Network_Function_Group entity. This field can be present only for stateful allow ACLs. "ACL": { "columns": { "network_function_group": {"type": {"key": {"type": "uuid", "refTable": "Network_Function_Group", "refType": "strong"}, "min": 0, "max": 1}}, New options for Logical_Switch_Port —---------------------------------- receive_multicast=<boolean>: Default true. If set to false, LS will not forward broadcast/multicast traffic to this port. This is to prevent looping of such packets. lsp_learn_fdb=<boolean>: Default true. If set to false, fdb learning will be skipped for packets coming out of this port. Redirected packets from the NF port would be carrying the originating VM’s MAC in source, and so learning should not happen. CMS needs to set both the above options to false for NF ports, in addition to disabling port security. network-function-linked-port=<lsp-name>: Each NF port needs to have this set to the other NF port of the pair. New NB_global options —-------------------- svc_monitor_mac_dst: destination MAC of probe packets (svc_monitor_mac is already there and will be used as source MAC) svc_monitor_ip4: source IP of probe packets svc_monitor_ip4_dst: destination IP of probe packets Sample configuration —------------------- ovn-nbctl ls-add ls1 ovn-nbctl lsp-add ls1 nfp1 ovn-nbctl lsp-add ls1 nfp2 ovn-nbctl set logical_switch_port nfp1 options:receive_multicast=false options:lsp_learn_fdb=false options:network-function-linked-port=nfp2 ovn-nbctl set logical_switch_port nfp2 options:receive_multicast=false options:lsp_learn_fdb=false options:network-function-linked-port=nfp1 ovn-nbctl network-function-add nf1 nfp1 nfp2 ovn-nbctl network-function-group-add nfg1 nf1 ovn-nbctl lsp-add ls1 p1 -- lsp-set-addresses p1 "50:6b:8d:3e:ed:c4 10.1.1.4" ovn-nbctl pg-add pg1 p1 ovn-nbctl create Address_Set name=as1 addresses=10.1.1.4 ovn-nbctl lsp-add ls1 p2 -- lsp-set-addresses p2 "50:6b:8d:3e:ed:c5 10.1.1.5" ovn-nbctl create Address_Set name=as2 addresses=10.1.1.5 ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' allow-related nfg1 ovn-nbctl acl-add pg1 to-lport 100 'outport==@pg1 && ip4.src == $as2' allow-related nfg1 3. SB tables ============ Service_Monitor: This is currently used by Load balancer. New fields are: “type” - to indicate LB or NF, “mac” - the destination MAC address for monitor packets, “logical_input_port” - the LSP to which the probe packet would be sent. Also, “icmp” has been added as a protocol type, used only for NF. "Service_Monitor": { "columns": { "type": {"type": {"key": { "type": "string", "enum": ["set", ["load-balancer", "network-function"]]}}}, "mac": {"type": "string"}, "protocol": { "type": {"key": {"type": "string", "enum": ["set", ["tcp", "udp", "icmp"]]}, "min": 0, "max": 1}}, "logical_input_port": {"type": "string"}, northd would create one Service_Monitor entity for each NF. The logical_input_port and logical_port would be populated from the NF inport and outport fields respectively. The probe packets would be injected into the logical_input_port and would be monitored out of logical_port. 4. Logical Flows ================ Logical Switch ingress pipeline: - in_network_function added after in_stateful. - Modifications to in_acl_eval, in_stateful and in_l2_lookup. Logical Switch egress pipeline: - out_network_function added after out_stateful. - Modifications to out_pre_acl, out_acl_eval and out_stateful. 4.1 from-lport ACL ------------------ The diagram shows the request path for packets from VM1 port p1, which is a member of the pg to which ACL is applied. The response would follow the reverse path, i.e. packet would be redirected to nfp2 and come out of nfp1 and be forwarded to p1. Also, p2 does not need to be on the same LS. Only the p1, nfp1, nfp2 are on the same LS. ----- ------- ----- | VM1 | | NF VM | | VM2 | ----- ------- ----- | /\ | / \ | | | | \ / | \ / | ------------------------------------------------------------ | p1 nfp1 nfp2 p2 | | | | Logical Switch | ------------------------------------------------------------- pg1: [p1] as2: [p2-ip] ovn-nbctl network-function-add nf1 nfp1 nfp2 ovn-nbctl network-function-group-add nfg1 nf1 ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' allow-related nfg1 Say, the unique id northd assigned to this NFG, is 123 The request packets from p1 matching a from-lport ACL with NFG, are redirected to nfp1 and the NFG id is committed to the ct label in p1's zone. When the same packet comes out of nfp2 it gets forwarded the normal way. Response packets have destination as p1's MAC. Ingress processing sets the outport to p1 and the CT lookup in egress pipeline (in p1's ct zone) yields the NFG id and the packet injected back to ingress pipeline after setting the outport to nfp2. Below are the changes in detail. 4.1.1 Request processing ------------------------ in_acl_eval: For from-lport ACLs with NFG, the existing rule's action has been enhanced to set: - reg8[21] = 1: to indicate that packet has matched a rule with NFG - reg5[0..7] = <NFG-unique-id> - reg8[22] = <direction> (1: request, 0: response) table=8 (ls_in_acl_eval), priority=1200 , match=(reg0[7] == 1 && (inport==@pg1 && ip4.dst == $as2)), action=(reg8[16] = 1; reg0[1] = 1; reg8[21] = 1; reg8[22] = 1; reg5[0..7] = 123; next;) table=8 (ls_in_acl_eval), priority=1200 , match=(reg0[8] == 1 && (inport==@pg1 && ip4.dst == $as2)), action=(reg8[16] = 1; reg8[21] = 1; reg8[22] = 1; reg5[0..7] = 123; next;) in_stateful: Priority 110: set NFG id in CT label if reg8[21] is set. - bit 7 (ct_label.network_function_group): Set to 1 to indicate NF insertion. - bits 17 to 24 (ct_label.network_function_group_id): Stores the 8 bit NFG id table=21(ls_in_stateful ), priority=110 , match=(reg0[1] == 1 && reg0[13] == 0 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 1; ct_label.network_function_group_id = reg5[0..7]; }; next;) table=21(ls_in_stateful ), priority=110 , match=(reg0[1] == 1 && reg0[13] == 1 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 1; ct_label.network_function_group_id = reg5[0..7]; }; next;) table=21(ls_in_stateful ), priority=100 , match=(reg0[1] == 1 && reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 0; ct_label.network_function_group_id = 0; }; next;) table=21(ls_in_stateful ), priority=100 , match=(reg0[1] == 1 && reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 0; ct_label.network_function_group_id = 0; }; next;) table=21(ls_in_stateful ), priority=0 , match=(1), action=(next;) For non-NFG cases, the existing priority 100 rules will be hit. There additional action has been added to clear the NFG bits in ct label. in_network_function: A new stage with priority 99 rules to redirect packets by setting outport to the NF “inport” (or its child port) based on the NFG id set by the prior ACL stage. Priority 100 rules ensure that when the same packets come out of the NF ports, they are not redirected again (the setting of reg5 here relates to the cross-host packet tunneling and will be explained later). Priority 1 rule: if reg8[21] is set, but the NF port (or child port) is not present on this LS, drop packets. table=22(ls_in_network_function), priority=100 , match=(inport == "nfp1"), action=(reg5[16..31] = ct_label.tun_if_id; next;) table=22(ls_in_network_function), priority=100 , match=(inport == "nfp2"), action=(reg5[16..31] = ct_label.tun_if_id; next;) table=22(ls_in_network_function), priority=100 , match=(reg8[21] == 1 && eth.mcast), action=(next;) table=22(ls_in_network_function), priority=99 , match=(reg8[21] == 1 && reg8[22] == 1 && reg5[0..7] == 1), action=(outport = "nfp1"; output;) table=22(ls_in_network_function), priority=1 , match=(reg8[21] == 1), action=(drop;) table=22(ls_in_network_function), priority=0 , match=(1), action=(next;) 4.1.2 Response processing ------------------------- out_acl_eval: High priority rules that allow response and related packets to go through have been enhanced to also copy CT label NFG bit into reg8[21]. table=6(ls_out_acl_eval), priority=65532, match=(!ct.est && ct.rel && !ct.new && !ct.inv && ct_mark.blocked == 0), action=(reg8[21] = ct_label.network_function_group; reg8[16] = 1; ct_commit_nat;) table=6(ls_out_acl_eval), priority=65532, match=(ct.est && !ct.rel && !ct.new && !ct.inv && ct.rpl && ct_mark.blocked == 0), action=(reg8[21] = ct_label.network_function_group; reg8[16] = 1; next;) out_network_function: Priority 99 rule matches on the nfg_id in ct_label and sets the outport to the NF “outport”. It also sets reg8[23]=1 and injects the packet to ingress pipeline (in_l2_lookup). Priority 100 rule forwards all packets to NF ports to the next table. table=11 (ls_out_network_function), priority=100 , match=(outport == "nfp1"), action=(next;) table=11 (ls_out_network_function), priority=100 , match=(outport == "nfp2"), action=(next;) table=11(ls_out_network_function), priority=100 , match=(reg8[21] == 1 && eth.mcast), action=(next;) table=11 (ls_out_network_function), priority=99 , match=(reg8[21] == 1 && reg8[22] == 0 && ct_label.network_function_group_id == 123), action=(outport = "nfp2"; reg8[23] = 1; next(pipeline=ingress, table=29);) table=11 (ls_out_network_function), priority=1 , match=(reg8[21] == 1), action=(drop;) table=11 (ls_out_network_function), priority=0 , match=(1), action=(next;) in_l2_lkup: if reg8[23] == 1 (packet has come back from egress), simply forward such packets as outport is already set. table=29(ls_in_l2_lkup), priority=100 , match=(reg8[23] == 1), action=(output;) The above set of rules ensure that the response packet is sent to nfp2. When the same packet comes out of nfp1, the ingress pipeline would set the outport to p1 and it enters the egress pipeline. out_pre_acl: If the packet is coming from the NF inport, skip the egress pipeline upto the out_nf stage, as the packet has already gone through it and we don't want the same packet to be processed by CT twice. table=2 (ls_out_pre_acl ), priority=110 , match=(inport == "nfp1"), action=(next(pipeline=egress, table=12);) 4.2 to-lport ACL ---------------- ----- -------- ----- | VM1 | | NF VM | | VM2 | ----- -------- ----- / \ | / \ | | | | | | \ / | \ / ------------------------------------------------------------- | p1 nfp1 nfp2 p2 | | | | Logical Switch | ------------------------------------------------------------- ovn-nbctl acl-add pg1 to-lport 100 'outport==@pg1&& ip4.src == $as2' allow-related nfg1 Diagram shows request traffic path. The response will follow a reverse path. Ingress pipeline sets the outport to p1 based on destination MAC lookup. The packet enters the egress pipeline. There the to-lport ACL with NFG gets evaluated and the NFG id gets committed to the CT label. Then the outport is set to nfp2 and then the packet is injected back to ingress. When the same packet comes out of nfp1, it gets forwarded to p1 the normal way. From the response packet from p1, ingress pipeline gets the NFG id from CT label and accordingly redirects it to nfp1. When it comes out of nfp2 it is forwarded the normal way. 4.2.1 Request processing ------------------------ out_acl_eval: For to-lport ACLs with NFG, the existing rule's action has been enhanced to set: - reg8[21] = 1: to indicate that packet has matched a rule with NFG - reg5[0..7] = <NFG-unique-id> - reg8[22] = <direction> (1: request, 0: response) table=6 (ls_out_acl_eval ), priority=1100 , match=(reg0[7] == 1 && (outport==@pg1 && ip4.src == $as2)), action=(reg8[16] = 1; reg0[1] = 1; reg8[21] = 1; reg8[22] = 1; reg5[0..7] = 1; next;) table=6 (ls_out_acl_eval ), priority=1100 , match=(reg0[8] == 1 && (outport==@pg1 && ip4.src == $as2)), action=(reg8[16] = 1; reg0[1] = 1; reg8[21] = 1; reg8[22] = 1; reg5[0..7] = 1; next;) Out_stateful: Priority 110: set NFG id in CT label if reg8[21] is set. table=10(ls_out_stateful ), priority=110 , match=(reg0[1] == 1 && reg0[13] == 0 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 1; ct_label.network_function_group_id = reg5[0..7]; }; next;) table=10(ls_out_stateful ), priority=110 , match=(reg0[1] == 1 && reg0[13] == 1 && reg8[21] == 1), action=(ct_commit { ct_mark.blocked = 0; ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 1; ct_label.network_function_group_id = reg5[0..7]; }; next;) table=10(ls_out_stateful ), priority=100 , match=(reg0[1] == 1 && reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 0; ct_label.network_function_group_id = 0; }; next;) table=10(ls_out_stateful ), priority=100 , match=(reg0[1] == 1 && reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; ct_mark.allow_established = reg0[20]; ct_mark.obs_stage = reg8[19..20]; ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; ct_label.acl_id = reg2[16..31]; ct_label.network_function_group = 0; ct_label.network_function_group_id = 0; }; next;) table=10(ls_out_stateful ), priority=0 , match=(1), action=(next;) out_network_function: A new stage that has priority 99 rules to redirect packet by setting outport to the NF “outport” (or its child port) based on the NFG id set by the prior ACL stage, and then injecting back to ingress. Priority 100 rules ensure that when the packets are going to NF ports, they are not redirected again. Priority 1 rule: if reg8[21] is set, but the NF port (or child port) is not present on this LS, drop packets. table=11(ls_out_network_function), priority=100 , match=(outport == "nfp1"), action=(next;) table=11(ls_out_network_function), priority=100 , match=(outport == "nfp2"), action=(next;) table=11(ls_out_network_function), priority=100 , match=(reg8[21] == 1 && eth.mcast), action=(next;) table=11(ls_out_network_function), priority=99 , match=(reg8[21] == 1 && reg8[22] == 1 && reg5[0..7] == 123), action=(outport = "nfp2"; reg8[23] = 1; next(pipeline=ingress, table=29);) table=11(ls_out_network_function), priority=1 , match=(reg8[21] == 1), action=(drop;) table=11(ls_out_network_function), priority=0 , match=(1), action=(next;) in_l2_lkup: As described earlier, the priority 100 rule will forward these packets. Then the same packet comes out from nfp1 and goes through the ingress processing where the outport gets set to p1. The egress pipeline out_pre_acl priority 110 rule described earlier, matches against inport as nfp1 and directly jumps to the stage after out_network_function. Thus the packet is not redirected again. 4.2.2 Response processing ------------------------- in_acl_eval: High priority rules that allow response and related packets to go through have been enhanced to also copy CT label NFG bit into reg8[21]. table=8(ls_in_acl_eval), priority=65532, match=(!ct.est && ct.rel && !ct.new && !ct.inv && ct_mark.blocked == 0), action=(reg0[17] = 1; reg8[21] = ct_label.network_function_group; reg8[16] = 1; ct_commit_nat;) table=8 (ls_in_acl_eval), priority=65532, match=(ct.est && !ct.rel && !ct.new && !ct.inv && ct.rpl && ct_mark.blocked == 0), action=(reg0[9] = 0; reg0[10] = 0; reg0[17] = 1; reg8[21] = ct_label.network_function_group; reg8[16] = 1; next;) in_network_function: Priority 99 rule matches on the nfg_id in ct_label and sets the outport to the NF “inport”. Priority 100 rule forwards all packets to NF ports to the next table. table=22(ls_in_network_function), priority=99 , match=(reg8[21] == 1 && reg8[22] == 0 && ct_label.network_function_group_id == 123), action=(outport = "nfp1"; output;) 5. Cross-host Traffic for VLAN Network ====================================== For overlay subnets, all cross-host traffic exchanges are tunneled. In the case of VLAN subnets, there needs to be special handling to selectively tunnel only the traffic to or from the NF ports. Take the example of a from-lport ACL. Packets from p1 to p2, gets redirected to nfp1 in host1. If this packet is simply sent out from host1, the physical network will directly forward it to host2 where VM2 is. So, we need to tunnel the redirected packets from host1 to host3. Now, once the packets come out of nfp2, if host3 sends the packets out, the physical network would learn p1's MAC coming from host3. So, these packets need to be tunneled back to host1. From there the packet would be forwarded to VM2 via the physical network. ----- ----- -------- | VM2 | | VM1 | | NF VM | ----- ----- -------- / \ | / \ | | (7) | (1) (3)| |(4) | \ / | \ / -------------- -------------- (2) --------------- | p2 | (6) | p1 |______\ | nfp1 nfp2 | | |/____ | |------/ | | | host2 |\ | host1 |/______ | host3 | | | | |\------ | | -------------- -------------- (5) -------------- The above figure shows the request packet path for a from-lport ACL. Response would follow the same path in reverse direction. To achieve this, the following would be done: On host where the ACL port group members are present (host1) —----------------------------------------------------------- REMOTE_OUTPUT (table 42): Currently, it tunnels traffic destined to all non-local overlay ports to their associated hosts. The same rule is now also added for traffic to non-local NF ports. Thus the packets from p1 get tunneled to host 3. On host with NF (host3) forward packet to nfp1 —---------------------------------------------- Upon reaching host3, the following rules come into play: PHY_TO_LOG (table 0): Ppriority 100: Existing rule - for each geneve tunnel interface on the chassis, copies info from header to inport, outport, metadata registers. Now the same rule also stores the tun intf id in a register (reg5[16..31]). CHECK_LOOPBACK (table 44) This table has a rule that clears all the registers. The change is to skip the clearing of reg5[16..31]. Logical egress pipeline: ls_out_stateful priority 120: If the outport is an NF port, copy reg5[16..31] (table0 had set it) to ct_label.tun_if_id.) table=10(ls_out_stateful ), priority=120 , match=(outport == "nfp1" && reg0[13] == 0), action=(ct_commit { ct_mark.blocked = 0; ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; ct_label.tun_if_id = reg5[16..31]; }; next;) table=10(ls_out_stateful ), priority=120 , match=(outport == "nfp1" && reg0[13] == 1), action=(ct_commit { ct_mark.blocked = 0; ct_mark.allow_established = reg0[20]; ct_label.acl_id = reg2[16..31]; ct_mark.obs_stage = reg8[19..20]; ct_mark.obs_collector_id = reg8[8..15]; ct_label.obs_point_id = reg9; ct_label.tun_if_id = reg5[16..31]; }; next;) The above sequence of flows ensure that if a packet is received via tunnel on host3, with outport as nfp1, the tunnel interface id is committed to the ct entry in nfp1's zone. On host with NF (host3) tunnel packets from nfp2 back to host1 —-------------------------------------------------------------- When the same packet comes out of nfp2 on host3: LOCAL_OUTPUT (table 43) When the packet comes out of the other NF port (nfp2), following two rules send it back to the host that it originally came from: Priority 110: For each NF port local to this host, following rule processes the packet through CT of linked port (for nfp2, it is nfp1): match: inport==nfp2 && RECIRC_BIT==0 action: RECIRC_BIT = 1, ct(zone=nfp1’s zone, table=LOCAL), resubmit to table 43 Priority 109: For each {tunnel_id, nf port} on this host, if the tun_if_id in ct_label matches the tunnel_id, send the recirculated packet using tunnetl_id: match: inport==nfp1 && RECIRC_BIT==1 && ct_label.tun_if_id==<tun-id> action: tunnel packet using tun-id If p1 and nfp1 happen to be on the same host, the tun_if_id would not be set and thus none of the priority 109 rules would match. It would be forwarded the usual way matching the existing priority 100 rules in LOCAL_TABLE. Special handling of the case where NF responds back on nfp1, instead of forwarding packet out of nfp2: For example, a SYN packet from p1 got redirected to nfp1. Then the NF, which is a firewall VM, drops the SYN and sends RST back on port nfp1. In this case, looking up in the linked port (nfp2) ct zone will not give anything. The following rule uses ct.inv to identify such scenarios and uses nfp1’s CT zone to send the packet back. To achieve this, following 2 rules are installed: in_network_function: Priority 100 rule that allows packets incoming from NF type ports, is enhanced with additional action to store the tun_if_id from ct_label into reg5[16..31]. table=22(ls_in_network_function), priority=100 , match=(inport == "nfp1"), action=(reg5[16..31] = ct_label.tun_if_id; next;) LOCAL_OUTPUT (table 43) Priority 110 rule: for recirculated packets, if ct (of the linked port) is invalid, use the tun id from reg5[16..31] to tunnel the packet back to host1 (as CT zone info has been overwritten in the above 110 priority rule in table 42). match: inport==nf1 && RECIRC_BIT==1 && ct.inv && MFF_LOG_TUN_OFPORT==<tun-id> action: tunnel packet using tun-id 6. NF insertion across logical switches ======================================= If the port-group where the ACL is being applied has members across multiple logical switches, there needs to be a NF port pair on each of these switches. The NF VM will have only one inport and one outport. The CMS is expected to create child ports linked to these ports on each logical switch where port-group members are present. The network-function entity would be configured with the parent ports only. When CMS creates the child ports, it does not need to change any of the NF, NFG or ACL config tables. When northd configures the redirection rules for a specific LS, it will use the parent or child port depending on what it finds on that LS. -------- | NF VM | -------- | | ----- | | ----- | VM1 | nfp1 nfp2 | VM2 | ---- - | | -------------- ----- | | | | | | SVC LS | | | | p1| nfp1_ch1 nfp2_ch1 -------------- p3| nfp1_ch2 nfp2_ch2 -------------------- -------------------- | LS1 | | LS2 | -------------------- -------------------- In this example, the CMS created the parent ports for the NF VM on LS named SVC LS. The ports are nfp1 and nfp2. The CMS configures the NF using these ports: ovn-nbctl network-function-add nf1 nfp1 nfp2 ovn-nbctl network-function-group-add nfg1 nf1 ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2' allow-related nfg1 The port group to which the ACL is applied is pg1 and pg1 has two ports: p1 on LS1 and p3 on LS2. The CMS needs to create child ports for the NF ports on LS1 and LS2. On LS1: nfp1_ch1 and nfp2_ch1. On LS2: nfp1_ch2 and nfp2_ch2 When northd creates rules on LS1, it would use nfp1_ch1 and nfp2_ch1. table=22(ls_in_network_function), priority=100 , match=(inport == "nfp2_ch1"), action=(reg5[16..31] = ct_label.tun_if_id; next;) table=22(ls_in_network_function), priority=99 , match=(reg8[21] == 1 && reg8[22] == 1 && reg5[0..7] == 1), action=(outport = "nfp1_ch1"; output;) When northd is creating rules on LS2, it would use nfp1_ch2 and nfp2_ch2. table=22(ls_in_network_function), priority=100 , match=(inport == "nfp2_ch2"), action=(reg5[16..31] = ct_label.tun_if_id; next;) table=22(ls_in_network_function), priority=99 , match=(reg8[21] == 1 && reg8[22] == 1 && reg5[0..7] == 1), action=(outport = "nfp1_ch2"; output;) 7. Health Monitoring ==================== The LB health monitoring functionality has been extended to support NFs. Network_Function_Group has a list of Network_Functions, each of which has a reference to network_Function_Health_Check that has the monitoring config. There is a corresponding SB service_monitor maintaining the online/offline status. When status changes, northd picks one of the “online” NFs and sets it in the network_function_active field of NFG. The redirection rule in LS uses the ports from this NF. Ovn-controller performs the health monitoring by sending ICMP echo request with source IP and MAC from NB global options “svc_monitor_ip4” and “svc_monitor_mac”, and destination IP and MAC from new NB global options “svc_monitor_ip4_dst” and “svc_monitor_mac_dst”. The sequence number and id are randomly generated and stored in service_mon. The NF VM forwards the same packet out of the other port. When it comes out, ovn-controller matches the sequence number and id with stored values and marks online if matched. V1: - First patch. V2: - Rebased code. - Added "mode" field in Network_function_group table, with only allowed value as "inline". This is for future expansion to include "mirror" mode. - Added a flow in the in_network_function and out_network_function table to skip redirection of multicast traffic. Sragdhara Datta Chaudhuri (5): ovn-nb: Network Function insertion OVN-NB schema changes ovn-nbctl: Network Function insertion commands. northd, tests: Network Function insertion logical flow programming. controller, tests: Network Function insertion tunneling of cross-host VLAN traffic. northd, controller: Network Function Health monitoring. controller/physical.c | 249 ++++++++++- controller/pinctrl.c | 252 +++++++++-- include/ovn/logical-fields.h | 16 +- lib/logical-fields.c | 26 ++ lib/ovn-util.h | 2 +- northd/en-global-config.c | 75 ++++ northd/en-global-config.h | 12 +- northd/en-multicast.c | 2 +- northd/en-northd.c | 8 + northd/en-sync-sb.c | 16 +- northd/inc-proc-northd.c | 6 +- northd/northd.c | 789 +++++++++++++++++++++++++++++++++-- northd/northd.h | 39 +- ovn-nb.ovsschema | 64 ++- ovn-nb.xml | 123 ++++++ ovn-sb.ovsschema | 12 +- ovn-sb.xml | 22 +- tests/ovn-controller.at | 6 +- tests/ovn-nbctl.at | 83 ++++ tests/ovn-northd.at | 508 ++++++++++++++++------ tests/ovn.at | 137 ++++++ utilities/ovn-nbctl.c | 533 ++++++++++++++++++++++- 22 files changed, 2747 insertions(+), 233 deletions(-) -- 2.39.3 _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev