This is a proposal to add service insertion support in OVN.
(The terms “Service” and “Network Function” have been used interchangeably here)
1. Introduction
================
The objective is to insert a service in the path of outbound/inbound traffic
from/to a port-group (set of logical switch ports).
Here are some of the highlights (these are described in detail later):
- A new entity network-function (NF) would be introduced. It contains a pair of
LSPs. The CMS would designate one as inport and the other as outport.
- Insertion of a service in the traffic path requires that the request packet
(client to server) be redirected to the inport and once the same packet comes
out of the outport, forward it to the destination
- The service VM should not modify the packet, i.e. the IP header, the source
and destination MAC addresses and VLAN tag, remain unchanged. The service VM
will be like a bump in the wire.
- Statefulness is to be maintained, i.e. the response traffic also needs to go
through the same pair of LSPs but in reverse, i.e. it will enter through
outport and come out from inport port.
- For future load balancing support across multiple NFs, a new entity
network-function-group (NFG) will be added as well. It contains a list of NFs.
- The access-list would accept a NFG as parameter and traffic matching the ACL
would be redirected to the associated port. This parameter is accepted for
stateful allow action only.
- Service insertion would be supported for both from-lport and to-lport ACLs.
New stage would be introduced in the logical switch ingress and egress pipeline
for this.
- For the service port we need to disable port security check, fdb learning and
multicast/broadcast forwarding. There is already support for the first one. For
the remaining two, new boolean options would be added.
- A new entity network_function_health_check will be introduced. This would
store the configuration parameters related to health monitoring. A packet
passthrough type of health check would be supported where ovn-controller would
periodically inject probe packets into the inport and monitor packets coming
out of the outport.
- If the traffic redirection involves cross-host traffic (e.g. source VM and
service VM are on different hosts), packets would be tunneled to and from the
service VM's host. This needs change in the tables maintained by ovn-controller
locally on each host.
- If the port-group to which the ACL is being applied has members spread across
multiple LSs, CMS needs to create network function child ports on each of these
LSs. The redirection rules in each LS will use the child ports on that LS.
2. NB tables
=============
New NB tables
—------------
Network_Function: Each row contains {inport, outport, health_check}
Network_Function_Group: Each row contains a list of Network_Function entities.
Min and max length of this list is 1. It also contains an id (unique value
between 1 and 255).
Network_Function_Health_Check: Each row contains configuration for probes in
options field: {interval, timeout, success_count, failure_count}
"Network_Function_Health_Check": {
"columns": {
"options": {
"type": {"key": "string",
"value": "string",
"min": 0,
"max": "unlimited"}},
"external_ids": {
"type": {"key": "string", "value": "string",
"min": 0, "max": "unlimited"}}},
"isRoot": false},
"Network_Function": {
"columns": {
"name": {"type": "string"},
"outport": {"type": {"key": {"type": "uuid",
"refTable": "Logical_Switch_Port",
"refType": "strong"},
"min": 1, "max": 1}},
"inport": {"type": {"key": {"type": "uuid",
"refTable": "Logical_Switch_Port",
"refType": "strong"},
"min": 1, "max": 1}},
"health_check": {"type": {
"key": {"type": "uuid",
"refTable": "Network_Function_Health_Check",
"refType": "strong"},
"min": 0, "max": 1}},
"external_ids": {
"type": {"key": "string", "value": "string",
"min": 0, "max": "unlimited"}}},
"isRoot": true},
"Network_Function_Group": {
"columns": {
"name": {"type": "string"},
"network_function": {"type":
{"key": {"type": "uuid",
"refTable": "Network_Function",
"refType": "strong"},
"min": 0, "max": "unlimited"}},
"id": {
"type": {"key": {"type": "integer",
"minInteger": 0,
"maxInteger": 255}}},
"external_ids": {
"type": {"key": "string", "value": "string",
"min": 0, "max": "unlimited"}}},
"isRoot": true},
Modified NB table
—----------------
ACL: The ACL entity would have a new optional field that is a reference to a
Network_Function_Group entity. This field can be present only for stateful
allow ACLs.
"ACL": {
"columns": {
"network_function_group": {"type": {"key": {"type": "uuid",
"refTable": "Network_Function_Group",
"refType": "strong"},
"min": 0,
"max": 1}},
New options for Logical_Switch_Port
—----------------------------------
receive_multicast=<boolean>: If set to false, LS will not forward
broadcast/multicast traffic to this port. This is to prevent such packets being
forwarded to the service ports and thus resulting in flooding and looping.
learn_fdb=<boolean>: If set to false, fdb learning will be skipped for packets
coming out of this port. The packets are carrying as source MAC the originating
VM port’s mac address. This MAC should not be learnt from the service VM’s port.
CMS would need to set the above two options for the service ports. In addition
to this, the CMS needs to disable port security check for these ports as well.
This is an existing config in LSP. This is required to allow packets carrying
originating VM’s source MAC to be forwarded out of service VM’s port.
3. SB tables
============
Port_Binding:
Apart from the options (receive_multicast and learn_fdb) copied over from the
Logical_Switch_Port, there will be following two new options:
network_function=<boolean>: This is to indicate this port belongs to a network
function.
network_function_linked_port=<lsp-name>: This is the name of the LSP linked to
this port. The link here implies that they are bound by the NF entity. If NF
port pair is {sp1, sp2}, then for sp1, the linked port is sp2, and vice versa.
These two fields can either come from the Logical_Switch_Port, i.e. the CMS
sets these. Alternatively, northd can populate these based on the NF
configuration.
Service_Monitor:
This is currently used by load balancer. This has a logical_port field, in
addition we would add an logical_input_port field. The probe packets would be
injected into the logical_input_port and would be monitored out of logical_port.
northd would create one Service_Monitor entity for each NF. The
logical_input_port and logical_port would be populated from the NF inport and
outport fields respectively.
"Service_Monitor": {
"columns": {
"logical_input_port": {"type": "string"},
4. Logical Flows
================
Logical Switch ingress pipeline:
- A new table in_network_function would be added after the stateful stage.
- There would be modifications to in_acl_eval, in_stateful and in_l2_lookup
stages.
Logical Switch egress pipeline:
- A new table out_network_function would be added before the stateful stage.
- There would be modifications to out_acl_eval and out_stateful stages.
4.1 from-lport ACL
------------------
Diagram shows the request path for packets from VM1 port p1, which is a member
of the pg that is being protected. The response would follow the reverse path,
i.e. packet would be redirected to sp2 and come out of sp1 and be forwarded to
p1.
Also, p2 does not need to be on the same LS. Only the p1, sp1, sp2 are on the
same LS.
----- -------- -----
| VM1 | | SVC VM | | VM2 |
----- -------- -----
| /\ | / \
| | | |
\ / | \ / |
-------------------------------------------------------------
| p1 sp1 sp2 p2 |
| |
| Logical Switch |
| |
-------------------------------------------------------------
pg1: [p1] as2: [p2-ip]
ovn-nbctl network-function-add nf1 sp1 sp2
ovn-nbctl network-function-group-add nfg1 nf1 123
ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2'
allow-related nfg1
Unique id for the nfg is 123
4.1.1 Request processing
------------------------
The below set of rules ensure that the packets (from p1) that match a
from-lport acl with nfg, are redirected to ingress service port (sp1) and the
nfg id is committed to the ct label in p1's zone.
When the same packet comes out of the egress service port (sp2) it gets
forwarded the normal way.
in_acl_eval: Packets coming out of VM1(p1) and destined to VM2, will match the
ACL. Since the ACL has nfg, the existing flow's action will be enhanced to set
a set of registers:
- reg0[18] = 1: set to 1 to indicate that packet has matched a rule with nfg
- reg5[0..7] = the unique id of the network_function_group
- reg0[19] = 0: packet direction (0: request, 1: response)
table=8 (ls_in_acl_eval ), priority=1200 , match=(reg0[7] == 1 &&
(inport==@pg1 && ip4.dst == $as2)), action=(reg8[16] = 1; reg0[1] = 1; reg0[18]
= 1; reg0[19] = 0; reg5[0..7] = 123; next;)
table=8 (ls_in_acl_eval ), priority=1200 , match=(reg0[8] == 1 &&
(inport==@pg1 && ip4.dst == $as2)), action=(reg8[16] = 1; reg0[18] = 1;
reg0[19] = 0; reg5[0..7] = 123; next;)
in_stateful: For the service insertion case (reg0[18] == 1), include the nfg id
in the CT label. The CT entry committed to p1’s zone will have the nfg id 123.
- bit 4 (ct_label.network_function_group): Set to 1 to indicate service
insertion.
- bits 17 to 24 (ct_label.network_function_group_id): Stores the 8 bit nfg id
table=20(ls_in_stateful ), priority=101 , match=(reg0[1] == 1 &&
reg0[18] == 1), action=(ct_commit { ct_mark.blocked = 0;
ct_label.network_function_group = 1; ct_label.network_function_group_id =
reg5[0..7]; }; next;)
in_network_function: A new stage that would redirect the packet (by setting the
outport) to the service port based on the nfg-id.
A high priority rule would forward all packets coming out of egress service
port (sp2). A low priority rule would send the packet to ingress port (sp1).
table=21(ls_in_network_function ), priority=100 , inport == "sp2"),
action=(next;)
table=21(ls_in_network_function ), priority=99 , match=(reg0[18] == 1
&& reg0[19] == 0 && reg5[0..7] == 123), action=(outport = "sp1"; output;)
table=21(ls_in_network_function ), priority=0 , match=(1),
action=(next;)
4.1.2 Response processing
-------------------------
For the response packet (destination is p1's MAC), ingress processing would set
the outport to p1 and the packet would enter the egress pipeline where the ct
lookup would happen in p1's ct zone.
out_acl_eval: There is a high priorty rule that allows response packets to go
through. Now, an even higher priority rule will have the same match conditions,
plus an added match on whether the nfg bit is set in the ct_label. If so, the
action would copy the nfg_id from ct_label to reg5.
table=4 (ls_out_acl_eval ), priority=65533, match=(ct.est && !ct.rel &&
!ct.new && !ct.inv && ct.rpl && ct_mark.blocked == 0 &&
ct_label.network_function_group == 1), action=(reg5[0..7] =
ct_label.network_function_group_id; reg0[18] = 1; reg0[19] = 1; reg0[9] = 0;
reg0[10] = 0; reg0[17] = 1; reg8[16] = 1; next;)
out_network_function: A high priority rule forwards all packets from ingress
service port (sp1) to the next table. A low priority rule matches on the nfg_id
and sets the outport to the egress port (sp2) of the nf.
Then the packet is injected back to the ingress pipeline l2_lkup stage.
Reg0[20] is set to indicate that it is a packet being sent back from egress to
ingress.
table=6 (ls_out_network_function ), priority=100 , match=(inport ==
"sp1"), action=(next;)
table=6 (ls_out_network_function ), priority=99 , match=(reg0[18] ==
1 && reg0[19] == 1 && reg5[0..7] == 123), action=(outport = "sp2";reg0[20] = 1;
next(pipeline=ingress, table=28);)
in_l2_lkup: This stage has destination MAC based rules to set outport. Now a
higher priority would check if reg0[20] is set to 1, and in that case simply
output the packet.
table=28(ls_in_l2_lkup ), priority=100 , match=(reg0[20] == 1),
action=(output;)
The above set of rules ensure that the response packet is sent to sp2. When the
same packet comes out of sp1, again the ingress pipeline would set the otport
to p1 and egress would do CT lookup in p1's zone to get the nfg from label.
However, in this case the packet would hit the priority 100 rule in the
ls_out_network_function table, and be forwarded the normal way.
4.2 to-lport ACL
----------------
----- -------- -----
| VM1 | | SVC VM | | VM2 |
----- -------- -----
/ \ / \ | |
| | | |
| | \ / \ /
-------------------------------------------------------------
| p1 sp1 sp2 p2 |
| |
| Logical Switch |
| |
-------------------------------------------------------------
ovn-nbctl acl-add pg1 to-lport 100 'outport==@pg1&& ip4.src == $as2'
allow-related nfg1
Diagram shows request traffic path. The response will follow a reverse path.
4.2.1 Request processing
------------------------
Ingress pipeline sets the outport based on destination MAC lookup. The packet
would enter the egress pipeline with outport set to p1.
out_acl_eval: Packets from VM2 with outport p1, will match the ACL. For ACLs
with nfg, the existing rule action will be enhanced to set a set of registers:
- reg0[18] = 1: set to 1 to indicate that packet has matched a rule with nfg
- reg5[0..7] = the unique id of the network_function_group
- reg0[19] = 0: packet direction (0: request, 1: response)
table=4 (ls_out_acl_eval ), priority=1100 , match=(reg0[7] == 1 &&
(outport == @pg1 && ip4.src == $as2)), action=(reg8[16] = 1; reg0[1] = 1;
;reg0[18] = 1 reg0[19] = 0; reg5[0..7] = 123; next;)
table=4 (ls_out_acl_eval ), priority=1100 , match=(reg0[8] == 1 &&
(outport == @pg1 && ip4.src == $as2)), action=(reg8[16] = 1; reg0[18] = 1;
reg0[19] = 0; reg5[0..7] = 123; next;)
out_network_function: A high priority rule forwards packets from egress service
port to the next table. A low priority rule matches on the nfg_id and sets the
outport to the ingress port (sp1) of the nf.
Then the packet is injected back to the ingress pipeline l2_lkup stage.
Reg0[20] is set to indicate that it is a packet being sent back from egress to
ingress.
table=6 (ls_out_network_function ), priority=100 , match=(inport ==
"sp2"), action=(next;)
table=6 (ls_out_network_function ), priority=99 , match=(reg0[18] ==
1 && reg0[19] == 0 && reg5[0..7] == 123), action=(outport = "sp1"; reg0[20] =
1; next(pipeline=ingress, table=28);)
table=6 (ls_out_network_function ), priority=0 , match=(1),
action=(next;)
in_l2_lkup: This stage has destination MAC based rules to set outport. Now a
higher priority would check if reg0[20] is set to 1, and in that case simply
output the packet.
table=28(ls_in_l2_lkup ), priority=100 , match=(reg0[20] == 1),
action=(output;)
>From l2_lkup, the packet will enter the egress pipeline and reach the service
>VM via port sp1.
Then the same packet comes out from sp2 and goes through the ingress processing
where the outport gets set to p1. The egress pipeline performs an ACL match and
again the nfg id would be set. However, this time the priority 100 rule in
ls_out_network_function will be hit and the packet will move on to the next
stage and CT entry will get committed to p1's zone with the nfg id populated.
out_stateful: For the service insertion case (reg0[18] == 1), include the nfg
id in the CT label. Following are the CT label bits to be used:
bit 4 (ct_label.network_function_group): Set to 1 to indicate service nfg.
bits 17 to 24 (ct_label.network_function_id): Stores the 8 bit nfg id
table=9 (ls_out_stateful ), priority=101 , match=(reg0[1] == 1 &&
reg0[18] == 1), action=(ct_commit { ct_mark.blocked = 0;
ct_label.network_function_group = 1; ct_label.network_function_group_id =
reg5[0..7]; }; next;)
4.2.2 Response processing
-------------------------
Packet enters from p1 and ingress pipeline looks up the CT entry from p1’s CT
zone. This entry has the nfg id set.
in_acl_eval: There is a high priorty rule that allows response packets to go
through. Now, an even higher priority rule will have the same match conditions,
plus an added match on whether the nfg bit is set in the ct_label. If so, the
action would copy the nfg_id from ct_label to reg5.
table=8 (ls_in_acl_eval ), priority=65533, match=(ct.est && !ct.rel &&
!ct.new && !ct.inv && ct.rpl && ct_mark.blocked == 0 &&
ct_label.network_function_group == 1), action=(reg5[0..7] =
ct_label.network_function_group_id; reg0[18] = 1; reg0[19] = 1; reg0[9] = 0;
reg0[10] = 0; reg0[17] = 1; reg8[16] = 1; next;)
in_network_function: It matches the reg5 for nfg id and reg0[19] for direction.
For response direction, a priority 99 rule sets the outport to the egress
service port (sp2) and then outputs the packet from there.
There is also a higher priority rule to compare the inport with the service
ingress port (sp1) to send the packet to the next stage.
table=21(ls_in_network_function ), priority=100 , match=(inport ==
"sp1"), action=(next;)
table=21(ls_in_network_function ), priority=99 , match=(reg0[18] ==
1 && reg0[19] == 1 && reg5[0..7] == 123), action=(outport = "sp2"; output;)
5. Cross-host Traffic for VLAN Network
======================================
For overlay subnets, all cross-host traffic exchanges are tunneled. In the case
of VLAN subnets, there needs to be special handling to selectively tunnel only
the traffic to or from the service ports.
Take the example of a from-lport ACL. Packets from p1 to p2, got redirected to
sp1 in host1. If this packet is simply sent out from host1, the physical
network will directly forward it to host2 where VM2 is. So, we need to tunnel
the redirected packets from host1 to host3. Now, once the packets come out of
sp2, if host3 sends the packets out, the physical network would learn p1's MAC
coming from host3. So, these packets need to be tunneled back to host1. From
there the packet would be forwarded to VM2 via the physical network.
----- ----- --------
| VM2 | | VM1 | | SVC VM |
----- ----- --------
/ \ | / \ |
| (7) | (1) (3)| |(4)
| \ / | \ /
-------------- -------------- (2) ---------------
| p2 | (6) | p1 |______\ | sp1 sp2 |
| |/____ | |------/ | |
| host2 |\ | host1 |/______ | host3 |
| | | |\------ | |
-------------- -------------- (5) --------------
The above figure shows the request packet path for a from-lport ACL. Response
would follow the same path in reverse direction.
To achieve this, the following would be done:
On host where the ACL port group members are present (host1)
—-----------------------------------------------------------
Packet going out of host1 with outport set to sp1:
- New flows would be installed by ovn-controller in host1's REMOTE_TABLE to
tunnel packets (geneve encap) destined to non-local service ports (sp1 and
sp2), to the associated host (host3). Similar rules are currently installed for
non-local overlay ports. This ensures that once host1 sets the outport to sp1,
the packets are sent via tunnel to host3.
On host where service ports are present (host3)
—----------------------------------------------
Same packet received on host3 via tunnel interface:
- Table 0 (PHY_TO_LOG) has priority 100 flows to process incoming packets on
each tunnel interface. Action for these flows would be enhanced to store the
tunnel interface id in a register.
in_port="ovn-xxxxxx-x" actions=move tunnel id to metadata reg, populate reg14
(inport) and reg15 (outport) from tunnel metadata, load tunnel interface id
into reg5[16..31]
- The packet would then be submitted to the egress pipeline with outport sp1.
The ls_out_stateful stage would copy the tunnel interface id from the register
to the ct_label.tun_if_id. This action will be done only if outport is a
service port.
table=9 (ls_out_stateful ), priority=101 , match=(reg5[16..31] != 0 &&
outport == "sp1"), action=(ct_commit { ct_mark.blocked = 0; ct_label.tun_if =
1;ct_label.tun_if_id = reg5[16..31]; }; next;)
The above two flows ensure that if a packet is received via tunnel on host3,
with outport as sp1, the tunnel interface id is committed to the ct entry in
sp1's zone.
Same packet coming out of sp2 on host3:
- When the packet comes out of sp2, and the ingress processing completes, the
destination port would be set to p2. If p2 is on the same host, i.e. host3, it
would match the existing priority 100 rule in LOCAL_TABLE and be forwarded to
the next stage.
- If p2 is on a different host:
- A new flow in LOCAL_TABLE would process the packet through the CT zone of
sp1 (i.e. the CT zone of the service port linked to the inport of the packet,
in this case import is sp1 and linked port is sp1). The packet would be
submitted to LOCAL_TABLE again, post the ct processing. A register bit is set
before the ct call to differentiate the recirculated packet from the original
one.
priority 90: for each local service_port, if the inport is sp_i &&
MLF_RECIRC_BIT == 0 --> action: set the MLF_RECIRC_BIT = 1,
ct(zone=zone_of_sp_i_linked_port, table=LOCAL_TABLE)
- Another new flow is installed to match the recirculated packet. If
ct_label.tun_if_id is set, it uses that value to send the packet back to host1
over the same tunnel interface where the packet was originally received.
priority 90: for each local service_port, if the inport is sp_i,
MLF_RECIRC_BIT = 1, ct_label.tun_if = 1 -> action: tunnel the packet using
ct_label.tun_if_id
In case the original packet was not received on tunnel, i.e. VM1 was on the
same host as SVC VM, the recirculated packet would not match rule2 (since
ct_label.tun_if is not 1). It would then fall through and match the lower
priority rules and be forwarded the normal way. The existing priority 100 rules
for remote ports and localnet ports would need to be pushed down to priority 80.
The order of rules in LOCAL_TABLE would be:
100: If the outport is local, send it to the next table. --> Existing flow,
unchanged.
90: If inport is svc port and recirc is 0, ct(linked port’s zone) and resubmit
to the same table. --> new flow
90: If inport is svc port and recirc is 1, and ct_label.tun_if is set, send the
packet via tunnel. --> new flow
80: If outport is remote, set to localnet and resubmit to same table -->
existing flow, priority changed from 100 to 80
80: : if outport is localnet port, submit to the next table. --> existing flow,
priority changed from 100 to 80
6. Service insertion across logical switches
============================================
If the port-group where the ACL is being applied has members across multiple
logical switches, there needs to be a service port pair on each of these
switches.
The service VM will have only one inport and one outport. The CMS is expected
to create child ports linked to these ports on each logical switch where pg
members are present.
The network-function entity would be configured with the parent ports only.
When CMS creates the child ports, it does not need to change any of the NF, NFG
or ACL config tables.
When northd configures the redirection rules for a specific LS, it will use the
parent or child port depending on what it finds on that LS.
--------
| SVC VM |
--------
| |
----- | | -----
| VM1 | sp1 sp2 | VM2 |
---- - | | -------------- ----- | |
| | | | SVC LS | | | |
p1| sp1_ch1 sp2_ch1 -------------- p3| sp1_ch2 sp2_ch2
-------------------- --------------------
| LS1 | | LS2 |
-------------------- --------------------
In this example, the CMS created the parent ports for the SVC VM on an LS named
SVC LS. The ports are sp1 and sp2. The CMS configures the NF using these ports:
ovn-nbctl network-function-add nf1 sp1 sp2
ovn-nbctl network-function-group-add nfg1 nf1 123
ovn-nbctl acl-add pg1 from-lport 200 'inport==@pg1 && ip4.dst == $as2'
allow-related nfg1
The port group to which the ACL is applied is pg1 and pg1 has two ports: p1 on
LS1 and p3 on LS2.
The CMS needs to create child ports for the service ports on LS1 and LS2. On
LS1: sp1_ch1 and sp2_ch1. On LS2: sp1_ch2 and sp2_ch2
When northd is creating rules on LS1 for the network_function table, it would
use sp1_ch1 and sp2_ch1.
table=21(ls_in_network_function ), priority=100 , inport ==
"sp2_ch1"), action=(next;)
table=21(ls_in_network_function ), priority=99 , match=(reg0[18] == 1
&& reg0[19] == 0 && reg5[0..7] == 123), action=(outport = "sp1_ch1"; output;)
When northd is creating rules on LS2 for the network_function table, it would
use sp1_ch2 and sp2_ch2.
table=21(ls_in_network_function ), priority=100 , inport ==
"sp2_ch2"), action=(next;)
table=21(ls_in_network_function ), priority=99 , match=(reg0[18] == 1
&& reg0[19] == 0 && reg5[0..7] == 123), action=(outport = "sp1_ch2"; output;)
7. Health Monitoring
====================
The ovn-controller will get the two service ports from the service_monitor
table and inject packets periodically into the service VM inport. Special rules
would be installed to match the packets coming from service VM outport and be
sent to ovn-controller for processing. Ovn-controller would detect missing
packets from outport and based on failure_count config, it would set the status
in service_monitor as offline. It would keep sending the packets to inport and
if it detects consecutive success_count number of packets from outport
interface, it would set the status back to online. Packet types being
considered are ICMP echo request and ARP.
Thanks,
Sragdhara
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev