Support a new logical router option "always_learn_from_arp_request" that controls behavior when handling ARP requests or IPv4 ND-NS packets.
"true" - Always learn the MAC/IP binding and add a new MAC_Binding entry (default behavior) "false" - If there is a MAC_binding for that IP and the MAC is different, or, if TPA of ARP request belongs to any router port on this router, then update/add that MAC/IP binding. Otherwise, don't update/add entries. Reported-by: Girish Moodalbail <gmoodalb...@nvidia.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/049995.html Signed-off-by: Han Zhou <hz...@ovn.org> --- northd/ovn-northd.8.xml | 109 ++++++++++++++++++++++++++++++++++++++++-------- northd/ovn-northd.c | 96 +++++++++++++++++++++++++++++++++++------- ovn-nb.xml | 27 ++++++++++++ tests/ovn.at | 18 ++++++++ 4 files changed, 217 insertions(+), 33 deletions(-) diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml index 9f2c70f..30936ab 100644 --- a/northd/ovn-northd.8.xml +++ b/northd/ovn-northd.8.xml @@ -1537,58 +1537,126 @@ output; <p> For each router port <var>P</var> that owns IP address <var>A</var>, which belongs to subnet <var>S</var> with prefix length <var>L</var>, - a priority-100 flow is added which matches - <code>inport == <var>P</var> && - arp.spa == <var>S</var>/<var>L</var> && arp.op == 1</code> - (ARP request) with the - following actions: + if the option <code>always_learn_from_arp_request</code> is + <code>true</code> for this router, a priority-100 flow is added which + matches <code>inport == <var>P</var> && arp.spa == + <var>S</var>/<var>L</var> && arp.op == 1</code> (ARP request) + with the following actions: + </p> + + <pre> +reg9[2] = lookup_arp(inport, arp.spa, arp.sha); +next; + </pre> + + <p> + If the option <code>always_learn_from_arp_request</code> is + <code>false</code>, the following two flows are added. + </p> + + <p> + A priority-110 flow is added which matches <code>inport == + <var>P</var> && arp.spa == <var>S</var>/<var>L</var> + && arp.tpa == <var>A</var> && arp.op == 1</code> + (ARP request) with the following actions: </p> <pre> reg9[2] = lookup_arp(inport, arp.spa, arp.sha); +reg9[3] = 1; +next; + </pre> + + <p> + A priority-100 flow is added which matches <code>inport == + <var>P</var> && arp.spa == <var>S</var>/<var>L</var> + && arp.op == 1</code> (ARP request) with the following + actions: + </p> + + <pre> +reg9[2] = lookup_arp(inport, arp.spa, arp.sha); +reg9[3] = lookup_arp_ip(inport, arp.spa); next; </pre> <p> If the logical router port <var>P</var> is a distributed gateway router port, additional match - <code>is_chassis_resident(cr-<var>P</var>)</code> is added so that - the resident gateway chassis handles the neighbor lookup. + <code>is_chassis_resident(cr-<var>P</var>)</code> is added for all + these flows. </p> </li> <li> <p> A priority-100 flow which matches on ARP reply packets and applies - the actions: + the actions if the option <code>always_learn_from_arp_request</code> + is <code>true</code>: </p> <pre> reg9[2] = lookup_arp(inport, arp.spa, arp.sha); next; </pre> + + <p> + If the option <code>always_learn_from_arp_request</code> + is <code>false</code>, the above actions will be: + </p> + + <pre> +reg9[2] = lookup_arp(inport, arp.spa, arp.sha); +reg9[3] = 1; +next; + </pre> + </li> <li> <p> A priority-100 flow which matches on IPv6 Neighbor Discovery - advertisement packet and applies the actions: + advertisement packet and applies the actions if the option + <code>always_learn_from_arp_request</code> is <code>true</code>: </p> <pre> reg9[2] = lookup_nd(inport, nd.target, nd.tll); next; </pre> + + <p> + If the option <code>always_learn_from_arp_request</code> + is <code>false</code>, the above actions will be: + </p> + + <pre> +reg9[2] = lookup_nd(inport, nd.target, nd.tll); +reg9[3] = 1; +next; + </pre> </li> <li> <p> A priority-100 flow which matches on IPv6 Neighbor Discovery - solicitation packet and applies the actions: + solicitation packet and applies the actions if the option + <code>always_learn_from_arp_request</code> is <code>true</code>: + </p> + + <pre> +reg9[2] = lookup_nd(inport, ip6.src, nd.sll); +next; + </pre> + + <p> + If the option <code>always_learn_from_arp_request</code> + is <code>false</code>, the above actions will be: </p> <pre> reg9[2] = lookup_nd(inport, ip6.src, nd.sll); +reg9[3] = lookup_nd_ip(inport, ip6.src); next; </pre> </li> @@ -1604,21 +1672,28 @@ next; <p> This table adds flows to learn the mac bindings from the ARP and - IPv6 Neighbor Solicitation/Advertisement packets if ARP/ND lookup - failed in the previous table. + IPv6 Neighbor Solicitation/Advertisement packets if it is needed + according to the lookup results from the previous stage. </p> <p> reg9[2] will be <code>1</code> if the <code>lookup_arp/lookup_nd</code> - in the previous table was successful, or if there was no need to do the - lookup. + in the previous table was successful or skipped, meaning no need + to learn mac binding from the packet. + </p> + + <p> + reg9[3] will be <code>1</code> if the + <code>lookup_arp_ip/lookup_nd_ip</code> in the previous table was + successful or skipped, meaning it is ok to learn mac binding from + the packet (if reg9[2] is 0). </p> <ul> <li> - A priority-100 flow with the match - <code>reg9[2] == 1</code> and advances the packet - to the next table as there is no need to learn the neighbor. + A priority-100 flow with the match <code>reg9[2] == 1 || reg9[3] == + 0</code> and advances the packet to the next table as there is no need + to learn the neighbor. </li> <li> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index 425f522..b5e7c08 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -221,6 +221,7 @@ enum ovn_stage { /* Register to store the result of check_pkt_larger action. */ #define REGBIT_PKT_LARGER "reg9[1]" #define REGBIT_LOOKUP_NEIGHBOR_RESULT "reg9[2]" +#define REGBIT_LOOKUP_NEIGHBOR_IP_RESULT "reg9[3]" /* Register to store the eth address associated to a router port for packets * received in S_ROUTER_IN_ADMISSION. @@ -8242,36 +8243,62 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, * For ARP packets, table LOOKUP_NEIGHBOR does a lookup for the * (arp.spa, arp.sha) in the mac binding table using the 'lookup_arp' * action and stores the result in REGBIT_LOOKUP_NEIGHBOR_RESULT bit. + * If "always_learn_from_arp_request" is set to false, it will also + * lookup for the (arp.spa) in the mac binding table using the + * "lookup_arp_ip" action for ARP request packets, and stores the + * result in REGBIT_LOOKUP_NEIGHBOR_IP_RESULT bit; or set that bit + * to "1" directly for ARP response packets. * * For IPv6 ND NA packets, table LOOKUP_NEIGHBOR does a lookup * for the (nd.target, nd.tll) in the mac binding table using the * 'lookup_nd' action and stores the result in - * REGBIT_LOOKUP_NEIGHBOR_RESULT bit. + * REGBIT_LOOKUP_NEIGHBOR_RESULT bit. If + * "always_learn_from_arp_request" is set to false, + * REGBIT_LOOKUP_NEIGHBOR_IP_RESULT bit is set. * * For IPv6 ND NS packets, table LOOKUP_NEIGHBOR does a lookup * for the (ip6.src, nd.sll) in the mac binding table using the * 'lookup_nd' action and stores the result in - * REGBIT_LOOKUP_NEIGHBOR_RESULT bit. + * REGBIT_LOOKUP_NEIGHBOR_RESULT bit. If + * "always_learn_from_arp_request" is set to false, it will also lookup + * for the (ip6.src) in the mac binding table using the "lookup_nd_ip" + * action and stores the result in REGBIT_LOOKUP_NEIGHBOR_IP_RESULT + * bit. * * Table LEARN_NEIGHBOR learns the mac-binding using the action - * - 'put_arp/put_nd' only if REGBIT_LOOKUP_NEIGHBOR_RESULT bit - * is not set. + * - 'put_arp/put_nd'. Learning mac-binding is skipped if + * REGBIT_LOOKUP_NEIGHBOR_RESULT bit is set or + * REGBIT_LOOKUP_NEIGHBOR_IP_RESULT is not set. * * */ /* Flows for LOOKUP_NEIGHBOR. */ + bool learn_from_arp_request = smap_get_bool(&od->nbr->options, + "always_learn_from_arp_request", true); + ds_clear(&actions); + ds_put_format(&actions, REGBIT_LOOKUP_NEIGHBOR_RESULT + " = lookup_arp(inport, arp.spa, arp.sha); %snext;", + learn_from_arp_request ? "" : + REGBIT_LOOKUP_NEIGHBOR_IP_RESULT" = 1; "); ovn_lflow_add(lflows, od, S_ROUTER_IN_LOOKUP_NEIGHBOR, 100, - "arp.op == 2", - REGBIT_LOOKUP_NEIGHBOR_RESULT" = " - "lookup_arp(inport, arp.spa, arp.sha); next;"); + "arp.op == 2", ds_cstr(&actions)); + ds_clear(&actions); + ds_put_format(&actions, REGBIT_LOOKUP_NEIGHBOR_RESULT + " = lookup_nd(inport, nd.target, nd.tll); %snext;", + learn_from_arp_request ? "" : + REGBIT_LOOKUP_NEIGHBOR_IP_RESULT" = 1; "); ovn_lflow_add(lflows, od, S_ROUTER_IN_LOOKUP_NEIGHBOR, 100, "nd_na", - REGBIT_LOOKUP_NEIGHBOR_RESULT" = " - "lookup_nd(inport, nd.target, nd.tll); next;"); + ds_cstr(&actions)); + ds_clear(&actions); + ds_put_format(&actions, REGBIT_LOOKUP_NEIGHBOR_RESULT + " = lookup_nd(inport, ip6.src, nd.sll); %snext;", + learn_from_arp_request ? "" : + REGBIT_LOOKUP_NEIGHBOR_IP_RESULT + " = lookup_nd_ip(inport, ip6.src); "); ovn_lflow_add(lflows, od, S_ROUTER_IN_LOOKUP_NEIGHBOR, 100, "nd_ns", - REGBIT_LOOKUP_NEIGHBOR_RESULT" = " - "lookup_nd(inport, ip6.src, nd.sll); next;"); + ds_cstr(&actions)); /* For other packet types, we can skip neighbor learning. * So set REGBIT_LOOKUP_NEIGHBOR_RESULT to 1. */ @@ -8280,8 +8307,12 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, /* Flows for LEARN_NEIGHBOR. */ /* Skip Neighbor learning if not required. */ + ds_clear(&match); + ds_put_format(&match, REGBIT_LOOKUP_NEIGHBOR_RESULT" == 1%s", + learn_from_arp_request ? "" : + " || "REGBIT_LOOKUP_NEIGHBOR_IP_RESULT" == 0"); ovn_lflow_add(lflows, od, S_ROUTER_IN_LEARN_NEIGHBOR, 100, - REGBIT_LOOKUP_NEIGHBOR_RESULT" == 1", "next;"); + ds_cstr(&match), "next;"); ovn_lflow_add(lflows, od, S_ROUTER_IN_LEARN_NEIGHBOR, 90, "arp", "put_arp(inport, arp.spa, arp.sha); next;"); @@ -8298,8 +8329,37 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, continue; } + bool learn_from_arp_request = smap_get_bool(&op->od->nbr->options, + "always_learn_from_arp_request", true); + /* Check if we need to learn mac-binding from ARP requests. */ for (int i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) { + if (!learn_from_arp_request) { + /* ARP request to this address should always get learned, + * so add a priority-110 flow to set + * REGBIT_LOOKUP_NEIGHBOR_IP_RESULT to 1. */ + ds_clear(&match); + ds_put_format(&match, + "inport == %s && arp.spa == %s/%u && " + "arp.tpa == %s && arp.op == 1", + op->json_key, + op->lrp_networks.ipv4_addrs[i].network_s, + op->lrp_networks.ipv4_addrs[i].plen, + op->lrp_networks.ipv4_addrs[i].addr_s); + if (op->od->l3dgw_port && op == op->od->l3dgw_port + && op->od->l3redirect_port) { + ds_put_format(&match, " && is_chassis_resident(%s)", + op->od->l3redirect_port->json_key); + } + const char *actions_s = REGBIT_LOOKUP_NEIGHBOR_RESULT + " = lookup_arp(inport, arp.spa, arp.sha); " + REGBIT_LOOKUP_NEIGHBOR_IP_RESULT" = 1;" + " next;"; + ovn_lflow_add_with_hint(lflows, op->od, + S_ROUTER_IN_LOOKUP_NEIGHBOR, 110, + ds_cstr(&match), actions_s, + &op->nbrp->header_); + } ds_clear(&match); ds_put_format(&match, "inport == %s && arp.spa == %s/%u && arp.op == 1", @@ -8311,12 +8371,16 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, ds_put_format(&match, " && is_chassis_resident(%s)", op->od->l3redirect_port->json_key); } + ds_clear(&actions); + ds_put_format(&actions, REGBIT_LOOKUP_NEIGHBOR_RESULT + " = lookup_arp(inport, arp.spa, arp.sha); %snext;", + learn_from_arp_request ? "" : + REGBIT_LOOKUP_NEIGHBOR_IP_RESULT + " = lookup_arp_ip(inport, arp.spa); "); ovn_lflow_add_with_hint(lflows, op->od, S_ROUTER_IN_LOOKUP_NEIGHBOR, 100, - ds_cstr(&match), - REGBIT_LOOKUP_NEIGHBOR_RESULT" = " - "lookup_arp(inport, arp.spa, arp.sha); " - "next;", &op->nbrp->header_); + ds_cstr(&match), ds_cstr(&actions), + &op->nbrp->header_); } } diff --git a/ovn-nb.xml b/ovn-nb.xml index e8450aa..415d30a 100644 --- a/ovn-nb.xml +++ b/ovn-nb.xml @@ -1859,6 +1859,33 @@ send traffic between each other. </p> </column> + <column name="options" key="always_learn_from_arp_request" type='{"type": "boolean"}'> + <p> + This option controls the behavior when handling IPv4 ARP requests or + IPv6 ND-NS packets - whether a dynamic neighbor (MAC binding) entry + is added/updated. + </p> + + <p> + <code>true</code> - Always learn the MAC-IP binding, and add/update + the MAC binding entry. + </p> + + <p> + <code>false</code> - If there is a MAC binding for that IP and the + MAC is different, or, if TPA of ARP request belongs to any router + port on this router, then update/add that MAC-IP binding. Otherwise, + don't update/add entries. + </p> + + <p> + It is <code>true</code> by default. It is recommended to set to + <code>false</code> when a large number of logical routers are + connected to the same logical switch but most of them never need to + send traffic between each other, to reduce the size of the MAC + binding table. + </p> + </column> </group> <group title="Common Columns"> diff --git a/tests/ovn.at b/tests/ovn.at index 67fb754..fc811b4 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -4031,6 +4031,18 @@ ip_to_hex() { sha=f00000000011 spa=`ip_to_hex 192 168 1 100` tpa=$spa + +# When always_learn_from_arp_request=false, the new mac-binding will not be learned +# through GARP request. +ovn-nbctl --wait=hv set logical_router lr0 options:always_learn_from_arp_request=false + +test_arp 11 $sha $spa $tpa +sleep 1 +AT_CHECK([ovn-sbctl find mac_binding ip="192.168.1.100"], [0], []) + +# When always_learn_from_arp_request=true, the new mac-binding will be learned. +ovn-nbctl --wait=hv set logical_router lr0 options:always_learn_from_arp_request=true + test_arp 11 $sha $spa $tpa OVS_WAIT_UNTIL([test `ovn-sbctl find mac_binding ip="192.168.1.100" | wc -l` -gt 0]) ovn-nbctl --wait=hv sync @@ -4045,6 +4057,12 @@ test_ip 21 $smac $dmac $sip $dip 11 # lp12 send GARP request to announce ownership of 192.168.1.100. +# Even when always_learn_from_arp_request=false, the existing mac-binding should be +# updated through GARP request. +ovn-nbctl --wait=hv set logical_router lr0 options:always_learn_from_arp_request=false +ovn-sbctl lflow-list +as hv2 ovs-ofctl dump-flows br-int + sha=f00000000012 test_arp 12 $sha $spa $tpa OVS_WAIT_UNTIL([ovn-sbctl find mac_binding ip="192.168.1.100" | grep f0:00:00:00:00:12]) -- 2.1.0 _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev