On 3/26/21 10:12 AM, Dumitru Ceara wrote:
On 3/26/21 2:52 PM, Mark Michelson wrote:
On 3/26/21 7:35 AM, Dumitru Ceara wrote:
On 3/26/21 12:32 PM, Dumitru Ceara wrote:
On 3/24/21 12:44 AM, Mark Michelson wrote:
On 3/23/21 6:55 PM, Dumitru Ceara wrote:
On 3/23/21 10:02 PM, Mark Michelson wrote:
On 3/23/21 12:13 PM, Dumitru Ceara wrote:
On 3/23/21 5:05 PM, Numan Siddique wrote:
On Fri, Mar 19, 2021 at 2:20 AM Mark Michelson
<[email protected]>
wrote:

Self-originated ARPs are intended to be sent to the "owning"
router for
a given IP address, as well as flooded to non-router ports on a
logical
switch.

When trying to determine the owning router for an IP address, we
would
iterate through load balancer and NAT addresses, and check if
these IP
addresses were "reachable" on this particular router. Reachable
here
means that the NAT external IP or load balancer VIP falls within
any of
the networks served by this router. If an IP address were
determined
not
to be reachable, then we would not try to send ARPs for that
particular
address to the router.

However, it is possible (and sometimes desired) to configure NAT
floating
IPs on a router that are not in any subnet handled by that router.
In this case, OVN refuses to send ARP requests to the router on
which the
floating IP has been configured. The result is that
internally-generated
traffic that targets the floating IP cannot reach its destination,
since the router on which the floating IP is configured never
receives ARPs
for the floating IP.

This patch fixes the issue by removing the reachability checks
altogether. If a router has a NAT external IP or load balancer VIP
that
is outside the range of any of its configured subnets, we still
should
send ARPs to that router for those requested addresses.

Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1929901

Signed-off-by: Mark Michelson <[email protected]>

Thanks for addressing this.

Acked-by: Numan Siddique <[email protected]>

@Dumitru - Since you had added the code to limit ARPs.  Can you
please
also take a look at this patch ?


Hi Mark, Numan,

I've been thinking about this for a while.  I think this needs at
least:

Fixes: 1e07781310d8 ("ovn-northd: Fix logical flows to limit ARP/NS
broadcast domain.")

OK, I can either just add that to the commit when I merge, or if I
need
to put a new version up for review, I can add it to the new version
then.

Sounds good.

I investigated some more and the problem is actually because of the
combination of:

1e07781310d8 ("ovn-northd: Fix logical flows to limit ARP/NS
broadcast domain.")

and

8c6a5bc21847 ("ovn-northd: Limit self originated ARP/ND broadcast
domain.")

Without 8c6a5bc21847, the "reachable" functions just optimize ARP
request flooding but when the flows using them are not hit we always
fall back to flooding in the MC_FLOOD group on LS-PUB.
With 8c6a5bc21847, we instead flood all logical router self originated
ARP requests in the MC_FLOOD_L2 group.




With the latest changes in ovn-kubernetes I think that the above was
not
needed anyway.  Mark, do you have more details about this by any
chance?

I'm not sure which changes in ovn-kubernetes you're referring to
here.

I was thinking of this:

https://github.com/ovn-org/ovn-kubernetes/pull/2055

Specifically:

https://github.com/ovn-org/ovn-kubernetes/commit/ce2d312c7266c8d55173c9660002c67715a5444b



However, after a closer look, I'm not sure if dnat_and_snat usage is
completely removed in ovn-kubernetes.  This makes me wonder if
reverting
1e07781310d8 ("ovn-northd: Fix logical flows to limit ARP/NS broadcast
domain.") will not cause a regression in the ovn-kubernetes use case.



Initial bug report was here:
https://mail.openvswitch.org/pipermail/ovs-discuss/2020-June/050287.html



To quote from that e-mail:

"Question though is why any Pod on the logical switch would send
an ARP
for an IP that is not in its subnet. A packet from a Pod towards a
non
subnet IP should ARP only for the default gateway IP."

The above holds true for the scenario I'm working here. The key is
that
this is a self-originated ARP from OVN, not an ARP from a pod/VM.

The use case I'm solving here is OpenStack floating IPs. It's not
common, but it is expected that a user is able to assign a
floating IP
to a router that does not service that floating IP's subnet. They
also
expect all other VMs in the cluster to be able to reach the target VM
via its floating IP.

So let's say you have a setup like:


                         EXT-ROUTER
                             ^
                             |
                             v
VM1 <-> LS1 <-> LR1 <-> LS-PUB <-> LR2 <-> LS2 <-> VM2

In this case, LR1 and LR2 both have distributed gateway router
ports on
them (the ports going to LS-PUB). They also have default static
routes
that point to the EXT-ROUTER. LR2's gateway router port IP address is
172.18.1.1/24. An OpenStack user configures a floating IP for VM2 of
172.18.2.100. This translates to a dnat_and_snat entry on LR2 for
that
address, which is NOT in the subnet configured on that router. Now
VM1
wants to ping 172.18.2.100. In this case, when the ping reaches
LR1, LR1
needs to ARP to find where 172.18.2.100 is. The ARP needs to
target LR2
since that's where the floating IP is configured. If the ARP only
goes
to EXT-ROUTER, EXT-ROUTER won't be able to respond to the ARP
properly,
and connectivity cannot be established.

I might be wrong but I'm still not sure how valid it is to define a
floating IP (172.18.2.100) on a router that doesn't have network that
includes 172.18.2.100.

Unfortunately, this battle has been fought a couple of times already,
and in the case of OpenStack, this is something they expect to work and
currently does work with ml2-ovs.


Would it work if neutron-ovn configured a 172.18.2.100/32 network
on the
LR2-to-LS-PUB router port?

Yes, that would make the router appear to be "reachable" with the
current OVN code.


Otherwise, I think we need to clarify if ovn-kubernetes would still
have
a scale issue or not.

A less pretty alternative is to allow such floating IPs (without a
network that includes that IP) only if a new config knob is set.

I think there are a couple of valid options here:

1) A configuration option. The option could be configured on the NAT
itself to allow for that address to be considered "reachable" on the
router even if it lies outside the router subnet.

2) Automatically allow this behavior for "floating IP" NATs
(dnat_and_snat, external_mac set, and port set).


(2) Has the advantage of "just working" for the OpenStack FIP case
while
also not causing a flow explosion for ovn-kubernetes. However, (1) is
more flexible, allowing for the option to be set on other DNATs as
well.

Maybe an option (3) that would make all CMSs happy could be:

For all dnat_and_snat logical IPs that are not part of router subnets
and a single logical flow in table in_l2_lkup on LS-PUB flooding ARP

s/and a single/add a single/

requests that target them in the MC_FLOOD group.  This shouldn't create
a scale issue because it's a single flow for all such IPs. If this
flow has higher priority than the flow restricting router self
originated ARPs to MC_FLOOD_L2 then it should also work fine.

OK, the idea here is that you put the known "unreachable" IP
destinations in a higher priority than the known "reachable" ones, so
that if one of the unreachable ones is targeted, we flood out all ports
instead of just L2. That should work well for self-originated ARPs or
ARPs from outside.

I think this would work. But also see my comments below.


The downside is that we *might* hit the OVS resubmit limit for ARPs
targeting such FIPs on scaled setups but, I guess, that's a different
story.

One thing Terry brought up with me yesterday is that it's odd we have to
self-originate ARPs at all for NAT addresses that are configured on the
logical network. We should be able to put add MAC_Binding entries for
those addresses without actually sending an ARP. You told me yesterday
in chat that in pinctrl, we have send_garp_locally(), which is supposed
to do this. In my setup (and presumably with OpenStack), this is not
happening, hence the need for ARPs.

That's true, we should figure out why the MAC_Binding entries are not
automatically inserted by ovn-controller in this case.

I've looked into this a bit more, and I can answer about my particular setup, but I can't comment about OpenStack since I don't have a setup I can easily inspect at the moment.

In my case, there were two reasons why I was not seeing gARPs for the NAT addresses.

* I had not set options:nat-addresses on the ls-pub switch.
* Even if I had set options:nat-addresses on the ls-pub switch, ls-pub did not have any ports of type "localnet"

If we go back to a time before the existence of send_garp_locally(), this makes good sense. You only want to send gARPs for NAT addresses if there is an external network, and you want to pick and choose which addresses you publicize. For "internal" traffic wishing to reach a pod/VM via a NAT address, we send ARPs to learn the MAC binding as needed.

I think that we should send local gARPs (i.e. create MAC_Bindings) for NAT addresses even if the connected switch does not have localnet ports. What's nice is that we already have an option that so if someone doesn't want to fill their MAC_Binding table unnecessarily, they can set always_learn_from_arp_request=false and they'll still use self-generated ARPs to learn the addresses as needed.

What do you think?



If we could get the NAT addresses added to MAC_Binding without sending
any ARPs, I think this would be for the best. And for deployments
worried about the size of the MAC_Binding table, Han added the
"always_learn_from_arp_request" option to force the need for an ARP in
those cases. And in those cases, your suggestion about adding the new
flow in ls_in_l2_lkup should work.

But I think we need the new flow regardless of the
always_learn_from_arp_request value.  That's because an operator/the CMS
can clear the MAC_Binding table externally and ovn-controller will not
repopulate it automatically.  IIRC, the number of gARPs sent by
ovn-controller is limited.

Yes, we'd still need to add the new flow, both for the reason you mentioned, as well as to work with the always_learn_from_arp_request=false case.





I'm just bracing for when someone wants the same behavior for load
balancers...


Also, I think the system test you're adding needs a small change as it
currently passes both with and without your patch.  We need to remove:

ovn-nbctl lr-route-add lr1 172.18.2.12 172.18.1.173 lr1-ls-pub

What do you think?

In this particular case, I can remove that static route. The only reason
I can remove it in this case is because the floating IP happens to be in
lr1's subnet. Otherwise, lr_in_ip_routing would drop the ping request.
We would not even reach lr_in_arp_request. In the generalized case where
a FIP could be outside the subnet of EITHER of the logical routers, then
the static route would be necessary.

Makes sense.


In the reference network I used from Terry, a difference was that there
was an external router connected to ls-pub via a localnet port. lr1 and
lr2 had default static routes added. Essentially:

ovn-nbctl lr-route-add lr1 0.0.0.0/0 <ext_router_ip> lr1-ls-pub
ovn-nbctl lr-route-add lr2 0.0.0.0/0 <ext_router_ip> lr2-ls-pub

This way, the pings (both request and response) would always make it
past lr_in_ip_routing.

In my test, I was trying for a more minimal setup, with no external
router. In order for the test to work, I needed to add static routes.

I think I need to just add a fake "external router" VM to the test and
include the same default routes on lr1 and lr2 in order to simulate the
OpenStack scenario better. Having no external router means I have to add
static routes on each router (for the general case), which trivializes
the scenario. I also will change the test so that FIP so that it is not
in either logical router's subnet.

I think the current test is fine as long as we remove the route on lr1.
We're actually testing the way lr2 replies to ARPs for the FIP so it
should be enough like this IMO.



Regards,
Dumitru




Regards,
Dumitru


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to