On 3/23/21 10:02 PM, Mark Michelson wrote: > On 3/23/21 12:13 PM, Dumitru Ceara wrote: >> On 3/23/21 5:05 PM, Numan Siddique wrote: >>> On Fri, Mar 19, 2021 at 2:20 AM Mark Michelson <[email protected]> >>> wrote: >>>> >>>> Self-originated ARPs are intended to be sent to the "owning" router for >>>> a given IP address, as well as flooded to non-router ports on a logical >>>> switch. >>>> >>>> When trying to determine the owning router for an IP address, we would >>>> iterate through load balancer and NAT addresses, and check if these IP >>>> addresses were "reachable" on this particular router. Reachable here >>>> means that the NAT external IP or load balancer VIP falls within any of >>>> the networks served by this router. If an IP address were determined >>>> not >>>> to be reachable, then we would not try to send ARPs for that particular >>>> address to the router. >>>> >>>> However, it is possible (and sometimes desired) to configure NAT >>>> floating >>>> IPs on a router that are not in any subnet handled by that router. >>>> In this case, OVN refuses to send ARP requests to the router on >>>> which the >>>> floating IP has been configured. The result is that >>>> internally-generated >>>> traffic that targets the floating IP cannot reach its destination, >>>> since the router on which the floating IP is configured never >>>> receives ARPs >>>> for the floating IP. >>>> >>>> This patch fixes the issue by removing the reachability checks >>>> altogether. If a router has a NAT external IP or load balancer VIP that >>>> is outside the range of any of its configured subnets, we still should >>>> send ARPs to that router for those requested addresses. >>>> >>>> Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1929901 >>>> >>>> Signed-off-by: Mark Michelson <[email protected]> >>> >>> Thanks for addressing this. >>> >>> Acked-by: Numan Siddique <[email protected]> >>> >>> @Dumitru - Since you had added the code to limit ARPs. Can you please >>> also take a look at this patch ? >>> >> >> Hi Mark, Numan, >> >> I've been thinking about this for a while. I think this needs at least: >> >> Fixes: 1e07781310d8 ("ovn-northd: Fix logical flows to limit ARP/NS >> broadcast domain.") > > OK, I can either just add that to the commit when I merge, or if I need > to put a new version up for review, I can add it to the new version then.
Sounds good. > >> >> With the latest changes in ovn-kubernetes I think that the above was not >> needed anyway. Mark, do you have more details about this by any chance? > > I'm not sure which changes in ovn-kubernetes you're referring to here. I was thinking of this: https://github.com/ovn-org/ovn-kubernetes/pull/2055 Specifically: https://github.com/ovn-org/ovn-kubernetes/commit/ce2d312c7266c8d55173c9660002c67715a5444b However, after a closer look, I'm not sure if dnat_and_snat usage is completely removed in ovn-kubernetes. This makes me wonder if reverting 1e07781310d8 ("ovn-northd: Fix logical flows to limit ARP/NS broadcast domain.") will not cause a regression in the ovn-kubernetes use case. > >> >> Initial bug report was here: >> https://mail.openvswitch.org/pipermail/ovs-discuss/2020-June/050287.html > > To quote from that e-mail: > > "Question though is why any Pod on the logical switch would send an ARP > for an IP that is not in its subnet. A packet from a Pod towards a non > subnet IP should ARP only for the default gateway IP." > > The above holds true for the scenario I'm working here. The key is that > this is a self-originated ARP from OVN, not an ARP from a pod/VM. > > The use case I'm solving here is OpenStack floating IPs. It's not > common, but it is expected that a user is able to assign a floating IP > to a router that does not service that floating IP's subnet. They also > expect all other VMs in the cluster to be able to reach the target VM > via its floating IP. > > So let's say you have a setup like: > > > EXT-ROUTER > ^ > | > v > VM1 <-> LS1 <-> LR1 <-> LS-PUB <-> LR2 <-> LS2 <-> VM2 > > In this case, LR1 and LR2 both have distributed gateway router ports on > them (the ports going to LS-PUB). They also have default static routes > that point to the EXT-ROUTER. LR2's gateway router port IP address is > 172.18.1.1/24. An OpenStack user configures a floating IP for VM2 of > 172.18.2.100. This translates to a dnat_and_snat entry on LR2 for that > address, which is NOT in the subnet configured on that router. Now VM1 > wants to ping 172.18.2.100. In this case, when the ping reaches LR1, LR1 > needs to ARP to find where 172.18.2.100 is. The ARP needs to target LR2 > since that's where the floating IP is configured. If the ARP only goes > to EXT-ROUTER, EXT-ROUTER won't be able to respond to the ARP properly, > and connectivity cannot be established. I might be wrong but I'm still not sure how valid it is to define a floating IP (172.18.2.100) on a router that doesn't have network that includes 172.18.2.100. Would it work if neutron-ovn configured a 172.18.2.100/32 network on the LR2-to-LS-PUB router port? Otherwise, I think we need to clarify if ovn-kubernetes would still have a scale issue or not. A less pretty alternative is to allow such floating IPs (without a network that includes that IP) only if a new config knob is set. Regards, Dumitru _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
