Hi Han, Yes, I agree that the patch is not enough. I'll take a look at the GARP thing because it's either not implemented or not working. Here's a reproducer while I jump back into it.
When you ping 172.24.4.200 from the namespace 1 the first time, a MAC_Binding entry gets created: # ovn-sbctl list mac_binding | grep 200 -C2 _uuid : 07967416-c89c-4233-8cc2-4dc929720838 datapath : 918a9363-fa6e-4086-98ee-8d073b924d29 ip : "172.24.4.200" logical_port : "lr0-public" mac : "00:00:20:20:12:15" After recreating lr1 and sw1 using a different MAC address, 172.24.4.200 becomes unreachable from sw0 as the MAC_Binding entry never gets updated. reproducer.sh #!/bin/bash for i in $(ovn-sbctl list mac_binding | grep uuid | awk '{print $3}'); do ovn-sbctl destroy mac_binding $i; done ip net del ns1 ip net del ns2 ovs-vsctl del-port ns1 ovs-vsctl del-port ns2 ovn-nbctl lr-del lr0 ovn-nbctl lr-del lr1 ovn-nbctl ls-del sw0 ovn-nbctl ls-del sw1 ovn-nbctl ls-del public chassis_name=`ovn-sbctl find chassis | grep ^name | awk '{print $3}'` ovn-nbctl ls-add sw0 ovn-nbctl lsp-add sw0 sw0-port1 ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 10.0.0.10" ovn-nbctl lr-add lr0 # Connect sw0 to lr0 ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.254/24 ovn-nbctl lsp-add sw0 sw0-lr0 ovn-nbctl lsp-set-type sw0-lr0 router ovn-nbctl lsp-set-addresses sw0-lr0 router ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0 ovn-nbctl ls-add public ovn-nbctl lrp-add lr0 lr0-public 00:00:20:20:12:13 172.24.4.220/24 ovn-nbctl lsp-add public public-lr0 ovn-nbctl lsp-set-type public-lr0 router ovn-nbctl lsp-set-addresses public-lr0 router ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public # localnet port ovn-nbctl lsp-add public ln-public ovn-nbctl lsp-set-type ln-public localnet ovn-nbctl lsp-set-addresses ln-public unknown ovn-nbctl lsp-set-options ln-public network_name=public ovn-nbctl ls-add sw1 ovn-nbctl lsp-add sw1 sw1-port1 ovn-nbctl lsp-set-addresses sw1-port1 "50:57:00:00:00:02 20.0.0.10" ovn-nbctl lr-add lr1 # Connect sw1 to lr1 ovn-nbctl lrp-add lr1 lr1-sw1 00:00:00:00:ff:02 20.0.0.254/24 ovn-nbctl lsp-add sw1 sw1-lr1 ovn-nbctl lsp-set-type sw1-lr1 router ovn-nbctl lsp-set-addresses sw1-lr1 router ovn-nbctl lsp-set-options sw1-lr1 router-port=lr1-sw1 ovn-nbctl lrp-add lr1 lr1-public 00:00:20:20:12:15 172.24.4.221/24 ovn-nbctl lsp-add public public-lr1 ovn-nbctl lsp-set-type public-lr1 router ovn-nbctl lsp-set-addresses public-lr1 router ovn-nbctl lsp-set-options public-lr1 router-port=lr1-public ovn-nbctl lr-nat-add lr0 snat 172.24.4.220 10.0.0.0/24 ovn-nbctl lr-nat-add lr1 snat 172.24.4.221 20.0.0.0/24 # Create the FIPs ovn-nbctl lr-nat-add lr0 dnat_and_snat 172.24.4.100 10.0.0.10 ovn-nbctl lr-nat-add lr1 dnat_and_snat 172.24.4.200 20.0.0.10 # Schedule the gateways ovn-nbctl lrp-set-gateway-chassis lr0-public $chassis_name 20 ovn-nbctl lrp-set-gateway-chassis lr1-public $chassis_name 20 add_phys_port() { name=$1 mac=$2 ip=$3 mask=$4 gw=$5 iface_id=$6 ip netns add $name ovs-vsctl add-port br-int $name -- set interface $name type=internal ip link set $name netns $name ip netns exec $name ip link set $name address $mac ip netns exec $name ip addr add $ip/$mask dev $name ip netns exec $name ip link set $name up ip netns exec $name ip route add default via $gw ovs-vsctl set Interface $name external_ids:iface-id=$iface_id } add_phys_port ns1 50:54:00:00:00:01 10.0.0.10 24 10.0.0.254 sw0-port1 add_phys_port ns2 50:57:00:00:00:02 20.0.0.10 24 20.0.0.254 sw1-port1 # Pinging from sw0 ip net e ns1 ping -c 4 172.24.4.200 ovn-nbctl lr-del lr1 ovn-nbctl ls-del sw1 ovn-nbctl ls-add sw1 ovn-nbctl lsp-add sw1 sw1-port1 ovn-nbctl lsp-set-addresses sw1-port1 "50:57:00:00:00:02 20.0.0.10" ovn-nbctl lr-add lr1 # Connect sw1 to lr1 ovn-nbctl lrp-add lr1 lr1-sw1 00:00:00:00:ff:02 20.0.0.254/24 ovn-nbctl lsp-add sw1 sw1-lr1 ovn-nbctl lsp-set-type sw1-lr1 router ovn-nbctl lsp-set-addresses sw1-lr1 router ovn-nbctl lsp-set-options sw1-lr1 router-port=lr1-sw1 # Change the MAC address of the LRP ovn-nbctl lrp-add lr1 lr1-public 00:00:20:20:12:95 172.24.4.221/24 ovn-nbctl lr-nat-add lr1 snat 172.24.4.221 20.0.0.0/24 ovn-nbctl lr-nat-add lr1 dnat_and_snat 172.24.4.200 20.0.0.10 ovn-nbctl lrp-set-gateway-chassis lr1-public centosl-rdocloud 20 # Pinging from sw0 won't work now. For the outside it will. ip net e ns1 ping -c 4 172.24.4.200 On Wed, Nov 21, 2018 at 9:04 PM Han Zhou <zhou...@gmail.com> wrote: > > > > On Tue, Nov 20, 2018 at 5:21 AM Mark Michelson <mmich...@redhat.com> wrote: > > > > Hi Daniel, > > > > I agree with Numan that this seems like a good approach to take. > > > > On 11/16/2018 12:41 PM, Daniel Alvarez Sanchez wrote: > > > > > > On Sat, Nov 10, 2018 at 12:21 AM Ben Pfaff <b...@ovn.org > > > <mailto:b...@ovn.org>> wrote: > > > > > > > > On Mon, Oct 29, 2018 at 05:21:13PM +0530, Numan Siddique wrote: > > > > > On Mon, Oct 29, 2018 at 5:00 PM Daniel Alvarez Sanchez > > > <dalva...@redhat.com <mailto:dalva...@redhat.com>> > > > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > After digging further. The problem seems to be reduced to reusing > > > an > > > > > > old gateway IP address for a dnat_and_snat entry. > > > > > > When a gateway port is bound to a chassis, its entry will show up > > > in > > > > > > the MAC_Binding table (at least when that Logical Switch is > > > connected > > > > > > to more than one Logical Router). After deleting the Logical Router > > > > > > and all its ports, this entry will remain there. If a new Logical > > > > > > Router is created and a Floating IP (dnat_and_snat) is assigned to > > > a > > > > > > VM with the old gw IP address, it will become unreachable. > > > > > > > > > > > > A workaround now from networking-ovn (OpenStack integration) is to > > > > > > delete MAC_Binding entries for that IP address upon a FIP > > > creation. I > > > > > > think that this however should be done from OVN, what do you folks > > > > > > think? > > > > > > > > > > > > > > > > > Agree. Since the MAC_Binding table row is created by ovn-controller, > > > it > > > > > should > > > > > be handled properly within OVN. > > > > > > > > I see that this has been sitting here for a while. The solution seems > > > > reasonable to me. Are either of you working on it? > > > > > > I started working on it. I came up with a solution (see patch below) > > > which works but I wanted to give you a bit more of context and get your > > > feedback: > > > > > > > > > ^ localnet > > > | > > > +---+---+ > > > | | > > > +------+ pub +------+ > > > | | | | > > > | +-------+ | > > > | 172.24.4.0/24 <http://172.24.4.0/24> | > > > | | > > > 172.24.4.220 | | 172.24.4.221 > > > +---+---+ +---+---+ > > > | | | | > > > | LR0 | | LR1 | > > > | | | | > > > +---+---+ +---+---+ > > > 10.0.0.254 | | 20.0.0.254 > > > | | > > > +---+---+ +---+---+ > > > | | | | > > > 10.0.0.0/24 <http://10.0.0.0/24> | SW0 | | SW1 | > > > 20.0.0.0/24 <http://20.0.0.0/24> > > > | | | | > > > +---+---+ +---+---+ > > > | | > > > | | > > > +---+---+ +---+---+ > > > | | | | > > > | VM0 | | VM1 | > > > | | | | > > > +-------+ +-------+ > > > 10.0.0.10 20.0.0.10 > > > 172.24.4.100 172.24.4.200 > > > > > > > > > When I ping VM1 floating IP from the external network, a new entry for > > > 172.24.4.221 in the LR0 datapath appears in the MAC_Binding table: > > > > > > _uuid : 85e30e87-3c59-423e-8681-ec4cfd9205f9 > > > datapath : ac5984b9-0fea-485f-84d4-031bdeced29b > > > ip : "172.24.4.221" > > > logical_port : "lrp02" > > > mac : "00:00:02:01:02:04" > > > > > > > > > Now, if LR1 gets removed and the old gateway IP (172.24.4.221) is reused > > > for VM2 FIP with different MAC and new gateway IP is created (for > > > example 172.24.4.222 00:00:02:01:02:99), VM2 FIP becomes unreachable > > > from VM1 until the old MAC_Binding entry gets deleted as pinging > > > 172.24.4.221 will use the wrong address ("00:00:02:01:02:04"). > > > > > > With the patch below, removing LR1 results in deleting all MAC_Binding > > > entries for every datapath where '172.24.4.221' appears in the 'ip' > > > column so the problem goes away. > > > > > > Another solution would be implementing some kind of 'aging' for > > > MAC_Binding entries but perhaps it's more complex. > > > Looking forward for your comments :) > > > > > > > > > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c > > > index 58bef7d..a86733e 100644 > > > --- a/ovn/northd/ovn-northd.c > > > +++ b/ovn/northd/ovn-northd.c > > > @@ -2324,6 +2324,18 @@ cleanup_mac_bindings(struct northd_context *ctx, > > > struct hmap *ports) > > > } > > > } > > > > > > +static void > > > +delete_mac_binding_by_ip(struct northd_context *ctx, const char *ip) > > > +{ > > > + const struct sbrec_mac_binding *b, *n; > > > + SBREC_MAC_BINDING_FOR_EACH_SAFE (b, n, ctx->ovnsb_idl) { > > > + if (strstr(ip, b->ip)) { > > > + sbrec_mac_binding_delete(b); > > > + } > > > + } > > > +} > > > + > > > + > > > /* Updates the southbound Port_Binding table so that it contains the > > > logical > > > * switch ports specified by the northbound database. > > > * > > > @@ -2383,6 +2395,15 @@ build_ports(struct northd_context *ctx, > > > /* Delete southbound records without northbound matches. */ > > > LIST_FOR_EACH_SAFE(op, next, list, &sb_only) { > > > ovs_list_remove(&op->list); > > > + > > > + /* Delete all MAC_Binding entries which match the IP addresses > > > of the > > > + * deleted logical router port (ie. port with a peer). */ > > > + const char *peer = smap_get(&op->sb->options, "peer"); > > > + if (peer) { > > > + for (int i = 0; i < op->sb->n_mac; i++) { > > > + delete_mac_binding_by_ip(ctx, op->sb->mac[i]); > > > + } > > > + } > > > sbrec_port_binding_delete(op->sb); > > > ovn_port_destroy(ports, op); > > > } > > > > > Hi, > > Sorry that I didn't notice this discussion until now. I encountered similar > problems before. It was not in floating IP scenario, but for external IPs - > ports on the same networks but not aware by OVN. When IP relocates from one > MAC to another, the previous mac-binding entry will not get updated and > therefore the re-located IP is unreachable. > > This happens for external router IPs on the localnet network behind the > gateways (which hosts the 172.24.4.221 port in Daniel's example). It also > happens for nested workloads that run inside a VM - the VM port is known by > OVN, but the internal workloads (e.g. containers) runs on same subnets but > relies on mac-binding to communicate. > > For both of my use cases, the problem has been solved by this patch (merged): > https://github.com/openvswitch/ovs/commit/b068454082f5d76727ffde34542ff19fed20e178 > > The idea is, mac-binding entry should be updated when the IP is announced in > a new location by GARP/ARP request/ARP response. So I think the best way to > solve the problem for floating IP is similar. We just need to generate GARP > when a new FIP is attached. I was under the impression that OVN already > supports GARP when a new NAT entry is added. But if the problem is still > there it means something is wrong there (or the GARP feature is not there yet > for the NAT case), and I need to check the code. > > For the patch proposed in this discussion, I think there are two problems. > > Firstly, I think it doesn't solve the problem completely. It only deletes > mac-binding when a logical router port is deleted. However, in any of the > above use cases (including FIP), IP relocation can happen without deleting > the router port. Or did I misunderstood anything here? > > Secondly, northd just reconciles between current state and desired state for > SB - it is declarative. We should avoid relying on the northd cleanup logic > to trigger important operations. I think the design principle of northd > should be making sure the desired state is reached, but not care about how is > it reached. For example, it can be reached by deleting extra records one by > one, but it is also correct if it deletes everything and recreate the desired > entries - this is just an example, it may be inefficient, but it may be > reasonable in some scenarios. Adding logic in northd that relies on *how* the > desired state is computed would make it unreliable and hard to maintain. I > think it would also create challenges for the DDlog implementation. > > For the mac-binding aging mechanism mentioned by Daniel, I agree. It is > required for fault scenarios when SB is temporarily down. Since we rely on SB > DB to store the ARP cache/Neighbor table for the virtual routers, if ARP > updates happens when the DB is down, changes are lost. However, the aging > mechanism seems tricky when scale is considered. Only the idle entries should > be timed out, but it is costly to update states whenever a mac-binding entry > is hit. I haven't thought about any clever way to achieve it without > sacrificing scalability. Any thoughts here? A workaround to the problem is to > resend GARP periodically (e.g. every 1 min). > > Thanks, > Han _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss