Hi Han,

Yes, I agree that the patch is not enough. I'll take a look at the
GARP thing because it's either not implemented or not working. Here's
a reproducer while I jump back into it.

When you ping 172.24.4.200 from the namespace 1 the first time, a
MAC_Binding entry gets created:

# ovn-sbctl list mac_binding | grep 200 -C2
_uuid               : 07967416-c89c-4233-8cc2-4dc929720838
datapath            : 918a9363-fa6e-4086-98ee-8d073b924d29
ip                  : "172.24.4.200"
logical_port        : "lr0-public"
mac                 : "00:00:20:20:12:15"


After recreating lr1 and sw1 using a different MAC address,
172.24.4.200 becomes unreachable from sw0 as the MAC_Binding entry
never gets updated.


reproducer.sh

#!/bin/bash
for i in $(ovn-sbctl list mac_binding | grep uuid  | awk '{print
$3}'); do ovn-sbctl destroy mac_binding $i; done

ip net del ns1
ip net del ns2
ovs-vsctl del-port ns1
ovs-vsctl del-port ns2
ovn-nbctl lr-del lr0
ovn-nbctl lr-del lr1
ovn-nbctl ls-del sw0
ovn-nbctl ls-del sw1
ovn-nbctl ls-del public

chassis_name=`ovn-sbctl find chassis | grep ^name | awk '{print $3}'`
ovn-nbctl ls-add sw0
ovn-nbctl lsp-add sw0 sw0-port1
ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 10.0.0.10"


ovn-nbctl lr-add lr0
# Connect sw0 to lr0
ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.254/24
ovn-nbctl lsp-add sw0 sw0-lr0
ovn-nbctl lsp-set-type sw0-lr0 router
ovn-nbctl lsp-set-addresses sw0-lr0 router
ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0


ovn-nbctl ls-add public
ovn-nbctl lrp-add lr0  lr0-public 00:00:20:20:12:13 172.24.4.220/24
ovn-nbctl lsp-add public public-lr0
ovn-nbctl lsp-set-type public-lr0 router
ovn-nbctl lsp-set-addresses public-lr0 router
ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public

# localnet port
ovn-nbctl lsp-add public ln-public
ovn-nbctl lsp-set-type ln-public localnet
ovn-nbctl lsp-set-addresses ln-public unknown
ovn-nbctl lsp-set-options ln-public network_name=public

ovn-nbctl ls-add sw1
ovn-nbctl lsp-add sw1 sw1-port1
ovn-nbctl lsp-set-addresses sw1-port1 "50:57:00:00:00:02 20.0.0.10"

ovn-nbctl lr-add lr1
# Connect sw1 to lr1
ovn-nbctl lrp-add lr1 lr1-sw1 00:00:00:00:ff:02 20.0.0.254/24
ovn-nbctl lsp-add sw1 sw1-lr1
ovn-nbctl lsp-set-type sw1-lr1 router
ovn-nbctl lsp-set-addresses sw1-lr1 router
ovn-nbctl lsp-set-options sw1-lr1 router-port=lr1-sw1

ovn-nbctl lrp-add lr1  lr1-public 00:00:20:20:12:15 172.24.4.221/24
ovn-nbctl lsp-add public public-lr1
ovn-nbctl lsp-set-type public-lr1 router
ovn-nbctl lsp-set-addresses public-lr1 router
ovn-nbctl lsp-set-options public-lr1 router-port=lr1-public


ovn-nbctl lr-nat-add lr0 snat 172.24.4.220 10.0.0.0/24
ovn-nbctl lr-nat-add lr1 snat 172.24.4.221  20.0.0.0/24

# Create the FIPs
ovn-nbctl lr-nat-add lr0 dnat_and_snat 172.24.4.100 10.0.0.10
ovn-nbctl lr-nat-add lr1 dnat_and_snat 172.24.4.200 20.0.0.10

# Schedule the gateways
ovn-nbctl lrp-set-gateway-chassis lr0-public $chassis_name 20
ovn-nbctl lrp-set-gateway-chassis lr1-public $chassis_name  20


add_phys_port() {
    name=$1
    mac=$2
    ip=$3
    mask=$4
    gw=$5
    iface_id=$6
    ip netns add $name
    ovs-vsctl add-port br-int $name -- set interface $name type=internal
    ip link set $name netns $name
    ip netns exec $name ip link set $name address $mac
    ip netns exec $name ip addr add $ip/$mask dev $name
    ip netns exec $name ip link set $name up
    ip netns exec $name ip route add default via $gw
    ovs-vsctl set Interface $name external_ids:iface-id=$iface_id
}


add_phys_port ns1 50:54:00:00:00:01 10.0.0.10  24 10.0.0.254 sw0-port1
add_phys_port ns2 50:57:00:00:00:02 20.0.0.10  24 20.0.0.254 sw1-port1

# Pinging from sw0
ip net e ns1 ping -c 4 172.24.4.200

ovn-nbctl lr-del lr1
ovn-nbctl ls-del sw1

ovn-nbctl ls-add sw1
ovn-nbctl lsp-add sw1 sw1-port1
ovn-nbctl lsp-set-addresses sw1-port1 "50:57:00:00:00:02 20.0.0.10"

ovn-nbctl lr-add lr1
# Connect sw1 to lr1
ovn-nbctl lrp-add lr1 lr1-sw1 00:00:00:00:ff:02 20.0.0.254/24
ovn-nbctl lsp-add sw1 sw1-lr1
ovn-nbctl lsp-set-type sw1-lr1 router
ovn-nbctl lsp-set-addresses sw1-lr1 router
ovn-nbctl lsp-set-options sw1-lr1 router-port=lr1-sw1


# Change the MAC address of the LRP
ovn-nbctl lrp-add lr1  lr1-public 00:00:20:20:12:95 172.24.4.221/24

ovn-nbctl lr-nat-add lr1 snat 172.24.4.221  20.0.0.0/24
ovn-nbctl lr-nat-add lr1 dnat_and_snat 172.24.4.200 20.0.0.10

ovn-nbctl lrp-set-gateway-chassis lr1-public centosl-rdocloud 20

# Pinging from sw0 won't work now. For the outside it will.
ip net e ns1 ping -c 4 172.24.4.200
On Wed, Nov 21, 2018 at 9:04 PM Han Zhou <zhou...@gmail.com> wrote:
>
>
>
> On Tue, Nov 20, 2018 at 5:21 AM Mark Michelson <mmich...@redhat.com> wrote:
> >
> > Hi Daniel,
> >
> > I agree with Numan that this seems like a good approach to take.
> >
> > On 11/16/2018 12:41 PM, Daniel Alvarez Sanchez wrote:
> > >
> > > On Sat, Nov 10, 2018 at 12:21 AM Ben Pfaff <b...@ovn.org
> > > <mailto:b...@ovn.org>> wrote:
> > >  >
> > >  > On Mon, Oct 29, 2018 at 05:21:13PM +0530, Numan Siddique wrote:
> > >  > > On Mon, Oct 29, 2018 at 5:00 PM Daniel Alvarez Sanchez
> > > <dalva...@redhat.com <mailto:dalva...@redhat.com>>
> > >  > > wrote:
> > >  > >
> > >  > > > Hi,
> > >  > > >
> > >  > > > After digging further. The problem seems to be reduced to reusing 
> > > an
> > >  > > > old gateway IP address for a dnat_and_snat entry.
> > >  > > > When a gateway port is bound to a chassis, its entry will show up 
> > > in
> > >  > > > the MAC_Binding table (at least when that Logical Switch is 
> > > connected
> > >  > > > to more than one Logical Router). After deleting the Logical Router
> > >  > > > and all its ports, this entry will remain there. If a new Logical
> > >  > > > Router is created and a Floating IP (dnat_and_snat) is assigned to 
> > > a
> > >  > > > VM with the old gw IP address, it will become unreachable.
> > >  > > >
> > >  > > > A workaround now from networking-ovn (OpenStack integration) is to
> > >  > > > delete MAC_Binding entries for that IP address upon a FIP 
> > > creation. I
> > >  > > > think that this however should be done from OVN, what do you folks
> > >  > > > think?
> > >  > > >
> > >  > > >
> > >  > > Agree. Since the MAC_Binding table row is created by ovn-controller, 
> > > it
> > >  > > should
> > >  > > be handled properly within OVN.
> > >  >
> > >  > I see that this has been sitting here for a while.  The solution seems
> > >  > reasonable to me.  Are either of you working on it?
> > >
> > > I started working on it. I came up with a solution (see patch below)
> > > which works but I wanted to give you a bit more of context and get your
> > > feedback:
> > >
> > >
> > >                             ^ localnet
> > >                             |
> > >                         +---+---+
> > >                         |       |
> > >                  +------+  pub  +------+
> > >                  |      |       |      |
> > >                  |      +-------+      |
> > >                  | 172.24.4.0/24 <http://172.24.4.0/24>    |
> > >                  |                     |
> > >     172.24.4.220 |                     | 172.24.4.221
> > >              +---+---+             +---+---+
> > >              |       |             |       |
> > >              |  LR0  |             |  LR1  |
> > >              |       |             |       |
> > >              +---+---+             +---+---+
> > >       10.0.0.254 |                     | 20.0.0.254
> > >                  |                     |
> > >              +---+---+             +---+---+
> > >              |       |             |       |
> > > 10.0.0.0/24 <http://10.0.0.0/24> |  SW0  |             |  SW1  |
> > > 20.0.0.0/24 <http://20.0.0.0/24>
> > >              |       |             |       |
> > >              +---+---+             +---+---+
> > >                  |                     |
> > >                  |                     |
> > >              +---+---+             +---+---+
> > >              |       |             |       |
> > >              |  VM0  |             |  VM1  |
> > >              |       |             |       |
> > >              +-------+             +-------+
> > >              10.0.0.10             20.0.0.10
> > >            172.24.4.100           172.24.4.200
> > >
> > >
> > > When I ping VM1 floating IP from the external network, a new entry for
> > > 172.24.4.221 in the LR0 datapath appears in the MAC_Binding table:
> > >
> > > _uuid               : 85e30e87-3c59-423e-8681-ec4cfd9205f9
> > > datapath            : ac5984b9-0fea-485f-84d4-031bdeced29b
> > > ip                  : "172.24.4.221"
> > > logical_port        : "lrp02"
> > > mac                 : "00:00:02:01:02:04"
> > >
> > >
> > > Now, if LR1 gets removed and the old gateway IP (172.24.4.221) is reused
> > > for VM2 FIP with different MAC and new gateway IP is created (for
> > > example 172.24.4.222 00:00:02:01:02:99),  VM2 FIP becomes unreachable
> > > from VM1 until the old MAC_Binding entry gets deleted as pinging
> > > 172.24.4.221 will use the wrong address ("00:00:02:01:02:04").
> > >
> > > With the patch below, removing LR1 results in deleting all MAC_Binding
> > > entries for every datapath where '172.24.4.221' appears in the 'ip'
> > > column so the problem goes away.
> > >
> > > Another solution would be implementing some kind of 'aging' for
> > > MAC_Binding entries but perhaps it's more complex.
> > > Looking forward for your comments :)
> > >
> > >
> > > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> > > index 58bef7d..a86733e 100644
> > > --- a/ovn/northd/ovn-northd.c
> > > +++ b/ovn/northd/ovn-northd.c
> > > @@ -2324,6 +2324,18 @@ cleanup_mac_bindings(struct northd_context *ctx,
> > > struct hmap *ports)
> > >       }
> > >   }
> > >
> > > +static void
> > > +delete_mac_binding_by_ip(struct northd_context *ctx, const char *ip)
> > > +{
> > > +    const struct sbrec_mac_binding *b, *n;
> > > +    SBREC_MAC_BINDING_FOR_EACH_SAFE (b, n, ctx->ovnsb_idl) {
> > > +        if (strstr(ip, b->ip)) {
> > > +            sbrec_mac_binding_delete(b);
> > > +        }
> > > +    }
> > > +}
> > > +
> > > +
> > >   /* Updates the southbound Port_Binding table so that it contains the
> > > logical
> > >    * switch ports specified by the northbound database.
> > >    *
> > > @@ -2383,6 +2395,15 @@ build_ports(struct northd_context *ctx,
> > >       /* Delete southbound records without northbound matches. */
> > >       LIST_FOR_EACH_SAFE(op, next, list, &sb_only) {
> > >           ovs_list_remove(&op->list);
> > > +
> > > +        /* Delete all MAC_Binding entries which match the IP addresses
> > > of the
> > > +         * deleted logical router port (ie. port with a peer). */
> > > +        const char *peer = smap_get(&op->sb->options, "peer");
> > > +        if (peer) {
> > > +            for (int i = 0; i < op->sb->n_mac; i++) {
> > > +                delete_mac_binding_by_ip(ctx, op->sb->mac[i]);
> > > +            }
> > > +        }
> > >           sbrec_port_binding_delete(op->sb);
> > >           ovn_port_destroy(ports, op);
> > >       }
> > >
>
> Hi,
>
> Sorry that I didn't notice this discussion until now. I encountered similar 
> problems before. It was not in floating IP scenario, but for external IPs - 
> ports on the same networks but not aware by OVN. When IP relocates from one 
> MAC to another, the previous mac-binding entry will not get updated and 
> therefore the re-located IP is unreachable.
>
> This happens for external router IPs on the localnet network behind the 
> gateways (which hosts the 172.24.4.221 port in Daniel's example). It also 
> happens for nested workloads that run inside a VM - the VM port is known by 
> OVN, but the internal workloads (e.g. containers) runs on same subnets but 
> relies on mac-binding to communicate.
>
> For both of my use cases, the problem has been solved by this patch (merged): 
> https://github.com/openvswitch/ovs/commit/b068454082f5d76727ffde34542ff19fed20e178
>
> The idea is, mac-binding entry should be updated when the IP is announced in 
> a new location by GARP/ARP request/ARP response. So I think the best way to 
> solve the problem for floating IP is similar. We just need to generate GARP 
> when a new FIP is attached. I was under the impression that OVN already 
> supports GARP when a new NAT entry is added. But if the problem is still 
> there it means something is wrong there (or the GARP feature is not there yet 
> for the NAT case), and I need to check the code.
>
> For the patch proposed in this discussion, I think there are two problems.
>
> Firstly, I think it doesn't solve the problem completely. It only deletes 
> mac-binding when a logical router port is deleted. However, in any of the 
> above use cases (including FIP), IP relocation can happen without deleting 
> the router port. Or did I misunderstood anything here?
>
> Secondly, northd just reconciles between current state and desired state for 
> SB - it is declarative. We should avoid relying on the northd cleanup logic 
> to trigger important operations. I think the design principle of northd 
> should be making sure the desired state is reached, but not care about how is 
> it reached. For example, it can be reached by deleting extra records one by 
> one, but it is also correct if it deletes everything and recreate the desired 
> entries - this is just an example, it may be inefficient, but it may be 
> reasonable in some scenarios. Adding logic in northd that relies on *how* the 
> desired state is computed would make it unreliable and hard to maintain. I 
> think it would also create challenges for the DDlog implementation.
>
> For the mac-binding aging mechanism mentioned by Daniel, I agree. It is 
> required for fault scenarios when SB is temporarily down. Since we rely on SB 
> DB to store the ARP cache/Neighbor table for the virtual routers, if ARP 
> updates happens when the DB is down, changes are lost. However, the aging 
> mechanism seems tricky when scale is considered. Only the idle entries should 
> be timed out, but it is costly to update states whenever a mac-binding entry 
> is hit. I haven't thought about any clever way to achieve it without 
> sacrificing scalability. Any thoughts here? A workaround to the problem is to 
> resend GARP periodically (e.g. every 1 min).
>
> Thanks,
> Han
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to