On Fri, Nov 30, 2018 at 7:29 AM Daniel Alvarez Sanchez
<[email protected] <mailto:[email protected]>> wrote:
>
> Thanks folks again for the discussion.
> I sent an RFC patch here [0]. I tried it out with my reproducer and it
> seems to work well. Instead of outputting the packet to the localnet
> ofport, it will inject it to the public switch pipeline so it'll get
> broadcasted to the rest of the ports resulting in other Logical
> Routers connected to the external switch updating their neighbours. As
> it's broadcasted, the GARP will also be sent out through the localnet
> port as before.
>
> Looking forward to your comments before moving on and writing tests.
>
> Thanks Numan for your help!
>
> [0]
https://mail.openvswitch.org/pipermail/ovs-dev/2018-November/354220.html
> On Wed, Nov 28, 2018 at 3:32 PM Daniel Alvarez Sanchez
> <[email protected] <mailto:[email protected]>> wrote:
> >
> > Hi all,
> >
> > As this thread is getting big I'm summarizing the issue I see so far:
> >
> > * When a dnat_and_snat entry is added to a logical router (or port
> > gets bound to a chassis), ovn-controller will send GARPs to announce
> > the MAC address of the FIP(s) (either the gw port or of the actual FIP
> > MAC address if distributed) only through localnet ports [0].
> >
> > * This means that gateway ports bound to that same chassis and
> > connected to the public switch won't get the GARPs, so they won't
> > update their MAC_Binding entries causing unreachability. In the
> > diagram of this thread, LR0 won't get the GARP sent by ovn-controller
> > if both gateway ports are bound to the same chassis.
> >
> > I tried out sending GARPs from the external network using master
> > branch and MAC_Binding entries get updated. However, in order to cover
> > missing cases, I think it would make sense to send the GARPs not only
> > to localnet ports but to all ports of those logical switches that have
> > a localnet port. What do you think?
> >
> > [0]
https://github.com/openvswitch/ovs/blob/master/ovn/controller/pinctrl.c#L2073
> >
> > [0]
https://github.com/openvswitch/ovs/blob/master/ovn/controller/pinctrl.c#L2073On
> > Fri, Nov 23, 2018 at 5:28 PM Daniel Alvarez Sanchez
> > <[email protected] <mailto:[email protected]>> wrote:
> > >
> > > On Wed, Nov 21, 2018 at 9:04 PM Han Zhou <[email protected]
<mailto:[email protected]>> wrote:
> > > >
> > > >
> > > >
> > > > On Tue, Nov 20, 2018 at 5:21 AM Mark Michelson
<[email protected] <mailto:[email protected]>> wrote:
> > > > >
> > > > > Hi Daniel,
> > > > >
> > > > > I agree with Numan that this seems like a good approach to take.
> > > > >
> > > > > On 11/16/2018 12:41 PM, Daniel Alvarez Sanchez wrote:
> > > > > >
> > > > > > On Sat, Nov 10, 2018 at 12:21 AM Ben Pfaff <[email protected]
<mailto:[email protected]>
> > > > > > <mailto:[email protected] <mailto:[email protected]>>> wrote:
> > > > > > >
> > > > > > > On Mon, Oct 29, 2018 at 05:21:13PM +0530, Numan Siddique
wrote:
> > > > > > > > On Mon, Oct 29, 2018 at 5:00 PM Daniel Alvarez Sanchez
> > > > > > <[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > After digging further. The problem seems to be
reduced to reusing an
> > > > > > > > > old gateway IP address for a dnat_and_snat entry.
> > > > > > > > > When a gateway port is bound to a chassis, its entry
will show up in
> > > > > > > > > the MAC_Binding table (at least when that Logical
Switch is connected
> > > > > > > > > to more than one Logical Router). After deleting the
Logical Router
> > > > > > > > > and all its ports, this entry will remain there. If
a new Logical
> > > > > > > > > Router is created and a Floating IP (dnat_and_snat)
is assigned to a
> > > > > > > > > VM with the old gw IP address, it will become
unreachable.
> > > > > > > > >
> > > > > > > > > A workaround now from networking-ovn (OpenStack
integration) is to
> > > > > > > > > delete MAC_Binding entries for that IP address upon
a FIP creation. I
> > > > > > > > > think that this however should be done from OVN,
what do you folks
> > > > > > > > > think?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > Agree. Since the MAC_Binding table row is created by
ovn-controller, it
> > > > > > > > should
> > > > > > > > be handled properly within OVN.
> > > > > > >
> > > > > > > I see that this has been sitting here for a while. The
solution seems
> > > > > > > reasonable to me. Are either of you working on it?
> > > > > >
> > > > > > I started working on it. I came up with a solution (see
patch below)
> > > > > > which works but I wanted to give you a bit more of context
and get your
> > > > > > feedback:
> > > > > >
> > > > > >
> > > > > > ^ localnet
> > > > > > |
> > > > > > +---+---+
> > > > > > | |
> > > > > > +------+ pub +------+
> > > > > > | | | |
> > > > > > | +-------+ |
> > > > > > | 172.24.4.0/24 <http://172.24.4.0/24>
<http://172.24.4.0/24> |
> > > > > > | |
> > > > > > 172.24.4.220 | | 172.24.4.221
> > > > > > +---+---+ +---+---+
> > > > > > | | | |
> > > > > > | LR0 | | LR1 |
> > > > > > | | | |
> > > > > > +---+---+ +---+---+
> > > > > > 10.0.0.254 | | 20.0.0.254
> > > > > > | |
> > > > > > +---+---+ +---+---+
> > > > > > | | | |
> > > > > > 10.0.0.0/24 <http://10.0.0.0/24> <http://10.0.0.0/24> |
SW0 | | SW1 |
> > > > > > 20.0.0.0/24 <http://20.0.0.0/24> <http://20.0.0.0/24>
> > > > > > | | | |
> > > > > > +---+---+ +---+---+
> > > > > > | |
> > > > > > | |
> > > > > > +---+---+ +---+---+
> > > > > > | | | |
> > > > > > | VM0 | | VM1 |
> > > > > > | | | |
> > > > > > +-------+ +-------+
> > > > > > 10.0.0.10 20.0.0.10
> > > > > > 172.24.4.100 172.24.4.200
> > > > > >
> > > > > >
> > > > > > When I ping VM1 floating IP from the external network, a
new entry for
> > > > > > 172.24.4.221 in the LR0 datapath appears in the MAC_Binding
table:
> > > > > >
> > > > > > _uuid : 85e30e87-3c59-423e-8681-ec4cfd9205f9
> > > > > > datapath : ac5984b9-0fea-485f-84d4-031bdeced29b
> > > > > > ip : "172.24.4.221"
> > > > > > logical_port : "lrp02"
> > > > > > mac : "00:00:02:01:02:04"
> > > > > >
> > > > > >
> > > > > > Now, if LR1 gets removed and the old gateway IP
(172.24.4.221) is reused
> > > > > > for VM2 FIP with different MAC and new gateway IP is
created (for
> > > > > > example 172.24.4.222 00:00:02:01:02:99), VM2 FIP becomes
unreachable
> > > > > > from VM1 until the old MAC_Binding entry gets deleted as
pinging
> > > > > > 172.24.4.221 will use the wrong address ("00:00:02:01:02:04").
> > > > > >
> > > > > > With the patch below, removing LR1 results in deleting all
MAC_Binding
> > > > > > entries for every datapath where '172.24.4.221' appears in
the 'ip'
> > > > > > column so the problem goes away.
> > > > > >
> > > > > > Another solution would be implementing some kind of 'aging' for
> > > > > > MAC_Binding entries but perhaps it's more complex.
> > > > > > Looking forward for your comments :)
> > > > > >
> > > > > >
> > > > > > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> > > > > > index 58bef7d..a86733e 100644
> > > > > > --- a/ovn/northd/ovn-northd.c
> > > > > > +++ b/ovn/northd/ovn-northd.c
> > > > > > @@ -2324,6 +2324,18 @@ cleanup_mac_bindings(struct
northd_context *ctx,
> > > > > > struct hmap *ports)
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > +static void
> > > > > > +delete_mac_binding_by_ip(struct northd_context *ctx, const
char *ip)
> > > > > > +{
> > > > > > + const struct sbrec_mac_binding *b, *n;
> > > > > > + SBREC_MAC_BINDING_FOR_EACH_SAFE (b, n, ctx->ovnsb_idl) {
> > > > > > + if (strstr(ip, b->ip)) {
> > > > > > + sbrec_mac_binding_delete(b);
> > > > > > + }
> > > > > > + }
> > > > > > +}
> > > > > > +
> > > > > > +
> > > > > > /* Updates the southbound Port_Binding table so that it
contains the
> > > > > > logical
> > > > > > * switch ports specified by the northbound database.
> > > > > > *
> > > > > > @@ -2383,6 +2395,15 @@ build_ports(struct northd_context *ctx,
> > > > > > /* Delete southbound records without northbound
matches. */
> > > > > > LIST_FOR_EACH_SAFE(op, next, list, &sb_only) {
> > > > > > ovs_list_remove(&op->list);
> > > > > > +
> > > > > > + /* Delete all MAC_Binding entries which match the
IP addresses
> > > > > > of the
> > > > > > + * deleted logical router port (ie. port with a
peer). */
> > > > > > + const char *peer = smap_get(&op->sb->options, "peer");
> > > > > > + if (peer) {
> > > > > > + for (int i = 0; i < op->sb->n_mac; i++) {
> > > > > > + delete_mac_binding_by_ip(ctx, op->sb->mac[i]);
> > > > > > + }
> > > > > > + }
> > > > > > sbrec_port_binding_delete(op->sb);
> > > > > > ovn_port_destroy(ports, op);
> > > > > > }
> > > > > >
> > > >
> > > > Hi,
> > > >
> > > > Sorry that I didn't notice this discussion until now. I
encountered similar problems before. It was not in floating IP scenario,
but for external IPs - ports on the same networks but not aware by OVN.
When IP relocates from one MAC to another, the previous mac-binding
entry will not get updated and therefore the re-located IP is unreachable.
> > > >
> > > > This happens for external router IPs on the localnet network
behind the gateways (which hosts the 172.24.4.221 port in Daniel's
example). It also happens for nested workloads that run inside a VM -
the VM port is known by OVN, but the internal workloads (e.g.
containers) runs on same subnets but relies on mac-binding to communicate.
> > > >
> > > > For both of my use cases, the problem has been solved by this
patch (merged):
https://github.com/openvswitch/ovs/commit/b068454082f5d76727ffde34542ff19fed20e178
> > > >
> > > > The idea is, mac-binding entry should be updated when the IP is
announced in a new location by GARP/ARP request/ARP response. So I think
the best way to solve the problem for floating IP is similar. We just
need to generate GARP when a new FIP is attached. I was under the
impression that OVN already supports GARP when a new NAT entry is added.
But if the problem is still there it means something is wrong there (or
the GARP feature is not there yet for the NAT case), and I need to check
the code.
> > >
> > > I think we're only sending the GARPs only for distributed
floating IPs
> > > (nat_addresses field in the Port_Binding table) [0].
> > > Anyways even with that, I'm not quite sure if the MAC_Binding table
> > > would get updated as I think that first time I hit this issue it was
> > > on a DVR environment (ie. distributed FIPs, dnat_and_snat entries
with
> > > a logical_port and external_mac).
> > >
> > > [0]
https://github.com/openvswitch/ovs/blob/master/ovn/controller/pinctrl.c#L2497
> > > >
> > > > For the patch proposed in this discussion, I think there are
two problems.
> > > >
> > > > Firstly, I think it doesn't solve the problem completely. It
only deletes mac-binding when a logical router port is deleted. However,
in any of the above use cases (including FIP), IP relocation can happen
without deleting the router port. Or did I misunderstood anything here?
> > > >
> > > > Secondly, northd just reconciles between current state and
desired state for SB - it is declarative. We should avoid relying on the
northd cleanup logic to trigger important operations. I think the design
principle of northd should be making sure the desired state is reached,
but not care about how is it reached. For example, it can be reached by
deleting extra records one by one, but it is also correct if it deletes
everything and recreate the desired entries - this is just an example,
it may be inefficient, but it may be reasonable in some scenarios.
Adding logic in northd that relies on *how* the desired state is
computed would make it unreliable and hard to maintain. I think it would
also create challenges for the DDlog implementation.
> > > >
> > > > For the mac-binding aging mechanism mentioned by Daniel, I
agree. It is required for fault scenarios when SB is temporarily down.
Since we rely on SB DB to store the ARP cache/Neighbor table for the
virtual routers, if ARP updates happens when the DB is down, changes are
lost. However, the aging mechanism seems tricky when scale is
considered. Only the idle entries should be timed out, but it is costly
to update states whenever a mac-binding entry is hit. I haven't thought
about any clever way to achieve it without sacrificing scalability. Any
thoughts here? A workaround to the problem is to resend GARP
periodically (e.g. every 1 min).
> > > >
> > > > Thanks,
> > > > Han
Hi Daniel,
Thanks for digging it out! Now the root cause is well explained, and the
RFC patch looks good to me. It is a simple and correct fix. However, I'd
like to bring up an alternative for discussion.