On Mon, Dec 3, 2018 at 3:48 PM Mark Michelson <mmich...@redhat.com> wrote:
>
> On 12/01/2018 03:44 PM, Han Zhou wrote:
> >
> >
> > On Fri, Nov 30, 2018 at 7:29 AM Daniel Alvarez Sanchez
> > <dalva...@redhat.com <mailto:dalva...@redhat.com>> wrote:
> >  >
> >  > Thanks folks again for the discussion.
> >  > I sent an RFC patch here [0]. I tried it out with my reproducer and it
> >  > seems to work well. Instead of outputting the packet to the localnet
> >  > ofport, it will inject it to the public switch pipeline so it'll get
> >  > broadcasted to the rest of the ports resulting in other Logical
> >  > Routers connected to the external switch updating their neighbours. As
> >  > it's broadcasted, the GARP will also be sent out through the localnet
> >  > port as before.
> >  >
> >  > Looking forward to your comments before moving on and writing tests.
> >  >
> >  > Thanks Numan for your help!
> >  >
> >  > [0]
> > https://mail.openvswitch.org/pipermail/ovs-dev/2018-November/354220.html
> >  > On Wed, Nov 28, 2018 at 3:32 PM Daniel Alvarez Sanchez
> >  > <dalva...@redhat.com <mailto:dalva...@redhat.com>> wrote:
> >  > >
> >  > > Hi all,
> >  > >
> >  > > As this thread is getting big I'm summarizing the issue I see so far:
> >  > >
> >  > > * When a dnat_and_snat entry is added to a logical router (or port
> >  > > gets bound to a chassis), ovn-controller will send GARPs to announce
> >  > > the MAC address of the FIP(s) (either the gw port or of the actual FIP
> >  > > MAC address if distributed) only through localnet ports [0].
> >  > >
> >  > > * This means that gateway ports bound to that same chassis and
> >  > > connected to the public switch won't get the GARPs, so they won't
> >  > > update their MAC_Binding entries causing unreachability. In the
> >  > > diagram of this thread, LR0 won't get the GARP sent by ovn-controller
> >  > > if both gateway ports are bound to the same chassis.
> >  > >
> >  > > I tried out sending GARPs from the external network using master
> >  > > branch and MAC_Binding entries get updated. However, in order to cover
> >  > > missing cases, I think it would make sense to send the GARPs not only
> >  > > to localnet ports but to all ports of those logical switches that have
> >  > > a localnet port. What do you think?
> >  > >
> >  > > [0]
> > https://github.com/openvswitch/ovs/blob/master/ovn/controller/pinctrl.c#L2073
> >  > >
> >  > > [0]
> > https://github.com/openvswitch/ovs/blob/master/ovn/controller/pinctrl.c#L2073On
> >  > > Fri, Nov 23, 2018 at 5:28 PM Daniel Alvarez Sanchez
> >  > > <dalva...@redhat.com <mailto:dalva...@redhat.com>> wrote:
> >  > > >
> >  > > > On Wed, Nov 21, 2018 at 9:04 PM Han Zhou <zhou...@gmail.com
> > <mailto:zhou...@gmail.com>> wrote:
> >  > > > >
> >  > > > >
> >  > > > >
> >  > > > > On Tue, Nov 20, 2018 at 5:21 AM Mark Michelson
> > <mmich...@redhat.com <mailto:mmich...@redhat.com>> wrote:
> >  > > > > >
> >  > > > > > Hi Daniel,
> >  > > > > >
> >  > > > > > I agree with Numan that this seems like a good approach to take.
> >  > > > > >
> >  > > > > > On 11/16/2018 12:41 PM, Daniel Alvarez Sanchez wrote:
> >  > > > > > >
> >  > > > > > > On Sat, Nov 10, 2018 at 12:21 AM Ben Pfaff <b...@ovn.org
> > <mailto:b...@ovn.org>
> >  > > > > > > <mailto:b...@ovn.org <mailto:b...@ovn.org>>> wrote:
> >  > > > > > >  >
> >  > > > > > >  > On Mon, Oct 29, 2018 at 05:21:13PM +0530, Numan Siddique
> > wrote:
> >  > > > > > >  > > On Mon, Oct 29, 2018 at 5:00 PM Daniel Alvarez Sanchez
> >  > > > > > > <dalva...@redhat.com <mailto:dalva...@redhat.com>
> > <mailto:dalva...@redhat.com <mailto:dalva...@redhat.com>>>
> >  > > > > > >  > > wrote:
> >  > > > > > >  > >
> >  > > > > > >  > > > Hi,
> >  > > > > > >  > > >
> >  > > > > > >  > > > After digging further. The problem seems to be
> > reduced to reusing an
> >  > > > > > >  > > > old gateway IP address for a dnat_and_snat entry.
> >  > > > > > >  > > > When a gateway port is bound to a chassis, its entry
> > will show up in
> >  > > > > > >  > > > the MAC_Binding table (at least when that Logical
> > Switch is connected
> >  > > > > > >  > > > to more than one Logical Router). After deleting the
> > Logical Router
> >  > > > > > >  > > > and all its ports, this entry will remain there. If
> > a new Logical
> >  > > > > > >  > > > Router is created and a Floating IP (dnat_and_snat)
> > is assigned to a
> >  > > > > > >  > > > VM with the old gw IP address, it will become
> > unreachable.
> >  > > > > > >  > > >
> >  > > > > > >  > > > A workaround now from networking-ovn (OpenStack
> > integration) is to
> >  > > > > > >  > > > delete MAC_Binding entries for that IP address upon
> > a FIP creation. I
> >  > > > > > >  > > > think that this however should be done from OVN,
> > what do you folks
> >  > > > > > >  > > > think?
> >  > > > > > >  > > >
> >  > > > > > >  > > >
> >  > > > > > >  > > Agree. Since the MAC_Binding table row is created by
> > ovn-controller, it
> >  > > > > > >  > > should
> >  > > > > > >  > > be handled properly within OVN.
> >  > > > > > >  >
> >  > > > > > >  > I see that this has been sitting here for a while.  The
> > solution seems
> >  > > > > > >  > reasonable to me.  Are either of you working on it?
> >  > > > > > >
> >  > > > > > > I started working on it. I came up with a solution (see
> > patch below)
> >  > > > > > > which works but I wanted to give you a bit more of context
> > and get your
> >  > > > > > > feedback:
> >  > > > > > >
> >  > > > > > >
> >  > > > > > >                             ^ localnet
> >  > > > > > >                             |
> >  > > > > > >                         +---+---+
> >  > > > > > >                         |       |
> >  > > > > > >                  +------+  pub  +------+
> >  > > > > > >                  |      |       |      |
> >  > > > > > >                  |      +-------+      |
> >  > > > > > >                  | 172.24.4.0/24 <http://172.24.4.0/24>
> > <http://172.24.4.0/24>    |
> >  > > > > > >                  |                     |
> >  > > > > > >     172.24.4.220 |                     | 172.24.4.221
> >  > > > > > >              +---+---+             +---+---+
> >  > > > > > >              |       |             |       |
> >  > > > > > >              |  LR0  |             |  LR1  |
> >  > > > > > >              |       |             |       |
> >  > > > > > >              +---+---+             +---+---+
> >  > > > > > >       10.0.0.254 |                     | 20.0.0.254
> >  > > > > > >                  |                     |
> >  > > > > > >              +---+---+             +---+---+
> >  > > > > > >              |       |             |       |
> >  > > > > > > 10.0.0.0/24 <http://10.0.0.0/24> <http://10.0.0.0/24> |
> >   SW0  |             |  SW1  |
> >  > > > > > > 20.0.0.0/24 <http://20.0.0.0/24> <http://20.0.0.0/24>
> >  > > > > > >              |       |             |       |
> >  > > > > > >              +---+---+             +---+---+
> >  > > > > > >                  |                     |
> >  > > > > > >                  |                     |
> >  > > > > > >              +---+---+             +---+---+
> >  > > > > > >              |       |             |       |
> >  > > > > > >              |  VM0  |             |  VM1  |
> >  > > > > > >              |       |             |       |
> >  > > > > > >              +-------+             +-------+
> >  > > > > > >              10.0.0.10             20.0.0.10
> >  > > > > > >            172.24.4.100           172.24.4.200
> >  > > > > > >
> >  > > > > > >
> >  > > > > > > When I ping VM1 floating IP from the external network, a
> > new entry for
> >  > > > > > > 172.24.4.221 in the LR0 datapath appears in the MAC_Binding
> > table:
> >  > > > > > >
> >  > > > > > > _uuid               : 85e30e87-3c59-423e-8681-ec4cfd9205f9
> >  > > > > > > datapath            : ac5984b9-0fea-485f-84d4-031bdeced29b
> >  > > > > > > ip                  : "172.24.4.221"
> >  > > > > > > logical_port        : "lrp02"
> >  > > > > > > mac                 : "00:00:02:01:02:04"
> >  > > > > > >
> >  > > > > > >
> >  > > > > > > Now, if LR1 gets removed and the old gateway IP
> > (172.24.4.221) is reused
> >  > > > > > > for VM2 FIP with different MAC and new gateway IP is
> > created (for
> >  > > > > > > example 172.24.4.222 00:00:02:01:02:99),  VM2 FIP becomes
> > unreachable
> >  > > > > > > from VM1 until the old MAC_Binding entry gets deleted as
> > pinging
> >  > > > > > > 172.24.4.221 will use the wrong address ("00:00:02:01:02:04").
> >  > > > > > >
> >  > > > > > > With the patch below, removing LR1 results in deleting all
> > MAC_Binding
> >  > > > > > > entries for every datapath where '172.24.4.221' appears in
> > the 'ip'
> >  > > > > > > column so the problem goes away.
> >  > > > > > >
> >  > > > > > > Another solution would be implementing some kind of 'aging' for
> >  > > > > > > MAC_Binding entries but perhaps it's more complex.
> >  > > > > > > Looking forward for your comments :)
> >  > > > > > >
> >  > > > > > >
> >  > > > > > > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> >  > > > > > > index 58bef7d..a86733e 100644
> >  > > > > > > --- a/ovn/northd/ovn-northd.c
> >  > > > > > > +++ b/ovn/northd/ovn-northd.c
> >  > > > > > > @@ -2324,6 +2324,18 @@ cleanup_mac_bindings(struct
> > northd_context *ctx,
> >  > > > > > > struct hmap *ports)
> >  > > > > > >       }
> >  > > > > > >   }
> >  > > > > > >
> >  > > > > > > +static void
> >  > > > > > > +delete_mac_binding_by_ip(struct northd_context *ctx, const
> > char *ip)
> >  > > > > > > +{
> >  > > > > > > +    const struct sbrec_mac_binding *b, *n;
> >  > > > > > > +    SBREC_MAC_BINDING_FOR_EACH_SAFE (b, n, ctx->ovnsb_idl) {
> >  > > > > > > +        if (strstr(ip, b->ip)) {
> >  > > > > > > +            sbrec_mac_binding_delete(b);
> >  > > > > > > +        }
> >  > > > > > > +    }
> >  > > > > > > +}
> >  > > > > > > +
> >  > > > > > > +
> >  > > > > > >   /* Updates the southbound Port_Binding table so that it
> > contains the
> >  > > > > > > logical
> >  > > > > > >    * switch ports specified by the northbound database.
> >  > > > > > >    *
> >  > > > > > > @@ -2383,6 +2395,15 @@ build_ports(struct northd_context *ctx,
> >  > > > > > >       /* Delete southbound records without northbound
> > matches. */
> >  > > > > > >       LIST_FOR_EACH_SAFE(op, next, list, &sb_only) {
> >  > > > > > >           ovs_list_remove(&op->list);
> >  > > > > > > +
> >  > > > > > > +        /* Delete all MAC_Binding entries which match the
> > IP addresses
> >  > > > > > > of the
> >  > > > > > > +         * deleted logical router port (ie. port with a
> > peer). */
> >  > > > > > > +        const char *peer = smap_get(&op->sb->options, "peer");
> >  > > > > > > +        if (peer) {
> >  > > > > > > +            for (int i = 0; i < op->sb->n_mac; i++) {
> >  > > > > > > +                delete_mac_binding_by_ip(ctx, op->sb->mac[i]);
> >  > > > > > > +            }
> >  > > > > > > +        }
> >  > > > > > >           sbrec_port_binding_delete(op->sb);
> >  > > > > > >           ovn_port_destroy(ports, op);
> >  > > > > > >       }
> >  > > > > > >
> >  > > > >
> >  > > > > Hi,
> >  > > > >
> >  > > > > Sorry that I didn't notice this discussion until now. I
> > encountered similar problems before. It was not in floating IP scenario,
> > but for external IPs - ports on the same networks but not aware by OVN.
> > When IP relocates from one MAC to another, the previous mac-binding
> > entry will not get updated and therefore the re-located IP is unreachable.
> >  > > > >
> >  > > > > This happens for external router IPs on the localnet network
> > behind the gateways (which hosts the 172.24.4.221 port in Daniel's
> > example). It also happens for nested workloads that run inside a VM -
> > the VM port is known by OVN, but the internal workloads (e.g.
> > containers) runs on same subnets but relies on mac-binding to communicate.
> >  > > > >
> >  > > > > For both of my use cases, the problem has been solved by this
> > patch (merged):
> > https://github.com/openvswitch/ovs/commit/b068454082f5d76727ffde34542ff19fed20e178
> >  > > > >
> >  > > > > The idea is, mac-binding entry should be updated when the IP is
> > announced in a new location by GARP/ARP request/ARP response. So I think
> > the best way to solve the problem for floating IP is similar. We just
> > need to generate GARP when a new FIP is attached. I was under the
> > impression that OVN already supports GARP when a new NAT entry is added.
> > But if the problem is still there it means something is wrong there (or
> > the GARP feature is not there yet for the NAT case), and I need to check
> > the code.
> >  > > >
> >  > > > I think we're only sending the GARPs only for distributed
> > floating IPs
> >  > > > (nat_addresses field in the Port_Binding table) [0].
> >  > > > Anyways even with that, I'm not quite sure if the MAC_Binding table
> >  > > > would get updated as I think that first time I hit this issue it was
> >  > > > on a DVR environment (ie. distributed FIPs, dnat_and_snat entries
> > with
> >  > > > a logical_port and external_mac).
> >  > > >
> >  > > > [0]
> > https://github.com/openvswitch/ovs/blob/master/ovn/controller/pinctrl.c#L2497
> >  > > > >
> >  > > > > For the patch proposed in this discussion, I think there are
> > two problems.
> >  > > > >
> >  > > > > Firstly, I think it doesn't solve the problem completely. It
> > only deletes mac-binding when a logical router port is deleted. However,
> > in any of the above use cases (including FIP), IP relocation can happen
> > without deleting the router port. Or did I misunderstood anything here?
> >  > > > >
> >  > > > > Secondly, northd just reconciles between current state and
> > desired state for SB - it is declarative. We should avoid relying on the
> > northd cleanup logic to trigger important operations. I think the design
> > principle of northd should be making sure the desired state is reached,
> > but not care about how is it reached. For example, it can be reached by
> > deleting extra records one by one, but it is also correct if it deletes
> > everything and recreate the desired entries - this is just an example,
> > it may be inefficient, but it may be reasonable in some scenarios.
> > Adding logic in northd that relies on *how* the desired state is
> > computed would make it unreliable and hard to maintain. I think it would
> > also create challenges for the DDlog implementation.
> >  > > > >
> >  > > > > For the mac-binding aging mechanism mentioned by Daniel, I
> > agree. It is required for fault scenarios when SB is temporarily down.
> > Since we rely on SB DB to store the ARP cache/Neighbor table for the
> > virtual routers, if ARP updates happens when the DB is down, changes are
> > lost. However, the aging mechanism seems tricky when scale is
> > considered. Only the idle entries should be timed out, but it is costly
> > to update states whenever a mac-binding entry is hit. I haven't thought
> > about any clever way to achieve it without sacrificing scalability. Any
> > thoughts here? A workaround to the problem is to resend GARP
> > periodically (e.g. every 1 min).
> >  > > > >
> >  > > > > Thanks,
> >  > > > > Han
> >
> > Hi Daniel,
> >
> > Thanks for digging it out! Now the root cause is well explained, and the
> > RFC patch looks good to me. It is a simple and correct fix. However, I'd
> > like to bring up an alternative for discussion.
>
> +1 from me on the patch being implemented well.
+1  https://mail.openvswitch.org/pipermail/ovs-dev/2018-December/354242.html
See comments as this patch breaks a test that was introduced by recent
commit and I need inputs :)
>
> >
> > Alternatively, since the NAT IPs are already known in NB, instead of
> > relying on dynamic ARP (mac-binding), the problem can be solved from
> > northd directly by adding static ARP-resolving flows for floating IPs,
> > just like how it is handling the IPs in "addresses" column of lports.
> > This would be more straightforward and reliable than relying on
> > mac-binding updating by GARP.
> >
> > I think we can still go ahead with the mac-binding patch, since it fixes
> > an improperly injected broadcast packet, and I don't see any negative
> > impact. In addition, we may think of the northd change as a further
> > improvement of the feature, if necessary. What do you think?
>
> I agree. I think the static flow-installation approach seems like a
> fairly straightforward thing to add. Would adding this behavior also
> require us to change the GARP-sending behavior?
IMHO, the way that the proposed patch sends the GARP is the correct
way of doing it so regardless of the static flow installation approach
I think it's ok to keep doing it like so. Imagine the case where a VM
is listening to GARPs for some other reason; perhaps there's other use
cases as well so keeping broadcast packets reaching the right
destinations seems correct to me.

Thanks!!
Daniel
>
> >
> > Thanks,
> > Han
>
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to