On Thu, Nov 7, 2019 at 7:43 AM Dumitru Ceara <[email protected]> wrote:
>
> On Thu, Nov 7, 2019 at 3:51 AM Han Zhou <[email protected]> wrote:
> >
> >
> >
> > On Mon, Nov 4, 2019 at 11:32 AM Numan Siddique <[email protected]> wrote:
> > >
> > > On Mon, Nov 4, 2019 at 8:37 PM Dumitru Ceara <[email protected]>
wrote:
> > > >
> > > > ARP request and ND NS packets for router owned IPs were being
> > > > flooded in the complete L2 domain (using the MC_FLOOD multicast
group).
> > > > However this creates a scaling issue in scenarios where aggregation
> > > > logical switches are connected to more logical routers (~350). The
> > > > logical pipelines of all routers would have to be executed before
the
> > > > packet is finally replied to by a single router, the owner of the IP
> > > > address.
> > > >
> > > > This commit limits the broadcast domain by bypassing the L2 Lookup
stage
> > > > for ARP requests that will be replied by a single router. The
packets
> > > > are still flooded in the L2 domain but not on any of the other patch
> > > > ports towards other routers connected to the switch. This restricted
> > > > flooding is done by using a new multicast group (MC_ARP_ND_FLOOD).
> > > >
> > > > IPs that are owned by the routers and for which this fix applies
are:
> > > > - IP addresses configured on the router ports.
> > > > - VIPs.
> > > > - NAT IPs.
> > > >
> > > > This commit also fixes function get_router_load_balancer_ips() which
> > > > was incorrectly returning a single address_family even though the
> > > > IP set could contain a mix of IPv4 and IPv6 addresses.
> > > >
> > > > Reported-at: https://bugzilla.redhat.com/1756945
> > > > Reported-by: Anil Venkata <[email protected]>
> > > > Signed-off-by: Dumitru Ceara <[email protected]>
> > >
> > > Thanks Dumitru, for addressing the review comments.
> > >
> > > Acked-by: Numan Siddique <[email protected]>
> > >
> > > Han, if you can take a look in this patch and provide your comments,
> > > that would be great.
> > >
>
> Hi Han,
>
> >
> > Sorry for delayed response :(
>
> No worries and thanks for reviewing this change.
>
> >
> > The patch looks good to me except:
> >
> > 1. It changes the behavior that when an ARP request for a LRP's IP is
coming to the logical switch, other routers will no longer learn the MAC-IP
binding of the ARP sender. This has been discussed and I tend to agree with
Dumitru that it should not cause real issue. I think it just worth to be
documented clearly in the ovn-architecture, probably in the section:
Logical Routers and Logical Patch Ports, because this is something
different from a traditional switch's behavior, and network administrators
may get suprised.
>
> Ack, I'll add this in v6.
>
> >
> > 2. Since we are anyway changing the behavior of ARP request handling
and bypassing logical router ports, and we think it should not cause real
problem, then I wonder why adding the MC_ARP_ND group to still flood to the
non-router ports. Is it really useful? Maybe it is not a big deal, but I
think the extra MC group would add some performance cost and it is almost
redundant with the regular MC_FLOOD except that there is no router ports in
it - it is a lot of redundancy considering that most LSPs are regular VIFs
in typical large scale environments. I would suggest to simplify this
unless there is clear concerns of not flooding to other ports.
>
> If I skip flooding to non-router ports the following test fails
> because one of the things checked by the test is that we flood
> broadcasts (including ARPs) in the broadcast domain:
>  36: ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR           FAILED (
ovs-macros.at:220)
>
> So my concern was that people might expect the ARP requests to be
> flooded within the L2 broadcast domain.
> However, we could add configuration knob to change the behavior and
> only flood them on localnet ports. Like this we could maintain the
> current L2 flooding but in deployments where this is not necessary the
> users could disable the flooding to VIF ports and this would remove
> the MC_ARP_ND group unless there's a localnet port in the datapath.
>
> What do you think?
>

Adding configuration knob is a valid option, but I hoped we could simplify
the change instead of making it even more complex :)

My point is that since we changed the behavior, it might be better to
change it with a more straightforward philosophy: for ARP request for an IP
that is owned by a LRP, it is not flooded. This is in fact comply with our
ARP responder behavior in LS: for a known IP-MAC binding, the ARP request
is not flooded but directly responded to the sender. Now with your change
we are just extending this behavior for IPs owned by LRPs behind the LSPs
with type "router".

I don't think we need to worry about the test case "36: ovn -- 3 HVs, 3 LS,
3 lports/LS, 1 LR". In fact the test already considered a case where
flooding is not expected:
            # 192.168.33.254 is configured to the switch patch port for
lrp33,
            # so no ARP flooding expected for it.
Now we can just change the test case a little more to exclude the IPs owned
by LRPs, since this is new expected behavior. What do you think?

Also I am confused why you pointed out localnet port as an exception. I
think we should just treat it with the same philosophy. If we are not
flooding, then just don't flood to any port, no matter if it is VIF or
localnet.

(If we do believe there is a need to flood with a new group MC_ARP_ND, I'd
say let's don't bother about the configuration knob now, until we see
performance impact from scale test)


> >
> > 3. Not a big thing, but the name of the functions
build_lswitch_rport_arp_responders() and build_lswitch_rport_arp_flow() are
a little bit confusing. It is not for arp-responder, but for bypassing
router ports. Would it be better to be something like:
build_lswitch_rport_arp_req_flows() and
build_lswitch_rport_arp_request_flow_for_ip()?
>
> Sure, I'll change them.
> >
> > In addition, not for the patch, but for the problem itself, I think
we'd better quantify the number of router ports we support with current
resubmit limit (as suggested by Ben), and document somewhere the limit and
the impact if user exceeds the limit. If the limit is easily exceed in very
common deployments, we probably should consider increasing the resubmit
limit.
>
> Agreed, that should be done too.
>
> Thanks,
> Dumitru
>
> >
> > Thanks,
> > Han
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to