On Thu, May 28, 2020 at 7:26 AM Dumitru Ceara <dce...@redhat.com> wrote:

> On 5/28/20 12:48 PM, Daniel Alvarez Sanchez wrote:
> > Hi all
> >
> > Sorry for top posting. I want to thank you all for the discussion and
> > give also some feedback from OpenStack perspective which is affected
> > by the problem described here.
> >
> > In OpenStack, it's kind of common to have a shared external network
> > (logical switch with a localnet port) across many tenants. Each tenant
> > user may create their own router where their instances will be
> > connected to access the external network.
> >
> > In such scenario, we are hitting the issue described here. In
> > particular in our tests we exercise 3K VIFs (with 1 FIP) each spanning
> > 300 LS; each LS connected to a LR (ie. 300 LRs) and that router
> > connected to the public LS. This is creating a huge problem in terms
> > of performance and tons of events due to the MAC_Binding entries
> > generated as a consequence of the GARPs sent for the floating IPs.
> >
>
> Just as an addition to this, GARPs wouldn't be the only reason why all
> routers would learn the MAC_Binding. Even if we wouldn't be sending
> GARPs for the FIPs, when a VM that's behind a FIP would send traffic to
> the outside, the router will generate an ARP request for the next hop
> using the FIP-IP and FIP-MAC. This will be broadcasted to all routers
> connected to the public LS and will trigger them to learn the
> FIP-IP:FIP-MAC binding.
>

Yeah we shouldn't be learning on regular ARP requests.


>
> > Thanks,
> > Daniel
> >
> >
> > On Thu, May 28, 2020 at 10:51 AM Dumitru Ceara <dce...@redhat.com>
> wrote:
> >>
> >> On 5/28/20 8:34 AM, Han Zhou wrote:
> >>>
> >>>
> >>> On Wed, May 27, 2020 at 1:10 AM Dumitru Ceara <dce...@redhat.com
> >>> <mailto:dce...@redhat.com>> wrote:
> >>>>
> >>>> Hi Girish, Han,
> >>>>
> >>>> On 5/26/20 11:51 PM, Han Zhou wrote:
> >>>>>
> >>>>>
> >>>>> On Tue, May 26, 2020 at 1:07 PM Girish Moodalbail
> >>> <gmoodalb...@gmail.com <mailto:gmoodalb...@gmail.com>
> >>>>> <mailto:gmoodalb...@gmail.com <mailto:gmoodalb...@gmail.com>>>
> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Tue, May 26, 2020 at 12:42 PM Han Zhou <zhou...@gmail.com
> >>> <mailto:zhou...@gmail.com>
> >>>>> <mailto:zhou...@gmail.com <mailto:zhou...@gmail.com>>> wrote:
> >>>>>>>
> >>>>>>> Hi Girish,
> >>>>>>>
> >>>>>>> Thanks for the summary. I agree with you that GARP request v.s.
> reply
> >>>>> is irrelavent to the problem here.
> >>>>
> >>>> Well, actually I think GARP request vs reply is relevant (at least for
> >>>> case 1 below) because if OVN would be generating GARP replies we
> >>>> wouldn't need the priority 80 flow to determine if an ARP request
> packet
> >>>> is actually an OVN self originated GARP that needs to be flooded in
> the
> >>>> L2 broadcast domain.
> >>>>
> >>>> On the other hand, router3 would be learning mac_binding IP2,M2 from
> the
> >>>> GARP reply originated by router2 and vice versa so we'd have to
> restrict
> >>>> flooding of GARP replies to non-patch ports.
> >>>>
> >>>
> >>> Hi Dumitru, the point was that, on the external LS, the GRs will have
> to
> >>> send ARP requests to resolve unknown IPs (at least for the external
> GW),
> >>> and it has to be broadcasted, which will cause all the GRs learn all
> >>> MACs of other GRs. This is regardless of the GARP behavior. You are
> >>> right that if we only consider the Join switch then the GARP request
> >>> v.s. reply does make a difference. However, GARP request/reply may be
> >>> really needed only on the external LS.
> >>>
> >>
> >> Ok, but do you see an easy way to determine if we need to add the
> >> logical flows that flood self originated GARP packets on a given logical
> >> switch? Right now we add them on all switches.
> >>
> >>>>>>> Please see my comment inline below.
> >>>>>>>
> >>>>>>> On Tue, May 26, 2020 at 12:09 PM Girish Moodalbail
> >>>>> <gmoodalb...@gmail.com <mailto:gmoodalb...@gmail.com>
> >>> <mailto:gmoodalb...@gmail.com <mailto:gmoodalb...@gmail.com>>> wrote:
> >>>>>>>>
> >>>>>>>> Hello Dumitru,
> >>>>>>>>
> >>>>>>>> There are several things that are being discussed on this thread.
> >>>>> Let me see if I can tease them out for clarity.
> >>>>>>>>
> >>>>>>>> 1. All the router IPs are known to OVN (the join switch case)
> >>>>>>>> 2. Some IPs are known and some are not known (the external logical
> >>>>> switch that connects to physical network case).
> >>>>>>>>
> >>>>>>>> Let us look at each of the case above:
> >>>>>>>>
> >>>>>>>> 1. Join Switch Case
> >>>>>>>>
> >>>>>>>> +----------------+        +----------------+
> >>>>>>>> |   l3gateway    |        |   l3gateway    |
> >>>>>>>> |    router2     |        |    router3     |
> >>>>>>>> +-------------+--+        +-+--------------+
> >>>>>>>>             IP2,M2         IP3,M3
> >>>>>>>>               |             |
> >>>>>>>>            +--+-------------+---+
> >>>>>>>>            |    join switch     |
> >>>>>>>>            +---------+----------+
> >>>>>>>>                      |
> >>>>>>>>                   IP1,M1
> >>>>>>>>              +-------+--------+
> >>>>>>>>              |  distributed   |
> >>>>>>>>              |     router     |
> >>>>>>>>              +----------------+
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Say, GR router2 wants to send the packet out to DR and that we
> >>>>> don't have static mappings of MAC to IP in lr_in_arp_resolve table
> on GR
> >>>>> router2 (with Han's patch of dynamic_neigh_routes=true for all the
> >>>>> Gateway Routers). With this in mind, when an ARP request is sent out
> by
> >>>>> router2's hypervisor the packet should be directly sent to the
> >>>>> distributed router alone. Your commit 32f5ebb0622 (ovn-northd: Limit
> >>>>> ARP/ND broadcast domain whenever possible) should have allowed only
> >>>>> unicast. However, in ls_in_l2_lkup table we have
> >>>>>>>>
> >>>>>>>>   table=19(ls_in_l2_lkup      ), priority=80   , match=(eth.src ==
> >>>>> { M2 } && (arp.op == 1 || nd_ns)), action=(outport = "_MC_flood";
> >>> output;)
> >>>>>>>>   table=19(ls_in_l2_lkup      ), priority=75   , match=(flags[1]
> ==
> >>>>> 0 && arp.op == 1 && arp.tpa == { IP1}), action=(outport =
> >>>>> "jtor-router2"; output;)
> >>>>>>>>
> >>>>>>>> As you can see, `priority=80` rule will always be hit and sent out
> >>>>> to all the GRs. The `priority=75` rule is never hit. So, we will see
> ARP
> >>>>> packets on the GENEVE tunnel. So, we need to change `priority=80` to
> >>>>> match GARP request packets. That way, for the known OVN IPs case we
> >>>>> don't do broadcast.
> >>>>>>>
> >>>>>>> Since the solution to case 2) below (i.e.
> >>>>> learn_from_arp_request=false) solves the problem of case 1), too, I
> >>>>> think we don't need this change just for case 1). As @Dumitru Ceara
> >>>>>  mentioned, there is some cost because it adds extra flows. It would
> be
> >>>>> significant amount of flows if there are a lot of snat_and_dnat IPs.
> >>>>> What do you think?
> >>>>
> >>>> I think the following might be a solution, although with the cost of
> >>>> adding as many flows as dnat_and_snat IPs are configured:
> >>>>
> >>>> - priority 80: explicitly determine if an ARP request is a self
> >>>> originated GARP for configured IP addresses and dnat_and_snat IPs (by
> >>>> matching on all eth.src and arp.tpa pairs) and if so flood on all
> >>>> non-patch ports.
> >>>> - priority 75: if arp.tpa is owned by an OVN logical router port,
> >>>> "unicast" it only on the patch port towards the router.
> >>>> - priority 1: flood any broadcast packet.
> >>>>
> >>>> Together with the learn_from_arp_request=false knob this would cover
> >>>> both case 1 (join switch) and case 2 (external switch).
> >>>>
> >>>> Wdyt?
> >>>>
> >>> Would the "learn_from_arp_request=false knob" cover both cases? If yes,
> >>> we don't need to add more flows of priority 80, or more accurately:
> >>> whether to update the priority-80 flows is not directly related to the
> >>> current problem.
> >>>
> >>
> >> Yes, it would, except for the fact that the ARP requests would still be
> >> flooded to all routers (and ignored at the destination). Which is afaiu
> >> what Girish was worried about. In order to address that part too I'm
> >> afraid we have to update the priority-80 flows.
> >>
> >> Regards,
> >> Dumitru
> >>
> >>>>>>
> >>>>>>
> >>>>>> Han, yes it will work. However, my only concern is that we would
> send
> >>>>> all these ARP requests via tunnel to each of 1000 hypervisors and
> these
> >>>>> hypervisors will just drop them on the floor. when they see
> >>>>> learn_from_arp_request=false.
> >>>>>
> >>>>> I think maybe it is not a problem since it happens only once on the
> Join
> >>>>> switch. Once the MAC is learned, it won't broadcast again. It may be
> >>>>> more of a problem on the external LS if periodical GARP is required
> >>>>> there. However, I'd suggest to have some test and see if it is
> really a
> >>>>> problem, before trying to solve it.
> >>>>>
> >>>>>>
> >>>>>> Han, Dumitru,
> >>>>>>
> >>>>>> Why can't we swap the priorities of the above two flows so that the
> >>>>> ARP request for NexHop IP known to OVN will be always sent via
> >>> `unicast`?
> >>>>>
> >>>>> If swapped, even GARP won't get broadcasted. Maybe that's not the
> >>>>> desired behavior.
> >>>>>
> >>>>
> >>>> This is definitely not desired as we'd be hitting the prio 75 flow
> that
> >>>> would send the self originated GARP request (IPx) packet back towards
> >>>> the router port that owns IPx.
> >>>>
> >>>>>>
> >>>>>> Regards,
> >>>>>> ~Girish
> >>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> 2. External Logical Switch Case
> >>>>>>>>
> >>>>>>>>                        10.10.10.0/24 <http://10.10.10.0/24>
> >>> <http://10.10.10.0/24>
> >>>>>
> >>>>>>>>    -------------------------+--------------------------
> >>>>>>>>                             |
> >>>>>>>>                          localnet
> >>>>>>>>                       +-----+-----+
> >>>>>>>>                       | external  |
> >>>>>>>>          +------------+    LS1    +-------------+
> >>>>>>>>          |            +-----+-----+             |
> >>>>>>>>          |                  |                   |
> >>>>>>>>      10.10.10.2         10.10.10.3          10.10.10.4
> >>>>>>>>         SNAT               SNAT                SNAT
> >>>>>>>>    +-----+-----+      +-----+-----+       +-----------+
> >>>>>>>>    | l3gateway |      | l3gateway |       | l3gateway |
> >>>>>>>>    |   node1   |      |   node2   |       |   node3   |
> >>>>>>>>    +-----------+      +-----------+       +-----------+
> >>>>>>>>
> >>>>>>>> In this case, we have some of the IPs in OVN and some in the
> >>>>> physical network. If we fix (1) above, all the ARP requests for the
> >>>>> OVN's router IPs will be unicast. However, all the ARP requests to
> >>>>> external IPs, say 10.10.10.1 on the "physical router", will be
> >>>>> broadcast. Now, we will see these ARP broadcasts on all the L3
> gateway
> >>>>> routers. With 'learn_from_arp_request=false' [a], then the
> MAC_Binding
> >>>>> table will not explode for both ARP and GARP requests.
> >>>>>>>>
> >>>>>>>> So, I don't think GARP requests and replies is the issue here?
> >>>>> Furthermore, learning from the GARP replies are blocked on certain
> >>>>> routers. For example:
> >>>>>
> >>>
> https://www.juniper.net/documentation/en_US/junose15.1/topics/concept/ip-gratuitous-arps-transmission-overview.html
> >>>>>  says "By default, updating the ARP cache on GARP replies is
> disabled on
> >>>>> the router.". So, our NAT addresses mapping will not be learnt.
> >>>>
> >>>> Just as a side note, the above doesn't mean Juniper boxes don't
> support
> >>>> learning from GARP replies, just that they'd need extra
> configuration. I
> >>>> don't necessarily think that's a bad thing if properly documented in
> OVN
> >>>> that we would be generating GARP replies.
> >>>>
> >>>> Regards,
> >>>> Dumitru
> >>>>
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> ~Girish
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> [a] - From Han's mail, the meaning of learn_from_arp_request=false
> >>>>> --> if the TPA is on the router, add a new entry (it means the
> >>>>>>>>>     remote wants to communicate with this node, so it makes
> >>> sense to
> >>>>>>>>>     learn the remote as well). Otherwise, ignore it and no new
> >>>>> entry added.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> You received this message because you are subscribed to the Google
> >>>>> Groups "ovn-kubernetes" group.
> >>>>>> To unsubscribe from this group and stop receiving emails from it,
> send
> >>>>> an email to ovn-kubernetes+unsubscr...@googlegroups.com
> >>> <mailto:ovn-kubernetes%2bunsubscr...@googlegroups.com>
> >>>>> <mailto:ovn-kubernetes%2bunsubscr...@googlegroups.com
> >>> <mailto:ovn-kubernetes%252bunsubscr...@googlegroups.com>>.
> >>>>>> To view this discussion on the web visit
> >>>>>
> >>>
> https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STRnem2PeSahuwhro1t%2BQJxchZNC7viq8n-ngM9KU%2B%2B-Xw%40mail.gmail.com
> .
> >>>>
> >>>
> >>> --
> >>> You received this message because you are subscribed to the Google
> >>> Groups "ovn-kubernetes" group.
> >>> To unsubscribe from this group and stop receiving emails from it, send
> >>> an email to ovn-kubernetes+unsubscr...@googlegroups.com
> >>> <mailto:ovn-kubernetes+unsubscr...@googlegroups.com>.
> >>> To view this discussion on the web visit
> >>>
> https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCkHGft30Vx_Yx3fiCeki4NM4YwCvNJaU2S2mGv4buLwgg%40mail.gmail.com
> >>> <
> https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCkHGft30Vx_Yx3fiCeki4NM4YwCvNJaU2S2mGv4buLwgg%40mail.gmail.com?utm_medium=email&utm_source=footer
> >.
> >>
> >> _______________________________________________
> >> discuss mailing list
> >> disc...@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >
>
>
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to