On Wed, Nov 13, 2019 at 8:14 AM Dumitru Ceara <dce...@redhat.com> wrote:
>
> On Wed, Nov 13, 2019 at 4:30 PM Han Zhou <hz...@ovn.org> wrote:
> >
> >
> >
> > On Wed, Nov 13, 2019 at 2:42 AM Dumitru Ceara <dce...@redhat.com> wrote:
> >>
> >> On Tue, Nov 12, 2019 at 8:50 PM Han Zhou <hz...@ovn.org> wrote:
> >> >
> >> >
> >> >
> >> >
> >> > On Tue, Nov 12, 2019 at 10:10 AM Dumitru Ceara <dce...@redhat.com>
wrote:
> >> > >
> >> > > On Tue, Nov 12, 2019 at 6:17 PM Han Zhou <zhou...@gmail.com> wrote:
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Tue, Nov 12, 2019 at 2:29 AM Dumitru Ceara <dce...@redhat.com>
wrote:
> >> > > > >
> >> > > > > ARP request and ND NS packets for router owned IPs were being
> >> > > > > flooded in the complete L2 domain (using the MC_FLOOD
multicast group).
> >> > > > > However this creates a scaling issue in scenarios where
aggregation
> >> > > > > logical switches are connected to more logical routers (~350).
The
> >> > > > > logical pipelines of all routers would have to be executed
before the
> >> > > > > packet is finally replied to by a single router, the owner of
the IP
> >> > > > > address.
> >> > > > >
> >> > > > > This commit limits the broadcast domain by bypassing the L2
Lookup stage
> >> > > > > for ARP requests that will be replied by a single router. The
packets
> >> > > > > are forwarded only to the router port that owns the target IP
address.
> >> > > > >
> >> > > > > IPs that are owned by the routers and for which this fix
applies are:
> >> > > > > - IP addresses configured on the router ports.
> >> > > > > - VIPs.
> >> > > > > - NAT IPs.
> >> > > > >
> >> > > > > Reported-at: https://bugzilla.redhat.com/1756945
> >> > > > > Reported-by: Anil Venkata <vkomm...@redhat.com>
> >> > > > > Signed-off-by: Dumitru Ceara <dce...@redhat.com>
> >> > > > >
> >> > > > > ---
> >> > > > > v7:
> >> > > > > - Address Han's comments:
> >> > > > >     - Remove flooding for all ARPs received on VLAN networks.
To avoid
> >> > > > >       that we now identify self originated (G)ARPs by matching
on source
> >> > > > >       MAC address too.
> >> > > > >     - Rename REGBIT_NOT_VXLAN to FLAGBIT_NOT_VXLAN.
> >> > > > > - Fix ovn-sb manpage.
> >> > > > > - Split patch in a series of 2:
> >> > > > >     - patch1: fixes the get_router_load_balancer_ips()
function.
> >> > > > >     - patch2: limits the ARP/ND broadcast domain.
> >> > > > > v6:
> >> > > > > - Address Han's comments:
> >> > > > >     - remove flooding of ARPs targeting OVN owned IP addresses.
> >> > > > >     - update ovn-architecture documentation.
> >> > > > >     - rename ARP handling functions.
> >> > > > >     - Adapt "ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR" autotest
to take into
> >> > > > >     account the new way of forwarding ARPs.
> >> > > > > - Also, properly deal with ARP packets on VLAN-backed networks.
> >> > > > > v5: Address Numan's comments: update comments & make autotest
more
> >> > > > >     robust.
> >> > > > > v4: Rebase.
> >> > > > > v3: Properly deal with VXLAN traffic. Address review comments
from
> >> > > > >     Numan (add autotests). Fix function
get_router_load_balancer_ips.
> >> > > > >     Rebase -> deal with IPv6 NAT too.
> >> > > > > v2: Move ARP broadcast domain limiting to table
S_SWITCH_IN_L2_LKUP to
> >> > > > > address localnet ports too.
> >> > > > > ---
> >> > > > >  northd/ovn-northd.8.xml |   14 ++
> >> > > > >  northd/ovn-northd.c     |  230
+++++++++++++++++++++++++++++++----
> >> > > > >  ovn-architecture.7.xml  |   19 +++
> >> > > > >  tests/ovn.at            |  307
+++++++++++++++++++++++++++++++++++++++++++++--
> >> > > > >  4 files changed, 530 insertions(+), 40 deletions(-)
> >> > > > >
> >> > > > > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> >> > > > > index 0a33dcd..344cc0d 100644
> >> > > > > --- a/northd/ovn-northd.8.xml
> >> > > > > +++ b/northd/ovn-northd.8.xml
> >> > > > > @@ -1005,6 +1005,20 @@ output;
> >> > > > >        </li>
> >> > > > >
> >> > > > >        <li>
> >> > > > > +        Priority-80 flows for each port connected to a
logical router
> >> > > > > +        matching self originated GARP/ARP request/ND packets.
These packets
> >> > > > > +        are flooded to the <code>MC_FLOOD</code> which
contains all logical
> >> > > > > +        ports.
> >> > > > > +      </li>
> >> > > > > +
> >> > > > > +      <li>
> >> > > > > +        Priority-75 flows for each IP address/VIP/NAT address
owned by a
> >> > > > > +        router port connected to the switch. These flows
match ARP requests
> >> > > > > +        and ND packets for the specific IP addresses.
Matched packets are
> >> > > > > +        forwarded only to the router that owns the IP address.
> >> > > > > +      </li>
> >> > > > > +
> >> > > > > +      <li>
> >> > > > >          A priority-70 flow that outputs all packets with an
Ethernet broadcast
> >> > > > >          or multicast <code>eth.dst</code> to the
<code>MC_FLOOD</code>
> >> > > > >          multicast group.
> >> > > > > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> >> > > > > index 32f3200..d6beb97 100644
> >> > > > > --- a/northd/ovn-northd.c
> >> > > > > +++ b/northd/ovn-northd.c
> >> > > > > @@ -210,6 +210,8 @@ enum ovn_stage {
> >> > > > >  #define REGBIT_LOOKUP_NEIGHBOR_RESULT "reg9[4]"
> >> > > > >  #define REGBIT_SKIP_LOOKUP_NEIGHBOR "reg9[5]"
> >> > > > >
> >> > > > > +#define FLAGBIT_NOT_VXLAN "flags[1] == 0"
> >> > > > > +
> >> > > > >  /* Returns an "enum ovn_stage" built from the arguments. */
> >> > > > >  static enum ovn_stage
> >> > > > >  ovn_stage_build(enum ovn_datapath_type dp_type, enum
ovn_pipeline pipeline,
> >> > > > > @@ -1202,6 +1204,34 @@ ovn_port_allocate_key(struct
ovn_datapath *od)
> >> > > > >                            1, (1u << 15) - 1,
&od->port_key_hint);
> >> > > > >  }
> >> > > > >
> >> > > > > +/* Returns true if the logical switch port 'enabled' column
is empty or
> >> > > > > + * set to true.  Otherwise, returns false. */
> >> > > > > +static bool
> >> > > > > +lsp_is_enabled(const struct nbrec_logical_switch_port *lsp)
> >> > > > > +{
> >> > > > > +    return !lsp->n_enabled || *lsp->enabled;
> >> > > > > +}
> >> > > > > +
> >> > > > > +/* Returns true only if the logical switch port 'up' column
is set to true.
> >> > > > > + * Otherwise, if the column is not set or set to false,
returns false. */
> >> > > > > +static bool
> >> > > > > +lsp_is_up(const struct nbrec_logical_switch_port *lsp)
> >> > > > > +{
> >> > > > > +    return lsp->n_up && *lsp->up;
> >> > > > > +}
> >> > > > > +
> >> > > > > +static bool
> >> > > > > +lsp_is_external(const struct nbrec_logical_switch_port *nbsp)
> >> > > > > +{
> >> > > > > +    return !strcmp(nbsp->type, "external");
> >> > > > > +}
> >> > > > > +
> >> > > > > +static bool
> >> > > > > +lrport_is_enabled(const struct nbrec_logical_router_port
*lrport)
> >> > > > > +{
> >> > > > > +    return !lrport->enabled || *lrport->enabled;
> >> > > > > +}
> >> > > > > +
> >> > > > >  static char *
> >> > > > >  chassis_redirect_name(const char *port_name)
> >> > > > >  {
> >> > > > > @@ -3750,28 +3780,6 @@ build_port_security_ip(enum
ovn_pipeline pipeline, struct ovn_port *op,
> >> > > > >
> >> > > > >  }
> >> > > > >
> >> > > > > -/* Returns true if the logical switch port 'enabled' column
is empty or
> >> > > > > - * set to true.  Otherwise, returns false. */
> >> > > > > -static bool
> >> > > > > -lsp_is_enabled(const struct nbrec_logical_switch_port *lsp)
> >> > > > > -{
> >> > > > > -    return !lsp->n_enabled || *lsp->enabled;
> >> > > > > -}
> >> > > > > -
> >> > > > > -/* Returns true only if the logical switch port 'up' column
is set to true.
> >> > > > > - * Otherwise, if the column is not set or set to false,
returns false. */
> >> > > > > -static bool
> >> > > > > -lsp_is_up(const struct nbrec_logical_switch_port *lsp)
> >> > > > > -{
> >> > > > > -    return lsp->n_up && *lsp->up;
> >> > > > > -}
> >> > > > > -
> >> > > > > -static bool
> >> > > > > -lsp_is_external(const struct nbrec_logical_switch_port *nbsp)
> >> > > > > -{
> >> > > > > -    return !strcmp(nbsp->type, "external");
> >> > > > > -}
> >> > > > > -
> >> > > > >  static bool
> >> > > > >  build_dhcpv4_action(struct ovn_port *op, ovs_be32 offer_ip,
> >> > > > >                      struct ds *options_action, struct ds
*response_action,
> >> > > > > @@ -5174,6 +5182,170 @@ build_lrouter_groups(struct hmap
*ports, struct ovs_list *lr_list)
> >> > > > >      }
> >> > > > >  }
> >> > > > >
> >> > > > > +/*
> >> > > > > + * Ingress table 17: Flows that flood self originated ARP/ND
packets in the
> >> > > > > + * switching domain.
> >> > > > > + */
> >> > > > > +static void
> >> > > > > +build_lswitch_rport_arp_req_self_orig_flow(struct ovn_port
*op,
> >> > > > > +                                           uint32_t priority,
> >> > > > > +                                           struct
ovn_datapath *od,
> >> > > > > +                                           struct hmap
*lflows)
> >> > > > > +{
> >> > > > > +    struct ds match = DS_EMPTY_INITIALIZER;
> >> > > > > +    struct ds eth_src = DS_EMPTY_INITIALIZER;
> >> > > > > +
> >> > > > > +    /* Self originated (G)ARP requests/ND need to be flooded
as usual.
> >> > > > > +     * Determine that packets are self originated by also
matching on
> >> > > > > +     * source MAC. Matching on ingress port is not reliable
in case this
> >> > > > > +     * is a VLAN-backed network.
> >> > > > > +     * Priority: 80.
> >> > > > > +     */
> >> > > > > +    ds_put_format(&eth_src, "{ %s, ", op->lrp_networks.ea_s);
> >> > > > > +    for (size_t i = 0; i < op->od->nbr->n_nat; i++) {
> >> > > > > +        const struct nbrec_nat *nat = op->od->nbr->nat[i];
> >> > > > > +
> >> > > > > +        if (!nat->external_mac) {
> >> > > > > +            continue;
> >> > > > > +        }
> >> > > > > +
> >> > > > > +        ds_put_format(&eth_src, "%s, ", nat->external_mac);
> >> > > > > +    }
> >> > > >
> >> > > > As discussed we need to add chassis unique MAC that are
configured in external-ids:ovn-chassis-mac-mappings of Chassis records, but
I didn't find this in the patch. VLAN backed logical router may not work
without this.
> >> > >
> >> > > Hi Han,
> >> > >
> >> > > Maybe I misunderstood but in the discussion on v6 I mentioned that
I
> >> > > don't think we need to add the MACs from
> >> > > external-ids:ovn-chassis-mac-mappings.
> >> > >
> >> > > Whenever chassis MACs are configured, in ovn-controller we create a
> >> > > conjunctive flow matching on any of the remote chassis MAC
addresses:
> >> > >
https://github.com/ovn-org/ovn/blob/master/controller/physical.c#L501
> >> > >
> >> > > And for all incoming traffic that matches this conjunction and
VLAN-id
> >> > > we change the MAC back to that of the logical router port:
> >> > >
https://github.com/ovn-org/ovn/blob/master/controller/physical.c#L558
> >> > >
> >> > > Isn't this enough to cover the self originated ARP packets?
> >> > >
> >> > > Thanks,
> >> > > Dumitru
> >> > >
> >> >
> >> > Dumitru, sorry that I misunderstood that you actually meant it was
ok to not adding chassis unique macs. Also I didn't realize that there are
already flows to change the chassis unique MACs back to the logical router
port's MACs.
> >> > With this precondition I think your patch should be good enough.
> >> >
> >> > However, I revisited the function put_replace_chassis_mac_flows()
and had some difficulty to understand how would it work. For these flows,
the match conditions are:
> >> > - in_port of the localnet port
> >> > - conjunction id: CHASSIS_MAC_TO_ROUTER_MAC_CONJID (value 100)
> >> > - vlan tag associated with the localnet port
> >> > The flow is added in a loop for each peer port to replace mac for
each router port on that logical switch. Since the match condition is all
the same, wouldn't it result in only one flow taking effect and others
getting dropped? I wonder if any other port-specific match condition should
be added so that MAC can be replaced back to its original router port mac
accordingly.
> >>
> >> If I understand correctly the differentiator is the ingress localnet
> >> port id and VLAN-ID.
> >>
> >> Looking at the autotest for "ovn -- 2 HVs, 2 lports/HV, localnet
> >> ports, DVR chassis mac" the network is:
> >>
> >> $ ovn-nbctl --db=unix:$PWD/./ovn-nb/ovn-nb.sock show
> >> switch df662f28-4a42-4ac4-aadb-89563347cae1 (ls1)
> >>     port ls1-to-router
> >>         type: router
> >>         router-port: router-to-ls1
> >>     port lp11
> >>         addresses: ["f0:00:00:00:00:11 192.168.1.1"]
> >>     port ln1
> >>         type: localnet
> >>         parent:
> >>         tag: 101
> >>         addresses: ["unknown"]
> >> switch 91ddb7b2-df8c-42de-a899-6bb35ee08a16 (ls2)
> >>     port ls2-to-router
> >>         type: router
> >>         router-port: router-to-ls2
> >>     port lp22
> >>         addresses: ["f0:00:00:00:00:22 192.168.2.2"]
> >>     port ln2
> >>         type: localnet
> >>         parent:
> >>         tag: 201
> >>         addresses: ["unknown"]
> >> router 4186cb04-3370-401c-9d65-29cb2af48af1 (router)
> >>     port router-to-ls1
> >>         mac: "00:00:01:01:02:03"
> >>         networks: ["192.168.1.3/24"]
> >>     port router-to-ls2
> >>         mac: "00:00:01:01:02:05"
> >>         networks: ["192.168.2.3/24"]
> >>
> >> $ ovn-sbctl --db=unix:$PWD/./ovn-sb/ovn-sb.sock list chassis | grep -E
> >> "uuid|external_ids"
> >> _uuid               : 31ba7484-e0af-4326-a71b-c3f32e52e547
> >> external_ids        : {datapath-type="",
> >>
iface-types="dummy,dummy-internal,dummy-pmd,erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan",
> >> ovn-bridge-mappings="phys:br-phys",
> >> ovn-chassis-mac-mappings="phys:aa:bb:cc:dd:ee:11", ovn-cms-options=""}
> >>
> >> _uuid               : a04b5ad2-d76f-42c4-9f2b-21816e2624e2
> >> external_ids        : {datapath-type="",
> >>
iface-types="dummy,dummy-internal,dummy-pmd,erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan",
> >> ovn-bridge-mappings="phys:br-phys",
> >> ovn-chassis-mac-mappings="phys:aa:bb:cc:dd:ee:22", ovn-cms-options=""}
> >>
> >> On HV1 (chassis-mac aa:bb:cc:dd:ee:11) we have the following flow to
> >> replace source MAC for already routed packets:
> >>
> >> $ OVS_RUNDIR=$PWD/hv1 ovs-ofctl dump-flows br-int | grep
> >> aa:bb:cc:dd:ee:11
> >>  cookie=0xe0f6b198, duration=580.460s, table=65, n_packets=0,
> >> n_bytes=0, idle_age=580,
> >> priority=150,reg15=0x1,metadata=0x1,dl_src=00:00:01:01:02:03
> >> actions=mod_dl_src:aa:bb:cc:dd:ee:11,mod_vlan_vid:101,output:1
> >>  cookie=0xfddc60a, duration=580.420s, table=65, n_packets=1,
> >> n_bytes=42, idle_age=580,
> >> priority=150,reg15=0x1,metadata=0x2,dl_src=00:00:01:01:02:05
> >> actions=mod_dl_src:aa:bb:cc:dd:ee:11,mod_vlan_vid:201,output:3
> >>
> >> In the above output, the first flow is for packets destined to hosts
> >> in LS1 (VLAN 101) and the second flow is for packets destined to hosts
> >> in LS2 (VLAN 201).
> >>
> >> If we inject a packet from lp11:
> >> in_port=lp11, eth.src=f0:00:00:00:00:11, eth.dst=00:00:01:01:02:03,
> >> ip.src=192.168.1.1, ip.dst=192.168.2.2
> >>
> >> The router pipeline is executed on HV1, the eth.src address that of
> >> the router-to-ls2 port (00:00:01:01:02:05) and finally the entry in
> >> table=65 is hit and eth.src is changed to the configured chassis-mac
> >> (aa:bb:cc:dd:ee:11).
> >>
> >> The packet is sent out on port ln2:
> >> $ OVS_RUNDIR=$PWD/hv1 ovs-vsctl --column ofport find interface
> >> name=patch-br-int-to-ln2
> >> ofport              : 3
> >>
> >> Then it is received on HV2, where we have the following flows:
> >> $ OVS_RUNDIR=$PWD/hv2 ovs-ofctl dump-flows br-int | grep conj
> >>  cookie=0x31ba7484, duration=1545.430s, table=0, n_packets=0,
> >> n_bytes=0, idle_age=1545, priority=180,dl_src=aa:bb:cc:dd:ee:11
> >> actions=conjunction(100,1/2)
> >>  cookie=0x2c6a49b3, duration=1545.368s, table=0, n_packets=1,
> >> n_bytes=46, idle_age=1545,
> >> priority=180,conj_id=100,in_port=2,dl_vlan=201
> >>
actions=strip_vlan,load:0x4->NXM_NX_REG13[],load:0x3->NXM_NX_REG11[],load:0x2->NXM_NX_REG12[],load:0x2->OXM_OF_METADATA[],load:0x1->NXM_NX_REG14[],mod_dl_src:00:00:01:01:02:05,resubmit(,8)
> >>  cookie=0xd8937e1d, duration=1545.332s, table=0, n_packets=0,
> >> n_bytes=0, idle_age=1545,
> >> priority=180,conj_id=100,in_port=3,dl_vlan=101
> >>
actions=strip_vlan,load:0x9->NXM_NX_REG13[],load:0x5->NXM_NX_REG11[],load:0x6->NXM_NX_REG12[],load:0x1->OXM_OF_METADATA[],load:0x1->NXM_NX_REG14[],mod_dl_src:00:00:01:01:02:03,resubmit(,8)
> >>  cookie=0x2c6a49b3, duration=1545.368s, table=0, n_packets=0,
> >> n_bytes=0, idle_age=1545, priority=180,dl_vlan=201
> >> actions=conjunction(100,2/2)
> >>  cookie=0xd8937e1d, duration=1545.332s, table=0, n_packets=0,
> >> n_bytes=0, idle_age=1545, priority=180,dl_vlan=101
> >> actions=conjunction(100,2/2)
> >>
> >> The packet matches:
> >> - clause 1 of the conjunction (100) because eth.src is the chassis-mac
> >> of HV1 (aa:bb:cc:dd:ee:11).
> >> - clause 2 of the conjunction because VLAN_ID is 201.
> >>
> >> And then matches this flow that fixes the eth.src in the packet to
> >> that of router-to-ls2:
> >> priority=180,conj_id=100,in_port=2,dl_vlan=201
> >> actions=.....,mod_dl_src:00:00:01:01:02:05,....
> >
> >
> > The test has only a single router with multiple lswitches, but the loop
in the code is trying to handle the case when there are multiple router
ports on the same lswitch (with same localnet port). In the loop the inport
and vlan doesn’t change across iterations.
>
> Ok, I see what you mean now, I was under the impression that we'd
> always have a single router port per lswitch with vlan networks.
>
> I'll let Ankur comment on this because it seems to me that we can't
> restore the router port MAC if multiple router-ports map to the same
> VLAN-ID + localnet-port.

Ok, now that this is clarified (at least common understanding of the
problem), I think we may move this forward by merging the current patch
first, and address the source MAC conversion problem separately in a new
thread discussing with Ankur.
I think it shouldn't affect the solution of this patch, either with current
implementation which replaces chassis mac to a probably wrong router's mac,
or with a new implementation that properly replaces chassis mac back to the
corresponding router's mac, the current patch should work well without
considering the chassis mac.
I applied this to master.

Thanks,
Han
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to