On Wed, Nov 13, 2019 at 8:52 PM Han Zhou <[email protected]> wrote:
>
>
>
> On Wed, Nov 13, 2019 at 8:14 AM Dumitru Ceara <[email protected]> wrote:
> >
> > On Wed, Nov 13, 2019 at 4:30 PM Han Zhou <[email protected]> wrote:
> > >
> > >
> > >
> > > On Wed, Nov 13, 2019 at 2:42 AM Dumitru Ceara <[email protected]> wrote:
> > >>
> > >> On Tue, Nov 12, 2019 at 8:50 PM Han Zhou <[email protected]> wrote:
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Tue, Nov 12, 2019 at 10:10 AM Dumitru Ceara <[email protected]> 
> > >> > wrote:
> > >> > >
> > >> > > On Tue, Nov 12, 2019 at 6:17 PM Han Zhou <[email protected]> wrote:
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On Tue, Nov 12, 2019 at 2:29 AM Dumitru Ceara <[email protected]> 
> > >> > > > wrote:
> > >> > > > >
> > >> > > > > ARP request and ND NS packets for router owned IPs were being
> > >> > > > > flooded in the complete L2 domain (using the MC_FLOOD multicast 
> > >> > > > > group).
> > >> > > > > However this creates a scaling issue in scenarios where 
> > >> > > > > aggregation
> > >> > > > > logical switches are connected to more logical routers (~350). 
> > >> > > > > The
> > >> > > > > logical pipelines of all routers would have to be executed 
> > >> > > > > before the
> > >> > > > > packet is finally replied to by a single router, the owner of 
> > >> > > > > the IP
> > >> > > > > address.
> > >> > > > >
> > >> > > > > This commit limits the broadcast domain by bypassing the L2 
> > >> > > > > Lookup stage
> > >> > > > > for ARP requests that will be replied by a single router. The 
> > >> > > > > packets
> > >> > > > > are forwarded only to the router port that owns the target IP 
> > >> > > > > address.
> > >> > > > >
> > >> > > > > IPs that are owned by the routers and for which this fix applies 
> > >> > > > > are:
> > >> > > > > - IP addresses configured on the router ports.
> > >> > > > > - VIPs.
> > >> > > > > - NAT IPs.
> > >> > > > >
> > >> > > > > Reported-at: https://bugzilla.redhat.com/1756945
> > >> > > > > Reported-by: Anil Venkata <[email protected]>
> > >> > > > > Signed-off-by: Dumitru Ceara <[email protected]>
> > >> > > > >
> > >> > > > > ---
> > >> > > > > v7:
> > >> > > > > - Address Han's comments:
> > >> > > > >     - Remove flooding for all ARPs received on VLAN networks. To 
> > >> > > > > avoid
> > >> > > > >       that we now identify self originated (G)ARPs by matching 
> > >> > > > > on source
> > >> > > > >       MAC address too.
> > >> > > > >     - Rename REGBIT_NOT_VXLAN to FLAGBIT_NOT_VXLAN.
> > >> > > > > - Fix ovn-sb manpage.
> > >> > > > > - Split patch in a series of 2:
> > >> > > > >     - patch1: fixes the get_router_load_balancer_ips() function.
> > >> > > > >     - patch2: limits the ARP/ND broadcast domain.
> > >> > > > > v6:
> > >> > > > > - Address Han's comments:
> > >> > > > >     - remove flooding of ARPs targeting OVN owned IP addresses.
> > >> > > > >     - update ovn-architecture documentation.
> > >> > > > >     - rename ARP handling functions.
> > >> > > > >     - Adapt "ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR" autotest to 
> > >> > > > > take into
> > >> > > > >     account the new way of forwarding ARPs.
> > >> > > > > - Also, properly deal with ARP packets on VLAN-backed networks.
> > >> > > > > v5: Address Numan's comments: update comments & make autotest 
> > >> > > > > more
> > >> > > > >     robust.
> > >> > > > > v4: Rebase.
> > >> > > > > v3: Properly deal with VXLAN traffic. Address review comments 
> > >> > > > > from
> > >> > > > >     Numan (add autotests). Fix function 
> > >> > > > > get_router_load_balancer_ips.
> > >> > > > >     Rebase -> deal with IPv6 NAT too.
> > >> > > > > v2: Move ARP broadcast domain limiting to table 
> > >> > > > > S_SWITCH_IN_L2_LKUP to
> > >> > > > > address localnet ports too.
> > >> > > > > ---
> > >> > > > >  northd/ovn-northd.8.xml |   14 ++
> > >> > > > >  northd/ovn-northd.c     |  230 
> > >> > > > > +++++++++++++++++++++++++++++++----
> > >> > > > >  ovn-architecture.7.xml  |   19 +++
> > >> > > > >  tests/ovn.at            |  307 
> > >> > > > > +++++++++++++++++++++++++++++++++++++++++++++--
> > >> > > > >  4 files changed, 530 insertions(+), 40 deletions(-)
> > >> > > > >
> > >> > > > > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> > >> > > > > index 0a33dcd..344cc0d 100644
> > >> > > > > --- a/northd/ovn-northd.8.xml
> > >> > > > > +++ b/northd/ovn-northd.8.xml
> > >> > > > > @@ -1005,6 +1005,20 @@ output;
> > >> > > > >        </li>
> > >> > > > >
> > >> > > > >        <li>
> > >> > > > > +        Priority-80 flows for each port connected to a logical 
> > >> > > > > router
> > >> > > > > +        matching self originated GARP/ARP request/ND packets. 
> > >> > > > > These packets
> > >> > > > > +        are flooded to the <code>MC_FLOOD</code> which contains 
> > >> > > > > all logical
> > >> > > > > +        ports.
> > >> > > > > +      </li>
> > >> > > > > +
> > >> > > > > +      <li>
> > >> > > > > +        Priority-75 flows for each IP address/VIP/NAT address 
> > >> > > > > owned by a
> > >> > > > > +        router port connected to the switch. These flows match 
> > >> > > > > ARP requests
> > >> > > > > +        and ND packets for the specific IP addresses.  Matched 
> > >> > > > > packets are
> > >> > > > > +        forwarded only to the router that owns the IP address.
> > >> > > > > +      </li>
> > >> > > > > +
> > >> > > > > +      <li>
> > >> > > > >          A priority-70 flow that outputs all packets with an 
> > >> > > > > Ethernet broadcast
> > >> > > > >          or multicast <code>eth.dst</code> to the 
> > >> > > > > <code>MC_FLOOD</code>
> > >> > > > >          multicast group.
> > >> > > > > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > >> > > > > index 32f3200..d6beb97 100644
> > >> > > > > --- a/northd/ovn-northd.c
> > >> > > > > +++ b/northd/ovn-northd.c
> > >> > > > > @@ -210,6 +210,8 @@ enum ovn_stage {
> > >> > > > >  #define REGBIT_LOOKUP_NEIGHBOR_RESULT "reg9[4]"
> > >> > > > >  #define REGBIT_SKIP_LOOKUP_NEIGHBOR "reg9[5]"
> > >> > > > >
> > >> > > > > +#define FLAGBIT_NOT_VXLAN "flags[1] == 0"
> > >> > > > > +
> > >> > > > >  /* Returns an "enum ovn_stage" built from the arguments. */
> > >> > > > >  static enum ovn_stage
> > >> > > > >  ovn_stage_build(enum ovn_datapath_type dp_type, enum 
> > >> > > > > ovn_pipeline pipeline,
> > >> > > > > @@ -1202,6 +1204,34 @@ ovn_port_allocate_key(struct ovn_datapath 
> > >> > > > > *od)
> > >> > > > >                            1, (1u << 15) - 1, 
> > >> > > > > &od->port_key_hint);
> > >> > > > >  }
> > >> > > > >
> > >> > > > > +/* Returns true if the logical switch port 'enabled' column is 
> > >> > > > > empty or
> > >> > > > > + * set to true.  Otherwise, returns false. */
> > >> > > > > +static bool
> > >> > > > > +lsp_is_enabled(const struct nbrec_logical_switch_port *lsp)
> > >> > > > > +{
> > >> > > > > +    return !lsp->n_enabled || *lsp->enabled;
> > >> > > > > +}
> > >> > > > > +
> > >> > > > > +/* Returns true only if the logical switch port 'up' column is 
> > >> > > > > set to true.
> > >> > > > > + * Otherwise, if the column is not set or set to false, returns 
> > >> > > > > false. */
> > >> > > > > +static bool
> > >> > > > > +lsp_is_up(const struct nbrec_logical_switch_port *lsp)
> > >> > > > > +{
> > >> > > > > +    return lsp->n_up && *lsp->up;
> > >> > > > > +}
> > >> > > > > +
> > >> > > > > +static bool
> > >> > > > > +lsp_is_external(const struct nbrec_logical_switch_port *nbsp)
> > >> > > > > +{
> > >> > > > > +    return !strcmp(nbsp->type, "external");
> > >> > > > > +}
> > >> > > > > +
> > >> > > > > +static bool
> > >> > > > > +lrport_is_enabled(const struct nbrec_logical_router_port 
> > >> > > > > *lrport)
> > >> > > > > +{
> > >> > > > > +    return !lrport->enabled || *lrport->enabled;
> > >> > > > > +}
> > >> > > > > +
> > >> > > > >  static char *
> > >> > > > >  chassis_redirect_name(const char *port_name)
> > >> > > > >  {
> > >> > > > > @@ -3750,28 +3780,6 @@ build_port_security_ip(enum ovn_pipeline 
> > >> > > > > pipeline, struct ovn_port *op,
> > >> > > > >
> > >> > > > >  }
> > >> > > > >
> > >> > > > > -/* Returns true if the logical switch port 'enabled' column is 
> > >> > > > > empty or
> > >> > > > > - * set to true.  Otherwise, returns false. */
> > >> > > > > -static bool
> > >> > > > > -lsp_is_enabled(const struct nbrec_logical_switch_port *lsp)
> > >> > > > > -{
> > >> > > > > -    return !lsp->n_enabled || *lsp->enabled;
> > >> > > > > -}
> > >> > > > > -
> > >> > > > > -/* Returns true only if the logical switch port 'up' column is 
> > >> > > > > set to true.
> > >> > > > > - * Otherwise, if the column is not set or set to false, returns 
> > >> > > > > false. */
> > >> > > > > -static bool
> > >> > > > > -lsp_is_up(const struct nbrec_logical_switch_port *lsp)
> > >> > > > > -{
> > >> > > > > -    return lsp->n_up && *lsp->up;
> > >> > > > > -}
> > >> > > > > -
> > >> > > > > -static bool
> > >> > > > > -lsp_is_external(const struct nbrec_logical_switch_port *nbsp)
> > >> > > > > -{
> > >> > > > > -    return !strcmp(nbsp->type, "external");
> > >> > > > > -}
> > >> > > > > -
> > >> > > > >  static bool
> > >> > > > >  build_dhcpv4_action(struct ovn_port *op, ovs_be32 offer_ip,
> > >> > > > >                      struct ds *options_action, struct ds 
> > >> > > > > *response_action,
> > >> > > > > @@ -5174,6 +5182,170 @@ build_lrouter_groups(struct hmap *ports, 
> > >> > > > > struct ovs_list *lr_list)
> > >> > > > >      }
> > >> > > > >  }
> > >> > > > >
> > >> > > > > +/*
> > >> > > > > + * Ingress table 17: Flows that flood self originated ARP/ND 
> > >> > > > > packets in the
> > >> > > > > + * switching domain.
> > >> > > > > + */
> > >> > > > > +static void
> > >> > > > > +build_lswitch_rport_arp_req_self_orig_flow(struct ovn_port *op,
> > >> > > > > +                                           uint32_t priority,
> > >> > > > > +                                           struct ovn_datapath 
> > >> > > > > *od,
> > >> > > > > +                                           struct hmap *lflows)
> > >> > > > > +{
> > >> > > > > +    struct ds match = DS_EMPTY_INITIALIZER;
> > >> > > > > +    struct ds eth_src = DS_EMPTY_INITIALIZER;
> > >> > > > > +
> > >> > > > > +    /* Self originated (G)ARP requests/ND need to be flooded as 
> > >> > > > > usual.
> > >> > > > > +     * Determine that packets are self originated by also 
> > >> > > > > matching on
> > >> > > > > +     * source MAC. Matching on ingress port is not reliable in 
> > >> > > > > case this
> > >> > > > > +     * is a VLAN-backed network.
> > >> > > > > +     * Priority: 80.
> > >> > > > > +     */
> > >> > > > > +    ds_put_format(&eth_src, "{ %s, ", op->lrp_networks.ea_s);
> > >> > > > > +    for (size_t i = 0; i < op->od->nbr->n_nat; i++) {
> > >> > > > > +        const struct nbrec_nat *nat = op->od->nbr->nat[i];
> > >> > > > > +
> > >> > > > > +        if (!nat->external_mac) {
> > >> > > > > +            continue;
> > >> > > > > +        }
> > >> > > > > +
> > >> > > > > +        ds_put_format(&eth_src, "%s, ", nat->external_mac);
> > >> > > > > +    }
> > >> > > >
> > >> > > > As discussed we need to add chassis unique MAC that are configured 
> > >> > > > in external-ids:ovn-chassis-mac-mappings of Chassis records, but I 
> > >> > > > didn't find this in the patch. VLAN backed logical router may not 
> > >> > > > work without this.
> > >> > >
> > >> > > Hi Han,
> > >> > >
> > >> > > Maybe I misunderstood but in the discussion on v6 I mentioned that I
> > >> > > don't think we need to add the MACs from
> > >> > > external-ids:ovn-chassis-mac-mappings.
> > >> > >
> > >> > > Whenever chassis MACs are configured, in ovn-controller we create a
> > >> > > conjunctive flow matching on any of the remote chassis MAC addresses:
> > >> > > https://github.com/ovn-org/ovn/blob/master/controller/physical.c#L501
> > >> > >
> > >> > > And for all incoming traffic that matches this conjunction and 
> > >> > > VLAN-id
> > >> > > we change the MAC back to that of the logical router port:
> > >> > > https://github.com/ovn-org/ovn/blob/master/controller/physical.c#L558
> > >> > >
> > >> > > Isn't this enough to cover the self originated ARP packets?
> > >> > >
> > >> > > Thanks,
> > >> > > Dumitru
> > >> > >
> > >> >
> > >> > Dumitru, sorry that I misunderstood that you actually meant it was ok 
> > >> > to not adding chassis unique macs. Also I didn't realize that there 
> > >> > are already flows to change the chassis unique MACs back to the 
> > >> > logical router port's MACs.
> > >> > With this precondition I think your patch should be good enough.
> > >> >
> > >> > However, I revisited the function put_replace_chassis_mac_flows() and 
> > >> > had some difficulty to understand how would it work. For these flows, 
> > >> > the match conditions are:
> > >> > - in_port of the localnet port
> > >> > - conjunction id: CHASSIS_MAC_TO_ROUTER_MAC_CONJID (value 100)
> > >> > - vlan tag associated with the localnet port
> > >> > The flow is added in a loop for each peer port to replace mac for each 
> > >> > router port on that logical switch. Since the match condition is all 
> > >> > the same, wouldn't it result in only one flow taking effect and others 
> > >> > getting dropped? I wonder if any other port-specific match condition 
> > >> > should be added so that MAC can be replaced back to its original 
> > >> > router port mac accordingly.
> > >>
> > >> If I understand correctly the differentiator is the ingress localnet
> > >> port id and VLAN-ID.
> > >>
> > >> Looking at the autotest for "ovn -- 2 HVs, 2 lports/HV, localnet
> > >> ports, DVR chassis mac" the network is:
> > >>
> > >> $ ovn-nbctl --db=unix:$PWD/./ovn-nb/ovn-nb.sock show
> > >> switch df662f28-4a42-4ac4-aadb-89563347cae1 (ls1)
> > >>     port ls1-to-router
> > >>         type: router
> > >>         router-port: router-to-ls1
> > >>     port lp11
> > >>         addresses: ["f0:00:00:00:00:11 192.168.1.1"]
> > >>     port ln1
> > >>         type: localnet
> > >>         parent:
> > >>         tag: 101
> > >>         addresses: ["unknown"]
> > >> switch 91ddb7b2-df8c-42de-a899-6bb35ee08a16 (ls2)
> > >>     port ls2-to-router
> > >>         type: router
> > >>         router-port: router-to-ls2
> > >>     port lp22
> > >>         addresses: ["f0:00:00:00:00:22 192.168.2.2"]
> > >>     port ln2
> > >>         type: localnet
> > >>         parent:
> > >>         tag: 201
> > >>         addresses: ["unknown"]
> > >> router 4186cb04-3370-401c-9d65-29cb2af48af1 (router)
> > >>     port router-to-ls1
> > >>         mac: "00:00:01:01:02:03"
> > >>         networks: ["192.168.1.3/24"]
> > >>     port router-to-ls2
> > >>         mac: "00:00:01:01:02:05"
> > >>         networks: ["192.168.2.3/24"]
> > >>
> > >> $ ovn-sbctl --db=unix:$PWD/./ovn-sb/ovn-sb.sock list chassis | grep -E
> > >> "uuid|external_ids"
> > >> _uuid               : 31ba7484-e0af-4326-a71b-c3f32e52e547
> > >> external_ids        : {datapath-type="",
> > >> iface-types="dummy,dummy-internal,dummy-pmd,erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan",
> > >> ovn-bridge-mappings="phys:br-phys",
> > >> ovn-chassis-mac-mappings="phys:aa:bb:cc:dd:ee:11", ovn-cms-options=""}
> > >>
> > >> _uuid               : a04b5ad2-d76f-42c4-9f2b-21816e2624e2
> > >> external_ids        : {datapath-type="",
> > >> iface-types="dummy,dummy-internal,dummy-pmd,erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan",
> > >> ovn-bridge-mappings="phys:br-phys",
> > >> ovn-chassis-mac-mappings="phys:aa:bb:cc:dd:ee:22", ovn-cms-options=""}
> > >>
> > >> On HV1 (chassis-mac aa:bb:cc:dd:ee:11) we have the following flow to
> > >> replace source MAC for already routed packets:
> > >>
> > >> $ OVS_RUNDIR=$PWD/hv1 ovs-ofctl dump-flows br-int | grep
> > >> aa:bb:cc:dd:ee:11
> > >>  cookie=0xe0f6b198, duration=580.460s, table=65, n_packets=0,
> > >> n_bytes=0, idle_age=580,
> > >> priority=150,reg15=0x1,metadata=0x1,dl_src=00:00:01:01:02:03
> > >> actions=mod_dl_src:aa:bb:cc:dd:ee:11,mod_vlan_vid:101,output:1
> > >>  cookie=0xfddc60a, duration=580.420s, table=65, n_packets=1,
> > >> n_bytes=42, idle_age=580,
> > >> priority=150,reg15=0x1,metadata=0x2,dl_src=00:00:01:01:02:05
> > >> actions=mod_dl_src:aa:bb:cc:dd:ee:11,mod_vlan_vid:201,output:3
> > >>
> > >> In the above output, the first flow is for packets destined to hosts
> > >> in LS1 (VLAN 101) and the second flow is for packets destined to hosts
> > >> in LS2 (VLAN 201).
> > >>
> > >> If we inject a packet from lp11:
> > >> in_port=lp11, eth.src=f0:00:00:00:00:11, eth.dst=00:00:01:01:02:03,
> > >> ip.src=192.168.1.1, ip.dst=192.168.2.2
> > >>
> > >> The router pipeline is executed on HV1, the eth.src address that of
> > >> the router-to-ls2 port (00:00:01:01:02:05) and finally the entry in
> > >> table=65 is hit and eth.src is changed to the configured chassis-mac
> > >> (aa:bb:cc:dd:ee:11).
> > >>
> > >> The packet is sent out on port ln2:
> > >> $ OVS_RUNDIR=$PWD/hv1 ovs-vsctl --column ofport find interface
> > >> name=patch-br-int-to-ln2
> > >> ofport              : 3
> > >>
> > >> Then it is received on HV2, where we have the following flows:
> > >> $ OVS_RUNDIR=$PWD/hv2 ovs-ofctl dump-flows br-int | grep conj
> > >>  cookie=0x31ba7484, duration=1545.430s, table=0, n_packets=0,
> > >> n_bytes=0, idle_age=1545, priority=180,dl_src=aa:bb:cc:dd:ee:11
> > >> actions=conjunction(100,1/2)
> > >>  cookie=0x2c6a49b3, duration=1545.368s, table=0, n_packets=1,
> > >> n_bytes=46, idle_age=1545,
> > >> priority=180,conj_id=100,in_port=2,dl_vlan=201
> > >> actions=strip_vlan,load:0x4->NXM_NX_REG13[],load:0x3->NXM_NX_REG11[],load:0x2->NXM_NX_REG12[],load:0x2->OXM_OF_METADATA[],load:0x1->NXM_NX_REG14[],mod_dl_src:00:00:01:01:02:05,resubmit(,8)
> > >>  cookie=0xd8937e1d, duration=1545.332s, table=0, n_packets=0,
> > >> n_bytes=0, idle_age=1545,
> > >> priority=180,conj_id=100,in_port=3,dl_vlan=101
> > >> actions=strip_vlan,load:0x9->NXM_NX_REG13[],load:0x5->NXM_NX_REG11[],load:0x6->NXM_NX_REG12[],load:0x1->OXM_OF_METADATA[],load:0x1->NXM_NX_REG14[],mod_dl_src:00:00:01:01:02:03,resubmit(,8)
> > >>  cookie=0x2c6a49b3, duration=1545.368s, table=0, n_packets=0,
> > >> n_bytes=0, idle_age=1545, priority=180,dl_vlan=201
> > >> actions=conjunction(100,2/2)
> > >>  cookie=0xd8937e1d, duration=1545.332s, table=0, n_packets=0,
> > >> n_bytes=0, idle_age=1545, priority=180,dl_vlan=101
> > >> actions=conjunction(100,2/2)
> > >>
> > >> The packet matches:
> > >> - clause 1 of the conjunction (100) because eth.src is the chassis-mac
> > >> of HV1 (aa:bb:cc:dd:ee:11).
> > >> - clause 2 of the conjunction because VLAN_ID is 201.
> > >>
> > >> And then matches this flow that fixes the eth.src in the packet to
> > >> that of router-to-ls2:
> > >> priority=180,conj_id=100,in_port=2,dl_vlan=201
> > >> actions=.....,mod_dl_src:00:00:01:01:02:05,....
> > >
> > >
> > > The test has only a single router with multiple lswitches, but the loop 
> > > in the code is trying to handle the case when there are multiple router 
> > > ports on the same lswitch (with same localnet port). In the loop the 
> > > inport and vlan doesn’t change across iterations.
> >
> > Ok, I see what you mean now, I was under the impression that we'd
> > always have a single router port per lswitch with vlan networks.
> >
> > I'll let Ankur comment on this because it seems to me that we can't
> > restore the router port MAC if multiple router-ports map to the same
> > VLAN-ID + localnet-port.
>
> Ok, now that this is clarified (at least common understanding of the 
> problem), I think we may move this forward by merging the current patch 
> first, and address the source MAC conversion problem separately in a new 
> thread discussing with Ankur.
> I think it shouldn't affect the solution of this patch, either with current 
> implementation which replaces chassis mac to a probably wrong router's mac, 
> or with a new implementation that properly replaces chassis mac back to the 
> corresponding router's mac, the current patch should work well without 
> considering the chassis mac.
> I applied this to master.
>
> Thanks,
> Han
>


Thank you!

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to