On Fri, Sep 28, 2018 at 12:36 AM Guru Shetty <[email protected]> wrote: > > > On Mon, 24 Sep 2018 at 03:10, Miguel Angel Ajo Pelayo <[email protected]> > wrote: > >> >> >> No worries Guru, I understand your feeling, I worked with Anil on >> developing this feature, and it's indeed rather complex (we are actually >> replacing keepalived + VRRP with openflow and BFD). >> >> I'm happy to work on a more detailed documentation, I guess that Numan, >> Anil and I could team up to help that put documentation in shape where [1] >> should be a good starting point for the admin side, and may be after that >> Numan can give it a thought for making the code simpler where there's room >> for that. It's easier to have others help simplify once we have good >> documentation. >> >> I agree with Numans that, may be we should take also this opportunity >> (this patch Anil has been working on to make VLAN private networks possible >> with l3ha) to also support SR-IOV, since it's actually not far from there >> at all. >> >> Guru, what's your biggest concern, the design documentation? the user >> side ? both?. My worry with design documentaiton is that tends to rot when >> new patches comes, do we have any example of internals documentation that >> talk about code in tree? >> > > My biggest concern is that reading the code does not give out a story. > Everything in ovn-northd can be understood by just reading the code. But > when it comes to gateway redirect, it is hard. > > For e.g., I am pasting a snippet of code from one of the patches in this > series. > > ++ if (op->peer && op->peer->od->localnet_port &&+ > op->od->l3dgw_port && op->od->l3redirect_port &&+ (op != > op->od->l3redirect_port) &&+ (op != op->od->l3dgw_port)) {+ > ds_put_format(&match, " && is_chassis_resident(%s)",+ > op->od->l3redirect_port->json_key);+ ds_put_format(&actions, "; > flags.rcv_from_vlan = 1");+ } > > You need to go back and read the first patches of the feature (gateway > redirect) and its commit messages to understand the implications of the > above if condition. The primary reason why it is hard is because, we try to > move back and forth with packet flow and force it to move in non-natural > directions. > > I feel that writing a documentation (A hybrid of man pages of > ovn-architecture and ovn-northd) would make it easy to follow. For e.g., we > follow a packet when it leaves the VM and heads to the gateway and gets > SNATted. Similarly, we follow the packet for DNAT. We can then enhance it > with other use cases (for e.g this patch series). > > On a larger note, OVN is a network virtualization project. We implement > the functionality of a router, switch, firewall etc and then when we > interconnect them, things just work well. Code around Switches, routers, > load-balancers etc are easy to understand because they are all in logical > space. But gateways transcend the logical boundary to physical boundary. > The reason why the "gateway redirect" code is hard to understand is because > we are trying to implement it in the same router that is also used in > logical space. A clean separation of boundaries does not exist. I tried to > implement the clean separation with gateway routers, wherein the there is a > separate router that connects to the physical world and is chassis bound. > But OpenStack-OVN as a client did not want to handle the complication of > maintaining 2 routers. > > I was involved in moving away from the Gateway router to a distributed gateway router port once Mickey Spiegel added the redirect chassis. It's true that we didn't want to maintain 2 logical routers in networking-ovn. But we moved mainly because we were co relating with the default neutron reference implementation and wanted to bridge gaps between it (with various agents ) and neutron + OVN. With the new redirect chassis support, we could add the floating ip support for the VMs with distributed routing (instead of centralized routing) and also later- HA support for centralized routing.
I have submitted a patch series as an alternate solution to solve this issue. Can you please take a look at it - https://patchwork.ozlabs.org/project/openvswitch/list/?series=69280 and provide your comments. I will also work on having a documentation as you have suggested above. Thanks Numan > > > > >> >> [1] >> https://github.com/openvswitch/ovs/blob/master/Documentation/topics/high-availability.rst >> >> >> >> On Fri, Sep 21, 2018 at 9:28 PM Numan Siddique <[email protected]> >> wrote: >> >>> On Fri, Sep 21, 2018 at 11:06 PM Guru Shetty <[email protected]> wrote: >>> >>> > > >>> > > >>> > > I have tried to make sense of this patch series a few times. I think >>> > > adding increasing complications like this will make gateway code >>> > > unmaintainable. The whole gateway redirect chassis already makes it >>> > > un-understandable and now this will mean that no one will be able to >>> > > understand it as we go forward. [/rant off] >>> > > >>> > >>> > Sorry about the "rant" if it came out as rude - was not my intention. I >>> > will try to offer something more constructive. When you initially >>> looked at >>> > the gateway redirect code, how long did it take for you to understand >>> it? >>> > My main concern is not with your code, but the underlying code. I >>> initially >>> > reviewed the underlying redirect code. And it took me a good 2 days to >>> > understand it. Back then, we wanted something working and it looked >>> like >>> > something that can eventually be simplified. But, we have been adding >>> more >>> > things to it. And right now, if this patch series is added, it will >>> likely >>> > get very very hard to understand it as we go forward. Do you reckon >>> that a >>> > year from now, you will be able to debug a corner case in the code? >>> > >>> > If there are others who have understood the "redirect" gateway code >>> easily, >>> > please do chime in. >>> > >>> > My suggestion is to re-look at the redirect gateway design and see >>> whether >>> > the code can be simplified. Or whether we can write detailed >>> documentation >>> > about it to make it easy to understand. Only once we are satisfied >>> with it, >>> > we can attempt to add more stuff to it. >>> > >>> > >>> I always forget the gateway code and the design. (May be because I >>> haven't >>> worked >>> on this part of the code much) and I have to refresh my memory everytime. >>> I agree that we need to have a detailed documentation. May be "a journey >>> of >>> a packet " >>> kind of documentation in ovn-architecture would also help. >>> >>> The proposed solution in this patch series, If I understand it correctly >>> runs the router pipeline >>> in the source chassis and then resumes it on the gateway chassis. >>> Recently >>> I was looking into >>> SR-IOV kind of scenarios, where the VM traffic by passes the host kernel, >>> and I think the >>> proposed solution will not work. In order to support N/S traffic for >>> SR-IOV >>> instances >>> we have to find probably another way and that would also complicate the >>> code further. >>> >>> However I think it is possible to have one solution to support gateway >>> traffic for >>> VLAN tenant networks which covers SR-IOV and the normal case. I think the >>> router pipeline >>> should be executed only on the gateway chassis. I will have a discussion >>> with >>> Anil on this, this monday and see if it's possible. >>> >>> But unfortunately, supporting these use cases will add some amount of >>> complexity :(. >>> >>> Thanks >>> Numan >>> >>> >>> > >>> > > >>> > >> --- >>> > >> >>> > >> v6->v7: >>> > >> * Added this patch >>> > >> >>> > >> >>> > >> ovn/controller/physical.c | 60 >>> > >> ++++++++++++++++++++++++++++++++++++++++++--- >>> > >> ovn/northd/ovn-northd.8.xml | 10 ++++++++ >>> > >> ovn/northd/ovn-northd.c | 29 ++++++++++++++++++++++ >>> > >> ovn/ovn-architecture.7.xml | 4 ++- >>> > >> 4 files changed, 99 insertions(+), 4 deletions(-) >>> > >> >>> > >> diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c >>> > >> index f269a1d..1f41f59 100644 >>> > >> --- a/ovn/controller/physical.c >>> > >> +++ b/ovn/controller/physical.c >>> > >> @@ -190,7 +190,9 @@ get_zone_ids(const struct sbrec_port_binding >>> > *binding, >>> > >> static void >>> > >> put_local_common_flows(uint32_t dp_key, uint32_t port_key, >>> > >> bool nested_container, const struct zone_ids >>> > >> *zone_ids, >>> > >> - struct ofpbuf *ofpacts_p, struct hmap >>> > *flow_table) >>> > >> + struct ofpbuf *ofpacts_p, struct hmap >>> > *flow_table, >>> > >> + struct local_datapath *ld, >>> > >> + const struct hmap *local_datapaths) >>> > >> { >>> > >> struct match match; >>> > >> >>> > >> @@ -221,11 +223,63 @@ put_local_common_flows(uint32_t dp_key, >>> uint32_t >>> > >> port_key, >>> > >> } >>> > >> } >>> > >> >>> > >> + struct ofpbuf *clone = NULL; >>> > >> + clone = ofpbuf_clone(ofpacts_p); >>> > >> + >>> > >> /* Resubmit to table 34. */ >>> > >> put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p); >>> > >> ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 100, 0, >>> > >> &match, ofpacts_p); >>> > >> >>> > >> + /* For a reply packet from gateway with VLAN switch port as >>> > >> destination >>> > >> + * (excluding localnet_port and external VLAN networks), >>> gateway >>> > >> router >>> > >> + * will use gateway MAC address as source MAC instead of router >>> > >> internal >>> > >> + * port MAC, so that external switches can learn gateway MAC >>> > address. >>> > >> + * Here (before packet is given to the port) we replace router >>> > >> gateway >>> > >> + * MAC address with router internal port MAC. */ >>> > >> + if (ld->localnet_port && (port_key != >>> > >> ld->localnet_port->tunnel_key)) { >>> > >> + for (int i = 0; i < ld->n_peer_dps; i++) { >>> > >> + struct local_datapath *peer_ldp = get_local_datapath( >>> > >> + local_datapaths, >>> ld->peer_dps[i]->peer_dp->tunnel_key); >>> > >> + const struct sbrec_port_binding *crp; >>> > >> + crp = peer_ldp->chassisredirect_port; >>> > >> + if (!crp) { >>> > >> + continue; >>> > >> + } >>> > >> + >>> > >> + if (strcmp(smap_get(&crp->options, "distributed-port"), >>> > >> + ld->peer_dps[i]->peer->logical_port) && >>> > >> + (port_key != ld->peer_dps[i]->patch->tunnel_key)) { >>> > >> + for (int j = 0; j < crp->n_mac; j++) { >>> > >> + struct lport_addresses laddrs; >>> > >> + if (!extract_lsp_addresses(crp->mac[j], >>> &laddrs)) { >>> > >> + continue; >>> > >> + } >>> > >> + match_set_dl_src(&match, laddrs.ea); >>> > >> + destroy_lport_addresses(&laddrs); >>> > >> + break; >>> > >> + } >>> > >> + for (int j = 0; j < ld->peer_dps[i]->peer->n_mac; >>> j++) >>> > { >>> > >> + struct lport_addresses laddrs; >>> > >> + uint64_t mac64; >>> > >> + if (!extract_lsp_addresses( >>> > >> + ld->peer_dps[i]->peer->mac[j], &laddrs)) { >>> > >> + continue; >>> > >> + } >>> > >> + mac64 = eth_addr_to_uint64(laddrs.ea); >>> > >> + put_load(mac64, >>> > >> + MFF_ETH_SRC, 0, 48, clone); >>> > >> + destroy_lport_addresses(&laddrs); >>> > >> + break; >>> > >> + } >>> > >> + put_resubmit(OFTABLE_CHECK_LOOPBACK, clone); >>> > >> + ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, >>> 150, >>> > 0, >>> > >> + &match, clone); >>> > >> + } >>> > >> + } >>> > >> + } >>> > >> + ofpbuf_delete(clone); >>> > >> + >>> > >> /* Table 34, Priority 100. >>> > >> * ======================= >>> > >> * >>> > >> @@ -330,7 +384,7 @@ consider_port_binding(struct ovsdb_idl_index >>> > >> *sbrec_chassis_by_name, >>> > >> >>> > >> struct zone_ids binding_zones = get_zone_ids(binding, >>> > ct_zones); >>> > >> put_local_common_flows(dp_key, port_key, false, >>> &binding_zones, >>> > >> - ofpacts_p, flow_table); >>> > >> + ofpacts_p, flow_table, ld, >>> > >> local_datapaths); >>> > >> >>> > >> match_init_catchall(&match); >>> > >> ofpbuf_clear(ofpacts_p); >>> > >> @@ -531,7 +585,7 @@ consider_port_binding(struct ovsdb_idl_index >>> > >> *sbrec_chassis_by_name, >>> > >> >>> > >> struct zone_ids zone_ids = get_zone_ids(binding, ct_zones); >>> > >> put_local_common_flows(dp_key, port_key, nested_container, >>> > >> &zone_ids, >>> > >> - ofpacts_p, flow_table); >>> > >> + ofpacts_p, flow_table, ld, >>> > >> local_datapaths); >>> > >> >>> > >> /* Table 0, Priority 200, 150 and 100. >>> > >> * ============================== >>> > >> diff --git a/ovn/northd/ovn-northd.8.xml >>> b/ovn/northd/ovn-northd.8.xml >>> > >> index 8fa5272..876c121 100644 >>> > >> --- a/ovn/northd/ovn-northd.8.xml >>> > >> +++ b/ovn/northd/ovn-northd.8.xml >>> > >> @@ -2013,6 +2013,16 @@ next; >>> > >> </li> >>> > >> >>> > >> <li> >>> > >> + A priority-100 logical flow with match >>> > >> + <code>inport == <var>GW</var> && >>> > >> + flags.rcv_from_vlan == 1</code> has actions >>> > >> + <code>eth.dst = <var>E</var>; next;</code>, where >>> > >> + <var>GW</var> is the logical router distributed gateway >>> > >> + port and <var>E</var> is the MAC address of router >>> > >> + distributed gateway port. >>> > >> + </li> >>> > >> + >>> > >> + <li> >>> > >> For each NAT rule in the OVN Northbound database that can >>> > >> be handled in a distributed manner, a priority-100 logical >>> > >> flow with match <code>ip4.src == <var>B</var> && >>> > >> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c >>> > >> index bcf0b66..d012bb8 100644 >>> > >> --- a/ovn/northd/ovn-northd.c >>> > >> +++ b/ovn/northd/ovn-northd.c >>> > >> @@ -4419,6 +4419,15 @@ add_route(struct hmap *lflows, const struct >>> > >> ovn_port *op, >>> > >> } else { >>> > >> ds_put_format(&actions, "ip%s.dst", is_ipv4 ? "4" : "6"); >>> > >> } >>> > >> + >>> > >> + if (op->peer && op->peer->od->localnet_port && >>> > >> + op->od->l3dgw_port && op->od->l3redirect_port && >>> > >> + (op != op->od->l3redirect_port) && >>> > >> + (op != op->od->l3dgw_port)) { >>> > >> + ds_put_format(&match, " && is_chassis_resident(%s)", >>> > >> + op->od->l3redirect_port->json_key); >>> > >> + ds_put_format(&actions, "; flags.rcv_from_vlan = 1"); >>> > >> + } >>> > >> ds_put_format(&actions, "; " >>> > >> "%sreg1 = %s; " >>> > >> "eth.src = %s; " >>> > >> @@ -6131,6 +6140,26 @@ build_lrouter_flows(struct hmap *datapaths, >>> > struct >>> > >> hmap *ports, >>> > >> op->lrp_networks.ipv6_addrs[i].network_s, >>> > >> op->lrp_networks.ipv6_addrs[i].plen, NULL, >>> NULL); >>> > >> } >>> > >> + >>> > >> + /* For a reply packet from gateway with VLAN switch port as >>> > >> + * destination, replace router internal port MAC with >>> router >>> > >> gateway >>> > >> + * MAC address, so that external switches can learn >>> gateway MAC >>> > >> + * address. Later before delivering the packet to the port, >>> > >> + * controller will replace the gateway MAC with router >>> internal >>> > >> port >>> > >> + * MAC in table 33. */ >>> > >> + if (op->od->l3dgw_port && (op == op->od->l3dgw_port) && >>> > >> + op->od->l3redirect_port) { >>> > >> + ds_clear(&actions); >>> > >> + ds_clear(&match); >>> > >> + ds_put_format(&match, "inport == %s", op->json_key); >>> > >> + ds_put_format(&match, " && flags.rcv_from_vlan == 1"); >>> > >> + ds_put_format(&match, " && is_chassis_resident(%s)", >>> > >> + op->od->l3redirect_port->json_key); >>> > >> + ds_put_format(&actions, >>> > >> + "eth.src = %s; next;", >>> > op->lrp_networks.ea_s); >>> > >> + ovn_lflow_add(lflows, op->od, S_ROUTER_IN_GW_REDIRECT, >>> 100, >>> > >> + ds_cstr(&match), ds_cstr(&actions)); >>> > >> + } >>> > >> } >>> > >> >>> > >> /* Convert the static routes to flows. */ >>> > >> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml >>> > >> index ad2101c..0de41d2 100644 >>> > >> --- a/ovn/ovn-architecture.7.xml >>> > >> +++ b/ovn/ovn-architecture.7.xml >>> > >> @@ -1067,7 +1067,9 @@ >>> > >> >>> > >> <p> >>> > >> Flows in table 33 resemble those in table 32 but for >>> logical >>> > >> ports that >>> > >> - reside locally rather than remotely. For unicast logical >>> > output >>> > >> ports >>> > >> + reside locally rather than remotely. If these are VLAN >>> ports >>> > and >>> > >> + packet has router gateway port MAC address as source, >>> replace >>> > it >>> > >> with >>> > >> + router internal port MAC address. For unicast logical >>> output >>> > >> ports >>> > >> on the local hypervisor, the actions just resubmit to >>> table 34. >>> > >> For >>> > >> multicast output ports that include one or more logical >>> ports >>> > on >>> > >> the >>> > >> local hypervisor, for each such logical port <var>P</var>, >>> the >>> > >> actions >>> > >> -- >>> > >> 1.8.3.1 >>> > >> >>> > >> _______________________________________________ >>> > >> dev mailing list >>> > >> [email protected] >>> > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> > >> >>> > > >>> > _______________________________________________ >>> > dev mailing list >>> > [email protected] >>> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> > >>> _______________________________________________ >>> dev mailing list >>> [email protected] >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> >> >> >> -- >> Miguel Ángel Ajo >> OSP / Networking DFG, OVN Squad Engineering >> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
