On Tue, Nov 26, 2019 at 9:54 PM Numan Siddique <[email protected]> wrote: > > > > On Wed, Nov 27, 2019, 7:25 AM Han Zhou <[email protected]> wrote: >> >> On Sun, Nov 17, 2019 at 10:55 PM Numan Siddique <[email protected]> wrote: >> > >> > On Mon, Nov 18, 2019 at 8:22 AM Han Zhou <[email protected]> wrote: >> > > >> > > On Sat, Nov 16, 2019 at 4:03 AM Numan Siddique <[email protected]> wrote: >> > > > >> > > > >> > > > >> > > > On Fri, Nov 15, 2019 at 1:18 PM Han Zhou <[email protected]> wrote: >> > > >> >> > > >> >> > > >> >> > > >> On Thu, Nov 14, 2019 at 10:25 PM Numan Siddique <[email protected]> >> wrote: >> > > >> > >> > > >> > On Thu, Nov 14, 2019 at 11:44 PM Han Zhou <[email protected]> wrote: >> > > >> > > >> > > >> > > On Wed, Nov 13, 2019 at 10:27 AM Numan Siddique < [email protected]> >> > > wrote: >> > > >> > > > >> > > >> > > > On Thu, Oct 31, 2019 at 2:43 AM Han Zhou <[email protected]> wrote: >> > > >> > > > > >> > > >> > > > > The series supports interconnecting multiple OVN deployments >> > > (e.g. >> > > >> > > located at >> > > >> > > > > multiple data centers) through logical routers connected with >> > > tansit >> > > >> > > logical >> > > >> > > > > switches with overlay tunnels, managed through OVN control >> > > plane. See >> > > >> > > the >> > > >> > > > > ovn-architecture.rst document updates for more details, and >> find >> > > the >> > > >> > > > > instructions in >> Documentation/tutorials/ovn-interconnection.rst. >> > > >> > > > > >> > > >> > > > > Han Zhou (13): >> > > >> > > > > ovn-architecture: Add documentation for OVN interconnection >> > > feature. >> > > >> > > > > ovn-inb: Interconnection northbound DB schema and CLI. >> > > >> > > > > ovn-isb: Interconnection southbound DB schema and CLI. >> > > >> > > > > ovn-ic: Interconnection controller with AZ registeration. >> > > >> > > > > ovn-northd.c: Refactor allocate_tnlid. >> > > >> > > > > ovn-ic: Transit switch controller. >> > > >> > > > > ovn-sb: Add columns is_interconn and is_remote to Chassis. >> > > >> > > > > ovn-ic: Interconnection gateway controller. >> > > >> > > > > ovn-ic: Interconnection port controller. >> > > >> > > > > ovn.at: e2e test for OVN interconnection. >> > > >> > > > > ovn-ctl: Refactor to reduce redundant code. >> > > >> > > > > ovn-ctl: Support commands for interconnection. >> > > >> > > > > tutorial: Add tutorial for OVN Interconnection. >> > > >> > > > > >> > > >> > > > >> > > >> > > > Hi Han, >> > > >> > > > >> > > >> > > > Thanks for working on this feature. It's an interesting use >> case. >> > > >> > > > >> > > >> > > > I had a quick look at all the patches. >> > > >> > > > >> > > >> > > >> > > >> > > Numan, thanks a lot for the thorough review! Please see my >> response >> > > inlined. >> > > >> > > >> > > >> > > > I have few comments >> > > >> > > > >> > > >> > > > 1. I would suggest to rename the DBs as ovn-ic-nb.ovsschema >> (and >> > > the >> > > >> > > > same for ovn-isb). >> > > >> > > > The DB name is - OVN_IC_Northbound. So it would make sense >> to >> > > have >> > > >> > > > - ovn-ic-nb.ovsschema >> > > >> > > > I would also suggest to rename the utilities to >> ovn-ic-nbctl >> > > and >> > > >> > > > ovn-ic-sbctl. >> > > >> > > > With ovn-inbctl/ovn-isbctl, it is really confusing. >> > > >> > > > >> > > >> > > Sure, I felt not quite convenient with two dashes in a command >> name. >> > > I >> > > >> > > agree that ovn-ic-nbctl and ovn-ic-sbctl are more clear. I can >> > > change it. >> > > >> > > >> > > >> > > > 2. ovn-ic service writes to interconnect south db, ovn north >> db and >> > > >> > > > ovn south db. Writing to ic south db and >> > > >> > > > ovn north db is fine. But it seems a little odd for ovn-ic >> to >> > > >> > > > write to south db. From what I understand it writes >> > > >> > > > to south db for 3 purposes >> > > >> > > > a. Updating the tunnel_key column of datapath_binding >> > > >> > > > representing the transit switch >> > > >> > > > b. Updating the key column of port_binding representing >> the >> > > >> > > > logical router port connecting to the transit switch. >> > > >> > > > c. Creating chassis rows for remote gateway chassis. >> > > >> > > > >> > > >> > > > I think it's better if ovn-ic can delegate all these to >> > > ovn-northd. >> > > >> > > > For (a) and (b), ovn-ic can set the generated tunnel key >> > > >> > > > in the other_config/options column of Logical switch/Logical >> > > switch >> > > >> > > > port. ovn-northd can internally set this value to >> > > >> > > > the south db. >> > > >> > > > >> > > >> > > > For (c), I think its better we add another table - >> > > Remote_Chassis >> > > >> > > > (or some other appropriate name) . ovn-ic will create rows >> > > >> > > > in this table for each remote chassis and ovn-northd will >> create >> > > >> > > > chassis row in south db. >> > > >> > > > We already have Gateway_Chassis table in North db. So I >> think >> > > it's >> > > >> > > > fine if we add Remote_Chassis table. The only odd thing >> > > >> > > > would be is to store the encap details in North db. >> > > >> > > > >> > > >> > > > With this, ovn-ic, doesn't need to worry about syncing >> between >> > > >> > > > interconnect south db and ovn south db. In my opinion ovn-ic >> > > >> > > > should act more like CMS to each availability zone and hence >> > > should >> > > >> > > > not do any write transactions to the south db. >> > > >> > > > >> > > >> > > > Any concerns with this proposed approach ? >> > > >> > > > >> > > >> > > We could avoid ovn-ic writing directly to SB with some extra >> logic in >> > > >> > > northd, but I don't see any problem for ovn-ic to update SB >> > > directly. First >> > > >> > > of all, we have hypervisors creating and updating SB directly for >> > > Chassis >> > > >> > > and Encap records. The design here is that ovn-ic updates Chassis >> > > and Encap >> > > >> > > on behalf of remote gateway nodes, which I think is >> straightforward >> > > and >> > > >> > > reasonable. Similarly, port-binding's chassis column is updated >> the >> > > same >> > > >> > > way as how hypervisors are updating it. Secondly, for tunnel keys >> > > updating, >> > > >> > > it may seem graceful to update from northd, since northd is the >> only >> > > client >> > > >> > > that updates tunnel keys today, but since ovn-ic is responsible >> for >> > > >> > > calculating these keys, and it already has connection to SB, I >> think >> > > it is >> > > >> > > just more natural and efficient to update it directly, to avoid >> the >> > > extra >> > > >> > > logic and unnecessary latency from northd with another round of >> > > >> > > computation. The scope of the ovn-ic is only for the >> interconnection >> > > >> > > objects, so I don't see any conflict or ownership ambiguity here. >> > > What do >> > > >> > > you think? >> > > >> > >> > > >> > Regarding the ovn-ic writing to SB DB, I would like to get other's >> > > >> > opinion as well. >> > > >> > I agree to your points, but at the same time concerned with few >> things >> > > like >> > > >> > - What about RBAC for ovn-ic ? >> > > >> > - Right now we have RBAC for ovn-controller in writing to the >> SB DB. >> > > >> > - Do we want something similar for ovn-ic if ovn-ic writes to >> SB DB. >> > > >> > - Does CMS need to do something similar for ovn-ic, like how it >> is >> > > documented >> > > >> > here - >> > > https://mail.openvswitch.org/pipermail/ovs-dev/2019-November/364481.html >> > > >> > (related discussion here - >> > > >> > >> > > >> https://mail.openvswitch.org/pipermail/ovs-discuss/2019-November/049501.html >> > > ) >> > > >> > if no RBAC for ovn-ic ? >> > > >> > >> > > >> For RBAC, I think ovn-ic and northd are at the same position, because >> > > they both are AZ level daemons, just focusing on different tasks. In >> theory >> > > they can be put in same process, but I separated them for clarity. So >> from >> > > RBAC perspective they should be just the same. >> > > >> Today there is no role for northd, which I think is a flaw, as >> discussed >> > > in the thread you pointed. It is not a big problem though, because a >> > > workaround is simply creating a separate connection and use iptable >> rules >> > > to restrict access from central nodes only. Same practice should be used >> > > for ovn-ic. >> > > >> >> > > >> > @Mark Michelson @Dumitru Ceara You have any thoughts/concerns on >> > > >> > ovn-ic writing to SB DB ? >> > > >> > >> > > >> > Regarding the tunnel key there are 2 things here >> > > >> > (1) tunnel key for transit datapath >> > > >> > (2) tunnel key for logical port connected to the transit switch >> > > >> > >> > > >> > For (1), you can completely avoid "Sync ISB TS tunnel key to AZ SB >> > > >> > datapath" by storing the tunnel id >> > > >> > in logical switch in NB DB like - >> > > >> > nbrec_logical_switch_update_other_config_setkey(ls, >> > > >> > "interconn-ts-tunnelkey, tunnelkey) >> > > >> > >> > > >> > ovn-northd when creating the datapath_binding row can set this >> value >> > > >> > directly. With this approach I think the function >> > > >> > ts_run() can be simplified a bit as it doesn't need to call - >> > > >> > SBREC_DATAPATH_BINDING_FOR_EACH() >> > > >> > >> > > >> >> > > >> I agree this approach should work. The code might be a little simpler >> > > but the latency will be added because of the extra computer iteration >> from >> > > northd to SB DB, although it might not be a big concern. So I don't have >> > > strong preference for either approach. >> > > >> >> > > >> > Right now CMS is expected to create lsp-lrp ports on the transit >> > > >> > switch - logical router on each AZ. >> > > >> > Instead of this, can't CMS/user create these links in IC-NB table ? >> > > >> > '' >> > > >> > something like >> > > >> > ovn-ic-nbctl ts-add ts0 >> > > >> > ovn-ic-nbctl tsp-add ts0 az0 az0-lr0 MAC1 IP1 >> > > >> > ovn-ic-nbctl tsp-add ts0 az1 az1-lr0 MAC2 IP2 >> > > >> > >> > > >> > (where az0-lr0 is logical router present in az0's NB DB and >> az1-lr0 >> > > >> > is logical router >> > > >> > present in az1's NB DB). >> > > >> > >> > > >> > ovn-ic will internally create lsp-lrp ports in its AZ's NB DB. It >> can >> > > >> > also create 'Port_Binding' >> > > >> > row in IC-SB DB and doesn't need to worry about syncing between >> the SB >> > > >> > DB and IC-SB DB. >> > > >> > >> > > >> > This would probably solve (2) and ovn-ic can set the tunnel key >> when >> > > >> > creating the lsp-lrp and ovn-northd >> > > >> > will make use of this tunnel key when creating the port_binding >> rows >> > > in SB DB. >> > > >> > >> > > >> > With this user/CMS doesn't have to worry about creating the >> lsp-lrps >> > > >> > in each AZ and I think >> > > >> > this seems more logical to me. >> > > >> > >> > > >> > Running "ovn-ic-nbctl show" will give a clear output to the user. >> > > >> > >> > > >> >> > > >> This is a different operation model than what I had in mind. My >> initial >> > > idea was that all configuration should be done at each AZ itself, and >> the >> > > ovn-ic from each AZ will sync between each other automatically, so >> there is >> > > no need for configuring the IC DBs by CMS/User. The reason why I end up >> > > with a IC-NB DB is the use case of multi-tenancy, when an operator >> needs to >> > > isolate different tenants on the backbone. They can create different TS >> for >> > > this purpose. Each TS can represent a tenant or segmentation-zone. This >> > > information should be specified by user and it is global, so it is >> stored >> > > in IC-NB DB. Other than this, there is no need for global level >> > > configurations. All the information that can be populated by ovn-ic is >> > > stored in IC-SB. >> > > >> >> > > >> With your proposal, LRP's from each AZ is directly configured by >> user in >> > > global IC-NB. It is a valid approach, too. >> > > >> >> > > >> However, I would prefer configuring them at AZ level for below >> > > considerations: >> > > >> - Try to avoid introducing new configurations. Other than the TS >> switch, >> > > there is nothing new. All the operations of LRP creation stays the same >> way >> > > as it is today, which I believe simplifies operation. >> > > >> - The ownership is more clear. AZ information of each LRP is >> populated >> > > automatically by ovn-ic from each AZ, and each AZ will only update its >> own >> > > LRPs to global DB. Each AZ has full control of its own intention of >> joining >> > > the global interconnection, deciding which router ports and gateways >> should >> > > get involved. >> > > >> - It might seem attractive to have a global view by configuring all >> > > these LRPs at global level, but in fact we will still need to configure >> > > static routes for each other at each AZ level. So the overall >> configuration >> > > will be half-global and half-local, which might in fact seem less >> natural. >> > > > >> > > > >> > > > I thought about the static routes. With my proposal, ovn-ic can >> > > configure the static routes too because it knows the transit switch >> > > configuration would have all the details. >> > > > >> > > I wanted to make the configuration at IC-NB as simple as possible. For >> the >> > > static routes, currently they can be configured at each AZ, but I am >> also >> > > planning a further improvement to avoid the configuration at all, by >> > > learning the routes of edge routers from each other automatically >> through >> > > IC-SB, but I will add it after the basic feature is consolidated. >> > > >> > > > >> > > >> - The command ovn-ic-nbctl show may seem not showing enough >> information, >> > > but in fact ovn-ic-sbctl show tells all details with the hierarchy of >> AZ -> >> > > GW -> Ports (for each port it also shows its TS and subnet/IP). I think >> it >> > > provides a global view of both logical and physical. >> > > >> >> > > >> If the change is for avoid updating tunnel keys to SB directly, we >> can >> > > do similar approach as you suggested for TS. >> > > >> >> > > >> > If we take this approach, ovn-ic doesn't need to write to the SB DB >> > > >> > except for creating 'Chassis' table in >> > > >> > SB DB. We could avoid this also if we add Remote_Chassis in NB DB. >> But >> > > >> > we can discuss further on this. >> > > >> > >> > > >> > Let me know what you think. >> > > >> > >> > > >> For writing to SB, I am ok to avoid writing tunnel keys (for TS and >> for >> > > port-binding) directly to SB but through northd, although this would >> > > introduce some latency which I think is not a big issue. >> > > > >> > > > >> > > > Ok. >> > > >> >> > > >> But for chassis/encap/port-binding's chassis, I would still prefer to >> > > update to SB directly, consistent with what we are doing from >> > > ovn-controller. The only difference from ovn-controller is that RBAC is >> > > necessary from ovn-controller but not from ovn-ic, since ovn-ic is AZ >> level >> > > component that only runs on central nodes. >> > > >> If in the future we want to avoid SB update from ovn-controller, too, >> > > then we can make the same change in a consistent way then for both >> > > ovn-controller and ovn-ic. >> > > >> > >> > > >> > > >> > > >> > > > 3. In patch 7,its better to rename the ovs configuration >> option - >> > > >> > > > "external_ids:is-interconn" to >> "external_ids:ovn-is-interconn". >> > > >> > > > You also need to document it in ovn-controller-8.xml. >> > > >> > > > >> > > >> > > Agree! >> > > >> > > >> > > >> > > > Or maybe we can remove this option - >> external_ids:is-interconn. >> > > We >> > > >> > > > probably don't need this if we do like below >> > > >> > > > >> > > >> > > > 2 (c) can also be done this way >> > > >> > > > - User creates transit switch. >> > > >> > > > - ovn-ic creates transit switch in north db. >> > > >> > > > - then the user adds the logical router port - logical >> > > switch >> > > >> > > > port to the transit switch in availability zone - az1 (for >> example) >> > > >> > > > - then the user creates gw chassis - for example >> > > >> > > > ovn-nbctl lrp-set-gateway-chassis lrp-lr1-ts1 >> > > <gateway >> > > >> > > > name> [priority] >> > > >> > > > - ovn-ic on az1 ,learns this and creates a >> Gateway_Chassis >> > > >> > > > row in the interconnect south db. >> > > >> > > > - ovn-ic on other az's then create Remote_Chassis >> rows in >> > > the >> > > >> > > North db. >> > > >> > > > - ovn-northd on other az's then create chassis row in >> > > south db >> > > >> > > > with "is_remote" set to true. >> > > >> > > > >> > > >> > > > I am not sure if this complicates further and hence its >> better >> > > >> > > > that ovn-ic writes to the south dbs. But we can discuss >> further on >> > > >> > > > this. >> > > >> > > > >> > > >> > > >> > > >> > > I think this is a good idea and I am incline to it, because it >> > > avoids one >> > > >> > > configuration on the gateway node, which is good. >> > > >> > > The only concern is that it makes the remote gateway sync more >> > > dynamic - it >> > > >> > > changes when LRP's gateway-chassis/ha-chassis settings change, >> which >> > > may be >> > > >> > > less efficient. The code of ovn-ic will be a little more >> complex. I >> > > think >> > > >> > > it should be ok, but I'd like to hear from you, too. >> > > >> > >> > > >> > I would prefer to go with this approach. Another advantage is that, >> > > >> > ovn-controller don't >> > > >> > have to establish tunnels with the remote chassis until it really >> has >> > > >> > to. I think with the present >> > > >> > approach it establishes tunnels even if a transit switch doesn't >> have >> > > >> > any router ports ? >> > > >> > Correct me if I am wrong here ? >> > > >> > >> > > >> >> > > >> Yes you are right. I think it is mainly about static v.s. dynamic. >> The >> > > benefit of static is that it is more predicable and efficient (maybe), >> > > while the dynamic approach avoids the extra configuration of >> > > "is-interconnection". I don't worry about the tunnel setup yet, since >> it is >> > > merely a piece of data in the ovsdb table on the chassis, and there >> won't >> > > be too many interconnection gateways. But I think I am inclined to >> change >> > > as you suggested, to avoid the extra configuration. >> > > > >> > > > >> > > > Ok. So to summarize from our discussion, v3 of this series would >> > > > - avoid tunnel key updates to the SB DB in ovn-ic >> > > > - but it will create remote chassis in SB DB right ? >> > > > >> > > >> > > Yes. In addition, I will also remove the "is-interconn" configuration >> for >> > > Chassis as you suggested, and determine the Chassis is for >> interconnection >> > > if there is a LRP@TS assigned to it, by either gateway_chassis or by >> > > ha_chassis. >> > > Does this all sound good for v3? >> > >> > Yes. Its sounds good to me. >> > >> > Numan >> > >> Hi Numan, >> >> I thought more about the "is-interconn" configuration for chassis. I want >> to keep it for now, for the below reasons: >> 1. It will be more complex to rely on checking all the LRPs on TS switches >> and then looking for related settings in gateway-chassis and ha-chassis. >> 2. LRP change in one AZ can cause all ovn-controllers in all the other AZs >> to recompute, when a gateway chassis's last port is removed or the first >> port is added, because this triggers the chassis add/delete in remote AZs, >> and chassis change is not handled incrementally (yet). If there is only LRP >> change itself doesn't cause recompute in other AZs. The recompute can also >> cause latency for LRP changes to take effect. >> >> It seems natural to me to configure it since usually admin would assign >> roles to nodes beforehand and it should be clear that which nodes are going >> to be used as gateway nodes. We can still try to implement the dynamic >> discovery logic as an enhancement, if needed. If we implement it, I think >> there is no compatibility problem to worry about in this case because the >> config would just become useless and harmless. So, do you think it is ok to >> keep this configuration at least for the first version? > > > > Thanks for the detailed explanation. Sounds reasonable to me. > > Thanks > Hi Numan,
I addressed the comments and sent out v3: https://patchwork.ozlabs.org/project/openvswitch/list/?series=155577 PTAL. Thanks, Han _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
