On Wed, Apr 19, 2023 at 5:17 AM Lucas Martins <[email protected]> wrote: > > Hi Han, all > > On Mon, Apr 17, 2023 at 8:02 PM Han Zhou <[email protected]> wrote: > > > > > > > > On Mon, Apr 17, 2023 at 7:18 AM Lucas Martins <[email protected]> wrote: > > > > > > Thanks all for the discussion and all the ideas here. > > > > > > After reading the emails, I think it boils down to two proposed > > > approaches: > > > > > > 1) CMS to continue to connect to the Southbound database if they need > > > information about the physical location of the resources. That would > > > avoid the inefficiency of having to copy data back-and-forth from the > > > Northbound and Southbound database. > > > > > > I guess the downside of this approach is that CMS will have to > > > maintain a connection with both databases (which is already the case > > > today). > > > > > > If we go with this approach, it would be good to have consensus from > > > the core OVN team where some tables in the Southbound must be kept > > > stable with backward compatibility in case of changes. Tables such > > > Chassis, Chassis_Private and Port_Binding at least will need that. I > > > guess that makes part of the Southbound database to not be considered > > > OVN internal only. > > > > It doesn't look very clean to expose the SB DB to CMS, but in practice I > > think it is not a real problem for doing so for keeping tables backward > > compatible, because even without considering CMS, OVN itself needs to keep > > the compatibility for upgrading. > > I am still open to the idea of keeping things in NB DB only, but at least > > one issue not addressed so far in this discussion (even with the status DB) > > is how to manage the orphan chassis if SB is not exposed to CMS. It seems > > still more practical to me to let CMS connect to SB directly instead of > > introducing a copy of Chassis table in NB and ovn-northd doing the > > back-and-forth data sync. So I am not sure if it should be a goal to remove > > all the SB access from CMS. > > > > Yeah the Chassis situation in OVN is a strange one, indeed. Given that > ovn-controller is the one creating it, not the CMS, shouldn't it be up > to OVN to actually manage that table ? Perhaps we need to introduce > some health check where ovn-controller could ping his own entry in the > Chassis table from time to time to indicate that it's alive. And, > ovn-northd could clean up the orphan entries if they are not updated > in X amount of time.
(Random thought) NB could have an API (ChassisDeprovisionRequest?) that would be used by CMS to request cleanup for a chassis by name. Northd could then update the object with the status of the request, or delete it once it's processed. > > Or piggybacking on the idea of the Status DB, maybe those health > checks could be done in that DB instead ? > > > On the other hand, giving another thought, it doesn't sound too much extra > > cost for propagating the "hosting-chassis" information for chassis-redirect > > ports back to NB, considering that the number of DGP/LRP records is > > relatively small, and we already update "UP" back for LSPs (which are far > > more than LRPs) when they bind to a chassis. For code complexity, we can > > make sure they are handled in similar ways (today only as part of > > ovnsb_db_run(), but I am working on improvements with incremental > > processing). > > > > Right, plus this is only applicable to chassisredirect LRP. > > > If we do that, I'd rather not use the "options" column, because this is not > > a configuration, but a status. Maybe we should introduce a new > > "status/details" column which is extensible for more information in the > > future. > > > > Absolutely, I actually thought of having a separated column while I > was working on the patch. But, I used the "options" column only > because I didn't want to do a schema update. But indeed, it does make > more sense to have its own column for this. > > > But again, we should only do this if it is really the best option for the > > CMS (see below). > > > > I think my real concern here is in fact connecting to SB or NB from every > > node, like the example use case of the ovn-bgp-agent. There are below > > options worth considering: > > - The local OVSDB on each node is also considered as an interface between > > OVN and CMS. It is possible for ovn-controller to expose some information > > to the local OVSDB (e.g. through external-ids of br-int bridge or just the > > openvswitch table). The benefit of this approach is that it reuses existing > > SB connection of ovn-controller and avoids extra connections, but I am not > > sure if this is a better or worse interface for the use case. > > - Deploy a dedicated OVSDB relay cluster (for SB, or NB if it is decided to > > propagate "hosting-chassis" to NB) for the per-node agents if scale becomes > > an issue. This would introduce some operational complexity of course. > > - A centralized CMS component replicates such information from SB/NB to > > CMS's own DB/API, and the per-node component reads through CMS's DB/API > > instead directly from OVN SB/NB. This leaves the scale problem to CMS, and > > for my understanding this is probably more applicable for K8S (thanks to > > the k8s api-server framework) but challenging for OpenStack. > > - The "status DB" proposed by Mark (see my comments below). > > Thanks for the ideas, perhaps we can explore some of them and see if > it would fit for our use case. I particularly like the idea of > potentially using the local OVSDB, something to explore. > > > > > > > > > 2) The second approach is what Mark has described below, having a > > > separated database for the runtime information. > > > > > > One thing I like about this approach is that it keep the role and > > > permissions of each database quite clear, as described: > > > > > > NB DB: CMS writes, OVN reads > > > SB DB: OVN-internal > > > Status DB: CMS reads, OVN writes > > > > > > It also seems to enable a lot of room for extra information about the > > > runtime to be added in the future (as mentioned in Mark's email, LB > > > health status, LSP packet counts etc...). > > > > > > The con is that it's yet another database that needs to be deployed > > > and maintained and another connection for CMS to maintain if they need > > > that sort of information. > > > > This is indeed an interesting idea and it seems not easy to conclude > > whether it's ultimately beneficial or detrimental. On one hand it does make > > the role for each DB clear and also helpful for scaling, but on the other > > hand it sounds too heavy for an extra DB to maintain (both from OVN > > development and operation perspective). > > I think for the use case like the ovn-bgp-agent, this is probably > > unnecessarily complex than it really needs to be, but for the new > > requirement of the telemetry counters brought up by Mark, it seems > > necessary because the counters may need to be constantly updated to the DB > > and either NB or SB would scale for that purpose. So I would propose > > something that may satisfy the telemetry requirement but avoid the burden > > of core OVN maintaining a 3rd DB. > > - ovn-controller can update information to the node local OVSDB (some > > information may already be there, such as packet counters for VIFs, some > > information may need to be calculated with ovn-controller's knowledge for > > the OVS flows, such as counters for north-south trafic of a VIF). > > - a separate component, either from CMS, or a new component of core OVN (if > > this is something commonly required by different CMSes), can export the > > counters to an externally maintain data source that is designed for > > telemetry data, such as a prometheus exporter. > > > > What do you think? > > > > This is indeed an interesting idea, and it's not easy to determine whether > > it's ultimately beneficial or detrimental. On one hand, it clearly defines > > the role of each database and can be helpful for scaling, but on the other > > hand, it might be too heavy to maintain an extra database from both OVN > > development and operational perspectives. > > > > For use cases like the ovn-bgp-agent, this approach may be more complex > > than necessary. However, for the new telemetry counter requirements brought > > up by Mark, it seems appropriate, as the counters may need constant > > updating in the database, and neither the NB nor the SB databases may scale > > well for that purpose. I am also worrying even if we introduce a thrid > > OVSDB raft cluster it is not going to scale well, simply because OVSDB is > > not designed for that purpose at all, even with all the recent > > optimizations. As a potential compromise, I propose a solution that could > > satisfy the telemetry requirements while avoiding the burden of core OVN > > maintaining a third database: > > > > - ovn-controller could provide the counters in the local node OVSDB (some > > information might already be present, such as packet counters for VIFs, > > while other information might require ovn-controller's knowledge of OVS > > flows, such as counters for north-south traffic of a VIF). > > - A separate component, either from the CMS or a new core OVN component (if > > this functionality is commonly required by various CMSes), could export the > > counters to an externally maintained data source designed for telemetry > > data, such as a Prometheus exporter. > > > > What are your thoughts on this approach? > > > > Yeah definitely the status DB is an overkill if the only use case we > have is the ovn-bgp-agent one. The idea of using the local OVSDB is > something that I think worth exploring. > > > Regards, > > Han > > > > > > > > ... > > > > > > Is there any preference of which approach the core OVN team would prefer > > > here ? > > > > > > Cheers, > > > Lucas > > > > > > > > > On Thu, Apr 13, 2023 at 6:11 PM Mark Michelson <[email protected]> > > > wrote: > > > > > > > > Hi all, > > > > > > > > I just caught up on this discussion and wanted to complicate things > > > > further by suggesting another idea. I think the Red Hat folks have heard > > > > this before, but I'm not sure if it has been brought up on this list > > > > before. > > > > > > > > Aside from this issue, there is also this high-priority issue from Red > > > > Hat Openstack: https://bugzilla.redhat.com/show_bug.cgi?id=2123176 . > > > > > > > > IMO, this all converges on the idea of introducing a third database to > > > > OVN. We can refer to this as the "Status" DB. > > > > > > > > The Status DB would be a place for state information generated by > > > > OVN/OVS to be stored. Some ideas for existing things that could go in > > > > the Status DB would be: > > > > * Logical port up/down state. > > > > * Logical switch port dynamic addresses (maybe, this is more > > > > complicated) > > > > * BFD status > > > > * Logical port installation status and installation timestamp. > > > > > > > > In addition to these existing items, the Status DB would be a place for > > > > additional items that do not exist yet, such as > > > > * Load balancer health check status > > > > * Logical port packet/byte counts > > > > * Gateway port bound chassis > > > > > > > > With the implementation of the Status DB, it would cement a relationship > > > > between the DBs as such: > > > > > > > > NB DB: CMS writes, OVN reads > > > > SB DB: OVN-internal > > > > Status DB: CMS reads, OVN writes > > > > > > > > It may be tempting to get this patch merged as-is, with the intention of > > > > migrating this to the new DB once it gets implemented. I don't think > > > > this is a good idea. Between this issue and the one I linked, I think > > > > the implementation of a Status DB is a good idea, and one that should be > > > > implemented very soon. > > > > > > > > Since this particular problem is already worked around by OpenStack, I > > > > think it makes more sense to implement this feature in a way that will > > > > be easier to maintain long-term than to get it in quickly. If we merge > > > > this as-is, then we are on the hook for supporting this status in the NB > > > > DB for quite a long time since we would need to take time to deprecate > > > > it properly. If we instead treat this as the impetus to write the Status > > > > DB, then I think this lightweight use-case would give us a good starting > > > > point towards adding the other items we're interested in. > > > > > > > > What do you think? > > > > > > > > On 4/13/23 09:32, Lucas Martins wrote: > > > > > Hi Han, Dumitru and Luis, > > > > > > > > > > Thanks for the discussion and ideas so far. My reply is inline: > > > > > > > > > > On Thu, Apr 13, 2023 at 10:45 AM Luis Tomas Bolivar > > > > > <[email protected]> wrote: > > > > >> > > > > >> > > > > >> > > > > >> On Thu, Apr 13, 2023 at 9:33 AM Dumitru Ceara <[email protected]> > > > > >> wrote: > > > > >>> > > > > >>> On 4/12/23 23:07, Han Zhou wrote: > > > > >>>> On Wed, Apr 12, 2023 at 8:01 AM <[email protected]> wrote: > > > > >>>>> > > > > >>>>> From: Lucas Alvares Gomes <[email protected]> > > > > >>>>> > > > > >>>>> In order for the CMS to know which Chassis a distributed gateway > > > > >>>>> port > > > > >>>>> is bond to, this patch updates the ovn-northd daemon to populate > > > > >>>>> the > > > > >>>>> Logical_Router_Port table with that information. > > > > >>>>> > > > > >>>>> To avoid changing the database schema, ovn-northd is setting a > > > > >>>>> new key > > > > >>>>> called "hosting-chassis" in the options column from the LRP > > > > >>>>> table. This > > > > >>>>> key value points to the name of the Chassis that is currently > > > > >>>>> hosting > > > > >>>>> the distributed port. > > > > >>>>> > > > > >>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2107515 > > > > >>>>> Signed-off-by: Lucas Alvares Gomes <[email protected]> > > > > >>> > > > > >>> Hi, Lucas, Han, > > > > >>> > > > > >>>> > > > > >>>> Thanks Lucas for the patch. However, in my opinion the chassis > > > > >>>> binding > > > > >>>> information belongs to SB and should stay there, otherwise we > > > > >>>> would make it > > > > >>>> consistent for LSPs and update the chassis information for them, > > > > >>>> too, which > > > > >>>> I think is not good in terms of clarity and extra control plane > > > > >>>> load. We'd > > > > >>>> better keep the separation between NB and SB clear and avoid > > > > >>>> propagating > > > > >>>> data between them back-and-forth. > > > > >>>> > > > > >>> > > > > >>> I partially agree with this but it also feels wrong that the CMS > > > > >>> accesses the SB directly. In an ideal world (and I know that's not > > > > >>> the > > > > >>> case today for neutron or ovn-k8s) the CMS should not care about > > > > >>> what's > > > > >>> in the SB; that is internal OVN data. > > > > >> > > > > >> > > > > >> Just to add some extra input in here. As Dumitru mentioned, it is > > > > >> not just a scaling issue, but that accessing the SB has its own > > > > >> problems as things can change there any time (it has already > > > > >> happened) breaking the logic on the CMS about how to react to those > > > > >> changes. If we don't have the information at the NB, that means we > > > > >> need 2 connections, one for the NB (to be as safe as possible from > > > > >> the SB changes), and one for the SB to get the chassis information. > > > > >> > > > > > > > > > > Right. So the idea is to have the CMS to only connect to the > > > > > Northbound database instead of maintaining a connection with both > > > > > databases (helping scalability). I don't know what the consensus is > > > > > but, if we agree that the Southbound database is used to store the > > > > > internal OVN data, I think it would be in everyone's favour if CMS > > > > > only used the Northbound database because as Luis pointed out apart > > > > > from scalability issues, the data structure in the Southbound database > > > > > can change overtime without any backwards compatibility and it will > > > > > break us (it already happened). > > > > > > > > > >> Also, note there is already chassis information on the > > > > >> logical_switch_ports at the NB DB, so adding that for the cr-lrps > > > > >> should not be that different. Adding the active chassis to the > > > > >> HA_Chassis_group also sounds good > > > > > > > > > > So I believe this is the option "requested-chassis" that Neutron sets > > > > > in the LSP. The difference is that this option is set by the CMS and > > > > > the new option "hosting-chassis" from my patch is set by northd > > > > > instead. But, there are still similarities because it's also the CMS > > > > > that sets the ha_chassis_group (or gateway_chassis) for a port to make > > > > > it HA. The proposed "hosting-chassis" option is just a way for northd > > > > > to give the CMS a feedback about which chassis from the group that > > > > > port ended up binding to. > > > > > > > > > >>> > > > > >>> > > > > >>> I suggest a different approach if we want to go ahead and propagate > > > > >>> such > > > > >>> information to the NB: can't we store the "active chassis" > > > > >>> information > > > > >>> per Gateway_chassis/HA_Chassis_group instead? That's > > > > >>> O(number-of-chassis) records that we need to update on chassis > > > > >>> failover. > > > > >>> We might even skip this for Gateway_chassis as I understand that > > > > >>> this > > > > >>> is the "old" way of configuring things (*). > > > > >>> > > > > > > > > > > That makes sense for me as well. So in the HA_Chassis_Group we would > > > > > have a column with the current active chassis name ? That would be > > > > > good because we can't really rely on the "priority" order because if > > > > > there is a fallback to another chassis, the CMS is blind to it. > > > > > > > > > >>> (*) Should we deprecate Gateway_chassis? > > > > >>> > > > > > > > > > > I think Neutron still uses it but, with my core OVN hat on I think it > > > > > is already time. Right now in the Northbound database we have > > > > > HA_Chassis_Group and Gateway_Chassis doing the same thing. I believe > > > > > that in the Southbound everything becomes a HA_Chassis_Group. So it's > > > > > fair to get rid of the Gateway_Chassis way already. > > > > > > > > > >>>> For the problem mentioned in the bugzilla, it seems to me already > > > > >>>> a scale > > > > >>>> challenge that something other than ovn-controller is connecting > > > > >>>> to OVN SB > > > > >>>> from every node (if I understand correctly). Moving all these > > > > >>>> connections > > > > >>>> from SB to NB may just make it much worse, because NB DB is > > > > >>>> usually more > > > > >>>> heavily/frequently updated by the CMS. (For small scale, this may > > > > >>>> not > > > > >>>> matter, even if the agent connects to both NB and SB.) > > > > >>>> > > > > >>> > > > > >>> An alternative to address the scale issue without changing OVN > > > > >>> could be > > > > >>> to use a dedicated SB relay to which all external (non-OVN) agents > > > > >>> that > > > > >>> need access to SB information can connect. Would that help? > > > > >>> > > > > > > > > > > The problem with it is that, more often than not we actually need to > > > > > connect to both databases (as stated above) and there's no backward > > > > > compatibility regards the data structure in the Southbound database > > > > > because it is supposed to be internal OVN data. That's why having the > > > > > CMS to only connect to the Northbound is a plus. > > > > > > > > > > Cheers, > > > > > Lucas > > > > > > > > > > > > > > > On Thu, Apr 13, 2023 at 10:45 AM Luis Tomas Bolivar > > > > > <[email protected]> wrote: > > > > >> > > > > >> > > > > >> > > > > >> On Thu, Apr 13, 2023 at 9:33 AM Dumitru Ceara <[email protected]> > > > > >> wrote: > > > > >>> > > > > >>> On 4/12/23 23:07, Han Zhou wrote: > > > > >>>> On Wed, Apr 12, 2023 at 8:01 AM <[email protected]> wrote: > > > > >>>>> > > > > >>>>> From: Lucas Alvares Gomes <[email protected]> > > > > >>>>> > > > > >>>>> In order for the CMS to know which Chassis a distributed gateway > > > > >>>>> port > > > > >>>>> is bond to, this patch updates the ovn-northd daemon to populate > > > > >>>>> the > > > > >>>>> Logical_Router_Port table with that information. > > > > >>>>> > > > > >>>>> To avoid changing the database schema, ovn-northd is setting a > > > > >>>>> new key > > > > >>>>> called "hosting-chassis" in the options column from the LRP > > > > >>>>> table. This > > > > >>>>> key value points to the name of the Chassis that is currently > > > > >>>>> hosting > > > > >>>>> the distributed port. > > > > >>>>> > > > > >>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2107515 > > > > >>>>> Signed-off-by: Lucas Alvares Gomes <[email protected]> > > > > >>> > > > > >>> Hi, Lucas, Han, > > > > >>> > > > > >>>> > > > > >>>> Thanks Lucas for the patch. However, in my opinion the chassis > > > > >>>> binding > > > > >>>> information belongs to SB and should stay there, otherwise we > > > > >>>> would make it > > > > >>>> consistent for LSPs and update the chassis information for them, > > > > >>>> too, which > > > > >>>> I think is not good in terms of clarity and extra control plane > > > > >>>> load. We'd > > > > >>>> better keep the separation between NB and SB clear and avoid > > > > >>>> propagating > > > > >>>> data between them back-and-forth. > > > > >>>> > > > > >>> > > > > >>> I partially agree with this but it also feels wrong that the CMS > > > > >>> accesses the SB directly. In an ideal world (and I know that's not > > > > >>> the > > > > >>> case today for neutron or ovn-k8s) the CMS should not care about > > > > >>> what's > > > > >>> in the SB; that is internal OVN data. > > > > >> > > > > >> > > > > >> Just to add some extra input in here. As Dumitru mentioned, it is > > > > >> not just a scaling issue, but that accessing the SB has its own > > > > >> problems as things can change there any time (it has already > > > > >> happened) breaking the logic on the CMS about how to react to those > > > > >> changes. If we don't have the information at the NB, that means we > > > > >> need 2 connections, one for the NB (to be as safe as possible from > > > > >> the SB changes), and one for the SB to get the chassis information. > > > > >> > > > > >> Also, note there is already chassis information on the > > > > >> logical_switch_ports at the NB DB, so adding that for the cr-lrps > > > > >> should not be that different. Adding the active chassis to the > > > > >> HA_Chassis_group also sounds good > > > > >>> > > > > >>> > > > > >>> I suggest a different approach if we want to go ahead and propagate > > > > >>> such > > > > >>> information to the NB: can't we store the "active chassis" > > > > >>> information > > > > >>> per Gateway_chassis/HA_Chassis_group instead? That's > > > > >>> O(number-of-chassis) records that we need to update on chassis > > > > >>> failover. > > > > >>> We might even skip this for Gateway_chassis as I understand that > > > > >>> this > > > > >>> is the "old" way of configuring things (*). > > > > >>> > > > > >>> (*) Should we deprecate Gateway_chassis? > > > > >>> > > > > >>>> For the problem mentioned in the bugzilla, it seems to me already > > > > >>>> a scale > > > > >>>> challenge that something other than ovn-controller is connecting > > > > >>>> to OVN SB > > > > >>>> from every node (if I understand correctly). Moving all these > > > > >>>> connections > > > > >>>> from SB to NB may just make it much worse, because NB DB is > > > > >>>> usually more > > > > >>>> heavily/frequently updated by the CMS. (For small scale, this may > > > > >>>> not > > > > >>>> matter, even if the agent connects to both NB and SB.) > > > > >>>> > > > > >>> > > > > >>> An alternative to address the scale issue without changing OVN > > > > >>> could be > > > > >>> to use a dedicated SB relay to which all external (non-OVN) agents > > > > >>> that > > > > >>> need access to SB information can connect. Would that help? > > > > >>> > > > > >>> Regards, > > > > >>> Dumitru > > > > >>> > > > > >>> _______________________________________________ > > > > >>> dev mailing list > > > > >>> [email protected] > > > > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > >> > > > > >> > > > > >> > > > > >> -- > > > > >> LUIS TOMÁS BOLÍVAR > > > > >> Principal Software Engineer > > > > >> Red Hat > > > > >> Madrid, Spain > > > > >> [email protected] > > > > >> > > > > > > > > > > _______________________________________________ > > > > > dev mailing list > > > > > [email protected] > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
