Re: [ovs-dev] [PATCH ovn] Expose distributed gateway port information in NB DB

Ihar Hrachyshka Thu, 27 Apr 2023 10:16:03 -0700

On Wed, Apr 19, 2023 at 5:17 AM Lucas Martins <[email protected]> wrote:
>
> Hi Han, all
>
> On Mon, Apr 17, 2023 at 8:02 PM Han Zhou <[email protected]> wrote:
> >
> >
> >
> > On Mon, Apr 17, 2023 at 7:18 AM Lucas Martins <[email protected]> wrote:
> > >
> > > Thanks all for the discussion and all the ideas here.
> > >
> > > After reading the emails, I think it boils down to two proposed 
> > > approaches:
> > >
> > > 1) CMS to continue to connect to the Southbound database if they need
> > > information about the physical location of the resources. That would
> > > avoid the inefficiency of having to copy data back-and-forth from the
> > > Northbound and Southbound database.
> > >
> > > I guess the downside of this approach is that CMS will have to
> > > maintain a connection with both databases (which is already the case
> > > today).
> > >
> > > If we go with this approach, it would be good to have consensus from
> > > the core OVN team where some tables in the Southbound must be kept
> > > stable with backward compatibility in case of changes. Tables such
> > > Chassis, Chassis_Private and Port_Binding at least will need that. I
> > > guess that makes part of the Southbound database to not be considered
> > > OVN internal only.
> >
> > It doesn't look very clean to expose the SB DB to CMS, but in practice I 
> > think it is not a real problem for doing so for keeping tables backward 
> > compatible, because even without considering CMS, OVN itself needs to keep 
> > the compatibility for upgrading.
> > I am still open to the idea of keeping things in NB DB only, but at least 
> > one issue not addressed so far in this discussion (even with the status DB) 
> > is how to manage the orphan chassis if SB is not exposed to CMS. It seems 
> > still more practical to me to let CMS connect to SB directly instead of 
> > introducing a copy of Chassis table in NB and ovn-northd doing the 
> > back-and-forth data sync. So I am not sure if it should be a goal to remove 
> > all the SB access from CMS.
> >
>
> Yeah the Chassis situation in OVN is a strange one, indeed. Given that
> ovn-controller is the one creating it, not the CMS, shouldn't it be up
> to OVN to actually manage that table ? Perhaps we need to introduce
> some health check where ovn-controller could ping his own entry in the
> Chassis table from time to time to indicate that it's alive. And,
> ovn-northd could clean up the orphan entries if they are not updated
> in X amount of time.


(Random thought) NB could have an API (ChassisDeprovisionRequest?)
that would be used by CMS to request cleanup for a chassis by name.
Northd could then update the object with the status of the request, or
delete it once it's processed.

>
> Or piggybacking on the idea of the Status DB, maybe those health
> checks could be done in that DB instead ?
>
> > On the other hand, giving another thought, it doesn't sound too much extra 
> > cost for propagating the "hosting-chassis" information for chassis-redirect 
> > ports back to NB, considering that the number of DGP/LRP records is 
> > relatively small, and we already update "UP" back for LSPs (which are far 
> > more than LRPs) when they bind to a chassis. For code complexity, we can 
> > make sure they are handled in similar ways (today only as part of 
> > ovnsb_db_run(), but I am working on improvements with incremental 
> > processing).
> >
>
> Right, plus this is only applicable to chassisredirect LRP.
>
> > If we do that, I'd rather not use the "options" column, because this is not 
> > a configuration, but a status. Maybe we should introduce a new 
> > "status/details" column which is extensible for more information in the 
> > future.
> >
>
> Absolutely, I actually thought of having a separated column while I
> was working on the patch. But, I used the "options" column only
> because I didn't want to do a schema update. But indeed, it does make
> more sense to have its own column for this.
>
> > But again, we should only do this if it is really the best option for the 
> > CMS (see below).
> >
> > I think my real concern here is in fact connecting to SB or NB from every 
> > node, like the example use case of the ovn-bgp-agent. There are below 
> > options worth considering:
> > - The local OVSDB on each node is also considered as an interface between 
> > OVN and CMS. It is possible for ovn-controller to expose some information 
> > to the local OVSDB (e.g. through external-ids of br-int bridge or just the 
> > openvswitch table). The benefit of this approach is that it reuses existing 
> > SB connection of ovn-controller and avoids extra connections, but I am not 
> > sure if this is a better or worse interface for the use case.
> > - Deploy a dedicated OVSDB relay cluster (for SB, or NB if it is decided to 
> > propagate "hosting-chassis" to NB) for the per-node agents if scale becomes 
> > an issue. This would introduce some operational complexity of course.
> > - A centralized CMS component replicates such information from SB/NB to 
> > CMS's own DB/API, and the per-node component reads through CMS's DB/API 
> > instead directly from OVN SB/NB. This leaves the scale problem to CMS, and 
> > for my understanding this is probably more applicable for K8S (thanks to 
> > the k8s api-server framework) but challenging for OpenStack.
> > - The "status DB" proposed by Mark (see my comments below).
>
> Thanks for the ideas, perhaps we can explore some of them and see if
> it would fit for our use case. I particularly like the idea of
> potentially using the local OVSDB, something to explore.
>
> >
> > >
> > > 2) The second approach is what Mark has described below, having a
> > > separated database for the runtime information.
> > >
> > > One thing I like about this approach is that it keep the role and
> > > permissions of each database quite clear, as described:
> > >
> > > NB DB: CMS writes, OVN reads
> > > SB DB: OVN-internal
> > > Status DB: CMS reads, OVN writes
> > >
> > > It also seems to enable a lot of room for extra information about the
> > > runtime to be added in the future (as mentioned in Mark's email, LB
> > > health status, LSP packet counts etc...).
> > >
> > > The con is that it's yet another database that needs to be deployed
> > > and maintained and another connection for CMS to maintain if they need
> > > that sort of information.
> >
> > This is indeed an interesting idea and it seems not easy to conclude 
> > whether it's ultimately beneficial or detrimental. On one hand it does make 
> > the role for each DB clear and also helpful for scaling, but on the other 
> > hand it sounds too heavy for an extra DB to maintain (both from OVN 
> > development and operation perspective).
> > I think for the use case like the ovn-bgp-agent, this is probably 
> > unnecessarily complex than it really needs to be, but for the new 
> > requirement of the telemetry counters brought up by Mark, it seems 
> > necessary because the counters may need to be constantly updated to the DB 
> > and either NB or SB would scale for that purpose. So I would propose 
> > something that may satisfy the telemetry requirement but avoid the burden 
> > of core OVN maintaining a 3rd DB.
> > - ovn-controller can update information to the node local OVSDB (some 
> > information may already be there, such as packet counters for VIFs, some 
> > information may need to be calculated with ovn-controller's knowledge for 
> > the OVS flows, such as counters for north-south trafic of a VIF).
> > - a separate component, either from CMS, or a new component of core OVN (if 
> > this is something commonly required by different CMSes), can export the 
> > counters to an externally maintain data source that is designed for 
> > telemetry data, such as a prometheus exporter.
> >
> > What do you think?
> >
> > This is indeed an interesting idea, and it's not easy to determine whether 
> > it's ultimately beneficial or detrimental. On one hand, it clearly defines 
> > the role of each database and can be helpful for scaling, but on the other 
> > hand, it might be too heavy to maintain an extra database from both OVN 
> > development and operational perspectives.
> >
> > For use cases like the ovn-bgp-agent, this approach may be more complex 
> > than necessary. However, for the new telemetry counter requirements brought 
> > up by Mark, it seems appropriate, as the counters may need constant 
> > updating in the database, and neither the NB nor the SB databases may scale 
> > well for that purpose. I am also worrying even if we introduce a thrid 
> > OVSDB raft cluster it is not going to scale well, simply because OVSDB is 
> > not designed for that purpose at all, even with all the recent 
> > optimizations. As a potential compromise, I propose a solution that could 
> > satisfy the telemetry requirements while avoiding the burden of core OVN 
> > maintaining a third database:
> >
> > - ovn-controller could provide the counters in the local node OVSDB (some 
> > information might already be present, such as packet counters for VIFs, 
> > while other information might require ovn-controller's knowledge of OVS 
> > flows, such as counters for north-south traffic of a VIF).
> > - A separate component, either from the CMS or a new core OVN component (if 
> > this functionality is commonly required by various CMSes), could export the 
> > counters to an externally maintained data source designed for telemetry 
> > data, such as a Prometheus exporter.
> >
> > What are your thoughts on this approach?
> >
>
> Yeah definitely the status DB is an overkill if the only use case we
> have is the ovn-bgp-agent one. The idea of using the local OVSDB is
> something that I think worth exploring.
>
> > Regards,
> > Han
> >
> > >
> > > ...
> > >
> > > Is there any preference of which approach the core OVN team would prefer 
> > > here ?
> > >
> > > Cheers,
> > > Lucas
> > >
> > >
> > > On Thu, Apr 13, 2023 at 6:11 PM Mark Michelson <[email protected]> 
> > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I just caught up on this discussion and wanted to complicate things
> > > > further by suggesting another idea. I think the Red Hat folks have heard
> > > > this before, but I'm not sure if it has been brought up on this list 
> > > > before.
> > > >
> > > > Aside from this issue, there is also this high-priority issue from Red
> > > > Hat Openstack: https://bugzilla.redhat.com/show_bug.cgi?id=2123176 .
> > > >
> > > > IMO, this all converges on the idea of introducing a third database to
> > > > OVN. We can refer to this as the "Status" DB.
> > > >
> > > > The Status DB would be a place for state information generated by
> > > > OVN/OVS to be stored. Some ideas for existing things that could go in
> > > > the Status DB would be:
> > > > * Logical port up/down state.
> > > > * Logical switch port dynamic addresses (maybe, this is more 
> > > > complicated)
> > > > * BFD status
> > > > * Logical port installation status and installation timestamp.
> > > >
> > > > In addition to these existing items, the Status DB would be a place for
> > > > additional items that do not exist yet, such as
> > > > * Load balancer health check status
> > > > * Logical port packet/byte counts
> > > > * Gateway port bound chassis
> > > >
> > > > With the implementation of the Status DB, it would cement a relationship
> > > > between the DBs as such:
> > > >
> > > > NB DB: CMS writes, OVN reads
> > > > SB DB: OVN-internal
> > > > Status DB: CMS reads, OVN writes
> > > >
> > > > It may be tempting to get this patch merged as-is, with the intention of
> > > > migrating this to the new DB once it gets implemented. I don't think
> > > > this is a good idea. Between this issue and the one I linked, I think
> > > > the implementation of a Status DB is a good idea, and one that should be
> > > > implemented very soon.
> > > >
> > > > Since this particular problem is already worked around by OpenStack, I
> > > > think it makes more sense to implement this feature in a way that will
> > > > be easier to maintain long-term than to get it in quickly. If we merge
> > > > this as-is, then we are on the hook for supporting this status in the NB
> > > > DB for quite a long time since we would need to take time to deprecate
> > > > it properly. If we instead treat this as the impetus to write the Status
> > > > DB, then I think this lightweight use-case would give us a good starting
> > > > point towards adding the other items we're interested in.
> > > >
> > > > What do you think?
> > > >
> > > > On 4/13/23 09:32, Lucas Martins wrote:
> > > > > Hi Han, Dumitru and Luis,
> > > > >
> > > > > Thanks for the discussion and ideas so far. My reply is inline:
> > > > >
> > > > > On Thu, Apr 13, 2023 at 10:45 AM Luis Tomas Bolivar 
> > > > > <[email protected]> wrote:
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Thu, Apr 13, 2023 at 9:33 AM Dumitru Ceara <[email protected]> 
> > > > >> wrote:
> > > > >>>
> > > > >>> On 4/12/23 23:07, Han Zhou wrote:
> > > > >>>> On Wed, Apr 12, 2023 at 8:01 AM <[email protected]> wrote:
> > > > >>>>>
> > > > >>>>> From: Lucas Alvares Gomes <[email protected]>
> > > > >>>>>
> > > > >>>>> In order for the CMS to know which Chassis a distributed gateway 
> > > > >>>>> port
> > > > >>>>> is bond to, this patch updates the ovn-northd daemon to populate 
> > > > >>>>> the
> > > > >>>>> Logical_Router_Port table with that information.
> > > > >>>>>
> > > > >>>>> To avoid changing the database schema, ovn-northd is setting a 
> > > > >>>>> new key
> > > > >>>>> called "hosting-chassis" in the options column from the LRP 
> > > > >>>>> table. This
> > > > >>>>> key value points to the name of the Chassis that is currently 
> > > > >>>>> hosting
> > > > >>>>> the distributed port.
> > > > >>>>>
> > > > >>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2107515
> > > > >>>>> Signed-off-by: Lucas Alvares Gomes <[email protected]>
> > > > >>>
> > > > >>> Hi, Lucas, Han,
> > > > >>>
> > > > >>>>
> > > > >>>> Thanks Lucas for the patch. However, in my opinion the chassis 
> > > > >>>> binding
> > > > >>>> information belongs to SB and should stay there, otherwise we 
> > > > >>>> would make it
> > > > >>>> consistent for LSPs and update the chassis information for them, 
> > > > >>>> too, which
> > > > >>>> I think is not good in terms of clarity and extra control plane 
> > > > >>>> load. We'd
> > > > >>>> better keep the separation between NB and SB clear and avoid 
> > > > >>>> propagating
> > > > >>>> data between them back-and-forth.
> > > > >>>>
> > > > >>>
> > > > >>> I partially agree with this but it also feels wrong that the CMS
> > > > >>> accesses the SB directly.  In an ideal world (and I know that's not 
> > > > >>> the
> > > > >>> case today for neutron or ovn-k8s) the CMS should not care about 
> > > > >>> what's
> > > > >>> in the SB; that is internal OVN data.
> > > > >>
> > > > >>
> > > > >> Just to add some extra input in here. As Dumitru mentioned, it is 
> > > > >> not just a scaling issue, but that accessing the SB has its own 
> > > > >> problems as things can change there any time (it has already 
> > > > >> happened) breaking the logic on the CMS about how to react to those 
> > > > >> changes. If we don't have the information at the NB, that means we 
> > > > >> need 2 connections, one for the NB (to be as safe as possible from 
> > > > >> the SB changes), and one for the SB to get the chassis information.
> > > > >>
> > > > >
> > > > > Right. So the idea is to have the CMS to only connect to the
> > > > > Northbound database instead of maintaining a connection with both
> > > > > databases (helping scalability). I don't know what the consensus is
> > > > > but, if we agree that the Southbound database is used to store the
> > > > > internal OVN data, I think it would be in everyone's favour if CMS
> > > > > only used the Northbound database because as Luis pointed out apart
> > > > > from scalability issues, the data structure in the Southbound database
> > > > > can change overtime without any backwards compatibility and it will
> > > > > break us (it already happened).
> > > > >
> > > > >> Also, note there is already chassis information on the 
> > > > >> logical_switch_ports at the NB DB, so adding that for the cr-lrps 
> > > > >> should not be that different. Adding the active chassis to the 
> > > > >> HA_Chassis_group also sounds good
> > > > >
> > > > > So I believe this is the option "requested-chassis" that Neutron sets
> > > > > in the LSP. The difference is that this option is set by the CMS and
> > > > > the new option "hosting-chassis" from my patch is set by northd
> > > > > instead. But, there are still similarities because it's also the CMS
> > > > > that sets the ha_chassis_group (or gateway_chassis) for a port to make
> > > > > it HA. The proposed "hosting-chassis" option is just a way for northd
> > > > > to give the CMS a feedback about which chassis from the group that
> > > > > port ended up binding to.
> > > > >
> > > > >>>
> > > > >>>
> > > > >>> I suggest a different approach if we want to go ahead and propagate 
> > > > >>> such
> > > > >>> information to the NB: can't we store the "active chassis" 
> > > > >>> information
> > > > >>> per Gateway_chassis/HA_Chassis_group instead?  That's
> > > > >>> O(number-of-chassis) records that we need to update on chassis 
> > > > >>> failover.
> > > > >>>   We might even skip this for Gateway_chassis as I understand that 
> > > > >>> this
> > > > >>> is the "old" way of configuring things (*).
> > > > >>>
> > > > >
> > > > > That makes sense for me as well. So in the HA_Chassis_Group we would
> > > > > have a column with the current active chassis name ? That would be
> > > > > good because we can't really rely on the "priority" order because if
> > > > > there is a fallback to another chassis, the CMS is blind to it.
> > > > >
> > > > >>> (*) Should we deprecate Gateway_chassis?
> > > > >>>
> > > > >
> > > > > I think Neutron still uses it but, with my core OVN hat on I think it
> > > > > is already time. Right now in the Northbound database we have
> > > > > HA_Chassis_Group and Gateway_Chassis doing the same thing. I believe
> > > > > that in the Southbound everything becomes a HA_Chassis_Group. So it's
> > > > > fair to get rid of the Gateway_Chassis way already.
> > > > >
> > > > >>>> For the problem mentioned in the bugzilla, it seems to me already 
> > > > >>>> a scale
> > > > >>>> challenge that something other than ovn-controller is connecting 
> > > > >>>> to OVN SB
> > > > >>>> from every node (if I understand correctly). Moving all these 
> > > > >>>> connections
> > > > >>>> from SB to NB may just make it much worse, because NB DB is 
> > > > >>>> usually more
> > > > >>>> heavily/frequently updated by the CMS. (For small scale, this may 
> > > > >>>> not
> > > > >>>> matter, even if the agent connects to both NB and SB.)
> > > > >>>>
> > > > >>>
> > > > >>> An alternative to address the scale issue without changing OVN 
> > > > >>> could be
> > > > >>> to use a dedicated SB relay to which all external (non-OVN) agents 
> > > > >>> that
> > > > >>> need access to SB information can connect.  Would that help?
> > > > >>>
> > > > >
> > > > > The problem with it is that, more often than not we actually need to
> > > > > connect to both databases (as stated above) and there's no backward
> > > > > compatibility regards the data structure in the Southbound database
> > > > > because it is supposed to be internal OVN data. That's why having the
> > > > > CMS to only connect to the Northbound is a plus.
> > > > >
> > > > > Cheers,
> > > > > Lucas
> > > > >
> > > > >
> > > > > On Thu, Apr 13, 2023 at 10:45 AM Luis Tomas Bolivar 
> > > > > <[email protected]> wrote:
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Thu, Apr 13, 2023 at 9:33 AM Dumitru Ceara <[email protected]> 
> > > > >> wrote:
> > > > >>>
> > > > >>> On 4/12/23 23:07, Han Zhou wrote:
> > > > >>>> On Wed, Apr 12, 2023 at 8:01 AM <[email protected]> wrote:
> > > > >>>>>
> > > > >>>>> From: Lucas Alvares Gomes <[email protected]>
> > > > >>>>>
> > > > >>>>> In order for the CMS to know which Chassis a distributed gateway 
> > > > >>>>> port
> > > > >>>>> is bond to, this patch updates the ovn-northd daemon to populate 
> > > > >>>>> the
> > > > >>>>> Logical_Router_Port table with that information.
> > > > >>>>>
> > > > >>>>> To avoid changing the database schema, ovn-northd is setting a 
> > > > >>>>> new key
> > > > >>>>> called "hosting-chassis" in the options column from the LRP 
> > > > >>>>> table. This
> > > > >>>>> key value points to the name of the Chassis that is currently 
> > > > >>>>> hosting
> > > > >>>>> the distributed port.
> > > > >>>>>
> > > > >>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2107515
> > > > >>>>> Signed-off-by: Lucas Alvares Gomes <[email protected]>
> > > > >>>
> > > > >>> Hi, Lucas, Han,
> > > > >>>
> > > > >>>>
> > > > >>>> Thanks Lucas for the patch. However, in my opinion the chassis 
> > > > >>>> binding
> > > > >>>> information belongs to SB and should stay there, otherwise we 
> > > > >>>> would make it
> > > > >>>> consistent for LSPs and update the chassis information for them, 
> > > > >>>> too, which
> > > > >>>> I think is not good in terms of clarity and extra control plane 
> > > > >>>> load. We'd
> > > > >>>> better keep the separation between NB and SB clear and avoid 
> > > > >>>> propagating
> > > > >>>> data between them back-and-forth.
> > > > >>>>
> > > > >>>
> > > > >>> I partially agree with this but it also feels wrong that the CMS
> > > > >>> accesses the SB directly.  In an ideal world (and I know that's not 
> > > > >>> the
> > > > >>> case today for neutron or ovn-k8s) the CMS should not care about 
> > > > >>> what's
> > > > >>> in the SB; that is internal OVN data.
> > > > >>
> > > > >>
> > > > >> Just to add some extra input in here. As Dumitru mentioned, it is 
> > > > >> not just a scaling issue, but that accessing the SB has its own 
> > > > >> problems as things can change there any time (it has already 
> > > > >> happened) breaking the logic on the CMS about how to react to those 
> > > > >> changes. If we don't have the information at the NB, that means we 
> > > > >> need 2 connections, one for the NB (to be as safe as possible from 
> > > > >> the SB changes), and one for the SB to get the chassis information.
> > > > >>
> > > > >> Also, note there is already chassis information on the 
> > > > >> logical_switch_ports at the NB DB, so adding that for the cr-lrps 
> > > > >> should not be that different. Adding the active chassis to the 
> > > > >> HA_Chassis_group also sounds good
> > > > >>>
> > > > >>>
> > > > >>> I suggest a different approach if we want to go ahead and propagate 
> > > > >>> such
> > > > >>> information to the NB: can't we store the "active chassis" 
> > > > >>> information
> > > > >>> per Gateway_chassis/HA_Chassis_group instead?  That's
> > > > >>> O(number-of-chassis) records that we need to update on chassis 
> > > > >>> failover.
> > > > >>>   We might even skip this for Gateway_chassis as I understand that 
> > > > >>> this
> > > > >>> is the "old" way of configuring things (*).
> > > > >>>
> > > > >>> (*) Should we deprecate Gateway_chassis?
> > > > >>>
> > > > >>>> For the problem mentioned in the bugzilla, it seems to me already 
> > > > >>>> a scale
> > > > >>>> challenge that something other than ovn-controller is connecting 
> > > > >>>> to OVN SB
> > > > >>>> from every node (if I understand correctly). Moving all these 
> > > > >>>> connections
> > > > >>>> from SB to NB may just make it much worse, because NB DB is 
> > > > >>>> usually more
> > > > >>>> heavily/frequently updated by the CMS. (For small scale, this may 
> > > > >>>> not
> > > > >>>> matter, even if the agent connects to both NB and SB.)
> > > > >>>>
> > > > >>>
> > > > >>> An alternative to address the scale issue without changing OVN 
> > > > >>> could be
> > > > >>> to use a dedicated SB relay to which all external (non-OVN) agents 
> > > > >>> that
> > > > >>> need access to SB information can connect.  Would that help?
> > > > >>>
> > > > >>> Regards,
> > > > >>> Dumitru
> > > > >>>
> > > > >>> _______________________________________________
> > > > >>> dev mailing list
> > > > >>> [email protected]
> > > > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> LUIS TOMÁS BOLÍVAR
> > > > >> Principal Software Engineer
> > > > >> Red Hat
> > > > >> Madrid, Spain
> > > > >> [email protected]
> > > > >>
> > > > >
> > > > > _______________________________________________
> > > > > dev mailing list
> > > > > [email protected]
> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > >
> > >
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn] Expose distributed gateway port information in NB DB

Reply via email to