Re: [ovs-dev] [PATCH ovn] Expose distributed gateway port information in NB DB

Han Zhou Mon, 17 Apr 2023 12:11:57 -0700

(Sorry that my previous reply includes redundant texts. Please ignore that
one and use this version :D)


On Mon, Apr 17, 2023 at 7:18 AM Lucas Martins <[email protected]> wrote:
>
> Thanks all for the discussion and all the ideas here.
>
> After reading the emails, I think it boils down to two proposed
approaches:
>
> 1) CMS to continue to connect to the Southbound database if they need
> information about the physical location of the resources. That would
> avoid the inefficiency of having to copy data back-and-forth from the
> Northbound and Southbound database.
>
> I guess the downside of this approach is that CMS will have to
> maintain a connection with both databases (which is already the case
> today).
>
> If we go with this approach, it would be good to have consensus from
> the core OVN team where some tables in the Southbound must be kept
> stable with backward compatibility in case of changes. Tables such
> Chassis, Chassis_Private and Port_Binding at least will need that. I
> guess that makes part of the Southbound database to not be considered
> OVN internal only.

It doesn't look very clean to expose the SB DB to CMS, but in practice I
think it is not a real problem for doing so for keeping tables backward
compatible, because even without considering CMS, OVN itself needs to keep
the compatibility for upgrading.
I am still open to the idea of keeping things in NB DB only, but at least
one issue not addressed so far in this discussion (even with the status DB)
is how to manage the orphan chassis if SB is not exposed to CMS. It seems
still more practical to me to let CMS connect to SB directly instead of
introducing a copy of Chassis table in NB and ovn-northd doing the
back-and-forth data sync. So I am not sure if it should be a goal to remove
all the SB access from CMS.

On the other hand, giving another thought, it doesn't sound too much extra
cost for propagating the "hosting-chassis" information for chassis-redirect
ports back to NB, considering that the number of DGP/LRP records is
relatively small, and we already update "UP" back for LSPs (which are far
more than LRPs) when they bind to a chassis. For code complexity, we can
make sure they are handled in similar ways (today only as part of
ovnsb_db_run(), but I am working on improvements with incremental
processing).

If we do that, I'd rather not use the "options" column, because this is not
a configuration, but a status. Maybe we should introduce a new
"status/details" column which is extensible for more information in the
future.

But again, we should only do this if it is really the best option for the
CMS (see below).

I think my real concern here is in fact connecting to SB or NB from every
node, like the example use case of the ovn-bgp-agent. There are below
options worth considering:
- The local OVSDB on each node is also considered as an interface between
OVN and CMS. It is possible for ovn-controller to expose some information
to the local OVSDB (e.g. through external-ids of br-int bridge or just the
openvswitch table). The benefit of this approach is that it reuses existing
SB connection of ovn-controller and avoids extra connections, but I am not
sure if this is a better or worse interface for the use case.
- Deploy a dedicated OVSDB relay cluster (for SB, or NB if it is decided to
propagate "hosting-chassis" to NB) for the per-node agents if scale becomes
an issue. This would introduce some operational complexity of course.
- A centralized CMS component replicates such information from SB/NB to
CMS's own DB/API, and the per-node component reads through CMS's DB/API
instead directly from OVN SB/NB. This leaves the scale problem to CMS, and
for my understanding this is probably more applicable for K8S (thanks to
the k8s api-server framework) but challenging for OpenStack.
- The "status DB" proposed by Mark (see my comments below).

>
> 2) The second approach is what Mark has described below, having a
> separated database for the runtime information.
>
> One thing I like about this approach is that it keep the role and
> permissions of each database quite clear, as described:
>
> NB DB: CMS writes, OVN reads
> SB DB: OVN-internal
> Status DB: CMS reads, OVN writes
>
> It also seems to enable a lot of room for extra information about the
> runtime to be added in the future (as mentioned in Mark's email, LB
> health status, LSP packet counts etc...).
>
> The con is that it's yet another database that needs to be deployed
> and maintained and another connection for CMS to maintain if they need
> that sort of information.
>

This is indeed an interesting idea, and it's not easy to determine whether
it's ultimately beneficial or detrimental. On one hand, it clearly defines
the role of each database and can be helpful for scaling, but on the other
hand, it might be too heavy to maintain an extra database from both OVN
development and operational perspectives.

For use cases like the ovn-bgp-agent, this approach may be more complex
than necessary. However, for the new telemetry counter requirements brought
up by Mark, it seems appropriate, as the counters may need constant
updating in the database, and neither the NB nor the SB databases may scale
well for that purpose. I am also worrying that even if we introduce a third
OVSDB raft cluster it is not going to scale well, simply because OVSDB is
not designed for that purpose at all, even with all the recent
optimizations. As a potential compromise, I propose a solution that could
satisfy the telemetry requirements while avoiding the burden of core OVN
maintaining a third database:

- ovn-controller could provide the counters in the local node OVSDB (some
information might already be present, such as packet counters for VIFs,
while other information might require ovn-controller's knowledge of OVS
flows, such as counters for north-south traffic of a VIF).
- A separate component, either from the CMS or a new core OVN component (if
this functionality is commonly required by various CMSes), could export the
counters to an externally maintained data source designed for telemetry
data, such as a Prometheus exporter.

What are your thoughts on this approach?

Regards,
Han

> ...
>
> Is there any preference of which approach the core OVN team would prefer
here ?
>
> Cheers,
> Lucas
>
>
> On Thu, Apr 13, 2023 at 6:11 PM Mark Michelson <[email protected]>
wrote:
> >
> > Hi all,
> >
> > I just caught up on this discussion and wanted to complicate things
> > further by suggesting another idea. I think the Red Hat folks have heard
> > this before, but I'm not sure if it has been brought up on this list
before.
> >
> > Aside from this issue, there is also this high-priority issue from Red
> > Hat Openstack: https://bugzilla.redhat.com/show_bug.cgi?id=2123176 .
> >
> > IMO, this all converges on the idea of introducing a third database to
> > OVN. We can refer to this as the "Status" DB.
> >
> > The Status DB would be a place for state information generated by
> > OVN/OVS to be stored. Some ideas for existing things that could go in
> > the Status DB would be:
> > * Logical port up/down state.
> > * Logical switch port dynamic addresses (maybe, this is more
complicated)
> > * BFD status
> > * Logical port installation status and installation timestamp.
> >
> > In addition to these existing items, the Status DB would be a place for
> > additional items that do not exist yet, such as
> > * Load balancer health check status
> > * Logical port packet/byte counts
> > * Gateway port bound chassis
> >
> > With the implementation of the Status DB, it would cement a relationship
> > between the DBs as such:
> >
> > NB DB: CMS writes, OVN reads
> > SB DB: OVN-internal
> > Status DB: CMS reads, OVN writes
> >
> > It may be tempting to get this patch merged as-is, with the intention of
> > migrating this to the new DB once it gets implemented. I don't think
> > this is a good idea. Between this issue and the one I linked, I think
> > the implementation of a Status DB is a good idea, and one that should be
> > implemented very soon.
> >
> > Since this particular problem is already worked around by OpenStack, I
> > think it makes more sense to implement this feature in a way that will
> > be easier to maintain long-term than to get it in quickly. If we merge
> > this as-is, then we are on the hook for supporting this status in the NB
> > DB for quite a long time since we would need to take time to deprecate
> > it properly. If we instead treat this as the impetus to write the Status
> > DB, then I think this lightweight use-case would give us a good starting
> > point towards adding the other items we're interested in.
> >
> > What do you think?
> >
> > On 4/13/23 09:32, Lucas Martins wrote:
> > > Hi Han, Dumitru and Luis,
> > >
> > > Thanks for the discussion and ideas so far. My reply is inline:
> > >
> > > On Thu, Apr 13, 2023 at 10:45 AM Luis Tomas Bolivar <
[email protected]> wrote:
> > >>
> > >>
> > >>
> > >> On Thu, Apr 13, 2023 at 9:33 AM Dumitru Ceara <[email protected]>
wrote:
> > >>>
> > >>> On 4/12/23 23:07, Han Zhou wrote:
> > >>>> On Wed, Apr 12, 2023 at 8:01 AM <[email protected]> wrote:
> > >>>>>
> > >>>>> From: Lucas Alvares Gomes <[email protected]>
> > >>>>>
> > >>>>> In order for the CMS to know which Chassis a distributed gateway
port
> > >>>>> is bond to, this patch updates the ovn-northd daemon to populate
the
> > >>>>> Logical_Router_Port table with that information.
> > >>>>>
> > >>>>> To avoid changing the database schema, ovn-northd is setting a
new key
> > >>>>> called "hosting-chassis" in the options column from the LRP
table. This
> > >>>>> key value points to the name of the Chassis that is currently
hosting
> > >>>>> the distributed port.
> > >>>>>
> > >>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2107515
> > >>>>> Signed-off-by: Lucas Alvares Gomes <[email protected]>
> > >>>
> > >>> Hi, Lucas, Han,
> > >>>
> > >>>>
> > >>>> Thanks Lucas for the patch. However, in my opinion the chassis
binding
> > >>>> information belongs to SB and should stay there, otherwise we
would make it
> > >>>> consistent for LSPs and update the chassis information for them,
too, which
> > >>>> I think is not good in terms of clarity and extra control plane
load. We'd
> > >>>> better keep the separation between NB and SB clear and avoid
propagating
> > >>>> data between them back-and-forth.
> > >>>>
> > >>>
> > >>> I partially agree with this but it also feels wrong that the CMS
> > >>> accesses the SB directly.  In an ideal world (and I know that's not
the
> > >>> case today for neutron or ovn-k8s) the CMS should not care about
what's
> > >>> in the SB; that is internal OVN data.
> > >>
> > >>
> > >> Just to add some extra input in here. As Dumitru mentioned, it is
not just a scaling issue, but that accessing the SB has its own problems as
things can change there any time (it has already happened) breaking the
logic on the CMS about how to react to those changes. If we don't have the
information at the NB, that means we need 2 connections, one for the NB (to
be as safe as possible from the SB changes), and one for the SB to get the
chassis information.
> > >>
> > >
> > > Right. So the idea is to have the CMS to only connect to the
> > > Northbound database instead of maintaining a connection with both
> > > databases (helping scalability). I don't know what the consensus is
> > > but, if we agree that the Southbound database is used to store the
> > > internal OVN data, I think it would be in everyone's favour if CMS
> > > only used the Northbound database because as Luis pointed out apart
> > > from scalability issues, the data structure in the Southbound database
> > > can change overtime without any backwards compatibility and it will
> > > break us (it already happened).
> > >
> > >> Also, note there is already chassis information on the
logical_switch_ports at the NB DB, so adding that for the cr-lrps should
not be that different. Adding the active chassis to the HA_Chassis_group
also sounds good
> > >
> > > So I believe this is the option "requested-chassis" that Neutron sets
> > > in the LSP. The difference is that this option is set by the CMS and
> > > the new option "hosting-chassis" from my patch is set by northd
> > > instead. But, there are still similarities because it's also the CMS
> > > that sets the ha_chassis_group (or gateway_chassis) for a port to make
> > > it HA. The proposed "hosting-chassis" option is just a way for northd
> > > to give the CMS a feedback about which chassis from the group that
> > > port ended up binding to.
> > >
> > >>>
> > >>>
> > >>> I suggest a different approach if we want to go ahead and propagate
such
> > >>> information to the NB: can't we store the "active chassis"
information
> > >>> per Gateway_chassis/HA_Chassis_group instead?  That's
> > >>> O(number-of-chassis) records that we need to update on chassis
failover.
> > >>>   We might even skip this for Gateway_chassis as I understand that
this
> > >>> is the "old" way of configuring things (*).
> > >>>
> > >
> > > That makes sense for me as well. So in the HA_Chassis_Group we would
> > > have a column with the current active chassis name ? That would be
> > > good because we can't really rely on the "priority" order because if
> > > there is a fallback to another chassis, the CMS is blind to it.
> > >
> > >>> (*) Should we deprecate Gateway_chassis?
> > >>>
> > >
> > > I think Neutron still uses it but, with my core OVN hat on I think it
> > > is already time. Right now in the Northbound database we have
> > > HA_Chassis_Group and Gateway_Chassis doing the same thing. I believe
> > > that in the Southbound everything becomes a HA_Chassis_Group. So it's
> > > fair to get rid of the Gateway_Chassis way already.
> > >
> > >>>> For the problem mentioned in the bugzilla, it seems to me already
a scale
> > >>>> challenge that something other than ovn-controller is connecting
to OVN SB
> > >>>> from every node (if I understand correctly). Moving all these
connections
> > >>>> from SB to NB may just make it much worse, because NB DB is
usually more
> > >>>> heavily/frequently updated by the CMS. (For small scale, this may
not
> > >>>> matter, even if the agent connects to both NB and SB.)
> > >>>>
> > >>>
> > >>> An alternative to address the scale issue without changing OVN
could be
> > >>> to use a dedicated SB relay to which all external (non-OVN) agents
that
> > >>> need access to SB information can connect.  Would that help?
> > >>>
> > >
> > > The problem with it is that, more often than not we actually need to
> > > connect to both databases (as stated above) and there's no backward
> > > compatibility regards the data structure in the Southbound database
> > > because it is supposed to be internal OVN data. That's why having the
> > > CMS to only connect to the Northbound is a plus.
> > >
> > > Cheers,
> > > Lucas
> > >
> > >
> > > On Thu, Apr 13, 2023 at 10:45 AM Luis Tomas Bolivar <
[email protected]> wrote:
> > >>
> > >>
> > >>
> > >> On Thu, Apr 13, 2023 at 9:33 AM Dumitru Ceara <[email protected]>
wrote:
> > >>>
> > >>> On 4/12/23 23:07, Han Zhou wrote:
> > >>>> On Wed, Apr 12, 2023 at 8:01 AM <[email protected]> wrote:
> > >>>>>
> > >>>>> From: Lucas Alvares Gomes <[email protected]>
> > >>>>>
> > >>>>> In order for the CMS to know which Chassis a distributed gateway
port
> > >>>>> is bond to, this patch updates the ovn-northd daemon to populate
the
> > >>>>> Logical_Router_Port table with that information.
> > >>>>>
> > >>>>> To avoid changing the database schema, ovn-northd is setting a
new key
> > >>>>> called "hosting-chassis" in the options column from the LRP
table. This
> > >>>>> key value points to the name of the Chassis that is currently
hosting
> > >>>>> the distributed port.
> > >>>>>
> > >>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2107515
> > >>>>> Signed-off-by: Lucas Alvares Gomes <[email protected]>
> > >>>
> > >>> Hi, Lucas, Han,
> > >>>
> > >>>>
> > >>>> Thanks Lucas for the patch. However, in my opinion the chassis
binding
> > >>>> information belongs to SB and should stay there, otherwise we
would make it
> > >>>> consistent for LSPs and update the chassis information for them,
too, which
> > >>>> I think is not good in terms of clarity and extra control plane
load. We'd
> > >>>> better keep the separation between NB and SB clear and avoid
propagating
> > >>>> data between them back-and-forth.
> > >>>>
> > >>>
> > >>> I partially agree with this but it also feels wrong that the CMS
> > >>> accesses the SB directly.  In an ideal world (and I know that's not
the
> > >>> case today for neutron or ovn-k8s) the CMS should not care about
what's
> > >>> in the SB; that is internal OVN data.
> > >>
> > >>
> > >> Just to add some extra input in here. As Dumitru mentioned, it is
not just a scaling issue, but that accessing the SB has its own problems as
things can change there any time (it has already happened) breaking the
logic on the CMS about how to react to those changes. If we don't have the
information at the NB, that means we need 2 connections, one for the NB (to
be as safe as possible from the SB changes), and one for the SB to get the
chassis information.
> > >>
> > >> Also, note there is already chassis information on the
logical_switch_ports at the NB DB, so adding that for the cr-lrps should
not be that different. Adding the active chassis to the HA_Chassis_group
also sounds good
> > >>>
> > >>>
> > >>> I suggest a different approach if we want to go ahead and propagate
such
> > >>> information to the NB: can't we store the "active chassis"
information
> > >>> per Gateway_chassis/HA_Chassis_group instead?  That's
> > >>> O(number-of-chassis) records that we need to update on chassis
failover.
> > >>>   We might even skip this for Gateway_chassis as I understand that
this
> > >>> is the "old" way of configuring things (*).
> > >>>
> > >>> (*) Should we deprecate Gateway_chassis?
> > >>>
> > >>>> For the problem mentioned in the bugzilla, it seems to me already
a scale
> > >>>> challenge that something other than ovn-controller is connecting
to OVN SB
> > >>>> from every node (if I understand correctly). Moving all these
connections
> > >>>> from SB to NB may just make it much worse, because NB DB is
usually more
> > >>>> heavily/frequently updated by the CMS. (For small scale, this may
not
> > >>>> matter, even if the agent connects to both NB and SB.)
> > >>>>
> > >>>
> > >>> An alternative to address the scale issue without changing OVN
could be
> > >>> to use a dedicated SB relay to which all external (non-OVN) agents
that
> > >>> need access to SB information can connect.  Would that help?
> > >>>
> > >>> Regards,
> > >>> Dumitru
> > >>>
> > >>> _______________________________________________
> > >>> dev mailing list
> > >>> [email protected]
> > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >>
> > >>
> > >>
> > >> --
> > >> LUIS TOMÁS BOLÍVAR
> > >> Principal Software Engineer
> > >> Red Hat
> > >> Madrid, Spain
> > >> [email protected]
> > >>
> > >
> > > _______________________________________________
> > > dev mailing list
> > > [email protected]
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn] Expose distributed gateway port information in NB DB

Reply via email to