Re: [ovs-dev] [PATCH ovn] Expose distributed gateway port information in NB DB

Han Zhou Thu, 13 Apr 2023 09:26:35 -0700

On Thu, Apr 13, 2023 at 6:33 AM Lucas Martins <[email protected]> wrote:
>
> Hi Han, Dumitru and Luis,
>
> Thanks for the discussion and ideas so far. My reply is inline:
>
> On Thu, Apr 13, 2023 at 10:45 AM Luis Tomas Bolivar <[email protected]>
wrote:
> >
> >
> >
> > On Thu, Apr 13, 2023 at 9:33 AM Dumitru Ceara <[email protected]> wrote:
> >>
> >> On 4/12/23 23:07, Han Zhou wrote:
> >> > On Wed, Apr 12, 2023 at 8:01 AM <[email protected]> wrote:
> >> >>
> >> >> From: Lucas Alvares Gomes <[email protected]>
> >> >>
> >> >> In order for the CMS to know which Chassis a distributed gateway
port
> >> >> is bond to, this patch updates the ovn-northd daemon to populate the
> >> >> Logical_Router_Port table with that information.
> >> >>
> >> >> To avoid changing the database schema, ovn-northd is setting a new
key
> >> >> called "hosting-chassis" in the options column from the LRP table.
This
> >> >> key value points to the name of the Chassis that is currently
hosting
> >> >> the distributed port.
> >> >>
> >> >> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2107515
> >> >> Signed-off-by: Lucas Alvares Gomes <[email protected]>
> >>
> >> Hi, Lucas, Han,
> >>
> >> >
> >> > Thanks Lucas for the patch. However, in my opinion the chassis
binding
> >> > information belongs to SB and should stay there, otherwise we would
make it
> >> > consistent for LSPs and update the chassis information for them,
too, which
> >> > I think is not good in terms of clarity and extra control plane
load. We'd
> >> > better keep the separation between NB and SB clear and avoid
propagating
> >> > data between them back-and-forth.
> >> >
> >>
> >> I partially agree with this but it also feels wrong that the CMS
> >> accesses the SB directly.  In an ideal world (and I know that's not the
> >> case today for neutron or ovn-k8s) the CMS should not care about what's
> >> in the SB; that is internal OVN data.
> >
> >
> > Just to add some extra input in here. As Dumitru mentioned, it is not
just a scaling issue, but that accessing the SB has its own problems as
things can change there any time (it has already happened) breaking the
logic on the CMS about how to react to those changes. If we don't have the
information at the NB, that means we need 2 connections, one for the NB (to
be as safe as possible from the SB changes), and one for the SB to get the
chassis information.
> >
>
> Right. So the idea is to have the CMS to only connect to the
> Northbound database instead of maintaining a connection with both
> databases (helping scalability). I don't know what the consensus is
> but, if we agree that the Southbound database is used to store the
> internal OVN data, I think it would be in everyone's favour if CMS
> only used the Northbound database because as Luis pointed out apart
> from scalability issues, the data structure in the Southbound database
> can change overtime without any backwards compatibility and it will
> break us (it already happened).


I think there is no simple solution here and it has to be a trade-off.
I agree that in an ideal world NB is the only interface to CMS, but from a
different angle, NB is about logical topology, and SB is the physical
realization. When CMS needs to access/manage data related to physical
deployment, it has to access SB. One typical example is when CMS needs to
cleanup chassis records for deleted nodes. The nodes may be down
ungracefully and there is no chance for ovn-controller to remove themselves
from SB, so such cleanup is needed from CMS.
There are other examples but may be less common. There were similar
discussions before, such as for the static mac-binding. We end up with the
table in NB, and copying to the SB, merely to avoid CMS accessing SB. If we
continue this path, we may end up with copying everything back-and-forth,
which is extremely inefficient - think about the chassis table case. If the
only concern is the stability of the SB schema, I think it is not that
critical problem, because we can keep such part stable, if it is agreed to
be used by CMS, such as chassis and port-binding tables which have been
quite stable.

I am not saying we can't populate any information back to NB. We have at
least one case that is the "UP" state for LSPs, which is probably also the
only case. We can populate more status information but only if it is really
necessary. However, I don't think it is a good idea to do that for the
binding chassis information.

For the ovn-bgp-agent implementation in this case, it is even more
intrusive, because the agent runs on every node. I wonder if it is possible
to avoid accessing OVN DBs at all. For example, the SB information required
may be exported by ovn-controller to the local OVSDB, so that the agent can
just read information from there.

>
> > Also, note there is already chassis information on the
logical_switch_ports at the NB DB, so adding that for the cr-lrps should
not be that different. Adding the active chassis to the HA_Chassis_group
also sounds good
>
> So I believe this is the option "requested-chassis" that Neutron sets
> in the LSP. The difference is that this option is set by the CMS and
> the new option "hosting-chassis" from my patch is set by northd
> instead. But, there are still similarities because it's also the CMS
> that sets the ha_chassis_group (or gateway_chassis) for a port to make
> it HA. The proposed "hosting-chassis" option is just a way for northd
> to give the CMS a feedback about which chassis from the group that
> port ended up binding to.
>

The "request-chassis" concept is totally different. It is specified by CMS
as a requirement, not a status. As mentioned before, the only relevant
example is the "UP" state of LSP.
For the same reason, I think it is also wrong to use "option" column for
the purpose of this patch.

> >>
> >>
> >> I suggest a different approach if we want to go ahead and propagate
such
> >> information to the NB: can't we store the "active chassis" information
> >> per Gateway_chassis/HA_Chassis_group instead?  That's
> >> O(number-of-chassis) records that we need to update on chassis
failover.
> >>  We might even skip this for Gateway_chassis as I understand that this
> >> is the "old" way of configuring things (*).
> >>

What do you mean by O(number-of-chassis) here? If a chassis fails over, we
should update for O(number-of-ports-failed-over-from-the-failure-chasssis),
right?

>
> That makes sense for me as well. So in the HA_Chassis_Group we would
> have a column with the current active chassis name ? That would be
> good because we can't really rely on the "priority" order because if
> there is a fallback to another chassis, the CMS is blind to it.
>
> >> (*) Should we deprecate Gateway_chassis?
> >>
>
> I think Neutron still uses it but, with my core OVN hat on I think it
> is already time. Right now in the Northbound database we have
> HA_Chassis_Group and Gateway_Chassis doing the same thing. I believe
> that in the Southbound everything becomes a HA_Chassis_Group. So it's
> fair to get rid of the Gateway_Chassis way already.
>

Neutron uses it, and ovn-k8s also uses it. I think Gateway_Chassis is more
straightforward and convenient to configure from CMS point of view, but I
don't have a very strong opinion here.

> >> > For the problem mentioned in the bugzilla, it seems to me already a
scale
> >> > challenge that something other than ovn-controller is connecting to
OVN SB
> >> > from every node (if I understand correctly). Moving all these
connections
> >> > from SB to NB may just make it much worse, because NB DB is usually
more
> >> > heavily/frequently updated by the CMS. (For small scale, this may not
> >> > matter, even if the agent connects to both NB and SB.)
> >> >
> >>
> >> An alternative to address the scale issue without changing OVN could be
> >> to use a dedicated SB relay to which all external (non-OVN) agents that
> >> need access to SB information can connect.  Would that help?
> >>
>
> The problem with it is that, more often than not we actually need to
> connect to both databases (as stated above) and there's no backward
> compatibility regards the data structure in the Southbound database
> because it is supposed to be internal OVN data. That's why having the
> CMS to only connect to the Northbound is a plus.

I think in general we should avoid connecting to SB or NB from every node.
For SB, it would be better to utilize the ovn-controller (which already
connects to SB) to expose information required by CMS to local OVSDB.
For NB, it would be better to access only from a central location, or use
ovsdb-relay to create a read-only layer dedicated for the distributed CMS
component.

Please let's give some more thoughts on this.

Thanks,
Han
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn] Expose distributed gateway port information in NB DB

Reply via email to