On Tue, Jul 6, 2021 at 8:11 PM Han Zhou <[email protected]> wrote: > > > > On Tue, Jul 6, 2021 at 1:19 AM Frode Nordahl <[email protected]> > wrote: > > > > On Mon, Jul 5, 2021 at 6:57 PM Numan Siddique <[email protected]> wrote: > > > > > > On Mon, Jul 5, 2021 at 12:12 PM Frode Nordahl > > > <[email protected]> wrote: > > > > > > > > On Wed, Jun 30, 2021 at 12:32 AM Numan Siddique <[email protected]> wrote: > > > > > > > > > > On Thu, Jun 10, 2021 at 10:13 AM Frode Nordahl > > > > > <[email protected]> wrote: > > > > > > > > > > > > On Thu, Jun 10, 2021 at 1:46 PM Ilya Maximets <[email protected]> > > > > > > wrote: > > > > > > > > > > > > > > On 6/10/21 8:36 AM, Han Zhou wrote: > > > > > > > > > > > > > > > > > > > > > > > > On Thu, May 13, 2021 at 9:25 AM Frode Nordahl > > > > > > > > <[email protected] > > > > > > > > <mailto:[email protected]>> wrote: > > > > > > > >> > > > > > > > >> On Thu, May 13, 2021 at 5:12 PM Ilya Maximets > > > > > > > >> <[email protected] <mailto:[email protected]>> wrote: > > > > > > > >> > > > > > > > > >> > On 5/9/21 4:03 PM, Frode Nordahl wrote: > > > > > > > >> > > Introduce plugging module that adds and removes ports on > > > > > > > >> > > the > > > > > > > >> > > integration bridge, as directed by Port_Binding options. > > > > > > > >> > > > > > > > > > >> > > Traditionally it has been the CMSs responsibility to > > > > > > > >> > > create Virtual > > > > > > > >> > > Interfaces (VIFs) as part of instance (Container, Pod, > > > > > > > >> > > Virtual > > > > > > > >> > > Machine etc.) life cycle, and subsequently manage > > > > > > > >> > > plug/unplug > > > > > > > >> > > operations on the Open vSwitch integration bridge. > > > > > > > >> > > > > > > > > > >> > > With the advent of NICs connected to multiple distinct > > > > > > > >> > > CPUs we can > > > > > > > >> > > have a topology where the instance runs on one host and > > > > > > > >> > > Open > > > > > > > >> > > vSwitch and OVN runs on a different host, the smartnic CPU. > > > > > > > >> > > > > > > > > > >> > > The act of plugging and unplugging the representor port in > > > > > > > >> > > Open > > > > > > > >> > > vSwitch running on the smartnic host CPU would be the same > > > > > > > >> > > for > > > > > > > >> > > every smartnic variant (thanks to the devlink-port[0][1] > > > > > > > >> > > infrastructure) and every CMS (Kubernetes, LXD, OpenStack, > > > > > > > >> > > etc.). > > > > > > > >> > > As such it is natural to extend OVN to provide this common > > > > > > > >> > > functionality through its CMS facing API. > > > > > > > >> > > > > > > > > >> > Hi, Frode. Thanks for putting this together, but it doesn't > > > > > > > >> > look > > > > > > > >> > natural to me. OVN, AFAIK, never touched physical devices or > > > > > > > >> > interacted with the kernel directly. This change introduces > > > > > > > >> > completely > > > > > > > >> > new functionality inside OVN. With the same effect we can > > > > > > > >> > run a fully > > > > > > > >> > separate service on these smartnic CPUs that will do plugging > > > > > > > >> > and configuration job for CMS. You may even make it > > > > > > > >> > independent > > > > > > > >> > from a particular CMS by creating a REST API for it or > > > > > > > >> > whatever. > > > > > > > >> > This will additionally allow using same service for non-OVN > > > > > > > >> > setups. > > > > > > > >> > > > > > > > >> Ilya, > > > > > > > >> > > > > > > > >> Thank you for taking the time to comment, much appreciated. > > > > > > > >> > > > > > > > >> Yes, this is new functionality, NICs with separate control > > > > > > > >> plane CPUs > > > > > > > >> and isolation from the host are also new, so this is one > > > > > > > >> proposal for > > > > > > > >> how we could go about to enable the use of them. > > > > > > > >> > > > > > > > >> The OVN controller does today get pretty close to the physical > > > > > > > >> realm > > > > > > > >> by maintaining patch ports in Open vSwitch based on bridge > > > > > > > >> mapping > > > > > > > >> configuration and presence of bridges to physical interfaces. > > > > > > > >> It also > > > > > > > >> does react to events of physical interfaces being plugged into > > > > > > > >> the > > > > > > > >> Open vSwitch instance it manages, albeit to date some other > > > > > > > >> entity has > > > > > > > >> been doing the act of adding the port into the bridge. > > > > > > > >> > > > > > > > >> The rationale for proposing to use the OVN database for > > > > > > > >> coordinating > > > > > > > >> this is that the information about which ports to bind, and > > > > > > > >> where to > > > > > > > >> bind them is already there. The timing of the information flow > > > > > > > >> from > > > > > > > >> the CMS is also suitable for the task. > > > > > > > >> > > > > > > > >> OVN relies on OVS library code, and all the necessary > > > > > > > >> libraries for > > > > > > > >> interfacing with the kernel through netlink and friends are > > > > > > > >> there or > > > > > > > >> would be easy to add. The rationale for using the > > > > > > > >> netlink-devlink > > > > > > > >> interface is that it provides a generic infrastructure for > > > > > > > >> these types > > > > > > > >> of NICs. So by using this interface we should be able to > > > > > > > >> support most > > > > > > > >> if not all of the variants of these cards. > > > > > > > >> > > > > > > > >> > > > > > > > >> Providing a separate OVN service to do the task could work, > > > > > > > >> but would > > > > > > > >> have the cost of an extra SB DB connection, IDL and monitors. > > > > > > > > > > > > > > IMHO, CMS should never connect to Southbound DB. It's just > > > > > > > because the > > > > > > > Southbound DB is not meant to be a public interface, it just > > > > > > > happened > > > > > > > to be available for connections. I know that OpenStack has > > > > > > > metadata > > > > > > > agents that connects to Sb DB, but if it's really required for > > > > > > > them, I > > > > > > > think, there should be a different way to get/set required > > > > > > > information > > > > > > > without connection to the Southbound. > > > > > > > > > > > > The CMS-facing API is the Northbound DB, I was not suggesting direct > > > > > > use of the Southbound DB by external to OVN services. My suggestion > > > > > > was to have a separate OVN process do this if your objection was to > > > > > > handle it as part of the ovn-controller process. > > > > > > > > > > > > > >> > > > > > > > >> I fear it would be quite hard to build a whole separate > > > > > > > >> project with > > > > > > > >> its own API, feels like a lot of duplicated effort when the > > > > > > > >> flow of > > > > > > > >> data and APIs in OVN already align so well with CMSs > > > > > > > >> interested in > > > > > > > >> using this? > > > > > > > >> > > > > > > > >> > Interactions with physical devices also makes OVN > > > > > > > >> > linux-dependent > > > > > > > >> > at least for this use case, IIUC. > > > > > > > >> > > > > > > > >> This specific bit would be linux-specific in the first > > > > > > > >> iteration, yes. > > > > > > > >> But the vendors manufacturing and distributing the hardware do > > > > > > > >> often > > > > > > > >> have drivers for other platforms, I am sure the necessary > > > > > > > >> infrastructure will become available there too over time, if > > > > > > > >> it is not > > > > > > > >> there already. > > > > > > > >> > > > > > > > >> We do currently have platform specific macros in the OVN build > > > > > > > >> system, > > > > > > > >> so we could enable the functionality when built on a compatible > > > > > > > >> platform. > > > > > > > >> > > > > > > > >> > Maybe, others has different opinions. > > > > > > > >> > > > > > > > >> I appreciate your opinion, and enjoy discussing this topic. > > > > > > > >> > > > > > > > >> > Another though is that there is, obviously, a network > > > > > > > >> > connection > > > > > > > >> > between the host and smartnic system. Maybe it's possible > > > > > > > >> > to just > > > > > > > >> > add an extra remote to the local ovsdb-server so CMS daemon > > > > > > > >> > on the > > > > > > > >> > host system could just add interfaces over the network > > > > > > > >> > connection? > > > > > > > >> > > > > > > > >> There are a few issues with such an approach. One of the main > > > > > > > >> goals > > > > > > > >> with providing and using a NIC with control plane CPUs is > > > > > > > >> having an > > > > > > > >> extra layer of security and isolation which is separate from > > > > > > > >> the > > > > > > > >> hypervisor host the card happens to share a PCI complex with > > > > > > > >> and draw > > > > > > > >> power from. Requiring a connection between the two for > > > > > > > >> operation would > > > > > > > >> defy this purpose. > > > > > > > >> > > > > > > > >> In addition to that, this class of cards provide visibility > > > > > > > >> into > > > > > > > >> kernel interfaces, enumeration of representor ports etc. only > > > > > > > >> from the > > > > > > > >> NIC control plane CPU side of the PCI complex, this > > > > > > > >> information is not > > > > > > > >> provided to the host. So if a hypervisor host CMS agent were > > > > > > > >> to do the > > > > > > > >> plugging through a remote ovsdb connection, it would have to > > > > > > > >> communicate with something else running on the NIC control > > > > > > > >> plane CPU > > > > > > > >> to retrieve the information it needs before it can know what > > > > > > > >> to relay > > > > > > > >> back over the ovsdb connection. > > > > > > > >> > > > > > > > >> -- > > > > > > > >> Frode Nordahl > > > > > > > >> > > > > > > > >> > Best regards, Ilya Maximets. > > > > > > > > > > > > > > > > Here are my 2 cents. > > > > > > > > > > > > > > > > Initially I had similar concerns to Ilya, and it seems OVN > > > > > > > > should stay away from the physical interface plugging. As a > > > > > > > > reference, here is how ovn-kubernetes is doing it without > > > > > > > > adding anything to OVN: > > > > > > > > https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing > > > > > > > > > > > > > > > > <https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing> > > > > > > > > > > > > > > AFAICT, a big part of the work is already done on the ovn-k8s > > > > > > > side: > > > > > > > https://github.com/ovn-org/ovn-kubernetes/pull/2005 > > > > > > > https://github.com/ovn-org/ovn-kubernetes/pull/2042 > > > > > > > > > > > > I am aware of the on-going effort to implement support for this in > > > > > > ovn-kubernetes directly. What we have identified is that there are > > > > > > other CMSs that want this functionality, and with that we have an > > > > > > opportunity to generalise and provide an abstraction in a common > > > > > > place > > > > > > that all the consuming CMSes can benefit from. > > > > > > > > > > > > > > > > > > > > > > However, thinking more about it, the proposed approach in this > > > > > > > > patch just expands the way how OVN can bind ports, utilizing > > > > > > > > the communication channel of OVN (OVSDB connections). If all > > > > > > > > the information regarding port binding can be specified by the > > > > > > > > CMS from NB, then it is not unnatural for ovn-controller to > > > > > > > > perform interface binding directly (instead of passively > > > > > > > > accepting what is attached by CMS). This kind of information > > > > > > > > already existed to some extent - the "requested_chassis" option > > > > > > > > in OpenStack. Now it seems this idea is just expanding it to a > > > > > > > > specific interface. The difference is that "requested_chassis" > > > > > > > > is used for validation only, but now we want to directly apply > > > > > > > > it. So I think at least I don't have a strong opinion on the > > > > > > > > idea. > > > > > > > > > > > > > > While it's, probably, OK for OVN to add port to the OVSDB, in many > > > > > > > cases these ports will require a lot of extra configuration which > > > > > > > is typically done by os-vif or CNI/device plugins. Imagine that > > > > > > > OVS > > > > > > > is running with userspace datapath and you need to plug-in some > > > > > > > DPDK > > > > > > > ports, where you have to specify the port type, DPDK port config, > > > > > > > porbably also number of rx/tx queues, number of descriptors in > > > > > > > these > > > > > > > queues. You may also want to configure affinity for these queues > > > > > > > per > > > > > > > PMD thread in OVS. For kernel interfaces it might be easier, but > > > > > > > they > > > > > > > also might require some extra configuration that OVN will have to > > > > > > > think about now. This is a typical job for CMS to configure this > > > > > > > kind of stuff, and that is why projects like os-vif or large > > > > > > > variety > > > > > > > of CNI/device plugins exists. > > > > > > > > > > > > CNI is Kubernetes specific, os-vif is OpenStack specific. And both > > > > > > of > > > > > > them get their information from the CMS. Providing support for > > > > > > userspace datapath and DPDK would require more information, some of > > > > > > which is available through devlink, some fit well in key/value > > > > > > options. Our initial target would be to support the kernel > > > > > > representor > > > > > > port workflow. > > > > > > > > > > > > > > > > > > > > > > There are some benefits: > > > > > > > > 1) The mechanism can be reused by different CMSes, which may > > > > > > > > simplify CMS implementation. > > > > > > > > 2) Compared with the ovn-k8s approach, it reuses OVN's > > > > > > > > communication channel, which avoids an extra CMS communication > > > > > > > > channel on the smart NIC side. (of course this can be achieved > > > > > > > > by a connection between the BM and smart NIC with *restricted* > > > > > > > > API just to convey the necessary information) > > > > > > > > > > > > > > The problem I see is that at least ovn-k8s, AFAIU, will require > > > > > > > the daemon on the Smart NIC anyways to monitor and configure > > > > > > > OVN components, e.g. to configure ovn-remote for ovn-controller > > > > > > > or run management appctl commands if required. > > > > > > > So, I don't see the point why it can't do plugging if it's already > > > > > > > there. > > > > > > > > > > > > This pattern is Kubernetes specific and is not the case for other > > > > > > CMSes. The current proposal for enabling Smart NIC with control > > > > > > plane > > > > > > CPUs for ovn-kubernetes could be simplified if the networking > > > > > > platform > > > > > > provided means for coordinating more of the network related bits. > > > > > > > > > > > > > > > > > > > > > > As to the negative side, it would increase OVN's complexity, > > > > > > > > and as mentioned by Ilya potentially breaks OVN's platform > > > > > > > > independence. To avoid this, I think the *plugging* module > > > > > > > > itself needs to be independent and pluggable. It can be > > > > > > > > extended as independent plugins. The plugin would need to > > > > > > > > define what information is needed in LSP's "options", and then > > > > > > > > implement corresponding drivers. With this approach, even the > > > > > > > > regular VIFs can be attached by ovn-controller if CMS can tell > > > > > > > > the interface name. Anyway, this is just my brief thinking. > > > > > > > > > > > > > > Aside from plugging, > > > > > > > I also don't see the reason to have devlink code in OVN just > > > > > > > because it runs once on the init stage, IIUC. So, I don't > > > > > > > understand why this information about the hardware cannot be > > > > > > > retrieved during the host provisioning and stored somewhere > > > > > > > (Nova config?). Isn't hardware inventory a job for tripleo > > > > > > > or something? > > > > > > > > > > > > As noted in one of the TODOs in the commit message of the RFC one of > > > > > > the work items to further develop this is to monitor devlink for > > > > > > changes, the end product will not do one-time initialization. > > > > > > > > > > > > While I agree with you that the current SR-IOV acceleration workflow > > > > > > configuration is pretty static and can be done at deploy time, this > > > > > > proposal prepares for the next generation subfunction workflow where > > > > > > you will have a much higher density, and run-time configuration and > > > > > > discovery of representor ports. There is a paper about it from > > > > > > netdev > > > > > > conf[2], and all of this is part of the devlink infrastructure > > > > > > (Linux > > > > > > kernel ABI). > > > > > > > > > > > > 2: > > > > > > https://netdevconf.info/0x14/pub/papers/45/0x14-paper45-talk-paper.pdf > > > > > > > > > My 2 cents. > > > > > > > > Thank you for weighing in with your opinion, Numan. > > > > > > > > > I agree with Ilya. It doesn't seem natural to me for OVN to create > > > > > OVS ports and also > > > > > to use devlink infrastructure. I think an external entity can do all > > > > > these. > > > > > > > > No doubt an external entity could do this, the challenge is that due > > > > to how the SmartNIC with separate control plane CPUs are laid out > > > > existing CMS tooling cannot do it, and we are working out a way to > > > > efficiently and securely coordinate how and when this external entity > > > > does the plugging. At the end of the day the data being passed around > > > > is networking related, and I believe it is possible to create a > > > > solution that would be consumable by all OVN-enabled CMS's. We'd be > > > > happy to prove this out outside of the ovn-controller first if that is > > > > more palatable, see below. > > > > > > > > > The other downside I see with this RFC is that it will be now > > > > > monitoring all the port binding > > > > > rows without any condition because when a port binding is created the > > > > > chassis column will > > > > > be NULL. Correct me If I'm wrong here. > > > > > > > > The intention of the condition is to consider unbound ports for > > > > plugging, as soon as they have been plugged and bound there will be a > > > > value in the chassis column and at that point only the respective > > > > chassis will monitor them. I guess this could be a problem if there is > > > > a large number of unbound ports in a deployment, from your experience > > > > would that be the case? (When I think of it I do remember the load > > > > balancer vip perhaps being a consumer of unbound ports). > > > > > > > > We considered using a separate LSP type, but apart from the plugging > > > > bit these LSP's will behave just like regular VIFs from OVN Port > > > > Binding code POV, and if all your hypervisors have SmartNICs with > > > > control plane CPUs there would be just as many of those ports.
> > > In my opinion having a separate type would be better. At the least it > > > will not impact > > > normal or existing deployments. > > > > > > Let's say we call a type as "representor" (just picking a name for > > > now), then each ovn-controller > > > can add a conditional monitor to get updates for these types (just > > > like how we do now for patch or external ports). > > > > > > What do you think of the below approach > > > > > > 1. A new LSP type - representor is added. CMS when creating this > > > lport also sets the requested-chassis option. > > > > Works for me. > > > I think it is better not to add a new LSP type for this purpose. It would be > better to make this "interface plugging" feature generic, so that it can work > for other types as well. It could be supported in the future for a regular > VIF, or for a new LSP type, so I think this behavior is better not to be tied > to a LSP type. > > We could add a new column in the Port_Binding table: plugged_by. > ovn-controller can monitor all port_bindings that have "plugged_by" equals to > current chassis. This way it should be backward compatible and not affecting > the use cases that doesn't need "plugged_by". That is a good idea, during development of the RFC I did make an attempt at creating an IDL index for the options:requested-chassis, but that does not work because it would only match rows with requested-chassis as the *only* option. Creating a new column would resolve this issue! > The details of plugging information can be stored in the options column with > new options defined, such as: "plug_type". For representor port the > "plug_type" can be defined as "representor". For each plug_type there can be > type-specific information defined and also stored in the options column. (Or > we can define a new column "plug_options" just for plugging information). A convention to prefix all plugging related options the same way, let's say: 'plug_', would work as well as a separate column, but I have nothing against a separate column if there is appetite for that. > > > 2. ovn-controller on the requested chassis sees this lport type, > > > creates an OVS port/interface and possibly sets some options or hints > > > in the external_ids column > > > of ovs interface. At this point, it will not do anything. > > > > This could potentially help us avoid the issue of adding more > > processes and IDLs talking to the SB DB. > > > > However, at this point in time, the ovn-controller does not know the > > details of the representor port, such as name/ifindex for the kernel > > representor port, and the interface it creates in OVS has to match the > > kernel representor port, and afict it is not possible to change these > > details after port creation. This information is provided by the > > devlink lookup in the current proposal. > > > > It might work if it was possible to create the port and interface > > separately, but that would cause a referential integrity violation. > > Another alternative could be that ovn-vif created a new interface > > record in 4), attaching that to the port record created by > > ovn-controller replacing a throw-away interface record, but we would > > be approaching hacky territory. > > > > Could we do some sort of local IPC perhaps through a Unix domain socket? > > I think it would be good to have separate *modules* to handle interface > plugging (one module per plug_type), but there is no need to have a separate > *process* for it, which increases the complexity and the IPC cost. The > modules can be invoked as callbacks registered for each plug_type, and > responsible for explaining the options defined for its type, which is > directly retrieved from the port_binding record from SB DB. This way the core > ovn-controller components are kept untouched (except calling the plugging > modules at some point, but not worrying about the details), without > introducing a separate process. (A separate process is needed only if there > are performance concerns or blocking I/Os). > > It is similar to how different dpif providers (or different dev types) are > supported in OVS. Yes, I like this. The currently proposed devlink informed plugging would have no issues with blocking I/O, and we could document a "charter" for the ovn-vif project that anything going into it would have to operate as if it was native OVN asynchronous code. > > > > > 3. An external entity (similar to your proposed pluggable.c) running > > > on the chassis, would have built an shash of "devlink_ports" using > > > the devlink sockets. > > > > > > 4. This entity will also be monitoring the ovs interfaces. When it > > > sees a new interface with the proper hints, will set the > > > external_ids:iface-id=<lport_name> > > > > I think setting iface-id (if still necessary) shall be done together with the > port creation, by the plugging module (through a callback). +1 > > > 5. ovn-controller when it sees the external_ids:iface-id, will bind > > > the port and add flows. > > > > > > > > > With (2) we now add the support in ovn-controller to create OVS ports, > > > but at the same time OVN will not bother about the internal details > > > of SMART nics > > > or bother about the devlink sockets. > > > > > > I'm not sure if the above approach seems hacky. The external entity > > > could be called "ovn-vif". > > > > > > What do you think ? > > > > This is an interesting idea and it could work if we figure out a way > > for ovn-controller and ovn-vif to exchange information before the > > port/interface is created. > > > > I'd really prefer not to introduce such extra IPC. Would the plug_type > mechanism be generic enough and avoid such external entities? I agree, optionally pulling the code in at compile time would make things much simpler and more effective. -- Frode Nordahl > Thanks, > Han > > > > > > > > > > Perhaps there can be a separate project called - ovn-vif which does > > > > > this ? > > > > > > > > Creating a separate ovn-org project called ovn-vif would allow us to > > > > move forward, we would of course have to document the optional CMS API > > > > in core OVN pointing to an optional external dependency for their use. > > > > Would this be an acceptable compromise? > > > > > > I personally do not have any objections to that. Maybe others have > > > different opinions. > > > > Great. > > > > Cheers! > > > > -- > > Frode Nordahl > > > > > > > > > > Thanks > > > Numan > > > > > > > > > > > -- > > > > Frode Nordahl > > > > > > > > > Thanks > > > > > Numan > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Frode Nordahl > > > > > > > > > > > > > Best regards, Ilya Maximets. > > > > > > _______________________________________________ > > > > > > dev mailing list > > > > > > [email protected] > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > > > _______________________________________________ > > > > dev mailing list > > > > [email protected] > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > > > > > -- > > Frode Nordahl > > _______________________________________________ > > dev mailing list > > [email protected] > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
