As we discussed in the IRC meeting on Thu, July 15th I'm working on a new RFC patch series which incorporates the discussion so far and I'll propose that to the list when done.
For anyone following along I'll be staging it over here: https://github.com/fnordahl/ovn/commits/pluggingv2 -- Frode Nordahl On Tue, Jul 13, 2021 at 10:47 AM Frode Nordahl <[email protected]> wrote: > > On Fri, Jul 9, 2021 at 9:04 AM Frode Nordahl > <[email protected]> wrote: > > > > On Thu, Jul 8, 2021 at 11:58 AM Frode Nordahl > > <[email protected]> wrote: > > > > > > On Wed, Jul 7, 2021 at 8:30 AM Frode Nordahl > > > <[email protected]> wrote: > > > > > > > > On Tue, Jul 6, 2021 at 8:11 PM Han Zhou <[email protected]> wrote: > > > > > > > > > > > > > > > > > > > > On Tue, Jul 6, 2021 at 1:19 AM Frode Nordahl > > > > > <[email protected]> wrote: > > > > > > > > > > > > On Mon, Jul 5, 2021 at 6:57 PM Numan Siddique <[email protected]> > > > > > > wrote: > > > > > > > > > > > > > > On Mon, Jul 5, 2021 at 12:12 PM Frode Nordahl > > > > > > > <[email protected]> wrote: > > > > > > > > > > > > > > > > On Wed, Jun 30, 2021 at 12:32 AM Numan Siddique > > > > > > > > <[email protected]> wrote: > > > > > > > > > > > > > > > > > > On Thu, Jun 10, 2021 at 10:13 AM Frode Nordahl > > > > > > > > > <[email protected]> wrote: > > > > > > > > > > > > > > > > > > > > On Thu, Jun 10, 2021 at 1:46 PM Ilya Maximets > > > > > > > > > > <[email protected]> wrote: > > > > > > > > > > > > > > > > > > > > > > On 6/10/21 8:36 AM, Han Zhou wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, May 13, 2021 at 9:25 AM Frode Nordahl > > > > > > > > > > > > <[email protected] > > > > > > > > > > > > <mailto:[email protected]>> wrote: > > > > > > > > > > > >> > > > > > > > > > > > >> On Thu, May 13, 2021 at 5:12 PM Ilya Maximets > > > > > > > > > > > >> <[email protected] <mailto:[email protected]>> wrote: > > > > > > > > > > > >> > > > > > > > > > > > > >> > On 5/9/21 4:03 PM, Frode Nordahl wrote: > > > > > > > > > > > >> > > Introduce plugging module that adds and removes > > > > > > > > > > > >> > > ports on the > > > > > > > > > > > >> > > integration bridge, as directed by Port_Binding > > > > > > > > > > > >> > > options. > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > Traditionally it has been the CMSs responsibility > > > > > > > > > > > >> > > to create Virtual > > > > > > > > > > > >> > > Interfaces (VIFs) as part of instance (Container, > > > > > > > > > > > >> > > Pod, Virtual > > > > > > > > > > > >> > > Machine etc.) life cycle, and subsequently manage > > > > > > > > > > > >> > > plug/unplug > > > > > > > > > > > >> > > operations on the Open vSwitch integration bridge. > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > With the advent of NICs connected to multiple > > > > > > > > > > > >> > > distinct CPUs we can > > > > > > > > > > > >> > > have a topology where the instance runs on one > > > > > > > > > > > >> > > host and Open > > > > > > > > > > > >> > > vSwitch and OVN runs on a different host, the > > > > > > > > > > > >> > > smartnic CPU. > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > The act of plugging and unplugging the representor > > > > > > > > > > > >> > > port in Open > > > > > > > > > > > >> > > vSwitch running on the smartnic host CPU would be > > > > > > > > > > > >> > > the same for > > > > > > > > > > > >> > > every smartnic variant (thanks to the > > > > > > > > > > > >> > > devlink-port[0][1] > > > > > > > > > > > >> > > infrastructure) and every CMS (Kubernetes, LXD, > > > > > > > > > > > >> > > OpenStack, etc.). > > > > > > > > > > > >> > > As such it is natural to extend OVN to provide > > > > > > > > > > > >> > > this common > > > > > > > > > > > >> > > functionality through its CMS facing API. > > > > > > > > > > > >> > > > > > > > > > > > > >> > Hi, Frode. Thanks for putting this together, but it > > > > > > > > > > > >> > doesn't look > > > > > > > > > > > >> > natural to me. OVN, AFAIK, never touched physical > > > > > > > > > > > >> > devices or > > > > > > > > > > > >> > interacted with the kernel directly. This change > > > > > > > > > > > >> > introduces completely > > > > > > > > > > > >> > new functionality inside OVN. With the same effect > > > > > > > > > > > >> > we can run a fully > > > > > > > > > > > >> > separate service on these smartnic CPUs that will do > > > > > > > > > > > >> > plugging > > > > > > > > > > > >> > and configuration job for CMS. You may even make it > > > > > > > > > > > >> > independent > > > > > > > > > > > >> > from a particular CMS by creating a REST API for it > > > > > > > > > > > >> > or whatever. > > > > > > > > > > > >> > This will additionally allow using same service for > > > > > > > > > > > >> > non-OVN setups. > > > > > > > > > > > >> > > > > > > > > > > > >> Ilya, > > > > > > > > > > > >> > > > > > > > > > > > >> Thank you for taking the time to comment, much > > > > > > > > > > > >> appreciated. > > > > > > > > > > > >> > > > > > > > > > > > >> Yes, this is new functionality, NICs with separate > > > > > > > > > > > >> control plane CPUs > > > > > > > > > > > >> and isolation from the host are also new, so this is > > > > > > > > > > > >> one proposal for > > > > > > > > > > > >> how we could go about to enable the use of them. > > > > > > > > > > > >> > > > > > > > > > > > >> The OVN controller does today get pretty close to the > > > > > > > > > > > >> physical realm > > > > > > > > > > > >> by maintaining patch ports in Open vSwitch based on > > > > > > > > > > > >> bridge mapping > > > > > > > > > > > >> configuration and presence of bridges to physical > > > > > > > > > > > >> interfaces. It also > > > > > > > > > > > >> does react to events of physical interfaces being > > > > > > > > > > > >> plugged into the > > > > > > > > > > > >> Open vSwitch instance it manages, albeit to date some > > > > > > > > > > > >> other entity has > > > > > > > > > > > >> been doing the act of adding the port into the bridge. > > > > > > > > > > > >> > > > > > > > > > > > >> The rationale for proposing to use the OVN database > > > > > > > > > > > >> for coordinating > > > > > > > > > > > >> this is that the information about which ports to > > > > > > > > > > > >> bind, and where to > > > > > > > > > > > >> bind them is already there. The timing of the > > > > > > > > > > > >> information flow from > > > > > > > > > > > >> the CMS is also suitable for the task. > > > > > > > > > > > >> > > > > > > > > > > > >> OVN relies on OVS library code, and all the necessary > > > > > > > > > > > >> libraries for > > > > > > > > > > > >> interfacing with the kernel through netlink and > > > > > > > > > > > >> friends are there or > > > > > > > > > > > >> would be easy to add. The rationale for using the > > > > > > > > > > > >> netlink-devlink > > > > > > > > > > > >> interface is that it provides a generic infrastructure > > > > > > > > > > > >> for these types > > > > > > > > > > > >> of NICs. So by using this interface we should be able > > > > > > > > > > > >> to support most > > > > > > > > > > > >> if not all of the variants of these cards. > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> Providing a separate OVN service to do the task could > > > > > > > > > > > >> work, but would > > > > > > > > > > > >> have the cost of an extra SB DB connection, IDL and > > > > > > > > > > > >> monitors. > > > > > > > > > > > > > > > > > > > > > > IMHO, CMS should never connect to Southbound DB. It's > > > > > > > > > > > just because the > > > > > > > > > > > Southbound DB is not meant to be a public interface, it > > > > > > > > > > > just happened > > > > > > > > > > > to be available for connections. I know that OpenStack > > > > > > > > > > > has metadata > > > > > > > > > > > agents that connects to Sb DB, but if it's really > > > > > > > > > > > required for them, I > > > > > > > > > > > think, there should be a different way to get/set > > > > > > > > > > > required information > > > > > > > > > > > without connection to the Southbound. > > > > > > > > > > > > > > > > > > > > The CMS-facing API is the Northbound DB, I was not > > > > > > > > > > suggesting direct > > > > > > > > > > use of the Southbound DB by external to OVN services. My > > > > > > > > > > suggestion > > > > > > > > > > was to have a separate OVN process do this if your > > > > > > > > > > objection was to > > > > > > > > > > handle it as part of the ovn-controller process. > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > >> I fear it would be quite hard to build a whole > > > > > > > > > > > >> separate project with > > > > > > > > > > > >> its own API, feels like a lot of duplicated effort > > > > > > > > > > > >> when the flow of > > > > > > > > > > > >> data and APIs in OVN already align so well with CMSs > > > > > > > > > > > >> interested in > > > > > > > > > > > >> using this? > > > > > > > > > > > >> > > > > > > > > > > > >> > Interactions with physical devices also makes OVN > > > > > > > > > > > >> > linux-dependent > > > > > > > > > > > >> > at least for this use case, IIUC. > > > > > > > > > > > >> > > > > > > > > > > > >> This specific bit would be linux-specific in the first > > > > > > > > > > > >> iteration, yes. > > > > > > > > > > > >> But the vendors manufacturing and distributing the > > > > > > > > > > > >> hardware do often > > > > > > > > > > > >> have drivers for other platforms, I am sure the > > > > > > > > > > > >> necessary > > > > > > > > > > > >> infrastructure will become available there too over > > > > > > > > > > > >> time, if it is not > > > > > > > > > > > >> there already. > > > > > > > > > > > >> > > > > > > > > > > > >> We do currently have platform specific macros in the > > > > > > > > > > > >> OVN build system, > > > > > > > > > > > >> so we could enable the functionality when built on a > > > > > > > > > > > >> compatible > > > > > > > > > > > >> platform. > > > > > > > > > > > >> > > > > > > > > > > > >> > Maybe, others has different opinions. > > > > > > > > > > > >> > > > > > > > > > > > >> I appreciate your opinion, and enjoy discussing this > > > > > > > > > > > >> topic. > > > > > > > > > > > >> > > > > > > > > > > > >> > Another though is that there is, obviously, a > > > > > > > > > > > >> > network connection > > > > > > > > > > > >> > between the host and smartnic system. Maybe it's > > > > > > > > > > > >> > possible to just > > > > > > > > > > > >> > add an extra remote to the local ovsdb-server so CMS > > > > > > > > > > > >> > daemon on the > > > > > > > > > > > >> > host system could just add interfaces over the > > > > > > > > > > > >> > network connection? > > > > > > > > > > > >> > > > > > > > > > > > >> There are a few issues with such an approach. One of > > > > > > > > > > > >> the main goals > > > > > > > > > > > >> with providing and using a NIC with control plane CPUs > > > > > > > > > > > >> is having an > > > > > > > > > > > >> extra layer of security and isolation which is > > > > > > > > > > > >> separate from the > > > > > > > > > > > >> hypervisor host the card happens to share a PCI > > > > > > > > > > > >> complex with and draw > > > > > > > > > > > >> power from. Requiring a connection between the two for > > > > > > > > > > > >> operation would > > > > > > > > > > > >> defy this purpose. > > > > > > > > > > > >> > > > > > > > > > > > >> In addition to that, this class of cards provide > > > > > > > > > > > >> visibility into > > > > > > > > > > > >> kernel interfaces, enumeration of representor ports > > > > > > > > > > > >> etc. only from the > > > > > > > > > > > >> NIC control plane CPU side of the PCI complex, this > > > > > > > > > > > >> information is not > > > > > > > > > > > >> provided to the host. So if a hypervisor host CMS > > > > > > > > > > > >> agent were to do the > > > > > > > > > > > >> plugging through a remote ovsdb connection, it would > > > > > > > > > > > >> have to > > > > > > > > > > > >> communicate with something else running on the NIC > > > > > > > > > > > >> control plane CPU > > > > > > > > > > > >> to retrieve the information it needs before it can > > > > > > > > > > > >> know what to relay > > > > > > > > > > > >> back over the ovsdb connection. > > > > > > > > > > > >> > > > > > > > > > > > >> -- > > > > > > > > > > > >> Frode Nordahl > > > > > > > > > > > >> > > > > > > > > > > > >> > Best regards, Ilya Maximets. > > > > > > > > > > > > > > > > > > > > > > > > Here are my 2 cents. > > > > > > > > > > > > > > > > > > > > > > > > Initially I had similar concerns to Ilya, and it seems > > > > > > > > > > > > OVN should stay away from the physical interface > > > > > > > > > > > > plugging. As a reference, here is how ovn-kubernetes is > > > > > > > > > > > > doing it without adding anything to OVN: > > > > > > > > > > > > https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing > > > > > > > > > > > > > > > > > > > > > > > > <https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing> > > > > > > > > > > > > > > > > > > > > > > AFAICT, a big part of the work is already done on the > > > > > > > > > > > ovn-k8s side: > > > > > > > > > > > https://github.com/ovn-org/ovn-kubernetes/pull/2005 > > > > > > > > > > > https://github.com/ovn-org/ovn-kubernetes/pull/2042 > > > > > > > > > > > > > > > > > > > > I am aware of the on-going effort to implement support for > > > > > > > > > > this in > > > > > > > > > > ovn-kubernetes directly. What we have identified is that > > > > > > > > > > there are > > > > > > > > > > other CMSs that want this functionality, and with that we > > > > > > > > > > have an > > > > > > > > > > opportunity to generalise and provide an abstraction in a > > > > > > > > > > common place > > > > > > > > > > that all the consuming CMSes can benefit from. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > However, thinking more about it, the proposed approach > > > > > > > > > > > > in this patch just expands the way how OVN can bind > > > > > > > > > > > > ports, utilizing the communication channel of OVN > > > > > > > > > > > > (OVSDB connections). If all the information regarding > > > > > > > > > > > > port binding can be specified by the CMS from NB, then > > > > > > > > > > > > it is not unnatural for ovn-controller to perform > > > > > > > > > > > > interface binding directly (instead of passively > > > > > > > > > > > > accepting what is attached by CMS). This kind of > > > > > > > > > > > > information already existed to some extent - the > > > > > > > > > > > > "requested_chassis" option in OpenStack. Now it seems > > > > > > > > > > > > this idea is just expanding it to a specific interface. > > > > > > > > > > > > The difference is that "requested_chassis" is used for > > > > > > > > > > > > validation only, but now we want to directly apply it. > > > > > > > > > > > > So I think at least I don't have a strong opinion on > > > > > > > > > > > > the idea. > > > > > > > > > > > > > > > > > > > > > > > While it's, probably, OK for OVN to add port to the > > > > > > > > > > > OVSDB, in many > > > > > > > > > > > cases these ports will require a lot of extra > > > > > > > > > > > configuration which > > > > > > > > > > > is typically done by os-vif or CNI/device plugins. > > > > > > > > > > > Imagine that OVS > > > > > > > > > > > is running with userspace datapath and you need to > > > > > > > > > > > plug-in some DPDK > > > > > > > > > > > ports, where you have to specify the port type, DPDK port > > > > > > > > > > > config, > > > > > > > > > > > porbably also number of rx/tx queues, number of > > > > > > > > > > > descriptors in these > > > > > > > > > > > queues. You may also want to configure affinity for > > > > > > > > > > > these queues per > > > > > > > > > > > PMD thread in OVS. For kernel interfaces it might be > > > > > > > > > > > easier, but they > > > > > > > > > > > also might require some extra configuration that OVN will > > > > > > > > > > > have to > > > > > > > > > > > think about now. This is a typical job for CMS to > > > > > > > > > > > configure this > > > > > > > > > > > kind of stuff, and that is why projects like os-vif or > > > > > > > > > > > large variety > > > > > > > > > > > of CNI/device plugins exists. > > > > > > > > > > > > > > > > > > > > CNI is Kubernetes specific, os-vif is OpenStack specific. > > > > > > > > > > And both of > > > > > > > > > > them get their information from the CMS. Providing support > > > > > > > > > > for > > > > > > > > > > userspace datapath and DPDK would require more information, > > > > > > > > > > some of > > > > > > > > > > which is available through devlink, some fit well in > > > > > > > > > > key/value > > > > > > > > > > options. Our initial target would be to support the kernel > > > > > > > > > > representor > > > > > > > > > > port workflow. > > Circling back to the DPDK argument, I had a look at what this would > look like in practice. For the DPDK representor port workflow, the > information required from CMS would be which PF and a logical VF > number. In the current proposal the CMS provides a MAC address of PF > and a VF number, with that information the plugging library can look > up a PCI address and fill the `options:dpdk-devargs` for example > '0000:03:00.0,representor=[0]'. > > When OVN is running on a SmartNIC, there will be no vHost sockets as > the workload would be running on a different host, at the other end of > the host facing PF+VF pair. > > As for other tuning parameters such as queues, descriptors and queue > affinity, it appears to me CMS systems are generally unaware of this > tuning today and that blanket configuration is applied through other > means. When OVN is running on a SmartNIC any queue affinity options > would be relative to control plane CPU cores and have no relation to > which host CPU cores the workload is running on at the other end of > the host facing PF+VF pair. We could choose to replicate this behavior > by having the plugging library have its own defaults, possibly > influenced by local per chassis configuration and keep this out of the > CMS API. > > Does this address your concerns wrt. DPDK support or do you have other > examples of configuration we would need to consider? > > > > > > > > > > > > > > > > > > > > > > > > > There are some benefits: > > > > > > > > > > > > 1) The mechanism can be reused by different CMSes, > > > > > > > > > > > > which may simplify CMS implementation. > > > > > > > > > > > > 2) Compared with the ovn-k8s approach, it reuses OVN's > > > > > > > > > > > > communication channel, which avoids an extra CMS > > > > > > > > > > > > communication channel on the smart NIC side. (of course > > > > > > > > > > > > this can be achieved by a connection between the BM and > > > > > > > > > > > > smart NIC with *restricted* API just to convey the > > > > > > > > > > > > necessary information) > > > > > > > > > > > > > > > > > > > > > > The problem I see is that at least ovn-k8s, AFAIU, will > > > > > > > > > > > require > > > > > > > > > > > the daemon on the Smart NIC anyways to monitor and > > > > > > > > > > > configure > > > > > > > > > > > OVN components, e.g. to configure ovn-remote for > > > > > > > > > > > ovn-controller > > > > > > > > > > > or run management appctl commands if required. > > > > > > > > > > > So, I don't see the point why it can't do plugging if > > > > > > > > > > > it's already > > > > > > > > > > > there. > > > > > > > > > > > > > > > > > > > > This pattern is Kubernetes specific and is not the case for > > > > > > > > > > other > > > > > > > > > > CMSes. The current proposal for enabling Smart NIC with > > > > > > > > > > control plane > > > > > > > > > > CPUs for ovn-kubernetes could be simplified if the > > > > > > > > > > networking platform > > > > > > > > > > provided means for coordinating more of the network related > > > > > > > > > > bits. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > As to the negative side, it would increase OVN's > > > > > > > > > > > > complexity, and as mentioned by Ilya potentially breaks > > > > > > > > > > > > OVN's platform independence. To avoid this, I think the > > > > > > > > > > > > *plugging* module itself needs to be independent and > > > > > > > > > > > > pluggable. It can be extended as independent plugins. > > > > > > > > > > > > The plugin would need to define what information is > > > > > > > > > > > > needed in LSP's "options", and then implement > > > > > > > > > > > > corresponding drivers. With this approach, even the > > > > > > > > > > > > regular VIFs can be attached by ovn-controller if CMS > > > > > > > > > > > > can tell the interface name. Anyway, this is just my > > > > > > > > > > > > brief thinking. > > > > > > > > > > > > > > > > > > > > > > Aside from plugging, > > > > > > > > > > > I also don't see the reason to have devlink code in OVN > > > > > > > > > > > just > > > > > > > > > > > because it runs once on the init stage, IIUC. So, I don't > > > > > > > > > > > understand why this information about the hardware cannot > > > > > > > > > > > be > > > > > > > > > > > retrieved during the host provisioning and stored > > > > > > > > > > > somewhere > > > > > > > > > > > (Nova config?). Isn't hardware inventory a job for > > > > > > > > > > > tripleo > > > > > > > > > > > or something? > > > > > > > > > > > > > > > > > > > > As noted in one of the TODOs in the commit message of the > > > > > > > > > > RFC one of > > > > > > > > > > the work items to further develop this is to monitor > > > > > > > > > > devlink for > > > > > > > > > > changes, the end product will not do one-time > > > > > > > > > > initialization. > > > > > > > > > > > > > > > > > > > > While I agree with you that the current SR-IOV acceleration > > > > > > > > > > workflow > > > > > > > > > > configuration is pretty static and can be done at deploy > > > > > > > > > > time, this > > > > > > > > > > proposal prepares for the next generation subfunction > > > > > > > > > > workflow where > > > > > > > > > > you will have a much higher density, and run-time > > > > > > > > > > configuration and > > > > > > > > > > discovery of representor ports. There is a paper about it > > > > > > > > > > from netdev > > > > > > > > > > conf[2], and all of this is part of the devlink > > > > > > > > > > infrastructure (Linux > > > > > > > > > > kernel ABI). > > > > > > > > > > > > > > > > > > > > 2: > > > > > > > > > > https://netdevconf.info/0x14/pub/papers/45/0x14-paper45-talk-paper.pdf > > > > > > > > > > > > > > > > > My 2 cents. > > > > > > > > > > > > > > > > Thank you for weighing in with your opinion, Numan. > > > > > > > > > > > > > > > > > I agree with Ilya. It doesn't seem natural to me for OVN to > > > > > > > > > create > > > > > > > > > OVS ports and also > > > > > > > > > to use devlink infrastructure. I think an external entity > > > > > > > > > can do all these. > > > > > > > > > > > > > > > > No doubt an external entity could do this, the challenge is > > > > > > > > that due > > > > > > > > to how the SmartNIC with separate control plane CPUs are laid > > > > > > > > out > > > > > > > > existing CMS tooling cannot do it, and we are working out a way > > > > > > > > to > > > > > > > > efficiently and securely coordinate how and when this external > > > > > > > > entity > > > > > > > > does the plugging. At the end of the day the data being passed > > > > > > > > around > > > > > > > > is networking related, and I believe it is possible to create a > > > > > > > > solution that would be consumable by all OVN-enabled CMS's. > > > > > > > > We'd be > > > > > > > > happy to prove this out outside of the ovn-controller first if > > > > > > > > that is > > > > > > > > more palatable, see below. > > > > > > > > > > > > > > > > > The other downside I see with this RFC is that it will be now > > > > > > > > > monitoring all the port binding > > > > > > > > > rows without any condition because when a port binding is > > > > > > > > > created the > > > > > > > > > chassis column will > > > > > > > > > be NULL. Correct me If I'm wrong here. > > > > > > > > > > > > > > > > The intention of the condition is to consider unbound ports for > > > > > > > > plugging, as soon as they have been plugged and bound there > > > > > > > > will be a > > > > > > > > value in the chassis column and at that point only the > > > > > > > > respective > > > > > > > > chassis will monitor them. I guess this could be a problem if > > > > > > > > there is > > > > > > > > a large number of unbound ports in a deployment, from your > > > > > > > > experience > > > > > > > > would that be the case? (When I think of it I do remember the > > > > > > > > load > > > > > > > > balancer vip perhaps being a consumer of unbound ports). > > > > > > > > > > > > > > > > We considered using a separate LSP type, but apart from the > > > > > > > > plugging > > > > > > > > bit these LSP's will behave just like regular VIFs from OVN Port > > > > > > > > Binding code POV, and if all your hypervisors have SmartNICs > > > > > > > > with > > > > > > > > control plane CPUs there would be just as many of those ports. > > > > > > > > > > > > > > > In my opinion having a separate type would be better. At the > > > > > > > least it > > > > > > > will not impact > > > > > > > normal or existing deployments. > > > > > > > > > > > > > > Let's say we call a type as "representor" (just picking a name > > > > > > > for > > > > > > > now), then each ovn-controller > > > > > > > can add a conditional monitor to get updates for these types (just > > > > > > > like how we do now for patch or external ports). > > > > > > > > > > > > > > > > > > > > What do you think of the below approach > > > > > > > > > > > > > > 1. A new LSP type - representor is added. CMS when creating this > > > > > > > lport also sets the requested-chassis option. > > > > > > > > > > > > Works for me. > > > > > > > > > > > I think it is better not to add a new LSP type for this purpose. It > > > > > would be better to make this "interface plugging" feature generic, so > > > > > that it can work for other types as well. It could be supported in > > > > > the future for a regular VIF, or for a new LSP type, so I think this > > > > > behavior is better not to be tied to a LSP type. > > > > > > > > > > We could add a new column in the Port_Binding table: plugged_by. > > > > > ovn-controller can monitor all port_bindings that have "plugged_by" > > > > > equals to current chassis. This way it should be backward compatible > > > > > and not affecting the use cases that doesn't need "plugged_by". > > > > > > > > That is a good idea, during development of the RFC I did make an > > > > attempt at creating an IDL index for the options:requested-chassis, > > > > but that does not work because it would only match rows with > > > > requested-chassis as the *only* option. > > > > > > > > Creating a new column would resolve this issue! > > > > > > Looking into how this bit could be implemented, I see that northd is > > > already performing lookup of chassis by name in several places, so I'm > > > thinking we could accomplish this without any change to NB schema. > > > > > > With the following change to SB schema, we could have northd check for > > > presence of Logical_Switch_Port options:plug_type and > > > options:requested-chassis, if both are there look up chassis UUID by > > > name in options:requested-chassis and insert into SB Port_Binding > > > plugged_by column. > > > > > > diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema > > > index bbf60781d..0dee8e155 100644 > > > --- a/ovn-sb.ovsschema > > > +++ b/ovn-sb.ovsschema > > > @@ -229,7 +229,11 @@ > > > "external_ids": {"type": {"key": "string", > > > "value": "string", > > > "min": 0, > > > - "max": "unlimited"}}}, > > > + "max": "unlimited"}}, > > > + "plugged_by": {"type": {"key": {"type": "uuid", > > > + "refTable": "Chassis", > > > + "refType": "weak"}, > > > + "min": 0, "max": 1}}}, > > > "indexes": [["datapath", "tunnel_key"], ["logical_port"]], > > > "isRoot": true}, > > > "MAC_Binding": { > > > > > > What do you think? > > > > > > -- > > > Frode Nordahl > > > > > > > > The details of plugging information can be stored in the options > > > > > column with new options defined, such as: "plug_type". For > > > > > representor port the "plug_type" can be defined as "representor". For > > > > > each plug_type there can be type-specific information defined and > > > > > also stored in the options column. (Or we can define a new column > > > > > "plug_options" just for plugging information). > > > > > > > > A convention to prefix all plugging related options the same way, > > > > let's say: 'plug_', would work as well as a separate column, but I > > > > have nothing against a separate column if there is appetite for that. > > > > > > > > > > > 2. ovn-controller on the requested chassis sees this lport type, > > > > > > > creates an OVS port/interface and possibly sets some options or > > > > > > > hints > > > > > > > in the external_ids column > > > > > > > of ovs interface. At this point, it will not do anything. > > > > > > > > > > > > This could potentially help us avoid the issue of adding more > > > > > > processes and IDLs talking to the SB DB. > > > > > > > > > > > > However, at this point in time, the ovn-controller does not know the > > > > > > details of the representor port, such as name/ifindex for the kernel > > > > > > representor port, and the interface it creates in OVS has to match > > > > > > the > > > > > > kernel representor port, and afict it is not possible to change > > > > > > these > > > > > > details after port creation. This information is provided by the > > > > > > devlink lookup in the current proposal. > > > > > > > > > > > > It might work if it was possible to create the port and interface > > > > > > separately, but that would cause a referential integrity violation. > > > > > > Another alternative could be that ovn-vif created a new interface > > > > > > record in 4), attaching that to the port record created by > > > > > > ovn-controller replacing a throw-away interface record, but we would > > > > > > be approaching hacky territory. > > > > > > > > > > > > Could we do some sort of local IPC perhaps through a Unix domain > > > > > > socket? > > > > > > > > > > I think it would be good to have separate *modules* to handle > > > > > interface plugging (one module per plug_type), but there is no need > > > > > to have a separate *process* for it, which increases the complexity > > > > > and the IPC cost. The modules can be invoked as callbacks registered > > > > > for each plug_type, and responsible for explaining the options > > > > > defined for its type, which is directly retrieved from the > > > > > port_binding record from SB DB. This way the core ovn-controller > > > > > components are kept untouched (except calling the plugging modules at > > > > > some point, but not worrying about the details), without introducing > > > > > a separate process. (A separate process is needed only if there are > > > > > performance concerns or blocking I/Os). > > > > > > > > > > It is similar to how different dpif providers (or different dev > > > > > types) are supported in OVS. > > > > > > > > Yes, I like this. The currently proposed devlink informed plugging > > > > would have no issues with blocking I/O, and we could document a > > > > "charter" for the ovn-vif project that anything going into it would > > > > have to operate as if it was native OVN asynchronous code. > > > > > > > > This bit could possibly be implemented with an interface like this > > (note that this is draft/pseudocode at this point): > > diff --git a/controller/plug-provider.h b/controller/plug-provider.h > > new file mode 100644 > > index 000000000..9b7a2e722 > > --- /dev/null > > +++ b/controller/plug-provider.h > > @@ -0,0 +1,45 @@ > > +struct plug { > > + const struct plug_class *plug_class; > > +}; > > + > > +struct plug_iface { > > + char *iface_name; > > + struct smap iface_external_ids; > > + struct smap iface_options; > > + struct smap iface_other_config; > > +}; > > + > > +struct plug_class { > > + /* Type of plugger in this class. */ > > + const char *type; > > + > > + /* Called when the plug provider is registered, typically at program > > + * startup. */ > > + int (*init)(void); > > + > > + /* Performs periodic work needed by plugger, if any is necessary. */ > > + bool (*run)(struct plug *plug); > > + > > + /* Called when the plug provider is removed, typically at program > > exit. */ > > + int (*destroy)(struct plug *plug); > > + > > + /* Pass lport name and options to plugger to prepare for port creation. > > + * > > + * The plug implemantation can perform lookup or any per port > > + * initialization and should fill plug_iface with data required for > > + * port/interface creation. > > + * > > + * OVN owns the contents of plug_iface struct and is responsible to > > free > > + * any data allocated when done with it. */ > > + bool (*plug_prepare)(const char *lport_name, > > + const struct smap *lport_options, > > + struct plug_iface *); > > + > > + /* Notify plugger that port was created. */ > > + bool (*plug_port_created)(const char *lport_name, > > + const struct smap *lport_options); > > + > > + /* Notify plugger that port was removed. */ > > + bool (*plug_port_removed)(const char *lport_name, > > + const struct smap *lport_options); > > +}; > > We should probably have a struct to keep the values the plugging > library needs to do its processing instead of multiple arguments, > something like: > > struct plug_port_ctx { > bool dpdk_initialized; /* from local Open_vSwitch table */ > const char *datapath_type; /* from local integration bridge */ > /* alternatively the above two could be collapsed into a single > * dpdk_enabled / use_dpdk bool */ > const char *lport_name; > const struct smap *lport_options; > }; > > -- > Frode Nordahl > > > With that the callbacks could be enabled and the infrastructure tested > > in core OVN, and we could host the platform dependent bits externally > > which would be enabled by the plug implementation calling a register > > function like with the OVS dpif framework. > > > > -- > > Frode Nordahl > > > > > > > > > > > > > > > 3. An external entity (similar to your proposed pluggable.c) > > > > > > > running > > > > > > > on the chassis, would have built an shash of "devlink_ports" > > > > > > > using > > > > > > > the devlink sockets. > > > > > > > > > > > > > > 4. This entity will also be monitoring the ovs interfaces. When > > > > > > > it > > > > > > > sees a new interface with the proper hints, will set the > > > > > > > external_ids:iface-id=<lport_name> > > > > > > > > > > > > I think setting iface-id (if still necessary) shall be done together > > > > > with the port creation, by the plugging module (through a callback). > > > > > > > > +1 > > > > > > > > > > > 5. ovn-controller when it sees the external_ids:iface-id, will > > > > > > > bind > > > > > > > the port and add flows. > > > > > > > > > > > > > > > > > > > > > With (2) we now add the support in ovn-controller to create OVS > > > > > > > ports, > > > > > > > but at the same time OVN will not bother about the internal > > > > > > > details > > > > > > > of SMART nics > > > > > > > or bother about the devlink sockets. > > > > > > > > > > > > > > I'm not sure if the above approach seems hacky. The external > > > > > > > entity > > > > > > > could be called "ovn-vif". > > > > > > > > > > > > > > What do you think ? > > > > > > > > > > > > This is an interesting idea and it could work if we figure out a way > > > > > > for ovn-controller and ovn-vif to exchange information before the > > > > > > port/interface is created. > > > > > > > > > > > > > > > > I'd really prefer not to introduce such extra IPC. Would the > > > > > plug_type mechanism be generic enough and avoid such external > > > > > entities? > > > > > > > > I agree, optionally pulling the code in at compile time would make > > > > things much simpler and more effective. > > > > > > > > -- > > > > Frode Nordahl > > > > > > > > > Thanks, > > > > > Han > > > > > > > > > > > > > > > > > > > > > > Perhaps there can be a separate project called - ovn-vif > > > > > > > > > which does this ? > > > > > > > > > > > > > > > > Creating a separate ovn-org project called ovn-vif would allow > > > > > > > > us to > > > > > > > > move forward, we would of course have to document the optional > > > > > > > > CMS API > > > > > > > > in core OVN pointing to an optional external dependency for > > > > > > > > their use. > > > > > > > > Would this be an acceptable compromise? > > > > > > > > > > > > > > I personally do not have any objections to that. Maybe others > > > > > > > have > > > > > > > different opinions. > > > > > > > > > > > > Great. > > > > > > > > > > > > Cheers! > > > > > > > > > > > > -- > > > > > > Frode Nordahl > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > Numan > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Frode Nordahl > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > Numan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Frode Nordahl > > > > > > > > > > > > > > > > > > > > > Best regards, Ilya Maximets. > > > > > > > > > > _______________________________________________ > > > > > > > > > > dev mailing list > > > > > > > > > > [email protected] > > > > > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > dev mailing list > > > > > > > > [email protected] > > > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Frode Nordahl > > > > > > _______________________________________________ > > > > > > dev mailing list > > > > > > [email protected] > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
