As we discussed in the IRC meeting on Thu, July 15th I'm working on a
new RFC patch series which incorporates the discussion so far and I'll
propose that to the list when done.

For anyone following along I'll be staging it over here:
https://github.com/fnordahl/ovn/commits/pluggingv2

-- 
Frode Nordahl

On Tue, Jul 13, 2021 at 10:47 AM Frode Nordahl
<[email protected]> wrote:
>
> On Fri, Jul 9, 2021 at 9:04 AM Frode Nordahl
> <[email protected]> wrote:
> >
> > On Thu, Jul 8, 2021 at 11:58 AM Frode Nordahl
> > <[email protected]> wrote:
> > >
> > > On Wed, Jul 7, 2021 at 8:30 AM Frode Nordahl
> > > <[email protected]> wrote:
> > > >
> > > > On Tue, Jul 6, 2021 at 8:11 PM Han Zhou <[email protected]> wrote:
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Jul 6, 2021 at 1:19 AM Frode Nordahl 
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > On Mon, Jul 5, 2021 at 6:57 PM Numan Siddique <[email protected]> 
> > > > > > wrote:
> > > > > > >
> > > > > > > On Mon, Jul 5, 2021 at 12:12 PM Frode Nordahl
> > > > > > > <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Wed, Jun 30, 2021 at 12:32 AM Numan Siddique 
> > > > > > > > <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Jun 10, 2021 at 10:13 AM Frode Nordahl
> > > > > > > > > <[email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > On Thu, Jun 10, 2021 at 1:46 PM Ilya Maximets 
> > > > > > > > > > <[email protected]> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On 6/10/21 8:36 AM, Han Zhou wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, May 13, 2021 at 9:25 AM Frode Nordahl 
> > > > > > > > > > > > <[email protected] 
> > > > > > > > > > > > <mailto:[email protected]>> wrote:
> > > > > > > > > > > >>
> > > > > > > > > > > >> On Thu, May 13, 2021 at 5:12 PM Ilya Maximets 
> > > > > > > > > > > >> <[email protected] <mailto:[email protected]>> wrote:
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > On 5/9/21 4:03 PM, Frode Nordahl wrote:
> > > > > > > > > > > >> > > Introduce plugging module that adds and removes 
> > > > > > > > > > > >> > > ports on the
> > > > > > > > > > > >> > > integration bridge, as directed by Port_Binding 
> > > > > > > > > > > >> > > options.
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > Traditionally it has been the CMSs responsibility 
> > > > > > > > > > > >> > > to create Virtual
> > > > > > > > > > > >> > > Interfaces (VIFs) as part of instance (Container, 
> > > > > > > > > > > >> > > Pod, Virtual
> > > > > > > > > > > >> > > Machine etc.) life cycle, and subsequently manage 
> > > > > > > > > > > >> > > plug/unplug
> > > > > > > > > > > >> > > operations on the Open vSwitch integration bridge.
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > With the advent of NICs connected to multiple 
> > > > > > > > > > > >> > > distinct CPUs we can
> > > > > > > > > > > >> > > have a topology where the instance runs on one 
> > > > > > > > > > > >> > > host and Open
> > > > > > > > > > > >> > > vSwitch and OVN runs on a different host, the 
> > > > > > > > > > > >> > > smartnic CPU.
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > The act of plugging and unplugging the representor 
> > > > > > > > > > > >> > > port in Open
> > > > > > > > > > > >> > > vSwitch running on the smartnic host CPU would be 
> > > > > > > > > > > >> > > the same for
> > > > > > > > > > > >> > > every smartnic variant (thanks to the 
> > > > > > > > > > > >> > > devlink-port[0][1]
> > > > > > > > > > > >> > > infrastructure) and every CMS (Kubernetes, LXD, 
> > > > > > > > > > > >> > > OpenStack, etc.).
> > > > > > > > > > > >> > > As such it is natural to extend OVN to provide 
> > > > > > > > > > > >> > > this common
> > > > > > > > > > > >> > > functionality through its CMS facing API.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Hi, Frode.  Thanks for putting this together, but it 
> > > > > > > > > > > >> > doesn't look
> > > > > > > > > > > >> > natural to me.  OVN, AFAIK, never touched physical 
> > > > > > > > > > > >> > devices or
> > > > > > > > > > > >> > interacted with the kernel directly.  This change 
> > > > > > > > > > > >> > introduces completely
> > > > > > > > > > > >> > new functionality inside OVN.  With the same effect 
> > > > > > > > > > > >> > we can run a fully
> > > > > > > > > > > >> > separate service on these smartnic CPUs that will do 
> > > > > > > > > > > >> > plugging
> > > > > > > > > > > >> > and configuration job for CMS.  You may even make it 
> > > > > > > > > > > >> > independent
> > > > > > > > > > > >> > from a particular CMS by creating a REST API for it 
> > > > > > > > > > > >> > or whatever.
> > > > > > > > > > > >> > This will additionally allow using same service for 
> > > > > > > > > > > >> > non-OVN setups.
> > > > > > > > > > > >>
> > > > > > > > > > > >> Ilya,
> > > > > > > > > > > >>
> > > > > > > > > > > >> Thank you for taking the time to comment, much 
> > > > > > > > > > > >> appreciated.
> > > > > > > > > > > >>
> > > > > > > > > > > >> Yes, this is new functionality, NICs with separate 
> > > > > > > > > > > >> control plane CPUs
> > > > > > > > > > > >> and isolation from the host are also new, so this is 
> > > > > > > > > > > >> one proposal for
> > > > > > > > > > > >> how we could go about to enable the use of them.
> > > > > > > > > > > >>
> > > > > > > > > > > >> The OVN controller does today get pretty close to the 
> > > > > > > > > > > >> physical realm
> > > > > > > > > > > >> by maintaining patch ports in Open vSwitch based on 
> > > > > > > > > > > >> bridge mapping
> > > > > > > > > > > >> configuration and presence of bridges to physical 
> > > > > > > > > > > >> interfaces. It also
> > > > > > > > > > > >> does react to events of physical interfaces being 
> > > > > > > > > > > >> plugged into the
> > > > > > > > > > > >> Open vSwitch instance it manages, albeit to date some 
> > > > > > > > > > > >> other entity has
> > > > > > > > > > > >> been doing the act of adding the port into the bridge.
> > > > > > > > > > > >>
> > > > > > > > > > > >> The rationale for proposing to use the OVN database 
> > > > > > > > > > > >> for coordinating
> > > > > > > > > > > >> this is that the information about which ports to 
> > > > > > > > > > > >> bind, and where to
> > > > > > > > > > > >> bind them is already there. The timing of the 
> > > > > > > > > > > >> information flow from
> > > > > > > > > > > >> the CMS is also suitable for the task.
> > > > > > > > > > > >>
> > > > > > > > > > > >> OVN relies on OVS library code, and all the necessary 
> > > > > > > > > > > >> libraries for
> > > > > > > > > > > >> interfacing with the kernel through netlink and 
> > > > > > > > > > > >> friends are there or
> > > > > > > > > > > >> would be easy to add. The rationale for using the 
> > > > > > > > > > > >> netlink-devlink
> > > > > > > > > > > >> interface is that it provides a generic infrastructure 
> > > > > > > > > > > >> for these types
> > > > > > > > > > > >> of NICs. So by using this interface we should be able 
> > > > > > > > > > > >> to support most
> > > > > > > > > > > >> if not all of the variants of these cards.
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >> Providing a separate OVN service to do the task could 
> > > > > > > > > > > >> work, but would
> > > > > > > > > > > >> have the cost of an extra SB DB connection, IDL and 
> > > > > > > > > > > >> monitors.
> > > > > > > > > > >
> > > > > > > > > > > IMHO, CMS should never connect to Southbound DB.  It's 
> > > > > > > > > > > just because the
> > > > > > > > > > > Southbound DB is not meant to be a public interface, it 
> > > > > > > > > > > just happened
> > > > > > > > > > > to be available for connections.  I know that OpenStack 
> > > > > > > > > > > has metadata
> > > > > > > > > > > agents that connects to Sb DB, but if it's really 
> > > > > > > > > > > required for them, I
> > > > > > > > > > > think, there should be a different way to get/set 
> > > > > > > > > > > required information
> > > > > > > > > > > without connection to the Southbound.
> > > > > > > > > >
> > > > > > > > > > The CMS-facing API is the Northbound DB, I was not 
> > > > > > > > > > suggesting direct
> > > > > > > > > > use of the Southbound DB by external to OVN services. My 
> > > > > > > > > > suggestion
> > > > > > > > > > was to have a separate OVN process do this if your 
> > > > > > > > > > objection was to
> > > > > > > > > > handle it as part of the ovn-controller process.
> > > > > > > > > >
> > > > > > > > > > > >>
> > > > > > > > > > > >> I fear it would be quite hard to build a whole 
> > > > > > > > > > > >> separate project with
> > > > > > > > > > > >> its own API, feels like a lot of duplicated effort 
> > > > > > > > > > > >> when the flow of
> > > > > > > > > > > >> data and APIs in OVN already align so well with CMSs 
> > > > > > > > > > > >> interested in
> > > > > > > > > > > >> using this?
> > > > > > > > > > > >>
> > > > > > > > > > > >> > Interactions with physical devices also makes OVN 
> > > > > > > > > > > >> > linux-dependent
> > > > > > > > > > > >> > at least for this use case, IIUC.
> > > > > > > > > > > >>
> > > > > > > > > > > >> This specific bit would be linux-specific in the first 
> > > > > > > > > > > >> iteration, yes.
> > > > > > > > > > > >> But the vendors manufacturing and distributing the 
> > > > > > > > > > > >> hardware do often
> > > > > > > > > > > >> have drivers for other platforms, I am sure the 
> > > > > > > > > > > >> necessary
> > > > > > > > > > > >> infrastructure will become available there too over 
> > > > > > > > > > > >> time, if it is not
> > > > > > > > > > > >> there already.
> > > > > > > > > > > >>
> > > > > > > > > > > >> We do currently have platform specific macros in the 
> > > > > > > > > > > >> OVN build system,
> > > > > > > > > > > >> so we could enable the functionality when built on a 
> > > > > > > > > > > >> compatible
> > > > > > > > > > > >> platform.
> > > > > > > > > > > >>
> > > > > > > > > > > >> > Maybe, others has different opinions.
> > > > > > > > > > > >>
> > > > > > > > > > > >> I appreciate your opinion, and enjoy discussing this 
> > > > > > > > > > > >> topic.
> > > > > > > > > > > >>
> > > > > > > > > > > >> > Another though is that there is, obviously, a 
> > > > > > > > > > > >> > network connection
> > > > > > > > > > > >> > between the host and smartnic system.  Maybe it's 
> > > > > > > > > > > >> > possible to just
> > > > > > > > > > > >> > add an extra remote to the local ovsdb-server so CMS 
> > > > > > > > > > > >> > daemon on the
> > > > > > > > > > > >> > host system could just add interfaces over the 
> > > > > > > > > > > >> > network connection?
> > > > > > > > > > > >>
> > > > > > > > > > > >> There are a few issues with such an approach. One of 
> > > > > > > > > > > >> the main goals
> > > > > > > > > > > >> with providing and using a NIC with control plane CPUs 
> > > > > > > > > > > >> is having an
> > > > > > > > > > > >> extra layer of security and isolation which is 
> > > > > > > > > > > >> separate from the
> > > > > > > > > > > >> hypervisor host the card happens to share a PCI 
> > > > > > > > > > > >> complex with and draw
> > > > > > > > > > > >> power from. Requiring a connection between the two for 
> > > > > > > > > > > >> operation would
> > > > > > > > > > > >> defy this purpose.
> > > > > > > > > > > >>
> > > > > > > > > > > >> In addition to that, this class of cards provide 
> > > > > > > > > > > >> visibility into
> > > > > > > > > > > >> kernel interfaces, enumeration of representor ports 
> > > > > > > > > > > >> etc. only from the
> > > > > > > > > > > >> NIC control plane CPU side of the PCI complex, this 
> > > > > > > > > > > >> information is not
> > > > > > > > > > > >> provided to the host. So if a hypervisor host CMS 
> > > > > > > > > > > >> agent were to do the
> > > > > > > > > > > >> plugging through a remote ovsdb connection, it would 
> > > > > > > > > > > >> have to
> > > > > > > > > > > >> communicate with something else running on the NIC 
> > > > > > > > > > > >> control plane CPU
> > > > > > > > > > > >> to retrieve the information it needs before it can 
> > > > > > > > > > > >> know what to relay
> > > > > > > > > > > >> back over the ovsdb connection.
> > > > > > > > > > > >>
> > > > > > > > > > > >> --
> > > > > > > > > > > >> Frode Nordahl
> > > > > > > > > > > >>
> > > > > > > > > > > >> > Best regards, Ilya Maximets.
> > > > > > > > > > > >
> > > > > > > > > > > > Here are my 2 cents.
> > > > > > > > > > > >
> > > > > > > > > > > > Initially I had similar concerns to Ilya, and it seems 
> > > > > > > > > > > > OVN should stay away from the physical interface 
> > > > > > > > > > > > plugging. As a reference, here is how ovn-kubernetes is 
> > > > > > > > > > > > doing it without adding anything to OVN: 
> > > > > > > > > > > > https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing
> > > > > > > > > > > >  
> > > > > > > > > > > > <https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing>
> > > > > > > > > > >
> > > > > > > > > > > AFAICT, a big part of the work is already done on the 
> > > > > > > > > > > ovn-k8s side:
> > > > > > > > > > >   https://github.com/ovn-org/ovn-kubernetes/pull/2005
> > > > > > > > > > >   https://github.com/ovn-org/ovn-kubernetes/pull/2042
> > > > > > > > > >
> > > > > > > > > > I am aware of the on-going effort to implement support for 
> > > > > > > > > > this in
> > > > > > > > > > ovn-kubernetes directly. What we have identified is that 
> > > > > > > > > > there are
> > > > > > > > > > other CMSs that want this functionality, and with that we 
> > > > > > > > > > have an
> > > > > > > > > > opportunity to generalise and provide an abstraction in a 
> > > > > > > > > > common place
> > > > > > > > > > that all the consuming CMSes can benefit from.
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > However, thinking more about it, the proposed approach 
> > > > > > > > > > > > in this patch just expands the way how OVN can bind 
> > > > > > > > > > > > ports, utilizing the communication channel of OVN 
> > > > > > > > > > > > (OVSDB connections). If all the information regarding 
> > > > > > > > > > > > port binding can be specified by the CMS from NB, then 
> > > > > > > > > > > > it is not unnatural for ovn-controller to perform 
> > > > > > > > > > > > interface binding directly (instead of passively 
> > > > > > > > > > > > accepting what is attached by CMS). This kind of 
> > > > > > > > > > > > information already existed to some extent - the 
> > > > > > > > > > > > "requested_chassis" option in OpenStack. Now it seems 
> > > > > > > > > > > > this idea is just expanding it to a specific interface. 
> > > > > > > > > > > > The difference is that "requested_chassis" is used for 
> > > > > > > > > > > > validation only, but now we want to directly apply it. 
> > > > > > > > > > > > So I think at least I don't have a strong opinion on 
> > > > > > > > > > > > the idea.
> > > > > > > > > > >
>
> > > > > > > > > > > While it's, probably, OK for OVN to add port to the 
> > > > > > > > > > > OVSDB, in many
> > > > > > > > > > > cases these ports will require a lot of extra 
> > > > > > > > > > > configuration which
> > > > > > > > > > > is typically done by os-vif or CNI/device plugins.  
> > > > > > > > > > > Imagine that OVS
> > > > > > > > > > > is running with userspace datapath and you need to 
> > > > > > > > > > > plug-in some DPDK
> > > > > > > > > > > ports, where you have to specify the port type, DPDK port 
> > > > > > > > > > > config,
> > > > > > > > > > > porbably also number of rx/tx queues, number of 
> > > > > > > > > > > descriptors in these
> > > > > > > > > > > queues.  You may also want to configure affinity for 
> > > > > > > > > > > these queues per
> > > > > > > > > > > PMD thread in OVS.  For kernel interfaces it might be 
> > > > > > > > > > > easier, but they
> > > > > > > > > > > also might require some extra configuration that OVN will 
> > > > > > > > > > > have to
> > > > > > > > > > > think about now.  This is a typical job for CMS to 
> > > > > > > > > > > configure this
> > > > > > > > > > > kind of stuff, and that is why projects like os-vif or 
> > > > > > > > > > > large variety
> > > > > > > > > > > of CNI/device plugins exists.
> > > > > > > > > >
> > > > > > > > > > CNI is Kubernetes specific, os-vif is OpenStack specific. 
> > > > > > > > > > And both of
> > > > > > > > > > them get their information from the CMS. Providing support 
> > > > > > > > > > for
> > > > > > > > > > userspace datapath and DPDK would require more information, 
> > > > > > > > > > some of
> > > > > > > > > > which is available through devlink, some fit well in 
> > > > > > > > > > key/value
> > > > > > > > > > options. Our initial target would be to support the kernel 
> > > > > > > > > > representor
> > > > > > > > > > port workflow.
>
> Circling back to the DPDK argument, I had a look at what this would
> look like in practice. For the DPDK representor port workflow, the
> information required from CMS would be which PF and a logical VF
> number. In the current proposal the CMS provides a MAC address of PF
> and a VF number, with that information the plugging library can look
> up a PCI address and fill the `options:dpdk-devargs` for example
> '0000:03:00.0,representor=[0]'.
>
> When OVN is running on a SmartNIC, there will be no vHost sockets as
> the workload would be running on a different host, at the other end of
> the host facing PF+VF pair.
>
> As for other tuning parameters such as queues, descriptors and queue
> affinity, it appears to me CMS systems are generally unaware of this
> tuning today and that blanket configuration is applied through other
> means. When OVN is running on a SmartNIC any queue affinity options
> would be relative to control plane CPU cores and have no relation to
> which host CPU cores the workload is running on at the other end of
> the host facing PF+VF pair. We could choose to replicate this behavior
> by having the plugging library have its own defaults, possibly
> influenced by local per chassis configuration and keep this out of the
> CMS API.
>
> Does this address your concerns wrt. DPDK support or do you have other
> examples of configuration we would need to consider?
>
> > > > > > > > > > > >
> > > > > > > > > > > > There are some benefits:
> > > > > > > > > > > > 1) The mechanism can be reused by different CMSes, 
> > > > > > > > > > > > which may simplify CMS implementation.
> > > > > > > > > > > > 2) Compared with the ovn-k8s approach, it reuses OVN's 
> > > > > > > > > > > > communication channel, which avoids an extra CMS 
> > > > > > > > > > > > communication channel on the smart NIC side. (of course 
> > > > > > > > > > > > this can be achieved by a connection between the BM and 
> > > > > > > > > > > > smart NIC with *restricted* API just to convey the 
> > > > > > > > > > > > necessary information)
> > > > > > > > > > >
> > > > > > > > > > > The problem I see is that at least ovn-k8s, AFAIU, will 
> > > > > > > > > > > require
> > > > > > > > > > > the daemon on the Smart NIC anyways to monitor and 
> > > > > > > > > > > configure
> > > > > > > > > > > OVN components, e.g. to configure ovn-remote for 
> > > > > > > > > > > ovn-controller
> > > > > > > > > > > or run management appctl commands if required.
> > > > > > > > > > > So, I don't see the point why it can't do plugging if 
> > > > > > > > > > > it's already
> > > > > > > > > > > there.
> > > > > > > > > >
> > > > > > > > > > This pattern is Kubernetes specific and is not the case for 
> > > > > > > > > > other
> > > > > > > > > > CMSes. The current proposal for enabling Smart NIC with 
> > > > > > > > > > control plane
> > > > > > > > > > CPUs for ovn-kubernetes could be simplified if the 
> > > > > > > > > > networking platform
> > > > > > > > > > provided means for coordinating more of the network related 
> > > > > > > > > > bits.
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > As to the negative side, it would increase OVN's 
> > > > > > > > > > > > complexity, and as mentioned by Ilya potentially breaks 
> > > > > > > > > > > > OVN's platform independence. To avoid this, I think the 
> > > > > > > > > > > > *plugging* module itself needs to be independent and 
> > > > > > > > > > > > pluggable. It can be extended as independent plugins. 
> > > > > > > > > > > > The plugin would need to define what information is 
> > > > > > > > > > > > needed in LSP's "options", and then implement 
> > > > > > > > > > > > corresponding drivers. With this approach, even the 
> > > > > > > > > > > > regular VIFs can be attached by ovn-controller if CMS 
> > > > > > > > > > > > can tell the interface name. Anyway, this is just my 
> > > > > > > > > > > > brief thinking.
> > > > > > > > > > >
> > > > > > > > > > > Aside from plugging,
> > > > > > > > > > > I also don't see the reason to have devlink code in OVN 
> > > > > > > > > > > just
> > > > > > > > > > > because it runs once on the init stage, IIUC.  So, I don't
> > > > > > > > > > > understand why this information about the hardware cannot 
> > > > > > > > > > > be
> > > > > > > > > > > retrieved during the host provisioning and stored 
> > > > > > > > > > > somewhere
> > > > > > > > > > > (Nova config?).  Isn't hardware inventory a job for 
> > > > > > > > > > > tripleo
> > > > > > > > > > > or something?
> > > > > > > > > >
> > > > > > > > > > As noted in one of the TODOs in the commit message of the 
> > > > > > > > > > RFC one of
> > > > > > > > > > the work items to further develop this is to monitor 
> > > > > > > > > > devlink for
> > > > > > > > > > changes, the end product will not do one-time 
> > > > > > > > > > initialization.
> > > > > > > > > >
> > > > > > > > > > While I agree with you that the current SR-IOV acceleration 
> > > > > > > > > > workflow
> > > > > > > > > > configuration is pretty static and can be done at deploy 
> > > > > > > > > > time, this
> > > > > > > > > > proposal prepares for the next generation subfunction 
> > > > > > > > > > workflow where
> > > > > > > > > > you will have a much higher density, and run-time 
> > > > > > > > > > configuration and
> > > > > > > > > > discovery of representor ports. There is a paper about it 
> > > > > > > > > > from netdev
> > > > > > > > > > conf[2], and all of this is part of the devlink 
> > > > > > > > > > infrastructure (Linux
> > > > > > > > > > kernel ABI).
> > > > > > > > > >
> > > > > > > > > > 2: 
> > > > > > > > > > https://netdevconf.info/0x14/pub/papers/45/0x14-paper45-talk-paper.pdf
> > > > > > > >
> > > > > > > > > My 2 cents.
> > > > > > > >
> > > > > > > > Thank you for weighing in with your opinion, Numan.
> > > > > > > >
> > > > > > > > > I agree with Ilya.  It doesn't seem natural to me for OVN to 
> > > > > > > > > create
> > > > > > > > > OVS ports and also
> > > > > > > > > to use devlink infrastructure.  I think an external entity 
> > > > > > > > > can do all these.
> > > > > > > >
> > > > > > > > No doubt an external entity could do this, the challenge is 
> > > > > > > > that due
> > > > > > > > to how the SmartNIC with separate control plane CPUs are laid 
> > > > > > > > out
> > > > > > > > existing CMS tooling cannot do it, and we are working out a way 
> > > > > > > > to
> > > > > > > > efficiently and securely coordinate how and when this external 
> > > > > > > > entity
> > > > > > > > does the plugging. At the end of the day the data being passed 
> > > > > > > > around
> > > > > > > > is networking related, and I believe it is possible to create a
> > > > > > > > solution that would be consumable by all OVN-enabled CMS's. 
> > > > > > > > We'd be
> > > > > > > > happy to prove this out outside of the ovn-controller first if 
> > > > > > > > that is
> > > > > > > > more palatable, see below.
> > > > > > > >
> > > > > > > > > The other downside I see with this RFC is that it will be now
> > > > > > > > > monitoring all the port binding
> > > > > > > > > rows without any condition because when a port binding is 
> > > > > > > > > created the
> > > > > > > > > chassis column will
> > > > > > > > > be NULL.  Correct me If I'm wrong here.
> > > > > > > >
> > > > > > > > The intention of the condition is to consider unbound ports for
> > > > > > > > plugging, as soon as they have been plugged and bound there 
> > > > > > > > will be a
> > > > > > > > value in the chassis column and at that point only the 
> > > > > > > > respective
> > > > > > > > chassis will monitor them. I guess this could be a problem if 
> > > > > > > > there is
> > > > > > > > a large number of unbound ports in a deployment, from your 
> > > > > > > > experience
> > > > > > > > would that be the case? (When I think of it I do remember the 
> > > > > > > > load
> > > > > > > > balancer vip perhaps being a consumer of unbound ports).
> > > > > > > >
> > > > > > > > We considered using a separate LSP type, but apart from the 
> > > > > > > > plugging
> > > > > > > > bit these LSP's will behave just like regular VIFs from OVN Port
> > > > > > > > Binding code POV, and if all your hypervisors have SmartNICs 
> > > > > > > > with
> > > > > > > > control plane CPUs there would be just as many of those ports.
> > > >
> > > >
> > > > > > > In my opinion having a separate type would be better.  At the 
> > > > > > > least it
> > > > > > > will not impact
> > > > > > > normal or existing deployments.
> > > > > > >
> > > > > > > Let's say we call a type as "representor"  (just picking a name 
> > > > > > > for
> > > > > > > now), then each ovn-controller
> > > > > > > can add a conditional monitor to get updates for these types (just
> > > > > > > like how we do now for patch or external ports).
> > > > > > >
> > >
> > >
> > > > > > > What do you think of the below approach
> > > > > > >
> > > > > > > 1.  A new LSP type - representor is added.  CMS when creating this
> > > > > > > lport also sets the requested-chassis option.
> > > > > >
> > > > > > Works for me.
> > > > > >
> > > > > I think it is better not to add a new LSP type for this purpose. It 
> > > > > would be better to make this "interface plugging" feature generic, so 
> > > > > that it can work for other types as well. It could be supported in 
> > > > > the future for a regular VIF, or for a new LSP type, so I think this 
> > > > > behavior is better not to be tied to a LSP type.
> > > > >
> > > > > We could add a new column in the Port_Binding table: plugged_by. 
> > > > > ovn-controller can monitor all port_bindings that have "plugged_by" 
> > > > > equals to current chassis. This way it should be backward compatible 
> > > > > and not affecting the use cases that doesn't need "plugged_by".
> > > >
> > > > That is a good idea, during development of the RFC I did make an
> > > > attempt at creating an IDL index for the options:requested-chassis,
> > > > but that does not work because it would only match rows with
> > > > requested-chassis as the *only* option.
> > > >
> > > > Creating a new column would resolve this issue!
> > >
> > > Looking into how this bit could be implemented, I see that northd is
> > > already performing lookup of chassis by name in several places, so I'm
> > > thinking we could accomplish this without any change to NB schema.
> > >
> > > With the following change to SB schema, we could have northd check for
> > > presence of Logical_Switch_Port options:plug_type and
> > > options:requested-chassis, if both are there look up chassis UUID by
> > > name in options:requested-chassis and insert into SB Port_Binding
> > > plugged_by column.
> > >
> > > diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
> > > index bbf60781d..0dee8e155 100644
> > > --- a/ovn-sb.ovsschema
> > > +++ b/ovn-sb.ovsschema
> > > @@ -229,7 +229,11 @@
> > >                  "external_ids": {"type": {"key": "string",
> > >                                   "value": "string",
> > >                                   "min": 0,
> > > -                                 "max": "unlimited"}}},
> > > +                                 "max": "unlimited"}},
> > > +                "plugged_by": {"type": {"key": {"type": "uuid",
> > > +                                                "refTable": "Chassis",
> > > +                                                "refType": "weak"},
> > > +                                        "min": 0, "max": 1}}},
> > >              "indexes": [["datapath", "tunnel_key"], ["logical_port"]],
> > >              "isRoot": true},
> > >          "MAC_Binding": {
> > >
> > > What do you think?
> > >
> > > --
> > > Frode Nordahl
> > >
> > > > > The details of plugging information can be stored in the options 
> > > > > column with new options defined, such as: "plug_type". For 
> > > > > representor port the "plug_type" can be defined as "representor". For 
> > > > > each plug_type there can be type-specific information defined and 
> > > > > also stored in the options column. (Or we can define a new column 
> > > > > "plug_options" just for plugging information).
> > > >
> > > > A convention to prefix all plugging related options the same way,
> > > > let's say: 'plug_', would work as well as a separate column, but I
> > > > have nothing against a separate column if there is appetite for that.
> > > >
> > > > > > > 2. ovn-controller on the requested chassis sees this lport type,
> > > > > > > creates an OVS port/interface and possibly sets some options or 
> > > > > > > hints
> > > > > > > in the external_ids column
> > > > > > >  of ovs interface.  At this point, it will not do anything.
> > > > > >
> > > > > > This could potentially help us avoid the issue of adding more
> > > > > > processes and IDLs talking to the SB DB.
> > > > > >
> > > > > > However, at this point in time, the ovn-controller does not know the
> > > > > > details of the representor port, such as name/ifindex for the kernel
> > > > > > representor port, and the interface it creates in OVS has to match 
> > > > > > the
> > > > > > kernel representor port, and afict it is not possible to change 
> > > > > > these
> > > > > > details after port creation. This information is provided by the
> > > > > > devlink lookup in the current proposal.
> > > > > >
> > > > > > It might work if it was possible to create the port and interface
> > > > > > separately, but that would cause a referential integrity violation.
> > > > > > Another alternative could be that ovn-vif created a new interface
> > > > > > record in 4), attaching that to the port record created by
> > > > > > ovn-controller replacing a throw-away interface record, but we would
> > > > > > be approaching hacky territory.
> > > > > >
> > > > > > Could we do some sort of local IPC perhaps through a Unix domain 
> > > > > > socket?
> > > > >
> > > > > I think it would be good to have separate *modules* to handle 
> > > > > interface plugging (one module per plug_type), but there is no need 
> > > > > to have a separate *process* for it, which increases the complexity 
> > > > > and the IPC cost. The modules can be invoked as callbacks registered 
> > > > > for each plug_type, and responsible for explaining the options 
> > > > > defined for its type, which is directly retrieved from the 
> > > > > port_binding record from SB DB. This way the core ovn-controller 
> > > > > components are kept untouched (except calling the plugging modules at 
> > > > > some point, but not worrying about the details), without introducing 
> > > > > a separate process. (A separate process is needed only if there are 
> > > > > performance concerns or blocking I/Os).
> > > > >
> > > > > It is similar to how different dpif providers (or different dev 
> > > > > types) are supported in OVS.
> > > >
> > > > Yes, I like this. The currently proposed devlink informed plugging
> > > > would have no issues with blocking I/O, and we could document a
> > > > "charter" for the ovn-vif project that anything going into it would
> > > > have to operate as if it was native OVN asynchronous code.
> > > >
> >
> > This bit could possibly be implemented with an interface like this
> > (note that this is draft/pseudocode at this point):
> > diff --git a/controller/plug-provider.h b/controller/plug-provider.h
> > new file mode 100644
> > index 000000000..9b7a2e722
> > --- /dev/null
> > +++ b/controller/plug-provider.h
> > @@ -0,0 +1,45 @@
> > +struct plug {
> > +    const struct plug_class *plug_class;
> > +};
> > +
> > +struct plug_iface {
> > +    char *iface_name;
> > +    struct smap iface_external_ids;
> > +    struct smap iface_options;
> > +    struct smap iface_other_config;
> > +};
> > +
> > +struct plug_class {
> > +    /* Type of plugger in this class. */
> > +    const char *type;
> > +
> > +    /* Called when the plug provider is registered, typically at program
> > +     * startup. */
> > +    int (*init)(void);
> > +
> > +    /* Performs periodic work needed by plugger, if any is necessary. */
> > +    bool (*run)(struct plug *plug);
> > +
> > +    /* Called when the plug provider is removed, typically at program 
> > exit. */
> > +    int (*destroy)(struct plug *plug);
> > +
> > +    /* Pass lport name and options to plugger to prepare for port creation.
> > +     *
> > +     * The plug implemantation can perform lookup or any per port
> > +     * initialization and should fill plug_iface with data required for
> > +     * port/interface creation.
> > +     *
> > +     * OVN owns the contents of plug_iface struct and is responsible to 
> > free
> > +     * any data allocated when done with it. */
> > +    bool (*plug_prepare)(const char *lport_name,
> > +                         const struct smap *lport_options,
> > +                         struct plug_iface *);
> > +
> > +    /* Notify plugger that port was created. */
> > +    bool (*plug_port_created)(const char *lport_name,
> > +                              const struct smap *lport_options);
> > +
> > +    /* Notify plugger that port was removed. */
> > +    bool (*plug_port_removed)(const char *lport_name,
> > +                              const struct smap *lport_options);
> > +};
>
> We should probably have a struct to keep the values the plugging
> library needs to do its processing instead of multiple arguments,
> something like:
>
> struct plug_port_ctx {
>     bool dpdk_initialized; /* from local Open_vSwitch table */
>     const char *datapath_type; /* from local integration bridge */
>     /* alternatively the above two could be collapsed into a single
>      * dpdk_enabled / use_dpdk bool */
>     const char *lport_name;
>     const struct smap *lport_options;
> };
>
> --
> Frode Nordahl
>
> > With that the callbacks could be enabled and the infrastructure tested
> > in core OVN, and we could host the platform dependent bits externally
> > which would be enabled by the plug implementation calling a register
> > function like with the OVS dpif framework.
> >
> > --
> > Frode Nordahl
> >
> > > > > >
> > > > > > > 3.  An external entity (similar to your proposed pluggable.c) 
> > > > > > > running
> > > > > > > on the chassis, would have built an shash of  "devlink_ports" 
> > > > > > > using
> > > > > > > the devlink sockets.
> > > > > > >
> > > > > > > 4.  This entity will also be monitoring the ovs interfaces.  When 
> > > > > > > it
> > > > > > > sees a new interface with the proper hints, will set the
> > > > > > > external_ids:iface-id=<lport_name>
> > > > > > >
> > > > > I think setting iface-id (if still necessary) shall be done together 
> > > > > with the port creation, by the plugging module (through a callback).
> > > >
> > > > +1
> > > >
> > > > > > > 5. ovn-controller when it sees the external_ids:iface-id,  will 
> > > > > > > bind
> > > > > > > the port and add flows.
> > > > > > >
> > > > > > >
> > > > > > > With (2) we now add the support in ovn-controller to create OVS 
> > > > > > > ports,
> > > > > > >  but at the same time OVN will not bother about the internal 
> > > > > > > details
> > > > > > > of SMART nics
> > > > > > > or bother about the devlink sockets.
> > > > > > >
> > > > > > > I'm not sure if the above approach seems hacky.  The external 
> > > > > > > entity
> > > > > > > could be called "ovn-vif".
> > > > > > >
> > > > > > > What do you think ?
> > > > > >
> > > > > > This is an interesting idea and it could work if we figure out a way
> > > > > > for ovn-controller and ovn-vif to exchange information before the
> > > > > > port/interface is created.
> > > > > >
> > > > >
> > > > > I'd really prefer not to introduce such extra IPC. Would the 
> > > > > plug_type mechanism be generic enough and avoid such external 
> > > > > entities?
> > > >
> > > > I agree, optionally pulling the code in at compile time would make
> > > > things much simpler and more effective.
> > > >
> > > > --
> > > > Frode Nordahl
> > > >
> > > > > Thanks,
> > > > > Han
> > > > >
> > > > > > > >
> > > > > > > > > Perhaps there can be a separate project called - ovn-vif 
> > > > > > > > > which does this ?
> > > > > > > >
> > > > > > > > Creating a separate ovn-org project called ovn-vif would allow 
> > > > > > > > us to
> > > > > > > > move forward, we would of course have to document the optional 
> > > > > > > > CMS API
> > > > > > > > in core OVN pointing to an optional external dependency for 
> > > > > > > > their use.
> > > > > > > > Would this be an acceptable compromise?
> > > > > > >
> > > > > > > I personally do not have any objections to that.  Maybe others 
> > > > > > > have
> > > > > > > different opinions.
> > > > > >
> > > > > > Great.
> > > > > >
> > > > > > Cheers!
> > > > > >
> > > > > > --
> > > > > > Frode Nordahl
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > > > Numan
> > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Frode Nordahl
> > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > Numan
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Frode Nordahl
> > > > > > > > > >
> > > > > > > > > > > Best regards, Ilya Maximets.
> > > > > > > > > > _______________________________________________
> > > > > > > > > > dev mailing list
> > > > > > > > > > [email protected]
> > > > > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > dev mailing list
> > > > > > > > [email protected]
> > > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Frode Nordahl
> > > > > > _______________________________________________
> > > > > > dev mailing list
> > > > > > [email protected]
> > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to