Hi Eric and all,
    With recent discussions [1], we have convergence on how Power and other architectures can use Cyborg. Before I update the spec [2], I am setting down some key aspects of the updates, so that we are all aligned.

The accelerator - instance attachment has two parts:

 * The connection between the accelerator and a host-visible attach
   handle, such as a PCI function or a mediated device UUID. We call
   this the Device Half of the attach.
 * The connection between the attach handle and the instance. We name
   this the Instance Half of the attach.

I propose two different extensibility mechanisms:

 * Cyborg drivers deal with device-specific aspects, including
   discovery/enumeration of devices and handling the Device Half of the
   attach (preparing devices/accelerators for attach to an instance,
   post-attach cleanup (if any) after successful attach, releasing
   device/accelerator resources on instance termination or failed
   attach, etc.)
 * os-acc plugins deal with hypervisor/system/architecture-specific
   aspects, including handling the Instance Half of the attach (e.g.
   for libvirt with PCI, preparing the XML snippet to be included in
   the domain XML).

When invoked by Nova compute to attach accelerator(s) to an instance, os-acc would call the Cyborg driver to prepare a VAN (Virtual Accelerator Nexus, which is a handle object for attaching an accelerator to an instance, similar to VIFs for networking). Such preparation may involve configuring the device in some way, including programming for FPGAs. This sets up a VAN object with the necessary data for the attach (e.g. PCI VF, Power DRC index, etc.). Then the os-acc would call a plugin to do the needful for that hypervisor, using that VAN. Finally the os-acc may call the Cyborg driver again to do any post-attach cleanup, if needed.

A more detailed workflow is here: https://docs.google.com/drawings/d/1cX06edia_Pr7P5nOB08VsSMsgznyrz4Yy2u8nb596sU/edit?usp=sharing

Thus, the drivers and plugins are expected to be complementary. For example, for 2 devices of types T1 and T2, there shall be 2 separate Cyborg drivers. Further, we would have separate plugins for, say, x86+KVM systems and Power systems. We could then have four different deployments -- T1 on x86+KVM, T2 on x86+KVM, T1 on Power, T2 on Power -- by suitable combinations of the drivers and plugins.

It is possible that there may be scenarios where the separation of roles between the plugins and the drivers are not so clear-cut. That can be addressed by allowing the plugins to call into Cyborg drivers in the future and/or by other mechanisms.

One secondary detail to note is that Nova compute calls os-acc per instance for all accelerators for that instance, not once for each accelerator. There are two reasons for that:

 * I think this is how Nova deals with os-vif [3].
 * If some accelerators got allocated/configured, and the next
   accelerator configuration fails, a rollback needs to be done. This
   is better done in os-acc than Nova compute.

Cyborg drivers are invoked both by the Cyborg agent (for discovery/enumeration) and by os-acc (for instance attach). Both shall use Stevedore to locate and load the drivers. A single Python module may implement both sets of interfaces, like this:

+--------------+         +-------+
| Nova Compute |         |Cyborg |
+----+---------+         |Agent  |
     |                   +---+---+
+----v---+                   |
| os-acc |                   |
+----+---+                   |
     |                       |
     |     Cyborg driver     |
+----v----------------+------v-----------+
|UN/PLUG ACCELERATORS |  DISCOVER        |
|FROM INSTANCES       |  ACCELERATORS    |
|                     |                  |
|* can_handle()       |  * get_devices() |
|* prepareVAN()       |                  |
|* postplug()         |                  |
|* unprepareVAN()     |                  |
+---------------------+------------------+

If there are no objections to the above, I will update the spec [2].

[1] http://eavesdrop.openstack.org/irclogs/%23openstack-cyborg/%23openstack-cyborg.2018-07-30.log.html#t2018-07-30T16:25:41-2
[2] https://review.openstack.org/#/c/577438/
[3] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1529

Regards,
Sundar
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to