Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-10-02 Thread Mooney, Sean K


> -Original Message-
> From: Dan Smith [mailto:d...@danplanet.com]
> Sent: Monday, October 2, 2017 3:53 PM
> To: OpenStack Development Mailing List (not for usage questions)
> <openstack-dev@lists.openstack.org>
> Subject: Re: [openstack-dev] vGPUs support for Nova - Implementation
> 
> >> I also think there is value in exposing vGPU in a generic way,
> irrespective of the underlying implementation (whether it is DEMU,
> mdev, SR-IOV or whatever approach Hyper-V/VMWare use).
> >
> > That is a big ask. To start with, all GPUs are not created equal, and
> > various vGPU functionality as designed by the GPU vendors is not
> > consistent, never mind the quirks added between different hypervisor
> > implementations. So I feel like trying to expose this in a generic
> > manner is, at least asking for problems, and more likely bound for
> > failure.
> 
> I feel the opposite. IMHO, Nova’s role in life is not to expose all the
> quirks of the underlying platform, but rather to provide a useful
> abstraction on top of those things. In spite of them.
[Mooney, Sean K] I have to agree with dan here.
vGPUs are a great example of where nova can add value by abstracting
the hypervisor specifics and provide a abstract api to allow requesting
vGPUS without having to encode the semantics of that api provide by the
hypervisor or hardware vendor in what we expose to the tenant.
> 
> > Nova already exposes plenty of hypervisor-specific functionality (or
> > functionality only implemented for one hypervisor), and that's fine.
> 
> And those bits of functionality are some of the most problematic we
> have. Among other reasons, they make it difficult for us to expose
> Thing 2.0, when we’ve encoded Thing 1.0 into our API so rigidly. This
> happens even within one virt driver where Thing 2.0 is significantly
> different than Thing 1.0.
> 
> The vGPU stuff seems well-suited for the generic modeling work that
> we’ve spent the last few years working on, and is a perfect example of
> an area where we can avoid piling on more debt to a not-abstract-enough
> “model” and move forward with the new one. That’s certainly my
> preference, and I think it’s actually less work than the debt-ridden
> way.
> 
> -—Dan
[Mooney, Sean K] I also agree that its likely less work to start fresh with 
the correct generic solution now, then try to adapt the pci passthough code
we have today to support vGPUs with out breaking the current
sriov and passthrough support. how vGPUs are virtualized is GPU vendor specific
so even within a single host you may neen to support multiple methods 
(sriov/mdev...) in a single virt dirver. For example a cloud/host with both amd 
and nvidia
Gpus which uses Libvirt would have to support generating the correct xml for 
both solutions.
> 
> 
> 
> ___
> ___
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-
> requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-10-02 Thread Dan Smith
>> I also think there is value in exposing vGPU in a generic way, irrespective 
>> of the underlying implementation (whether it is DEMU, mdev, SR-IOV or 
>> whatever approach Hyper-V/VMWare use).
> 
> That is a big ask. To start with, all GPUs are not created equal, and
> various vGPU functionality as designed by the GPU vendors is not
> consistent, never mind the quirks added between different hypervisor
> implementations. So I feel like trying to expose this in a generic
> manner is, at least asking for problems, and more likely bound for
> failure.

I feel the opposite. IMHO, Nova’s role in life is not to expose all the quirks 
of the underlying platform, but rather to provide a useful abstraction on top 
of those things. In spite of them.

> Nova already exposes plenty of hypervisor-specific functionality (or
> functionality only implemented for one hypervisor), and that's fine.

And those bits of functionality are some of the most problematic we have. Among 
other reasons, they make it difficult for us to expose Thing 2.0, when we’ve 
encoded Thing 1.0 into our API so rigidly. This happens even within one virt 
driver where Thing 2.0 is significantly different than Thing 1.0.

The vGPU stuff seems well-suited for the generic modeling work that we’ve spent 
the last few years working on, and is a perfect example of an area where we can 
avoid piling on more debt to a not-abstract-enough “model” and move forward 
with the new one. That’s certainly my preference, and I think it’s actually 
less work than the debt-ridden way.

-—Dan



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-10-02 Thread Sahid Orentino Ferdjaoui
On Fri, Sep 29, 2017 at 04:51:10PM +, Bob Ball wrote:
> Hi Sahid,
> 
> > > a second device emulator along-side QEMU.  There is no mdev 
> > > integration.  I'm concerned about how much mdev-specific functionality 
> > > would have to be faked up in the XenServer-specific driver for vGPU to 
> > > be used in this way.
> >
> > What you are refering with your DEMU it's what QEMU/KVM have with its 
> > vfio-pci. XenServer is
> > reading through MDEV since the vendors provide drivers on *Linux* using the 
> > MDEV framework.
> > MDEV is a kernel layer, used to expose hardwares, it's not hypervisor 
> > specific.
> 
> It is possible that the vendor's userspace libraries use mdev,
> however DEMU has no concept of mdev at all.  If the vendor's
> userspace libraries do use mdev then this is entirely abstracted
> from XenServer's integration.  While I don't have access to the
> vendors source for the userspace libraries or the kernel module my
> understanding was that the kernel module in XenServer's integration
> is for the userspace libraries to talk to the kernel module and for
> IOCTLS.  My reading of mdev implies that /sys/class/mdev_bus should
> exist for it to be used?  It does not exist in XenServer, which to
> me implies that the vendor's driver for XenServer do not use mdev?

I shared our discussion to Alex Williamson, it's response:

> Hi Sahid,
>
> XenServer does not use mdev for vGPU support.  The mdev/vfio
> infrastructure was developed in response to DEMU used on XenServer,
> which we felt was not an upstream acceptable solution.  There has
> been cursory interest in porting vfio to Xen, so it's possible that
> they might use the same mechanism some day, but for now they are
> different solutions, the vifo/mdev solution being the only one
> accepted upstream so far. Thanks,
>
> Alex

It's my mistake. It seems clear now that XenSever can't take the
benefice of that mdev support I have added in /pci module. The support
of vGPUs for Xen will have to wait for the generic device management I
guess.

>
> Bob
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-10-02 Thread Blair Bethwaite
On 29 September 2017 at 22:26, Bob Ball  wrote:
> The concepts of PCI and SR-IOV are, of course, generic, but I think out of 
> principal we should avoid a hypervisor-specific integration for vGPU (indeed 
> Citrix has been clear from the beginning that the vGPU integration we are 
> proposing is intentionally hypervisor agnostic)

To be fair, what this proposal is doing is piggy-backing on Nova's
existing PCI functionality to expose Linux/KVM VFIO mdev, it just so
happens mdev was created for vGPU, but it was designed to extend to
other devices/things too.

> I also think there is value in exposing vGPU in a generic way, irrespective 
> of the underlying implementation (whether it is DEMU, mdev, SR-IOV or 
> whatever approach Hyper-V/VMWare use).

That is a big ask. To start with, all GPUs are not created equal, and
various vGPU functionality as designed by the GPU vendors is not
consistent, never mind the quirks added between different hypervisor
implementations. So I feel like trying to expose this in a generic
manner is, at least asking for problems, and more likely bound for
failure.

Nova already exposes plenty of hypervisor-specific functionality (or
functionality only implemented for one hypervisor), and that's fine.
Maybe there should be a something in OpenStack that would generically
manage vGPU-graphics and/or vGPU-compute etc, but I'm pretty sure it
would never be allowed into Nova :-).

Anyway, take all that with a grain of salt, because frankly I would
love to see this in sooner rather than later - even if it did have a
big "this might change in non-upgradeable ways" sticker on it.

-- 
Cheers,
~Blairo

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-10-02 Thread Sahid Orentino Ferdjaoui
On Fri, Sep 29, 2017 at 11:16:43AM -0400, Jay Pipes wrote:
> Hi Sahid, comments inline. :)
> 
> On 09/29/2017 04:53 AM, Sahid Orentino Ferdjaoui wrote:
> > On Thu, Sep 28, 2017 at 05:06:16PM -0400, Jay Pipes wrote:
> > > On 09/28/2017 11:37 AM, Sahid Orentino Ferdjaoui wrote:
> > > > Please consider the support of MDEV for the /pci framework which
> > > > provides support for vGPUs [0].
> > > > 
> > > > Accordingly to the discussion [1]
> > > > 
> > > > With this first implementation which could be used as a skeleton for
> > > > implementing PCI Devices in Resource Tracker
> > > 
> > > I'm not entirely sure what you're referring to above as "implementing PCI
> > > devices in Resource Tracker". Could you elaborate? The resource tracker
> > > already embeds a PciManager object that manages PCI devices, as you know.
> > > Perhaps you meant "implement PCI devices as Resource Providers"?
> > 
> > A PciManager? I know that we have a field PCI_DEVICE :) - I guess a
> > virt driver can return inventory with total of PCI devices. Talking
> > about manager, not sure.
> 
> I'm referring to this:
> 
> https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L33
>
> [SNIP]
> 
> It is that piece that Eric and myself have been talking about standardizing
> into a "generic device management" interface that would have an
> update_inventory() method that accepts a ProviderTree object [1]

Jay, all of that looks to me perfectly sane even it's not clear what
you want make so generic. That part of code is for the virt layers and
you can't make it like just considering GPU or NET as a generic piece,
they have characteristic which are requirements for virt layers.

In that method 'update_inventory(provider_tree)' which you are going
to introduce for /pci/PciManager, a first step would be to convert the
objects to a understable dict for the whole logic, right, or do you
have an other plan?

In all cases from my POV I don't see any blocker, both work can
co-exist without any pain. And adding features in the current /pci
module is not going to add heavy work but is going to give to us a
clear view of what is needed.

> [1]
> https://github.com/openstack/nova/blob/master/nova/compute/provider_tree.py
> 
> and would add resource providers corresponding to devices that are made
> available to guests for use.
> 
> > You still have to define "traits", basically for physical network
> > devices, the users want to select device according physical network,
> > to select device according the placement on host (NUMA), to select the
> > device according the bandwidth capability... For GPU it's same
> > story. *And I do not have mentioned devices which support virtual
> > functions.*
> 
> Yes, the generic device manager would be responsible for associating traits
> to the resource providers it adds to the ProviderTree provided to it in the
> update_inventory() call.
> 
> > So that is what you plan to do for this release :) - Reasonably I
> > don't think we are close to have something ready for production.
> 
> I don't disagree with you that this is a huge amount of refactoring to
> undertake over the next couple releases. :)

Yes and that is the point. We are going to block the work on /pci
module during a period where we can see a large interest around such
support.

> > Jay, I have question, Why you don't start by exposing NUMA ?
> 
> I believe you're asking here why we don't start by modeling NUMA nodes as
> child resource providers of the compute node? Instead of starting by
> modeling PCI devices as child providers of the compute node? If that's not
> what you're asking, please do clarify...
> 
> We're starting with modeling PCI devices as child providers of the compute
> node because they are easier to deal with as a whole than NUMA nodes and we
> have the potential of being able to remove the PciPassthroughFilter from the
> scheduler in Queens.
> 
> I don't see us being able to remove the NUMATopologyFilter from the
> scheduler in Queens because of the complexity involved in how coupled the
> NUMA topology resource handling is to CPU pinning, huge page support, and IO
> emulation thread pinning.
> 
> Hope that answers that question; again, lemme know if that's not the
> question you were asking! :)

Yes it was the question and you perfectly responded, thanks. I will
try to be more clear in the future :)

As you have noticed the support of NUMA will be quite difficult and it
is not in the TODO right now, which let me think that we are going to
block development on pci module and more of that at the end provide
less support (no NUMA awareness). Is that reasonable ?

> > > For the record, I have zero confidence in any existing "functional" tests
> > > for NUMA, SR-IOV, CPU pinning, huge pages, and the like. Unfortunately, 
> > > due
> > > to the fact that these features often require hardware that either the
> > > upstream community CI lacks or that depends on libraries, drivers and 
> > > kernel
> > > versions that really aren't 

Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-09-29 Thread Bob Ball
Hi Sahid,

> > a second device emulator along-side QEMU.  There is no mdev 
> > integration.  I'm concerned about how much mdev-specific functionality 
> > would have to be faked up in the XenServer-specific driver for vGPU to 
> > be used in this way.
>
> What you are refering with your DEMU it's what QEMU/KVM have with its 
> vfio-pci. XenServer is
> reading through MDEV since the vendors provide drivers on *Linux* using the 
> MDEV framework.
> MDEV is a kernel layer, used to expose hardwares, it's not hypervisor 
> specific.

It is possible that the vendor's userspace libraries use mdev, however DEMU has 
no concept of mdev at all.  If the vendor's userspace libraries do use mdev 
then this is entirely abstracted from XenServer's integration.
While I don't have access to the vendors source for the userspace libraries or 
the kernel module my understanding was that the kernel module in XenServer's 
integration is for the userspace libraries to talk to the kernel module and for 
IOCTLS.

My reading of mdev implies that /sys/class/mdev_bus should exist for it to be 
used?  It does not exist in XenServer, which to me implies that the vendor's 
driver for XenServer do not use mdev?

Bob

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-09-29 Thread Sahid Orentino Ferdjaoui
On Fri, Sep 29, 2017 at 12:26:07PM +, Bob Ball wrote:
> Hi Sahid,
> 
> > Please consider the support of MDEV for the /pci framework which provides 
> > support for vGPUs [0].
> 

> From my understanding, this MDEV implementation for vGPU would be
> entirely specific to libvirt, is that correct?

No, but Linux specific yes. Windows is supporting SR-IOV.

> XenServer's implementation for vGPU is based on a pooled device
> model (as described in
> http://lists.openstack.org/pipermail/openstack-dev/2017-September/122702.html)

This topic is referring something which I guess everyone understand
now - It's basically why I do have added support of MDEV in /pci to
make it working whatever how the virtual devices are exposed, SR-IOV
or MDEV.

> a second device emulator along-side QEMU.  There is no mdev
> integration.  I'm concerned about how much mdev-specific
> functionality would have to be faked up in the XenServer-specific
> driver for vGPU to be used in this way.

What you are refering with your DEMU it's what QEMU/KVM have with its
vfio-pci. XenServer is reading through MDEV since the vendors provide
drivers on *Linux* using the MDEV framework.

MDEV is a kernel layer, used to expose hardwares, it's not hypervisor
specific.

> I'm not familiar with mdev, but it looks Linux specific, so would not be 
> usable by Hyper-V?
> I've also not been able to find suggestions that VMWare can make use of mdev, 
> although I don't know the architecture of VMWare's integration.
> 
> The concepts of PCI and SR-IOV are, of course, generic, but I think out of 
> principal we should avoid a hypervisor-specific integration for vGPU (indeed 
> Citrix has been clear from the beginning that the vGPU integration we are 
> proposing is intentionally hypervisor agnostic)
> I also think there is value in exposing vGPU in a generic way, irrespective 
> of the underlying implementation (whether it is DEMU, mdev, SR-IOV or 
> whatever approach Hyper-V/VMWare use).
> 
> It's quite difficult for me to see how this will work for other
> hypervisors.  Do you also have a draft alternate spec where more
> details can be discussed?

I would expect that XenServer provides the MDEV UUID, then it's easy
to ask sysfs if you need to get the NUMA node of the physical device
or the mdev_type.

> Bob
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-09-29 Thread Jay Pipes

Hi Sahid, comments inline. :)

On 09/29/2017 04:53 AM, Sahid Orentino Ferdjaoui wrote:

On Thu, Sep 28, 2017 at 05:06:16PM -0400, Jay Pipes wrote:

On 09/28/2017 11:37 AM, Sahid Orentino Ferdjaoui wrote:

Please consider the support of MDEV for the /pci framework which
provides support for vGPUs [0].

Accordingly to the discussion [1]

With this first implementation which could be used as a skeleton for
implementing PCI Devices in Resource Tracker


I'm not entirely sure what you're referring to above as "implementing PCI
devices in Resource Tracker". Could you elaborate? The resource tracker
already embeds a PciManager object that manages PCI devices, as you know.
Perhaps you meant "implement PCI devices as Resource Providers"?


A PciManager? I know that we have a field PCI_DEVICE :) - I guess a
virt driver can return inventory with total of PCI devices. Talking
about manager, not sure.


I'm referring to this:

https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L33

The PciDevTracker class is instantiated in the resource tracker when the 
first ComputeNode object managed by the resource tracker is init'd:


https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L578

On initialization, the PciDevTracker inventories the compute node's 
collection of PCI devices by grabbing a list of records from the 
pci_devices table in the cell database:


https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L69

and then comparing those DB records with information the hypervisor 
returns about PCI devices:


https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L160

Each hypervisor returns something different for the list of pci devices, 
as you know. For libvirt, the call that returns PCI device information 
is here:


https://github.com/openstack/nova/blob/master/nova/virt/libvirt/host.py#L842

The results of that are jammed into a "pci_passthrough_devices" key in 
the returned result of the virt driver's get_available_resource() call. 
For libvirt, that's here:


https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L5809

It is that piece that Eric and myself have been talking about 
standardizing into a "generic device management" interface that would 
have an update_inventory() method that accepts a ProviderTree object [1]


[1] 
https://github.com/openstack/nova/blob/master/nova/compute/provider_tree.py


and would add resource providers corresponding to devices that are made 
available to guests for use.



You still have to define "traits", basically for physical network
devices, the users want to select device according physical network,
to select device according the placement on host (NUMA), to select the
device according the bandwidth capability... For GPU it's same
story. *And I do not have mentioned devices which support virtual
functions.*


Yes, the generic device manager would be responsible for associating 
traits to the resource providers it adds to the ProviderTree provided to 
it in the update_inventory() call.



So that is what you plan to do for this release :) - Reasonably I
don't think we are close to have something ready for production.


I don't disagree with you that this is a huge amount of refactoring to 
undertake over the next couple releases. :)



Jay, I have question, Why you don't start by exposing NUMA ?


I believe you're asking here why we don't start by modeling NUMA nodes 
as child resource providers of the compute node? Instead of starting by 
modeling PCI devices as child providers of the compute node? If that's 
not what you're asking, please do clarify...


We're starting with modeling PCI devices as child providers of the 
compute node because they are easier to deal with as a whole than NUMA 
nodes and we have the potential of being able to remove the 
PciPassthroughFilter from the scheduler in Queens.


I don't see us being able to remove the NUMATopologyFilter from the 
scheduler in Queens because of the complexity involved in how coupled 
the NUMA topology resource handling is to CPU pinning, huge page 
support, and IO emulation thread pinning.


Hope that answers that question; again, lemme know if that's not the 
question you were asking! :)



For the record, I have zero confidence in any existing "functional" tests
for NUMA, SR-IOV, CPU pinning, huge pages, and the like. Unfortunately, due
to the fact that these features often require hardware that either the
upstream community CI lacks or that depends on libraries, drivers and kernel
versions that really aren't available to non-bleeding edge users (or users
with very deep pockets).


It's good point, if you are not confidence, don't you think it's
premature to move forward on implementing new thing without to have
well trusted functional tests?


Completely agree with you. I would rather see functional integration 
tests that are proven to actually test these complex hardware devices 
*gating* Nova patches before adding any 

Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-09-29 Thread Dan Smith

The concepts of PCI and SR-IOV are, of course, generic


They are, although the PowerVM guys have already pointed out that they
don't even refer to virtual devices by PCI address and thus anything 
based on that subsystem isn't going to help them.



but I think out of principal we should avoid a hypervisor-specific
integration for vGPU (indeed Citrix has been clear from the beginning
that the vGPU integration we are proposing is intentionally
hypervisor agnostic) I also think there is value in exposing vGPU in
a generic way, irrespective of the underlying implementation (whether
it is DEMU, mdev, SR-IOV or whatever approach Hyper-V/VMWare use).


I very much agree, of course.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-09-29 Thread Bob Ball
Hi Sahid,

> Please consider the support of MDEV for the /pci framework which provides 
> support for vGPUs [0].

From my understanding, this MDEV implementation for vGPU would be entirely 
specific to libvirt, is that correct?

XenServer's implementation for vGPU is based on a pooled device model (as 
described in 
http://lists.openstack.org/pipermail/openstack-dev/2017-September/122702.html) 
and directly interfaces with the card using DEMU ("Discrete EMU") as a second 
device emulator along-side QEMU.  There is no mdev integration.  I'm concerned 
about how much mdev-specific functionality would have to be faked up in the 
XenServer-specific driver for vGPU to be used in this way.

I'm not familiar with mdev, but it looks Linux specific, so would not be usable 
by Hyper-V?
I've also not been able to find suggestions that VMWare can make use of mdev, 
although I don't know the architecture of VMWare's integration.

The concepts of PCI and SR-IOV are, of course, generic, but I think out of 
principal we should avoid a hypervisor-specific integration for vGPU (indeed 
Citrix has been clear from the beginning that the vGPU integration we are 
proposing is intentionally hypervisor agnostic)
I also think there is value in exposing vGPU in a generic way, irrespective of 
the underlying implementation (whether it is DEMU, mdev, SR-IOV or whatever 
approach Hyper-V/VMWare use).

It's quite difficult for me to see how this will work for other hypervisors.  
Do you also have a draft alternate spec where more details can be discussed?

Bob
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-09-29 Thread Sylvain Bauza
On Fri, Sep 29, 2017 at 2:32 AM, Dan Smith  wrote:

> In this serie of patches we are generalizing the PCI framework to
>>> handle MDEV devices. We arguing it's a lot of patches but most of them
>>> are small and the logic behind is basically to make it understand two
>>> new fields MDEV_PF and MDEV_VF.
>>>
>>
>> That's not really "generalizing the PCI framework to handle MDEV devices"
>> :) More like it's just changing the /pci module to understand a different
>> device management API, but ok.
>>
>
> Yeah, the series is adding more fields to our PCI structure to allow for
> more variations in the kinds of things we lump into those tables. This is
> my primary complaint with this approach, and has been since the topic first
> came up. I really want to avoid building any more dependency on the
> existing pci-passthrough mechanisms and focus any new effort on using
> resource providers for this. The existing pci-passthrough code is almost
> universally hated, poorly understood and tested, and something we should
> not be further building upon.
>
> In this serie of patches we make libvirt driver support, as usually,
>>> return resources and attach devices returned by the pci manager. This
>>> part can be reused for Resource Provider.
>>>
>>
>> Perhaps, but the idea behind the resource providers framework is to treat
>> devices as generic things. Placement doesn't need to know about the
>> particular device attachment status.
>>
>
> I quickly went through the patches and left a few comments. The base work
> of pulling some of this out of libvirt is there, but it's all focused on
> the act of populating pci structures from the vgpu information we get from
> libvirt. That code could be made to instead populate a resource inventory,
> but that's about the most of the set that looks applicable to the
> placement-based approach.
>
>
I'll review them too.

As mentioned in IRC and the previous ML discussion, my focus is on the
>> nested resource providers work and reviews, along with the other two
>> top-priority scheduler items (move operations and alternate hosts).
>>
>> I'll do my best to look at your patch series, but please note it's lower
>> priority than a number of other items.
>>
>
> FWIW, I'm not really planning to spend any time reviewing it until/unless
> it is retooled to generate an inventory from the virt driver.
>
> With the two patches that report vgpus and then create guests with them
> when asked converted to resource providers, I think that would be enough to
> have basic vgpu support immediately. No DB migrations, model changes, etc
> required. After that, helping to get the nested-rps and traits work landed
> gets us the ability to expose attributes of different types of those vgpus
> and opens up a lot of possibilities. IMHO, that's work I'm interested in
> reviewing.
>

That's exactly the things I would like to provide for Queens, so operators
would have a possibility to have flavors asking for vGPU resources in
Queens, even if they couldn't yet ask for a specific VGPU type yet (or
asking to be in the same NUMA cell than the CPU). The latter is definitely
needing to have nested resource providers, but the former (just having vGPU
resource classes provided by the virt driver) is possible for Queens.



> One thing that would be very useful, Sahid, if you could get with Eric
>> Fried (efried) on IRC and discuss with him the "generic device management"
>> system that was discussed at the PTG. It's likely that the /pci module is
>> going to be overhauled in Rocky and it would be good to have the mdev
>> device management API requirements included in that discussion.
>>
>
> Definitely this.
>

++


> --Dan
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-09-29 Thread Sahid Orentino Ferdjaoui
On Thu, Sep 28, 2017 at 05:06:16PM -0400, Jay Pipes wrote:
> On 09/28/2017 11:37 AM, Sahid Orentino Ferdjaoui wrote:
> > Please consider the support of MDEV for the /pci framework which
> > provides support for vGPUs [0].
> > 
> > Accordingly to the discussion [1]
> > 
> > With this first implementation which could be used as a skeleton for
> > implementing PCI Devices in Resource Tracker
> 
> I'm not entirely sure what you're referring to above as "implementing PCI
> devices in Resource Tracker". Could you elaborate? The resource tracker
> already embeds a PciManager object that manages PCI devices, as you know.
> Perhaps you meant "implement PCI devices as Resource Providers"?

A PciManager? I know that we have a field PCI_DEVICE :) - I guess a
virt driver can return inventory with total of PCI devices. Talking
about manager, not sure.

You still have to define "traits", basically for physical network
devices, the users want to select device according physical network,
to select device according the placement on host (NUMA), to select the
device according the bandwidth capability... For GPU it's same
story. *And I do not have mentioned devices which support virtual
functions.*

So that is what you plan to do for this release :) - Reasonably I
don't think we are close to have something ready for production.

Jay, I have question, Why you don't start by exposing NUMA ?

> > we provide support for
> > attaching vGPUs to guests. And also to provide affinity per NUMA
> > nodes. An other important point is that that implementation can take
> > advantage of the ongoing specs like PCI NUMA policies.
> > 
> > * The Implementation [0]
> > 
> > [PATCH 01/13] pci: update PciDevice object field 'address' to accept
> > [PATCH 02/13] pci: add for PciDevice object new field mdev
> > [PATCH 03/13] pci: generalize object unit-tests for different
> > [PATCH 04/13] pci: add support for mdev device type request
> > [PATCH 05/13] pci: generalize stats unit-tests for different
> > [PATCH 06/13] pci: add support for mdev devices type devspec
> > [PATCH 07/13] pci: add support for resource pool stats of mdev
> > [PATCH 08/13] pci: make manager to accept handling mdev devices
> > 
> > In this serie of patches we are generalizing the PCI framework to
> > handle MDEV devices. We arguing it's a lot of patches but most of them
> > are small and the logic behind is basically to make it understand two
> > new fields MDEV_PF and MDEV_VF.
> 
> That's not really "generalizing the PCI framework to handle MDEV devices" :)
> More like it's just changing the /pci module to understand a different
> device management API, but ok.

If you prefer call it like that :) - The point is the /pci manages
physical devices, It can passthrough the whole device or its virtual
functions exposed through SRIOV or MDEV.

> > [PATCH 09/13] libvirt: update PCI node device to report mdev devices
> > [PATCH 10/13] libvirt: report mdev resources
> > [PATCH 11/13] libvirt: add support to start vm with using mdev (vGPU)
> > 
> > In this serie of patches we make libvirt driver support, as usually,
> > return resources and attach devices returned by the pci manager. This
> > part can be reused for Resource Provider.
> 
> Perhaps, but the idea behind the resource providers framework is to treat
> devices as generic things. Placement doesn't need to know about the
> particular device attachment status.
> 
> > [PATCH 12/13] functional: rework fakelibvirt host pci devices
> > [PATCH 13/13] libvirt: resuse SRIOV funtional tests for MDEV devices
> > 
> > Here we reuse 100/100 of the functional tests used for SR-IOV
> > devices. Again here, this part can be reused for Resource Provider.
> 
> Probably not, but I'll take a look :)
> 
> For the record, I have zero confidence in any existing "functional" tests
> for NUMA, SR-IOV, CPU pinning, huge pages, and the like. Unfortunately, due
> to the fact that these features often require hardware that either the
> upstream community CI lacks or that depends on libraries, drivers and kernel
> versions that really aren't available to non-bleeding edge users (or users
> with very deep pockets).

It's good point, if you are not confidence, don't you think it's
premature to move forward on implementing new thing without to have
well trusted functional tests?

> > * The Usage
> > 
> > There are no difference between SR-IOV and MDEV, from operators point
> > of view who knows how to expose SR-IOV devices in Nova, they already
> > know how to expose MDEV devices (vGPUs).
> > 
> > Operators will be able to expose MDEV devices in the same manner as
> > they expose SR-IOV:
> > 
> >   1/ Configure whitelist devices
> > 
> >   ['{"vendor_id":"10de"}']
> > 
> >   2/ Create aliases
> > 
> >   [{"vendor_id":"10de", "name":"vGPU"}]
> > 
> >   3/ Configure the flavor
> > 
> >   openstack flavor set --property "pci_passthrough:alias"="vGPU:1"
> > 
> > * Limitations
> > 
> > The mdev does not provide 'product_id' but 'mdev_type' which should be
> > 

Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-09-28 Thread Dan Smith

In this serie of patches we are generalizing the PCI framework to
handle MDEV devices. We arguing it's a lot of patches but most of them
are small and the logic behind is basically to make it understand two
new fields MDEV_PF and MDEV_VF.


That's not really "generalizing the PCI framework to handle MDEV 
devices" :) More like it's just changing the /pci module to understand a 
different device management API, but ok.


Yeah, the series is adding more fields to our PCI structure to allow for 
more variations in the kinds of things we lump into those tables. This 
is my primary complaint with this approach, and has been since the topic 
first came up. I really want to avoid building any more dependency on 
the existing pci-passthrough mechanisms and focus any new effort on 
using resource providers for this. The existing pci-passthrough code is 
almost universally hated, poorly understood and tested, and something we 
should not be further building upon.



In this serie of patches we make libvirt driver support, as usually,
return resources and attach devices returned by the pci manager. This
part can be reused for Resource Provider.


Perhaps, but the idea behind the resource providers framework is to 
treat devices as generic things. Placement doesn't need to know about 
the particular device attachment status.


I quickly went through the patches and left a few comments. The base 
work of pulling some of this out of libvirt is there, but it's all 
focused on the act of populating pci structures from the vgpu 
information we get from libvirt. That code could be made to instead 
populate a resource inventory, but that's about the most of the set that 
looks applicable to the placement-based approach.


As mentioned in IRC and the previous ML discussion, my focus is on the 
nested resource providers work and reviews, along with the other two 
top-priority scheduler items (move operations and alternate hosts).


I'll do my best to look at your patch series, but please note it's lower 
priority than a number of other items.


FWIW, I'm not really planning to spend any time reviewing it 
until/unless it is retooled to generate an inventory from the virt driver.


With the two patches that report vgpus and then create guests with them 
when asked converted to resource providers, I think that would be enough 
to have basic vgpu support immediately. No DB migrations, model changes, 
etc required. After that, helping to get the nested-rps and traits work 
landed gets us the ability to expose attributes of different types of 
those vgpus and opens up a lot of possibilities. IMHO, that's work I'm 
interested in reviewing.


One thing that would be very useful, Sahid, if you could get with Eric 
Fried (efried) on IRC and discuss with him the "generic device 
management" system that was discussed at the PTG. It's likely that the 
/pci module is going to be overhauled in Rocky and it would be good to 
have the mdev device management API requirements included in that 
discussion.


Definitely this.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-09-28 Thread Jay Pipes

On 09/28/2017 11:37 AM, Sahid Orentino Ferdjaoui wrote:

Please consider the support of MDEV for the /pci framework which
provides support for vGPUs [0].

Accordingly to the discussion [1]

With this first implementation which could be used as a skeleton for
implementing PCI Devices in Resource Tracker


I'm not entirely sure what you're referring to above as "implementing 
PCI devices in Resource Tracker". Could you elaborate? The resource 
tracker already embeds a PciManager object that manages PCI devices, as 
you know. Perhaps you meant "implement PCI devices as Resource Providers"?



we provide support for
attaching vGPUs to guests. And also to provide affinity per NUMA
nodes. An other important point is that that implementation can take
advantage of the ongoing specs like PCI NUMA policies.

* The Implementation [0]

[PATCH 01/13] pci: update PciDevice object field 'address' to accept
[PATCH 02/13] pci: add for PciDevice object new field mdev
[PATCH 03/13] pci: generalize object unit-tests for different
[PATCH 04/13] pci: add support for mdev device type request
[PATCH 05/13] pci: generalize stats unit-tests for different
[PATCH 06/13] pci: add support for mdev devices type devspec
[PATCH 07/13] pci: add support for resource pool stats of mdev
[PATCH 08/13] pci: make manager to accept handling mdev devices

In this serie of patches we are generalizing the PCI framework to
handle MDEV devices. We arguing it's a lot of patches but most of them
are small and the logic behind is basically to make it understand two
new fields MDEV_PF and MDEV_VF.


That's not really "generalizing the PCI framework to handle MDEV 
devices" :) More like it's just changing the /pci module to understand a 
different device management API, but ok.



[PATCH 09/13] libvirt: update PCI node device to report mdev devices
[PATCH 10/13] libvirt: report mdev resources
[PATCH 11/13] libvirt: add support to start vm with using mdev (vGPU)

In this serie of patches we make libvirt driver support, as usually,
return resources and attach devices returned by the pci manager. This
part can be reused for Resource Provider.


Perhaps, but the idea behind the resource providers framework is to 
treat devices as generic things. Placement doesn't need to know about 
the particular device attachment status.



[PATCH 12/13] functional: rework fakelibvirt host pci devices
[PATCH 13/13] libvirt: resuse SRIOV funtional tests for MDEV devices

Here we reuse 100/100 of the functional tests used for SR-IOV
devices. Again here, this part can be reused for Resource Provider.


Probably not, but I'll take a look :)

For the record, I have zero confidence in any existing "functional" 
tests for NUMA, SR-IOV, CPU pinning, huge pages, and the like. 
Unfortunately, due to the fact that these features often require 
hardware that either the upstream community CI lacks or that depends on 
libraries, drivers and kernel versions that really aren't available to 
non-bleeding edge users (or users with very deep pockets).



* The Usage

There are no difference between SR-IOV and MDEV, from operators point
of view who knows how to expose SR-IOV devices in Nova, they already
know how to expose MDEV devices (vGPUs).

Operators will be able to expose MDEV devices in the same manner as
they expose SR-IOV:

  1/ Configure whitelist devices

  ['{"vendor_id":"10de"}']

  2/ Create aliases

  [{"vendor_id":"10de", "name":"vGPU"}]

  3/ Configure the flavor

  openstack flavor set --property "pci_passthrough:alias"="vGPU:1"

* Limitations

The mdev does not provide 'product_id' but 'mdev_type' which should be
considered to exactly identify which resource users can request e.g:
nvidia-10. To provide that support we have to add a new field
'mdev_type' so aliases could be something like:

  {"vendor_id":"10de", mdev_type="nvidia-10" "name":"alias-nvidia-10"}
  {"vendor_id":"10de", mdev_type="nvidia-11" "name":"alias-nvidia-11"}

I do have plan to add but first I need to have support from upstream
to continue that work.


As mentioned in IRC and the previous ML discussion, my focus is on the 
nested resource providers work and reviews, along with the other two 
top-priority scheduler items (move operations and alternate hosts).


I'll do my best to look at your patch series, but please note it's lower 
priority than a number of other items.


One thing that would be very useful, Sahid, if you could get with Eric 
Fried (efried) on IRC and discuss with him the "generic device 
management" system that was discussed at the PTG. It's likely that the 
/pci module is going to be overhauled in Rocky and it would be good to 
have the mdev device management API requirements included in that 
discussion.


Best,
-jay

Best,
-jay



[0] 
https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:pci-mdev-support
[1] 
http://lists.openstack.org/pipermail/openstack-dev/2017-September/122591.html


Re: [openstack-dev] vGPUs support for Nova

2017-09-26 Thread Mooney, Sean K


> -Original Message-
> From: Sahid Orentino Ferdjaoui [mailto:sferd...@redhat.com]
> Sent: Tuesday, September 26, 2017 1:46 PM
> To: OpenStack Development Mailing List (not for usage questions)
> <openstack-dev@lists.openstack.org>
> Subject: Re: [openstack-dev] vGPUs support for Nova
> 
> On Mon, Sep 25, 2017 at 04:59:04PM +, Jianghua Wang wrote:
> > Sahid,
> >
> > Just share some background. XenServer doesn't expose vGPUs as mdev or
> > pci devices.
> 
> That does not make any sense. There is physical device (PCI) which
> provides functions (vGPUs). These functions are exposed through mdev
> framework. What you need is the mdev UUID related to a specific vGPU
> and I'm sure that XenServer is going to expose it. Something which
> XenServer may not expose is the NUMA node where the physical device is
> plugged on but in such situation you could still use sysfs.
[Mooney, Sean K] this is implementation specific. Amd support virtualizing
There gpu using sriov http://www.amd.com/Documents/Multiuser-GPU-White-Paper.pdf
In that case you can use the existing pci pass-through support without any 
modification.
For intel and nvidia gpus we need speficic hypervisor support as the device 
partitioning
Is done in the host gpu driver rather than via sirov. There are two level of 
abstraction
That we must keep separate. 1 how does the hardware support configuration and 
enumeration
Of the virutalised resources (amd in hardware via sriov, intel/nvidia via 
driver/software manager). 
2 how does the hypervisor report the vgpus to openstack and other clients.

In the amd case I would not expect any hypervisor to have mdevs associated with 
The sriov vf as that is not the virtualization model they have implemented.
In the intel gvt case yes you will have mdevs but the virtual gpus are not 
Represented on the pci bus so we should not model them as pci deveices.

Some more comments below.
> 
> > I proposed a spec about one year ago to make fake pci devices so that
> > we can use the existing PCI mechanism to cover vGPUs. But that's not
> a
> > good design and got strongly objection. After that, we switched to
> use
> > the resource providers by following the advice from the core team.
> >
> > Regards,
> > Jianghua
> >
> > -Original Message-
> > From: Sahid Orentino Ferdjaoui [mailto:sferd...@redhat.com]
> > Sent: Monday, September 25, 2017 11:01 PM
> > To: OpenStack Development Mailing List (not for usage questions)
> > <openstack-dev@lists.openstack.org>
> > Subject: Re: [openstack-dev] vGPUs support for Nova
> >
> > On Mon, Sep 25, 2017 at 09:29:25AM -0500, Matt Riedemann wrote:
> > > On 9/25/2017 5:40 AM, Jay Pipes wrote:
> > > > On 09/25/2017 05:39 AM, Sahid Orentino Ferdjaoui wrote:
> > > > > There is a desire to expose the vGPUs resources on top of
> > > > > Resource Provider which is probably the path we should be going
> > > > > in the long term. I was not there for the last PTG and you
> > > > > probably already made a decision about moving in that direction
> > > > > anyway. My personal feeling is that it is premature.
> > > > >
> > > > > The nested Resource Provider work is not yet feature-complete
> > > > > and requires more reviewer attention. If we continue in the
> > > > > direction of Resource Provider, it will need at least 2 more
> > > > > releases to expose the vGPUs feature and that without the
> > > > > support of NUMA, and with the feeling of pushing something
> which is not stable/production-ready.
[Mooney, Sean K] Not all gpus have numam affinity. Intel integrated gpus do 
not. they have
Dedicated edram on the processor die so there memory accesses never leave
The processor package sot they do not have numa affinity. I would assume the
Same is true for amd integrated gpus so only descreet gpus will have numa 
affinity.
> > > > >
> > > > > It's seems safer to first have the Resource Provider work well
> > > > > finalized/stabilized to be production-ready. Then on top of
> > > > > something stable we could start to migrate our current virt
> > > > > specific features like NUMA, CPU Pinning, Huge Pages and
> finally PCI devices.
> > > > >
> > > > > I'm talking about PCI devices in general because I think we
> > > > > should implement the vGPU on top of our /pci framework which is
> > > > > production ready and provides the support of NUMA.
> > > > >
> > > > > The hardware vendors building their drivers using mdev and the
This is vendor specifi

Re: [openstack-dev] vGPUs support for Nova

2017-09-26 Thread Sahid Orentino Ferdjaoui
On Mon, Sep 25, 2017 at 04:59:04PM +, Jianghua Wang wrote:
> Sahid,
> 
> Just share some background. XenServer doesn't expose vGPUs as mdev
> or pci devices.

That does not make any sense. There is physical device (PCI) which
provides functions (vGPUs). These functions are exposed through mdev
framework. What you need is the mdev UUID related to a specific vGPU
and I'm sure that XenServer is going to expose it. Something which
XenServer may not expose is the NUMA node where the physical device is
plugged on but in such situation you could still use sysfs.

> I proposed a spec about one year ago to make fake pci devices so
> that we can use the existing PCI mechanism to cover vGPUs. But
> that's not a good design and got strongly objection. After that, we
> switched to use the resource providers by following the advice from
> the core team.
>
> Regards,
> Jianghua
> 
> -Original Message-
> From: Sahid Orentino Ferdjaoui [mailto:sferd...@redhat.com] 
> Sent: Monday, September 25, 2017 11:01 PM
> To: OpenStack Development Mailing List (not for usage questions) 
> <openstack-dev@lists.openstack.org>
> Subject: Re: [openstack-dev] vGPUs support for Nova
> 
> On Mon, Sep 25, 2017 at 09:29:25AM -0500, Matt Riedemann wrote:
> > On 9/25/2017 5:40 AM, Jay Pipes wrote:
> > > On 09/25/2017 05:39 AM, Sahid Orentino Ferdjaoui wrote:
> > > > There is a desire to expose the vGPUs resources on top of Resource 
> > > > Provider which is probably the path we should be going in the long 
> > > > term. I was not there for the last PTG and you probably already 
> > > > made a decision about moving in that direction anyway. My personal 
> > > > feeling is that it is premature.
> > > > 
> > > > The nested Resource Provider work is not yet feature-complete and 
> > > > requires more reviewer attention. If we continue in the direction 
> > > > of Resource Provider, it will need at least 2 more releases to 
> > > > expose the vGPUs feature and that without the support of NUMA, and 
> > > > with the feeling of pushing something which is not 
> > > > stable/production-ready.
> > > > 
> > > > It's seems safer to first have the Resource Provider work well 
> > > > finalized/stabilized to be production-ready. Then on top of 
> > > > something stable we could start to migrate our current virt 
> > > > specific features like NUMA, CPU Pinning, Huge Pages and finally PCI 
> > > > devices.
> > > > 
> > > > I'm talking about PCI devices in general because I think we should 
> > > > implement the vGPU on top of our /pci framework which is 
> > > > production ready and provides the support of NUMA.
> > > > 
> > > > The hardware vendors building their drivers using mdev and the 
> > > > /pci framework currently understand only SRIOV but on a quick 
> > > > glance it does not seem complicated to make it support mdev.
> > > > 
> > > > In the /pci framework we will have to:
> > > > 
> > > > * Update the PciDevice object fields to accept NULL value for
> > > >    'address' and add new field 'uuid'
> > > > * Update PciRequest to handle a new tag like 'vgpu_types'
> > > > * Update PciDeviceStats to also maintain pool of vGPUs
> > > > 
> > > > The operators will have to create alias(-es) and configure 
> > > > flavors. Basically most of the logic is already implemented and 
> > > > the method 'consume_request' is going to select the right vGPUs 
> > > > according the request.
> > > > 
> > > > In /virt we will have to:
> > > > 
> > > > * Update the field 'pci_passthrough_devices' to also include GPUs
> > > >    devices.
> > > > * Update attach/detach PCI device to handle vGPUs
> > > > 
> > > > We have a few people interested in working on it, so we could 
> > > > certainly make this feature available for Queen.
> > > > 
> > > > I can take the lead updating/implementing the PCI and libvirt 
> > > > driver part, I'm sure Jianghua Wang will be happy to take the lead 
> > > > for the virt XenServer part.
> > > > 
> > > > And I trust Jay, Stephen and Sylvain to follow the developments.
> > > 
> > > I understand the desire to get something in to Nova to support 
> > > vGPUs, and I understand that the existing /pci modules represent the 
> > > fastest/cheapest way to get there.
> > > 
> > &

Re: [openstack-dev] vGPUs support for Nova

2017-09-25 Thread Jianghua Wang
Sahid,

   Just share some background. XenServer doesn't expose vGPUs as mdev or pci 
devices. I proposed a spec about one year ago to make fake pci devices so that 
we can use the existing PCI mechanism to cover vGPUs. But that's not a good 
design and got strongly objection. After that, we switched to use the resource 
providers by following the advice from the core team.

Regards,
Jianghua

-Original Message-
From: Sahid Orentino Ferdjaoui [mailto:sferd...@redhat.com] 
Sent: Monday, September 25, 2017 11:01 PM
To: OpenStack Development Mailing List (not for usage questions) 
<openstack-dev@lists.openstack.org>
Subject: Re: [openstack-dev] vGPUs support for Nova

On Mon, Sep 25, 2017 at 09:29:25AM -0500, Matt Riedemann wrote:
> On 9/25/2017 5:40 AM, Jay Pipes wrote:
> > On 09/25/2017 05:39 AM, Sahid Orentino Ferdjaoui wrote:
> > > There is a desire to expose the vGPUs resources on top of Resource 
> > > Provider which is probably the path we should be going in the long 
> > > term. I was not there for the last PTG and you probably already 
> > > made a decision about moving in that direction anyway. My personal 
> > > feeling is that it is premature.
> > > 
> > > The nested Resource Provider work is not yet feature-complete and 
> > > requires more reviewer attention. If we continue in the direction 
> > > of Resource Provider, it will need at least 2 more releases to 
> > > expose the vGPUs feature and that without the support of NUMA, and 
> > > with the feeling of pushing something which is not 
> > > stable/production-ready.
> > > 
> > > It's seems safer to first have the Resource Provider work well 
> > > finalized/stabilized to be production-ready. Then on top of 
> > > something stable we could start to migrate our current virt 
> > > specific features like NUMA, CPU Pinning, Huge Pages and finally PCI 
> > > devices.
> > > 
> > > I'm talking about PCI devices in general because I think we should 
> > > implement the vGPU on top of our /pci framework which is 
> > > production ready and provides the support of NUMA.
> > > 
> > > The hardware vendors building their drivers using mdev and the 
> > > /pci framework currently understand only SRIOV but on a quick 
> > > glance it does not seem complicated to make it support mdev.
> > > 
> > > In the /pci framework we will have to:
> > > 
> > > * Update the PciDevice object fields to accept NULL value for
> > >    'address' and add new field 'uuid'
> > > * Update PciRequest to handle a new tag like 'vgpu_types'
> > > * Update PciDeviceStats to also maintain pool of vGPUs
> > > 
> > > The operators will have to create alias(-es) and configure 
> > > flavors. Basically most of the logic is already implemented and 
> > > the method 'consume_request' is going to select the right vGPUs 
> > > according the request.
> > > 
> > > In /virt we will have to:
> > > 
> > > * Update the field 'pci_passthrough_devices' to also include GPUs
> > >    devices.
> > > * Update attach/detach PCI device to handle vGPUs
> > > 
> > > We have a few people interested in working on it, so we could 
> > > certainly make this feature available for Queen.
> > > 
> > > I can take the lead updating/implementing the PCI and libvirt 
> > > driver part, I'm sure Jianghua Wang will be happy to take the lead 
> > > for the virt XenServer part.
> > > 
> > > And I trust Jay, Stephen and Sylvain to follow the developments.
> > 
> > I understand the desire to get something in to Nova to support 
> > vGPUs, and I understand that the existing /pci modules represent the 
> > fastest/cheapest way to get there.
> > 
> > I won't block you from making any of the above changes, Sahid. I'll 
> > even do my best to review them. However, I will be primarily 
> > focusing this cycle on getting the nested resource providers work 
> > feature-complete for (at least) SR-IOV PF/VF devices.
> > 
> > The decision of whether to allow an approach that adds more to the 
> > existing /pci module is ultimately Matt's.
> > 
> > Best,
> > -jay
> > 
> > 
> > __ OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: 
> > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> Nested resource providers is not merged or

Re: [openstack-dev] vGPUs support for Nova

2017-09-25 Thread Sahid Orentino Ferdjaoui
On Mon, Sep 25, 2017 at 09:29:25AM -0500, Matt Riedemann wrote:
> On 9/25/2017 5:40 AM, Jay Pipes wrote:
> > On 09/25/2017 05:39 AM, Sahid Orentino Ferdjaoui wrote:
> > > There is a desire to expose the vGPUs resources on top of Resource
> > > Provider which is probably the path we should be going in the long
> > > term. I was not there for the last PTG and you probably already made a
> > > decision about moving in that direction anyway. My personal feeling is
> > > that it is premature.
> > > 
> > > The nested Resource Provider work is not yet feature-complete and
> > > requires more reviewer attention. If we continue in the direction of
> > > Resource Provider, it will need at least 2 more releases to expose the
> > > vGPUs feature and that without the support of NUMA, and with the
> > > feeling of pushing something which is not stable/production-ready.
> > > 
> > > It's seems safer to first have the Resource Provider work well
> > > finalized/stabilized to be production-ready. Then on top of something
> > > stable we could start to migrate our current virt specific features
> > > like NUMA, CPU Pinning, Huge Pages and finally PCI devices.
> > > 
> > > I'm talking about PCI devices in general because I think we should
> > > implement the vGPU on top of our /pci framework which is production
> > > ready and provides the support of NUMA.
> > > 
> > > The hardware vendors building their drivers using mdev and the /pci
> > > framework currently understand only SRIOV but on a quick glance it
> > > does not seem complicated to make it support mdev.
> > > 
> > > In the /pci framework we will have to:
> > > 
> > > * Update the PciDevice object fields to accept NULL value for
> > >    'address' and add new field 'uuid'
> > > * Update PciRequest to handle a new tag like 'vgpu_types'
> > > * Update PciDeviceStats to also maintain pool of vGPUs
> > > 
> > > The operators will have to create alias(-es) and configure
> > > flavors. Basically most of the logic is already implemented and the
> > > method 'consume_request' is going to select the right vGPUs according
> > > the request.
> > > 
> > > In /virt we will have to:
> > > 
> > > * Update the field 'pci_passthrough_devices' to also include GPUs
> > >    devices.
> > > * Update attach/detach PCI device to handle vGPUs
> > > 
> > > We have a few people interested in working on it, so we could
> > > certainly make this feature available for Queen.
> > > 
> > > I can take the lead updating/implementing the PCI and libvirt driver
> > > part, I'm sure Jianghua Wang will be happy to take the lead for the
> > > virt XenServer part.
> > > 
> > > And I trust Jay, Stephen and Sylvain to follow the developments.
> > 
> > I understand the desire to get something in to Nova to support vGPUs,
> > and I understand that the existing /pci modules represent the
> > fastest/cheapest way to get there.
> > 
> > I won't block you from making any of the above changes, Sahid. I'll even
> > do my best to review them. However, I will be primarily focusing this
> > cycle on getting the nested resource providers work feature-complete for
> > (at least) SR-IOV PF/VF devices.
> > 
> > The decision of whether to allow an approach that adds more to the
> > existing /pci module is ultimately Matt's.
> > 
> > Best,
> > -jay
> > 
> > __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> Nested resource providers is not merged or production ready because we
> haven't made it a priority. We've certainly talked about it and Jay has had
> patches proposed for several releases now though.
> 
> Building vGPU support into the existing framework, which only a couple of
> people understand - certainly not me, might be a short-term gain but is just
> more technical debt we have to pay off later, and delays any focus on nested
> resource providers for the wider team.
> 
> At the Queens PTG it was abundantly clear that many features are dependent
> on nested resource providers, including several networking-related features
> like bandwidth-based scheduling.
> 
> The priorities for placement/scheduler in Queens are:
> 
> 1. Dan Smith's migration allocations cleanup.
> 2. Alternative hosts for reschedules with cells v2.
> 3. Nested resource providers.
> 
> All of these are in progress and need review.
> 
> I personally don't think we should abandon the plan to implement vGPU
> support with nested resource providers without first seeing any code changes
> for it as a proof of concept. It also sounds like we have a pretty simple
> staggered plan for rolling out vGPU support so it's not very detailed to
> start. The virt driver reports vGPU inventory and we decorate the details
> later with traits (which Alex Xu is working on and needs review).
> 
> Sahid, you could certainly 

Re: [openstack-dev] vGPUs support for Nova

2017-09-25 Thread Matt Riedemann

On 9/25/2017 5:40 AM, Jay Pipes wrote:

On 09/25/2017 05:39 AM, Sahid Orentino Ferdjaoui wrote:

There is a desire to expose the vGPUs resources on top of Resource
Provider which is probably the path we should be going in the long
term. I was not there for the last PTG and you probably already made a
decision about moving in that direction anyway. My personal feeling is
that it is premature.

The nested Resource Provider work is not yet feature-complete and
requires more reviewer attention. If we continue in the direction of
Resource Provider, it will need at least 2 more releases to expose the
vGPUs feature and that without the support of NUMA, and with the
feeling of pushing something which is not stable/production-ready.

It's seems safer to first have the Resource Provider work well
finalized/stabilized to be production-ready. Then on top of something
stable we could start to migrate our current virt specific features
like NUMA, CPU Pinning, Huge Pages and finally PCI devices.

I'm talking about PCI devices in general because I think we should
implement the vGPU on top of our /pci framework which is production
ready and provides the support of NUMA.

The hardware vendors building their drivers using mdev and the /pci
framework currently understand only SRIOV but on a quick glance it
does not seem complicated to make it support mdev.

In the /pci framework we will have to:

* Update the PciDevice object fields to accept NULL value for
   'address' and add new field 'uuid'
* Update PciRequest to handle a new tag like 'vgpu_types'
* Update PciDeviceStats to also maintain pool of vGPUs

The operators will have to create alias(-es) and configure
flavors. Basically most of the logic is already implemented and the
method 'consume_request' is going to select the right vGPUs according
the request.

In /virt we will have to:

* Update the field 'pci_passthrough_devices' to also include GPUs
   devices.
* Update attach/detach PCI device to handle vGPUs

We have a few people interested in working on it, so we could
certainly make this feature available for Queen.

I can take the lead updating/implementing the PCI and libvirt driver
part, I'm sure Jianghua Wang will be happy to take the lead for the
virt XenServer part.

And I trust Jay, Stephen and Sylvain to follow the developments.


I understand the desire to get something in to Nova to support vGPUs, 
and I understand that the existing /pci modules represent the 
fastest/cheapest way to get there.


I won't block you from making any of the above changes, Sahid. I'll even 
do my best to review them. However, I will be primarily focusing this 
cycle on getting the nested resource providers work feature-complete for 
(at least) SR-IOV PF/VF devices.


The decision of whether to allow an approach that adds more to the 
existing /pci module is ultimately Matt's.


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Nested resource providers is not merged or production ready because we 
haven't made it a priority. We've certainly talked about it and Jay has 
had patches proposed for several releases now though.


Building vGPU support into the existing framework, which only a couple 
of people understand - certainly not me, might be a short-term gain but 
is just more technical debt we have to pay off later, and delays any 
focus on nested resource providers for the wider team.


At the Queens PTG it was abundantly clear that many features are 
dependent on nested resource providers, including several 
networking-related features like bandwidth-based scheduling.


The priorities for placement/scheduler in Queens are:

1. Dan Smith's migration allocations cleanup.
2. Alternative hosts for reschedules with cells v2.
3. Nested resource providers.

All of these are in progress and need review.

I personally don't think we should abandon the plan to implement vGPU 
support with nested resource providers without first seeing any code 
changes for it as a proof of concept. It also sounds like we have a 
pretty simple staggered plan for rolling out vGPU support so it's not 
very detailed to start. The virt driver reports vGPU inventory and we 
decorate the details later with traits (which Alex Xu is working on and 
needs review).


Sahid, you could certainly implement a separate proof of concept and 
make that available if the nested resource providers-based change hits 
major issues or goes far too long and has too much risk, then we have a 
contingency plan at least. But I don't expect that to get review 
priority and you'd have to accept that it might not get merged since we 
want to use nested resource providers.


Either way we are going to need solid functional testing and that 
functional testing should be written against the API 

Re: [openstack-dev] vGPUs support for Nova

2017-09-25 Thread Jay Pipes

On 09/25/2017 05:39 AM, Sahid Orentino Ferdjaoui wrote:

There is a desire to expose the vGPUs resources on top of Resource
Provider which is probably the path we should be going in the long
term. I was not there for the last PTG and you probably already made a
decision about moving in that direction anyway. My personal feeling is
that it is premature.

The nested Resource Provider work is not yet feature-complete and
requires more reviewer attention. If we continue in the direction of
Resource Provider, it will need at least 2 more releases to expose the
vGPUs feature and that without the support of NUMA, and with the
feeling of pushing something which is not stable/production-ready.

It's seems safer to first have the Resource Provider work well
finalized/stabilized to be production-ready. Then on top of something
stable we could start to migrate our current virt specific features
like NUMA, CPU Pinning, Huge Pages and finally PCI devices.

I'm talking about PCI devices in general because I think we should
implement the vGPU on top of our /pci framework which is production
ready and provides the support of NUMA.

The hardware vendors building their drivers using mdev and the /pci
framework currently understand only SRIOV but on a quick glance it
does not seem complicated to make it support mdev.

In the /pci framework we will have to:

* Update the PciDevice object fields to accept NULL value for
   'address' and add new field 'uuid'
* Update PciRequest to handle a new tag like 'vgpu_types'
* Update PciDeviceStats to also maintain pool of vGPUs

The operators will have to create alias(-es) and configure
flavors. Basically most of the logic is already implemented and the
method 'consume_request' is going to select the right vGPUs according
the request.

In /virt we will have to:

* Update the field 'pci_passthrough_devices' to also include GPUs
   devices.
* Update attach/detach PCI device to handle vGPUs

We have a few people interested in working on it, so we could
certainly make this feature available for Queen.

I can take the lead updating/implementing the PCI and libvirt driver
part, I'm sure Jianghua Wang will be happy to take the lead for the
virt XenServer part.

And I trust Jay, Stephen and Sylvain to follow the developments.


I understand the desire to get something in to Nova to support vGPUs, 
and I understand that the existing /pci modules represent the 
fastest/cheapest way to get there.


I won't block you from making any of the above changes, Sahid. I'll even 
do my best to review them. However, I will be primarily focusing this 
cycle on getting the nested resource providers work feature-complete for 
(at least) SR-IOV PF/VF devices.


The decision of whether to allow an approach that adds more to the 
existing /pci module is ultimately Matt's.


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev