Re: [openstack-dev] [Cyborg] [Nova] Backup plan without nested RPs

2018-06-05 Thread Alex Xu
2018-06-05 22:53 GMT+08:00 Eric Fried :

> Alex-
>
> Allocations for an instance are pulled down by the compute manager
> and
> passed into the virt driver's spawn method since [1].  An allocation
> comprises a consumer, provider, resource class, and amount.  Once we can
> schedule to trees, the allocations pulled down by the compute manager
> will span the tree as appropriate.  So in that sense, yes, nova-compute
> knows which amounts of which resource classes come from which providers.
>

Eric, thanks, that is the thing I missed. Initial I thought we will return
the allocations from the scheduler and down to the compute manager. I see
we already pull the allocations in the compute manager now.


>
> However, if you're asking about the situation where we have two
> different allocations of the same resource class coming from two
> separate providers: Yes, we can still tell which RCxAMOUNT is associated
> with which provider; but No, we still have no inherent way to correlate
> a specific one of those allocations with the part of the *request* it
> came from.  If just the provider UUID isn't enough for the virt driver
> to figure out what to do, it may have to figure it out by looking at the
> flavor (and/or image metadata), inspecting the traits on the providers
> associated with the allocations, etc.  (The theory here is that, if the
> virt driver can't tell the difference at that point, then it actually
> doesn't matter.)
>
> [1] https://review.openstack.org/#/c/511879/
>
> On 06/05/2018 09:05 AM, Alex Xu wrote:
> > Maybe I missed something. Is there anyway the nova-compute can know the
> > resources are allocated from which child resource provider? For example,
> > the host has two PFs. The request is asking one VF, then the
> > nova-compute needs to know the VF is allocated from which PF (resource
> > provider). As my understand, currently we only return a list of
> > alternative resource provider to the nova-compute, those alternative is
> > root resource provider.
> >
> > 2018-06-05 21:29 GMT+08:00 Jay Pipes  > >:
> >
> > On 06/05/2018 08:50 AM, Stephen Finucane wrote:
> >
> > I thought nested resource providers were already supported by
> > placement? To the best of my knowledge, what is /not/ supported
> > is virt drivers using these to report NUMA topologies but I
> > doubt that affects you. The placement guys will need to weigh in
> > on this as I could be missing something but it sounds like you
> > can start using this functionality right now.
> >
> >
> > To be clear, this is what placement and nova *currently* support
> > with regards to nested resource providers:
> >
> > 1) When creating a resource provider in placement, you can specify a
> > parent_provider_uuid and thus create trees of providers. This was
> > placement API microversion 1.14. Also included in this microversion
> > was support for displaying the parent and root provider UUID for
> > resource providers.
> >
> > 2) The nova "scheduler report client" (terrible name, it's mostly
> > just the placement client at this point) understands how to call
> > placement API 1.14 and create resource providers with a parent
> provider.
> >
> > 3) The nova scheduler report client uses a ProviderTree object [1]
> > to cache information about the hierarchy of providers that it knows
> > about. For nova-compute workers managing hypervisors, that means the
> > ProviderTree object contained in the report client is rooted in a
> > resource provider that represents the compute node itself (the
> > hypervisor). For nova-compute workers managing baremetal, that means
> > the ProviderTree object contains many root providers, each
> > representing an Ironic baremetal node.
> >
> > 4) The placement API's GET /allocation_candidates endpoint now
> > understands the concept of granular request groups [2]. Granular
> > request groups are only relevant when a user wants to specify that
> > child providers in a provider tree should be used to satisfy part of
> > an overall scheduling request. However, this support is yet
> > incomplete -- see #5 below.
> >
> > The following parts of the nested resource providers modeling are
> > *NOT* yet complete, however:
> >
> > 5) GET /allocation_candidates does not currently return *results*
> > when granular request groups are specified. So, while the placement
> > service understands the *request* for granular groups, it doesn't
> > yet have the ability to constrain the returned candidates
> > appropriately. Tetsuro is actively working on this functionality in
> > this patch series:
> >
> > https://review.openstack.org/#/q/status:open+project:
> openstack/nova+branch:master+topic:bp/nested-resource-
> providers-allocation-candidates
> > 

Re: [openstack-dev] [Cyborg] [Nova] Backup plan without nested RPs

2018-06-05 Thread Eric Fried
Alex-

Allocations for an instance are pulled down by the compute manager and
passed into the virt driver's spawn method since [1].  An allocation
comprises a consumer, provider, resource class, and amount.  Once we can
schedule to trees, the allocations pulled down by the compute manager
will span the tree as appropriate.  So in that sense, yes, nova-compute
knows which amounts of which resource classes come from which providers.

However, if you're asking about the situation where we have two
different allocations of the same resource class coming from two
separate providers: Yes, we can still tell which RCxAMOUNT is associated
with which provider; but No, we still have no inherent way to correlate
a specific one of those allocations with the part of the *request* it
came from.  If just the provider UUID isn't enough for the virt driver
to figure out what to do, it may have to figure it out by looking at the
flavor (and/or image metadata), inspecting the traits on the providers
associated with the allocations, etc.  (The theory here is that, if the
virt driver can't tell the difference at that point, then it actually
doesn't matter.)

[1] https://review.openstack.org/#/c/511879/

On 06/05/2018 09:05 AM, Alex Xu wrote:
> Maybe I missed something. Is there anyway the nova-compute can know the
> resources are allocated from which child resource provider? For example,
> the host has two PFs. The request is asking one VF, then the
> nova-compute needs to know the VF is allocated from which PF (resource
> provider). As my understand, currently we only return a list of
> alternative resource provider to the nova-compute, those alternative is
> root resource provider.
> 
> 2018-06-05 21:29 GMT+08:00 Jay Pipes  >:
> 
> On 06/05/2018 08:50 AM, Stephen Finucane wrote:
> 
> I thought nested resource providers were already supported by
> placement? To the best of my knowledge, what is /not/ supported
> is virt drivers using these to report NUMA topologies but I
> doubt that affects you. The placement guys will need to weigh in
> on this as I could be missing something but it sounds like you
> can start using this functionality right now.
> 
> 
> To be clear, this is what placement and nova *currently* support
> with regards to nested resource providers:
> 
> 1) When creating a resource provider in placement, you can specify a
> parent_provider_uuid and thus create trees of providers. This was
> placement API microversion 1.14. Also included in this microversion
> was support for displaying the parent and root provider UUID for
> resource providers.
> 
> 2) The nova "scheduler report client" (terrible name, it's mostly
> just the placement client at this point) understands how to call
> placement API 1.14 and create resource providers with a parent provider.
> 
> 3) The nova scheduler report client uses a ProviderTree object [1]
> to cache information about the hierarchy of providers that it knows
> about. For nova-compute workers managing hypervisors, that means the
> ProviderTree object contained in the report client is rooted in a
> resource provider that represents the compute node itself (the
> hypervisor). For nova-compute workers managing baremetal, that means
> the ProviderTree object contains many root providers, each
> representing an Ironic baremetal node.
> 
> 4) The placement API's GET /allocation_candidates endpoint now
> understands the concept of granular request groups [2]. Granular
> request groups are only relevant when a user wants to specify that
> child providers in a provider tree should be used to satisfy part of
> an overall scheduling request. However, this support is yet
> incomplete -- see #5 below.
> 
> The following parts of the nested resource providers modeling are
> *NOT* yet complete, however:
> 
> 5) GET /allocation_candidates does not currently return *results*
> when granular request groups are specified. So, while the placement
> service understands the *request* for granular groups, it doesn't
> yet have the ability to constrain the returned candidates
> appropriately. Tetsuro is actively working on this functionality in
> this patch series:
> 
> 
> https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/nested-resource-providers-allocation-candidates
> 
> 
> 
> 6) The virt drivers need to implement the update_provider_tree()
> interface [3] and construct the tree of resource providers along
> with appropriate inventory records for each child provider in the
> tree. Both libvirt and XenAPI virt drivers have patch series up that
> begin to take advantage of the n

Re: [openstack-dev] [Cyborg] [Nova] Backup plan without nested RPs

2018-06-05 Thread Alex Xu
Maybe I missed something. Is there anyway the nova-compute can know the
resources are allocated from which child resource provider? For example,
the host has two PFs. The request is asking one VF, then the nova-compute
needs to know the VF is allocated from which PF (resource provider). As my
understand, currently we only return a list of alternative resource
provider to the nova-compute, those alternative is root resource provider.

2018-06-05 21:29 GMT+08:00 Jay Pipes :

> On 06/05/2018 08:50 AM, Stephen Finucane wrote:
>
>> I thought nested resource providers were already supported by placement?
>> To the best of my knowledge, what is /not/ supported is virt drivers using
>> these to report NUMA topologies but I doubt that affects you. The placement
>> guys will need to weigh in on this as I could be missing something but it
>> sounds like you can start using this functionality right now.
>>
>
> To be clear, this is what placement and nova *currently* support with
> regards to nested resource providers:
>
> 1) When creating a resource provider in placement, you can specify a
> parent_provider_uuid and thus create trees of providers. This was placement
> API microversion 1.14. Also included in this microversion was support for
> displaying the parent and root provider UUID for resource providers.
>
> 2) The nova "scheduler report client" (terrible name, it's mostly just the
> placement client at this point) understands how to call placement API 1.14
> and create resource providers with a parent provider.
>
> 3) The nova scheduler report client uses a ProviderTree object [1] to
> cache information about the hierarchy of providers that it knows about. For
> nova-compute workers managing hypervisors, that means the ProviderTree
> object contained in the report client is rooted in a resource provider that
> represents the compute node itself (the hypervisor). For nova-compute
> workers managing baremetal, that means the ProviderTree object contains
> many root providers, each representing an Ironic baremetal node.
>
> 4) The placement API's GET /allocation_candidates endpoint now understands
> the concept of granular request groups [2]. Granular request groups are
> only relevant when a user wants to specify that child providers in a
> provider tree should be used to satisfy part of an overall scheduling
> request. However, this support is yet incomplete -- see #5 below.
>
> The following parts of the nested resource providers modeling are *NOT*
> yet complete, however:
>
> 5) GET /allocation_candidates does not currently return *results* when
> granular request groups are specified. So, while the placement service
> understands the *request* for granular groups, it doesn't yet have the
> ability to constrain the returned candidates appropriately. Tetsuro is
> actively working on this functionality in this patch series:
>
> https://review.openstack.org/#/q/status:open+project:opensta
> ck/nova+branch:master+topic:bp/nested-resource-providers-
> allocation-candidates
>
> 6) The virt drivers need to implement the update_provider_tree() interface
> [3] and construct the tree of resource providers along with appropriate
> inventory records for each child provider in the tree. Both libvirt and
> XenAPI virt drivers have patch series up that begin to take advantage of
> the nested provider modeling. However, a number of concerns [4] about
> in-place nova-compute upgrades when moving from a single resource provider
> to a nested provider tree model were raised, and we have begun
> brainstorming how to handle the migration of existing data in the
> single-provider model to the nested provider model. [5] We are blocking any
> reviews on patch series that modify the local provider modeling until these
> migration concerns are fully resolved.
>
> 7) The scheduler does not currently pass granular request groups to
> placement. Once #5 and #6 are resolved, and once the migration/upgrade path
> is resolved, clearly we will need to have the scheduler start making
> requests to placement that represent the granular request groups and have
> the scheduler pass the resulting allocation candidates to its filters and
> weighers.
>
> Hope this helps highlight where we currently are and the work still left
> to do (in Rocky) on nested resource providers.
>
> Best,
> -jay
>
>
> [1] https://github.com/openstack/nova/blob/master/nova/compute/p
> rovider_tree.py
>
> [2] https://specs.openstack.org/openstack/nova-specs/specs/queen
> s/approved/granular-resource-requests.html
>
> [3] https://github.com/openstack/nova/blob/f902e0d5d87fb05207e4a
> 7aca73d185775d43df2/nova/virt/driver.py#L833
>
> [4] http://lists.openstack.org/pipermail/openstack-dev/2018-May/
> 130783.html
>
> [5] https://etherpad.openstack.org/p/placement-making-the-(up)grade
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.

Re: [openstack-dev] [Cyborg] [Nova] Backup plan without nested RPs

2018-06-05 Thread Eric Fried
To summarize: cyborg could model things nested-wise, but there would be
no way to schedule them yet.

Couple of clarifications inline.

On 06/05/2018 08:29 AM, Jay Pipes wrote:
> On 06/05/2018 08:50 AM, Stephen Finucane wrote:
>> I thought nested resource providers were already supported by
>> placement? To the best of my knowledge, what is /not/ supported is
>> virt drivers using these to report NUMA topologies but I doubt that
>> affects you. The placement guys will need to weigh in on this as I
>> could be missing something but it sounds like you can start using this
>> functionality right now.
> 
> To be clear, this is what placement and nova *currently* support with
> regards to nested resource providers:
> 
> 1) When creating a resource provider in placement, you can specify a
> parent_provider_uuid and thus create trees of providers. This was
> placement API microversion 1.14. Also included in this microversion was
> support for displaying the parent and root provider UUID for resource
> providers.
> 
> 2) The nova "scheduler report client" (terrible name, it's mostly just
> the placement client at this point) understands how to call placement
> API 1.14 and create resource providers with a parent provider.
> 
> 3) The nova scheduler report client uses a ProviderTree object [1] to
> cache information about the hierarchy of providers that it knows about.
> For nova-compute workers managing hypervisors, that means the
> ProviderTree object contained in the report client is rooted in a
> resource provider that represents the compute node itself (the
> hypervisor). For nova-compute workers managing baremetal, that means the
> ProviderTree object contains many root providers, each representing an
> Ironic baremetal node.
> 
> 4) The placement API's GET /allocation_candidates endpoint now
> understands the concept of granular request groups [2]. Granular request
> groups are only relevant when a user wants to specify that child
> providers in a provider tree should be used to satisfy part of an
> overall scheduling request. However, this support is yet incomplete --
> see #5 below.

Granular request groups are also usable/useful when sharing providers
are in play. That functionality is complete on both the placement side
and the report client side (see below).

> The following parts of the nested resource providers modeling are *NOT*
> yet complete, however:
> 
> 5) GET /allocation_candidates does not currently return *results* when
> granular request groups are specified. So, while the placement service
> understands the *request* for granular groups, it doesn't yet have the
> ability to constrain the returned candidates appropriately. Tetsuro is
> actively working on this functionality in this patch series:
> 
> https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/nested-resource-providers-allocation-candidates
> 
> 
> 6) The virt drivers need to implement the update_provider_tree()
> interface [3] and construct the tree of resource providers along with
> appropriate inventory records for each child provider in the tree. Both
> libvirt and XenAPI virt drivers have patch series up that begin to take
> advantage of the nested provider modeling. However, a number of concerns
> [4] about in-place nova-compute upgrades when moving from a single
> resource provider to a nested provider tree model were raised, and we
> have begun brainstorming how to handle the migration of existing data in
> the single-provider model to the nested provider model. [5] We are
> blocking any reviews on patch series that modify the local provider
> modeling until these migration concerns are fully resolved.
> 
> 7) The scheduler does not currently pass granular request groups to
> placement.

The code is in place to do this [6] - so the scheduler *will* pass
granular request groups to placement if your flavor specifies them.  As
noted above, such flavors will be limited to exploiting sharing
providers until Tetsuro's series merges.  But no further code work is
required on the scheduler side.

[6] https://review.openstack.org/#/c/515811/

> Once #5 and #6 are resolved, and once the migration/upgrade
> path is resolved, clearly we will need to have the scheduler start
> making requests to placement that represent the granular request groups
> and have the scheduler pass the resulting allocation candidates to its
> filters and weighers.
> 
> Hope this helps highlight where we currently are and the work still left
> to do (in Rocky) on nested resource providers.
> 
> Best,
> -jay
> 
> 
> [1]
> https://github.com/openstack/nova/blob/master/nova/compute/provider_tree.py
> 
> [2]
> https://specs.openstack.org/openstack/nova-specs/specs/queens/approved/granular-resource-requests.html
> 
> 
> [3]
> https://github.com/openstack/nova/blob/f902e0d5d87fb05207e4a7aca73d185775d43df2/nova/virt/driver.py#L833
> 
> 
> [4] http://lists.openstack.org/pipermail/openstack-dev/2018-May/130783.html
> 
> [5] https://

Re: [openstack-dev] [Cyborg] [Nova] Backup plan without nested RPs

2018-06-05 Thread Jay Pipes

On 06/05/2018 08:50 AM, Stephen Finucane wrote:
I thought nested resource providers were already supported by placement? 
To the best of my knowledge, what is /not/ supported is virt drivers 
using these to report NUMA topologies but I doubt that affects you. The 
placement guys will need to weigh in on this as I could be missing 
something but it sounds like you can start using this functionality 
right now.


To be clear, this is what placement and nova *currently* support with 
regards to nested resource providers:


1) When creating a resource provider in placement, you can specify a 
parent_provider_uuid and thus create trees of providers. This was 
placement API microversion 1.14. Also included in this microversion was 
support for displaying the parent and root provider UUID for resource 
providers.


2) The nova "scheduler report client" (terrible name, it's mostly just 
the placement client at this point) understands how to call placement 
API 1.14 and create resource providers with a parent provider.


3) The nova scheduler report client uses a ProviderTree object [1] to 
cache information about the hierarchy of providers that it knows about. 
For nova-compute workers managing hypervisors, that means the 
ProviderTree object contained in the report client is rooted in a 
resource provider that represents the compute node itself (the 
hypervisor). For nova-compute workers managing baremetal, that means the 
ProviderTree object contains many root providers, each representing an 
Ironic baremetal node.


4) The placement API's GET /allocation_candidates endpoint now 
understands the concept of granular request groups [2]. Granular request 
groups are only relevant when a user wants to specify that child 
providers in a provider tree should be used to satisfy part of an 
overall scheduling request. However, this support is yet incomplete -- 
see #5 below.


The following parts of the nested resource providers modeling are *NOT* 
yet complete, however:


5) GET /allocation_candidates does not currently return *results* when 
granular request groups are specified. So, while the placement service 
understands the *request* for granular groups, it doesn't yet have the 
ability to constrain the returned candidates appropriately. Tetsuro is 
actively working on this functionality in this patch series:


https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/nested-resource-providers-allocation-candidates

6) The virt drivers need to implement the update_provider_tree() 
interface [3] and construct the tree of resource providers along with 
appropriate inventory records for each child provider in the tree. Both 
libvirt and XenAPI virt drivers have patch series up that begin to take 
advantage of the nested provider modeling. However, a number of concerns 
[4] about in-place nova-compute upgrades when moving from a single 
resource provider to a nested provider tree model were raised, and we 
have begun brainstorming how to handle the migration of existing data in 
the single-provider model to the nested provider model. [5] We are 
blocking any reviews on patch series that modify the local provider 
modeling until these migration concerns are fully resolved.


7) The scheduler does not currently pass granular request groups to 
placement. Once #5 and #6 are resolved, and once the migration/upgrade 
path is resolved, clearly we will need to have the scheduler start 
making requests to placement that represent the granular request groups 
and have the scheduler pass the resulting allocation candidates to its 
filters and weighers.


Hope this helps highlight where we currently are and the work still left 
to do (in Rocky) on nested resource providers.


Best,
-jay


[1] 
https://github.com/openstack/nova/blob/master/nova/compute/provider_tree.py


[2] 
https://specs.openstack.org/openstack/nova-specs/specs/queens/approved/granular-resource-requests.html


[3] 
https://github.com/openstack/nova/blob/f902e0d5d87fb05207e4a7aca73d185775d43df2/nova/virt/driver.py#L833


[4] http://lists.openstack.org/pipermail/openstack-dev/2018-May/130783.html

[5] https://etherpad.openstack.org/p/placement-making-the-(up)grade

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Cyborg] [Nova] Backup plan without nested RPs

2018-06-05 Thread Stephen Finucane
On Mon, 2018-06-04 at 10:49 -0700, Nadathur, Sundar wrote:
> Hi,
> 
>  Cyborg needs to create RCs and traits for accelerators. The
> original plan was to do that with nested RPs. To avoid rushing
> the
> Nova developers, I had proposed that Cyborg could start by
> applying
> the traits to the compute node RP, and accept the resulting
> caveats
> for Rocky, till we get nested RP support. That proposal did not
> find
> many takers, and Cyborg has essentially been in waiting mode.
> 
> 
> 
> Since it is June already, and there is a risk of not delivering
> anything meaningful in Rocky, I am reviving my older proposal,
> which
> is summarized as below:
> 
> 
>   Cyborg shall create the RCs and traits as per spec
> (https://review.openstack.org/#/c/554717/), both in Rocky and
> beyond. Only the RPs will change post-Rocky.
> 
>   
>   In Rocky:
>   
> Cyborg will not create nested RPs. It shall apply the device
>   traits to the compute node RP.
> Cyborg will document the resulting caveat, i.e., all devices
>   in the same host should have the same traits. In
> particular,
>   we cannot have a GPU and a FPGA, or 2 FPGAs of different
>   types, in the same host.
> Cyborg will document that upgrades to post-Rocky releases
>   will require operator intervention (as described below).
> 
> 
>   
>For upgrade to post-Rocky world with nested RPs:
>   
> The operator needs to stop all running instances that use an
>   accelerator.
> The operator needs to run a script that removes the Cyborg
>   traits and the inventory for Cyborg RCs from compute node
> RPs.
> The operator can then perform the upgrade. The new Cyborg
>   agent/driver(s) shall created nested RPs and publish
>   inventory/traits as specified.
>   
> 
> IMHO, it is acceptable for Cyborg to do this because it is new
>   and we can set expectations for the (lack of) upgrade plan. The
>   alternative is that potentially no meaningful use cases get
>   addressed in Rocky for Cyborg. 
> 
> 
> 
> Please LMK what you think.

I thought nested resource providers were already supported by
placement? To the best of my knowledge, what is not supported is virt
drivers using these to report NUMA topologies but I doubt that affects
you. The placement guys will need to weigh in on this as I could be
missing something but it sounds like you can start using this
functionality right now.

Stephen

> 
> 
> Regards,
> 
> Sundar
> 
>   
> 
> _
> _OpenStack Development Mailing List (not for usage
> questions)Unsubscribe: OpenStack-dev-
> requ...@lists.openstack.org?subject:unsubscribehttp://lists.openstack
> .org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Cyborg] [Nova] Backup plan without nested RPs

2018-06-04 Thread Eric Fried
Sundar-

We've been discussing the upgrade path on another thread [1] and are
working toward a solution [2][3] that would not require downtime or
special scripts (other than whatever's normally required for an upgrade).

We still hope to have all of that ready for Rocky, but if you're
concerned about timing, this work should make it a viable option for you
to start out modeling everything in the compute RP as you say, and then
move it over later.

Thanks,
Eric

[1] http://lists.openstack.org/pipermail/openstack-dev/2018-May/130783.html
[2] http://lists.openstack.org/pipermail/openstack-dev/2018-June/131045.html
[3] https://etherpad.openstack.org/p/placement-migrate-operations

On 06/04/2018 12:49 PM, Nadathur, Sundar wrote:
> Hi,
>  Cyborg needs to create RCs and traits for accelerators. The
> original plan was to do that with nested RPs. To avoid rushing the Nova
> developers, I had proposed that Cyborg could start by applying the
> traits to the compute node RP, and accept the resulting caveats for
> Rocky, till we get nested RP support. That proposal did not find many
> takers, and Cyborg has essentially been in waiting mode.
> 
> Since it is June already, and there is a risk of not delivering anything
> meaningful in Rocky, I am reviving my older proposal, which is
> summarized as below:
> 
>   * Cyborg shall create the RCs and traits as per spec
> (https://review.openstack.org/#/c/554717/), both in Rocky and
> beyond. Only the RPs will change post-Rocky.
>   * In Rocky:
>   o Cyborg will not create nested RPs. It shall apply the device
> traits to the compute node RP.
>   o Cyborg will document the resulting caveat, i.e., all devices in
> the same host should have the same traits. In particular, we
> cannot have a GPU and a FPGA, or 2 FPGAs of different types, in
> the same host.
>   o Cyborg will document that upgrades to post-Rocky releases will
> require operator intervention (as described below).
>   *  For upgrade to post-Rocky world with nested RPs:
>   o The operator needs to stop all running instances that use an
> accelerator.
>   o The operator needs to run a script that removes the Cyborg
> traits and the inventory for Cyborg RCs from compute node RPs.
>   o The operator can then perform the upgrade. The new Cyborg
> agent/driver(s) shall created nested RPs and publish
> inventory/traits as specified.
> 
> IMHO, it is acceptable for Cyborg to do this because it is new and we
> can set expectations for the (lack of) upgrade plan. The alternative is
> that potentially no meaningful use cases get addressed in Rocky for Cyborg.
> 
> Please LMK what you think.
> 
> Regards,
> Sundar
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Cyborg] [Nova] Backup plan without nested RPs

2018-06-04 Thread Nadathur, Sundar

Hi,
 Cyborg needs to create RCs and traits for accelerators. The 
original plan was to do that with nested RPs. To avoid rushing the Nova 
developers, I had proposed that Cyborg could start by applying the 
traits to the compute node RP, and accept the resulting caveats for 
Rocky, till we get nested RP support. That proposal did not find many 
takers, and Cyborg has essentially been in waiting mode.


Since it is June already, and there is a risk of not delivering anything 
meaningful in Rocky, I am reviving my older proposal, which is 
summarized as below:


 * Cyborg shall create the RCs and traits as per spec
   (https://review.openstack.org/#/c/554717/), both in Rocky and
   beyond. Only the RPs will change post-Rocky.
 * In Rocky:
 o Cyborg will not create nested RPs. It shall apply the device
   traits to the compute node RP.
 o Cyborg will document the resulting caveat, i.e., all devices in
   the same host should have the same traits. In particular, we
   cannot have a GPU and a FPGA, or 2 FPGAs of different types, in
   the same host.
 o Cyborg will document that upgrades to post-Rocky releases will
   require operator intervention (as described below).
 *   For upgrade to post-Rocky world with nested RPs:
 o The operator needs to stop all running instances that use an
   accelerator.
 o The operator needs to run a script that removes the Cyborg
   traits and the inventory for Cyborg RCs from compute node RPs.
 o The operator can then perform the upgrade. The new Cyborg
   agent/driver(s) shall created nested RPs and publish
   inventory/traits as specified.

IMHO, it is acceptable for Cyborg to do this because it is new and we 
can set expectations for the (lack of) upgrade plan. The alternative is 
that potentially no meaningful use cases get addressed in Rocky for Cyborg.


Please LMK what you think.

Regards,
Sundar
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev