Re: [openstack-dev] [NOVA] nova GPU suppot and find GPU type

2018-10-31 Thread Sylvain Bauza
On Tue, Oct 30, 2018 at 12:21 AM Manuel Sopena Ballesteros <
manuel...@garvan.org.au> wrote:

> Dear Nova community,
>
>
>
> This is the first time I work with GPUs.
>
>
>
> I have a Dell C4140 with x4 Nvidia Tesla V100 SXM2 16GB I would like to
> setup on Openstack Rocky.
>
>
>
> I checked the documentation and I have 2 questions I would like to ask:
>
>
>
> 1.   Docs (1) says *As of the Queens release, there is no upstream
> continuous integration testing with a hardware environment that has virtual
> GPUs and therefore this feature is considered experimental*. Does it
> means nova will stop supporting GPUs? Is GPU support being transferred to a
> different project?
>

 No. We told about "experimental" because given we didn't have a CI
verifying the features, we were not sure operators were not having a lot of
bugs. After 2 cycles, we don't have a lot of bugs and some operators use
it, so we could remove the "experimental" situation.

2.   I installed
> cuda-repo-rhel7-10-0-local-10.0.130-410.48-1.0-1.x86_64 on the physical
> host but I can’t find the type of GPUs installed (2) (/sys/class/mdev_bus
> doesn’t exists). What should I do then? What should I put in
> devices.enabled_vgpu_types
> 
> ?
>
>
Make sure you use the correct grid server driver from nvidia and then
follow those steps :
https://docs.nvidia.com/grid/6.0/grid-vgpu-user-guide/index.html#install-vgpu-package-generic-linux-kvm

Once the driver will be installed (make sure tho you remove first the
nouveau driver as said above) and the system rebooted, you should see the
above sysfs directory.
HTH,
-Sylvain


>
> (1) - https://docs.openstack.org/nova/rocky/admin/virtual-gpu.html
>
> (2)-
> https://docs.openstack.org/nova/rocky/admin/virtual-gpu.html#how-to-discover-a-gpu-type
>
>
>
> Thank you very much
>
>
>
> *Manuel Sopena Ballesteros *| Big data Engineer
> *Garvan Institute of Medical Research *
> The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010
> *T:* + 61 (0)2 9355 5760 | *F:* +61 (0)2 9295 8507 | *E:*
> manuel...@garvan.org.au
>
>
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] shall we do a spec review day next tuesday oct 23?

2018-10-15 Thread Sylvain Bauza
Le lun. 15 oct. 2018 à 19:07, melanie witt  a écrit :

> Hey all,
>
> Milestone s-1 is coming up next week on Thursday Oct 25 [1] and I was
> thinking it would be a good idea to have a spec review day next week on
> Tuesday Oct 23 to spend some focus on spec reviews together.
>
> Spec freeze is s-2 Jan 10, so the review day isn't related to any
> deadlines, but would just be a way to organize and make sure we have
> initial review on the specs that have been proposed so far.
>
> How does Tuesday Oct 23 work for everyone? Let me know if another day
> works better.
>
> So far, efried and mriedem are on board when I asked in the
> #openstack-nova channel. I'm sending this mail to gather more responses
> asynchronously.
>

I'll only be available on the European morning but I can still surely help
around this date. A spec review day is always a good idea :-)


> Cheers,
> -melanie
>
> [1] https://wiki.openstack.org/wiki/Nova/Stein_Release_Schedule
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-10 Thread Sylvain Bauza
Le mer. 10 oct. 2018 à 12:32, Balázs Gibizer 
a écrit :

> Hi,
>
> Thanks for all the feedback. I feel the following consensus is forming:
>
> 1) remove the force flag in a new microversion. I've proposed a spec
> about that API change [1]
>
> Thanks, will look at it.


> 2) in the old microversions change the blind allocation copy to gather
> every resource from a nested source RPs too and try to allocate that
> from the destination root RP. In nested allocation cases putting this
> allocation to placement will fail and nova will fail the migration /
> evacuation. However it will succeed if the server does not need nested
> allocation neither on the source nor on the destination host (a.k.a the
> legacy case). Or if the server has nested allocation on the source host
> but does not need nested allocation on the destination host (for
> example the dest host does not have nested RP tree yet).
>
>
Cool with me.


> I will start implementing #2) as part of the
> use-nested-allocation-candidate bp soon and will continue with #1)
> later in the cycle.
>
> Nothing is set in stone yet so feedback is still very appreciated.
>
> Cheers,
> gibi
>
> [1] https://review.openstack.org/#/c/609330/
>
> On Tue, Oct 9, 2018 at 11:40 AM, Balázs Gibizer
>  wrote:
> > Hi,
> >
> > Setup
> > -
> >
> > nested allocation: an allocation that contains resources from one or
> > more nested RPs. (if you have better term for this then please
> > suggest).
> >
> > If an instance has nested allocation it means that the compute, it
> > allocates from, has a nested RP tree. BUT if a compute has a nested
> > RP tree it does not automatically means that the instance, allocating
> > from that compute, has a nested allocation (e.g. bandwidth inventory
> > will be on a nested RPs but not every instance will require bandwidth)
> >
> > Afaiu, as soon as we have NUMA modelling in place the most trivial
> > servers will have nested allocations as CPU and MEMORY inverntory
> > will be moved to the nested NUMA RPs. But NUMA is still in the future.
> >
> > Sidenote: there is an edge case reported by bauzas when an instance
> > allocates _only_ from nested RPs. This was discussed on last Friday
> > and it resulted in a new patch[0] but I would like to keep that
> > discussion separate from this if possible.
> >
> > Sidenote: the current problem somewhat related to not just nested PRs
> > but to sharing RPs as well. However I'm not aiming to implement
> > sharing support in Nova right now so I also try to keep the sharing
> > disscussion separated if possible.
> >
> > There was already some discussion on the Monday's scheduler meeting
> > but I could not attend.
> >
> http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20
> >
> >
> > The meat
> > 
> >
> > Both live-migrate[1] and evacuate[2] has an optional force flag on
> > the nova REST API. The documentation says: "Force  by not
> > verifying the provided destination host by the scheduler."
> >
> > Nova implements this statement by not calling the scheduler if
> > force=True BUT still try to manage allocations in placement.
> >
> > To have allocation on the destination host Nova blindly copies the
> > instance allocation from the source host to the destination host
> > during these operations. Nova can do that as 1) the whole allocation
> > is against a single RP (the compute RP) and 2) Nova knows both the
> > source compute RP and the destination compute RP.
> >
> > However as soon as we bring nested allocations into the picture that
> > blind copy will not be feasible. Possible cases
> > 0) The instance has non-nested allocation on the source and would
> > need non nested allocation on the destination. This works with blindy
> > copy today.
> > 1) The instance has a nested allocation on the source and would need
> > a nested allocation on the destination as well.
> > 2) The instance has a non-nested allocation on the source and would
> > need a nested allocation on the destination.
> > 3) The instance has a nested allocation on the source and would need
> > a non nested allocation on the destination.
> >
> > Nova cannot generate nested allocations easily without reimplementing
> > some of the placement allocation candidate (a_c) code. However I
> > don't like the idea of duplicating some of the a_c code in Nova.
> >
> > Nova cannot detect what kind of allocation (nested or non-nested) an
> > instance would need on the destination without calling placement a_c.
> > So knowing when to call placement is a chicken and egg problem.
> >
> > Possible solutions:
> > A) fail fast
> > 
> > 0) Nova can detect that the source allocatioin is non-nested and try
> > the blindy copy and it will succeed.
> > 1) Nova can detect that the source allocaton is nested and fail the
> > operation
> > 2) Nova only sees a non nested source allocation. Even if the dest RP
> > tree is nested it does not mean that the allocation will be nested.
> > We 

Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-09 Thread Sylvain Bauza
> Shit, I forgot to add openstack-operators@...
> Operators, see my question for you here :
>
>
>> Le mar. 9 oct. 2018 à 16:39, Eric Fried  a écrit :
>>
>>> IIUC, the primary thing the force flag was intended to do - allow an
>>> instance to land on the requested destination even if that means
>>> oversubscription of the host's resources - doesn't happen anymore since
>>> we started making the destination claim in placement.
>>>
>>> IOW, since pike, you don't actually see a difference in behavior by
>>> using the force flag or not. (If you do, it's more likely a bug than
>>> what you were expecting.)
>>>
>>> So there's no reason to keep it around. We can remove it in a new
>>> microversion (or not); but even in the current microversion we need not
>>> continue making convoluted attempts to observe it.
>>>
>>> What that means is that we should simplify everything down to ignore the
>>> force flag and always call GET /a_c. Problem solved - for nested and/or
>>> sharing, NUMA or not, root resources or no, on the source and/or
>>> destination.
>>>
>>>
>>
>> While I tend to agree with Eric here (and I commented on the review
>> accordingly by saying we should signal the new behaviour by a
>> microversion), I still think we need to properly advertise this, adding
>> openstack-operators@ accordingly.
>> Disclaimer : since we have gaps on OSC, the current OSC behaviour when
>> you "openstack server live-migrate " is to *force* the destination
>> by not calling the scheduler. Yeah, it sucks.
>>
>> Operators, what are the exact cases (for those running clouds newer than
>> Mitaka, ie. Newton and above) when you make use of the --force option for
>> live migration with a microversion newer or equal 2.29 ?
>> In general, even in the case of an emergency, you still want to make sure
>> you don't throw your compute under the bus by massively migrating instances
>> that would create an undetected snowball effect by having this compute
>> refusing new instances. Or are you disabling the target compute service
>> first and throw your pet instances up there ?
>>
>> -Sylvain
>>
>>
>>
>> -efried
>>>
>>> On 10/09/2018 04:40 AM, Balázs Gibizer wrote:
>>> > Hi,
>>> >
>>> > Setup
>>> > -
>>> >
>>> > nested allocation: an allocation that contains resources from one or
>>> > more nested RPs. (if you have better term for this then please
>>> suggest).
>>> >
>>> > If an instance has nested allocation it means that the compute, it
>>> > allocates from, has a nested RP tree. BUT if a compute has a nested RP
>>> > tree it does not automatically means that the instance, allocating
>>> from
>>> > that compute, has a nested allocation (e.g. bandwidth inventory will
>>> be
>>> > on a nested RPs but not every instance will require bandwidth)
>>> >
>>> > Afaiu, as soon as we have NUMA modelling in place the most trivial
>>> > servers will have nested allocations as CPU and MEMORY inverntory will
>>> > be moved to the nested NUMA RPs. But NUMA is still in the future.
>>> >
>>> > Sidenote: there is an edge case reported by bauzas when an instance
>>> > allocates _only_ from nested RPs. This was discussed on last Friday
>>> and
>>> > it resulted in a new patch[0] but I would like to keep that discussion
>>> > separate from this if possible.
>>> >
>>> > Sidenote: the current problem somewhat related to not just nested PRs
>>> > but to sharing RPs as well. However I'm not aiming to implement
>>> sharing
>>> > support in Nova right now so I also try to keep the sharing
>>> disscussion
>>> > separated if possible.
>>> >
>>> > There was already some discussion on the Monday's scheduler meeting
>>> but
>>> > I could not attend.
>>> >
>>> http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20
>>> >
>>> >
>>> > The meat
>>> > 
>>> >
>>> > Both live-migrate[1] and evacuate[2] has an optional force flag on the
>>> > nova REST API. The documentation says: "Force  by not
>>> > verifying the provided destination host by the scheduler."
>>> >
>>> > Nova implements this statement by not calling the scheduler if
>>> > force=True BUT still try to manage allocations in placement.
>>> >
>>> > To have allocation on the destination host Nova blindly copies the
>>> > instance allocation from the source host to the destination host
>>> during
>>> > these operations. Nova can do that as 1) the whole allocation is
>>> > against a single RP (the compute RP) and 2) Nova knows both the source
>>> > compute RP and the destination compute RP.
>>> >
>>> > However as soon as we bring nested allocations into the picture that
>>> > blind copy will not be feasible. Possible cases
>>> > 0) The instance has non-nested allocation on the source and would need
>>> > non nested allocation on the destination. This works with blindy copy
>>> > today.
>>> > 1) The instance has a nested allocation on the source and would need a
>>> > nested allocation on the destination as well.
>>> > 2) The instance has a non-nested 

Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-09 Thread Sylvain Bauza
Shit, I forgot to add openstack-operators@...
Operators, see my question for you here :


> Le mar. 9 oct. 2018 à 16:39, Eric Fried  a écrit :
>
>> IIUC, the primary thing the force flag was intended to do - allow an
>> instance to land on the requested destination even if that means
>> oversubscription of the host's resources - doesn't happen anymore since
>> we started making the destination claim in placement.
>>
>> IOW, since pike, you don't actually see a difference in behavior by
>> using the force flag or not. (If you do, it's more likely a bug than
>> what you were expecting.)
>>
>> So there's no reason to keep it around. We can remove it in a new
>> microversion (or not); but even in the current microversion we need not
>> continue making convoluted attempts to observe it.
>>
>> What that means is that we should simplify everything down to ignore the
>> force flag and always call GET /a_c. Problem solved - for nested and/or
>> sharing, NUMA or not, root resources or no, on the source and/or
>> destination.
>>
>>
>
> While I tend to agree with Eric here (and I commented on the review
> accordingly by saying we should signal the new behaviour by a
> microversion), I still think we need to properly advertise this, adding
> openstack-operators@ accordingly.
> Disclaimer : since we have gaps on OSC, the current OSC behaviour when you
> "openstack server live-migrate " is to *force* the destination by
> not calling the scheduler. Yeah, it sucks.
>
> Operators, what are the exact cases (for those running clouds newer than
> Mitaka, ie. Newton and above) when you make use of the --force option for
> live migration with a microversion newer or equal 2.29 ?
> In general, even in the case of an emergency, you still want to make sure
> you don't throw your compute under the bus by massively migrating instances
> that would create an undetected snowball effect by having this compute
> refusing new instances. Or are you disabling the target compute service
> first and throw your pet instances up there ?
>
> -Sylvain
>
>
>
> -efried
>>
>> On 10/09/2018 04:40 AM, Balázs Gibizer wrote:
>> > Hi,
>> >
>> > Setup
>> > -
>> >
>> > nested allocation: an allocation that contains resources from one or
>> > more nested RPs. (if you have better term for this then please suggest).
>> >
>> > If an instance has nested allocation it means that the compute, it
>> > allocates from, has a nested RP tree. BUT if a compute has a nested RP
>> > tree it does not automatically means that the instance, allocating from
>> > that compute, has a nested allocation (e.g. bandwidth inventory will be
>> > on a nested RPs but not every instance will require bandwidth)
>> >
>> > Afaiu, as soon as we have NUMA modelling in place the most trivial
>> > servers will have nested allocations as CPU and MEMORY inverntory will
>> > be moved to the nested NUMA RPs. But NUMA is still in the future.
>> >
>> > Sidenote: there is an edge case reported by bauzas when an instance
>> > allocates _only_ from nested RPs. This was discussed on last Friday and
>> > it resulted in a new patch[0] but I would like to keep that discussion
>> > separate from this if possible.
>> >
>> > Sidenote: the current problem somewhat related to not just nested PRs
>> > but to sharing RPs as well. However I'm not aiming to implement sharing
>> > support in Nova right now so I also try to keep the sharing disscussion
>> > separated if possible.
>> >
>> > There was already some discussion on the Monday's scheduler meeting but
>> > I could not attend.
>> >
>> http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20
>> >
>> >
>> > The meat
>> > 
>> >
>> > Both live-migrate[1] and evacuate[2] has an optional force flag on the
>> > nova REST API. The documentation says: "Force  by not
>> > verifying the provided destination host by the scheduler."
>> >
>> > Nova implements this statement by not calling the scheduler if
>> > force=True BUT still try to manage allocations in placement.
>> >
>> > To have allocation on the destination host Nova blindly copies the
>> > instance allocation from the source host to the destination host during
>> > these operations. Nova can do that as 1) the whole allocation is
>> > against a single RP (the compute RP) and 2) Nova knows both the source
>> > compute RP and the destination compute RP.
>> >
>> > However as soon as we bring nested allocations into the picture that
>> > blind copy will not be feasible. Possible cases
>> > 0) The instance has non-nested allocation on the source and would need
>> > non nested allocation on the destination. This works with blindy copy
>> > today.
>> > 1) The instance has a nested allocation on the source and would need a
>> > nested allocation on the destination as well.
>> > 2) The instance has a non-nested allocation on the source and would
>> > need a nested allocation on the destination.
>> > 3) The instance has a nested allocation on the source and 

Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-09 Thread Sylvain Bauza
Le mar. 9 oct. 2018 à 17:09, Balázs Gibizer  a
écrit :

>
>
> On Tue, Oct 9, 2018 at 4:56 PM, Sylvain Bauza 
> wrote:
> >
> >
> > Le mar. 9 oct. 2018 à 16:39, Eric Fried  a
> > écrit :
> >> IIUC, the primary thing the force flag was intended to do - allow an
> >> instance to land on the requested destination even if that means
> >> oversubscription of the host's resources - doesn't happen anymore
> >> since
> >> we started making the destination claim in placement.
> >>
> >> IOW, since pike, you don't actually see a difference in behavior by
> >> using the force flag or not. (If you do, it's more likely a bug than
> >> what you were expecting.)
> >>
> >> So there's no reason to keep it around. We can remove it in a new
> >> microversion (or not); but even in the current microversion we need
> >> not
> >> continue making convoluted attempts to observe it.
> >>
> >> What that means is that we should simplify everything down to ignore
> >> the
> >> force flag and always call GET /a_c. Problem solved - for nested
> >> and/or
> >> sharing, NUMA or not, root resources or no, on the source and/or
> >> destination.
> >>
> >
> >
> > While I tend to agree with Eric here (and I commented on the review
> > accordingly by saying we should signal the new behaviour by a
> > microversion), I still think we need to properly advertise this,
> > adding openstack-operators@ accordingly.
>
> Question for you as well: if we remove (or change) the force flag in a
> new microversion then how should the old microversions behave when
> nested allocations would be required?
>
>
In that case (ie. old microversions with either "force=None and target" or
'force=True', we should IMHO not allocate any migration.
Thoughts ?


> Cheers,
> gibi
>
> > Disclaimer : since we have gaps on OSC, the current OSC behaviour
> > when you "openstack server live-migrate " is to *force* the
> > destination by not calling the scheduler. Yeah, it sucks.
> >
> > Operators, what are the exact cases (for those running clouds newer
> > than Mitaka, ie. Newton and above) when you make use of the --force
> > option for live migration with a microversion newer or equal 2.29 ?
> > In general, even in the case of an emergency, you still want to make
> > sure you don't throw your compute under the bus by massively
> > migrating instances that would create an undetected snowball effect
> > by having this compute refusing new instances. Or are you disabling
> > the target compute service first and throw your pet instances up
> > there ?
> >
> > -Sylvain
> >
> >
> >
> >> -efried
> >>
> >> On 10/09/2018 04:40 AM, Balázs Gibizer wrote:
> >> > Hi,
> >> >
> >> > Setup
> >> > -
> >> >
> >> > nested allocation: an allocation that contains resources from one
> >> or
> >> > more nested RPs. (if you have better term for this then please
> >> suggest).
> >> >
> >> > If an instance has nested allocation it means that the compute, it
> >> > allocates from, has a nested RP tree. BUT if a compute has a
> >> nested RP
> >> > tree it does not automatically means that the instance, allocating
> >> from
> >> > that compute, has a nested allocation (e.g. bandwidth inventory
> >> will be
> >> > on a nested RPs but not every instance will require bandwidth)
> >> >
> >> > Afaiu, as soon as we have NUMA modelling in place the most trivial
> >> > servers will have nested allocations as CPU and MEMORY inverntory
> >> will
> >> > be moved to the nested NUMA RPs. But NUMA is still in the future.
> >> >
> >> > Sidenote: there is an edge case reported by bauzas when an instance
> >> > allocates _only_ from nested RPs. This was discussed on last
> >> Friday and
> >> > it resulted in a new patch[0] but I would like to keep that
> >> discussion
> >> > separate from this if possible.
> >> >
> >> > Sidenote: the current problem somewhat related to not just nested
> >> PRs
> >> > but to sharing RPs as well. However I'm not aiming to implement
> >> sharing
> >> > support in Nova right now so I also try to keep the sharing
> >> disscussion
> >> > separated if possible.
> >> >
> >> > There was already some discussi

Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-09 Thread Sylvain Bauza
Le mar. 9 oct. 2018 à 16:39, Eric Fried  a écrit :

> IIUC, the primary thing the force flag was intended to do - allow an
> instance to land on the requested destination even if that means
> oversubscription of the host's resources - doesn't happen anymore since
> we started making the destination claim in placement.
>
> IOW, since pike, you don't actually see a difference in behavior by
> using the force flag or not. (If you do, it's more likely a bug than
> what you were expecting.)
>
> So there's no reason to keep it around. We can remove it in a new
> microversion (or not); but even in the current microversion we need not
> continue making convoluted attempts to observe it.
>
> What that means is that we should simplify everything down to ignore the
> force flag and always call GET /a_c. Problem solved - for nested and/or
> sharing, NUMA or not, root resources or no, on the source and/or
> destination.
>
>

While I tend to agree with Eric here (and I commented on the review
accordingly by saying we should signal the new behaviour by a
microversion), I still think we need to properly advertise this, adding
openstack-operators@ accordingly.
Disclaimer : since we have gaps on OSC, the current OSC behaviour when you
"openstack server live-migrate " is to *force* the destination by
not calling the scheduler. Yeah, it sucks.

Operators, what are the exact cases (for those running clouds newer than
Mitaka, ie. Newton and above) when you make use of the --force option for
live migration with a microversion newer or equal 2.29 ?
In general, even in the case of an emergency, you still want to make sure
you don't throw your compute under the bus by massively migrating instances
that would create an undetected snowball effect by having this compute
refusing new instances. Or are you disabling the target compute service
first and throw your pet instances up there ?

-Sylvain



-efried
>
> On 10/09/2018 04:40 AM, Balázs Gibizer wrote:
> > Hi,
> >
> > Setup
> > -
> >
> > nested allocation: an allocation that contains resources from one or
> > more nested RPs. (if you have better term for this then please suggest).
> >
> > If an instance has nested allocation it means that the compute, it
> > allocates from, has a nested RP tree. BUT if a compute has a nested RP
> > tree it does not automatically means that the instance, allocating from
> > that compute, has a nested allocation (e.g. bandwidth inventory will be
> > on a nested RPs but not every instance will require bandwidth)
> >
> > Afaiu, as soon as we have NUMA modelling in place the most trivial
> > servers will have nested allocations as CPU and MEMORY inverntory will
> > be moved to the nested NUMA RPs. But NUMA is still in the future.
> >
> > Sidenote: there is an edge case reported by bauzas when an instance
> > allocates _only_ from nested RPs. This was discussed on last Friday and
> > it resulted in a new patch[0] but I would like to keep that discussion
> > separate from this if possible.
> >
> > Sidenote: the current problem somewhat related to not just nested PRs
> > but to sharing RPs as well. However I'm not aiming to implement sharing
> > support in Nova right now so I also try to keep the sharing disscussion
> > separated if possible.
> >
> > There was already some discussion on the Monday's scheduler meeting but
> > I could not attend.
> >
> http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20
> >
> >
> > The meat
> > 
> >
> > Both live-migrate[1] and evacuate[2] has an optional force flag on the
> > nova REST API. The documentation says: "Force  by not
> > verifying the provided destination host by the scheduler."
> >
> > Nova implements this statement by not calling the scheduler if
> > force=True BUT still try to manage allocations in placement.
> >
> > To have allocation on the destination host Nova blindly copies the
> > instance allocation from the source host to the destination host during
> > these operations. Nova can do that as 1) the whole allocation is
> > against a single RP (the compute RP) and 2) Nova knows both the source
> > compute RP and the destination compute RP.
> >
> > However as soon as we bring nested allocations into the picture that
> > blind copy will not be feasible. Possible cases
> > 0) The instance has non-nested allocation on the source and would need
> > non nested allocation on the destination. This works with blindy copy
> > today.
> > 1) The instance has a nested allocation on the source and would need a
> > nested allocation on the destination as well.
> > 2) The instance has a non-nested allocation on the source and would
> > need a nested allocation on the destination.
> > 3) The instance has a nested allocation on the source and would need a
> > non nested allocation on the destination.
> >
> > Nova cannot generate nested allocations easily without reimplementing
> > some of the placement allocation candidate (a_c) code. However I don't
> > 

Re: [openstack-dev] [Openstack-operators] [ironic] [nova] [tripleo] Deprecation of Nova's integration with Ironic Capabilities and ComputeCapabilitiesFilter

2018-09-28 Thread Sylvain Bauza
On Fri, Sep 28, 2018 at 12:50 AM melanie witt  wrote:

> On Thu, 27 Sep 2018 17:23:26 -0500, Matt Riedemann wrote:
> > On 9/27/2018 3:02 PM, Jay Pipes wrote:
> >> A great example of this would be the proposed "deploy template" from
> >> [2]. This is nothing more than abusing the placement traits API in order
> >> to allow passthrough of instance configuration data from the nova flavor
> >> extra spec directly into the nodes.instance_info field in the Ironic
> >> database. It's a hack that is abusing the entire concept of the
> >> placement traits concept, IMHO.
> >>
> >> We should have a way *in Nova* of allowing instance configuration
> >> key/value information to be passed through to the virt driver's spawn()
> >> method, much the same way we provide for user_data that gets exposed
> >> after boot to the guest instance via configdrive or the metadata service
> >> API. What this deploy template thing is is just a hack to get around the
> >> fact that nova doesn't have a basic way of passing through some collated
> >> instance configuration key/value information, which is a darn shame and
> >> I'm really kind of annoyed with myself for not noticing this sooner. :(
> >
> > We talked about this in Dublin through right? We said a good thing to do
> > would be to have some kind of template/profile/config/whatever stored
> > off in glare where schema could be registered on that thing, and then
> > you pass a handle (ID reference) to that to nova when creating the
> > (baremetal) server, nova pulls it down from glare and hands it off to
> > the virt driver. It's just that no one is doing that work.
>
> If I understood correctly, that discussion was around adding a way to
> pass a desired hardware configuration to nova when booting an ironic
> instance. And that it's something that isn't yet possible to do using
> the existing ComputeCapabilitiesFilter. Someone please correct me if I'm
> wrong there.
>
> That said, I still don't understand why we are talking about deprecating
> the ComputeCapabilitiesFilter if there's no supported way to replace it
> yet. If boolean traits are not enough to replace it, then we need to
> hold off on deprecating it, right? Would the
> template/profile/config/whatever in glare approach replace what the
> ComputeCapabilitiesFilter is doing or no? Sorry, I'm just not clearly
> understanding this yet.
>
>
I just feel some new traits have to be defined, like Jay said, and some
work has to be done on the Ironic side to make sure they expose them as
traits and not by the old way.
That leaves tho a question : does Ironic support custom capabilities ? If
so, that leads to Jay's point about the key/pair information that's not
intented for traits. If we all agree on the fact that traits shouldn't be
allowed for key/value pairs, could we somehow imagine Ironic to change the
customization mechanism to be boolean only ?

Also, I'm a bit confused whether operators make use of Ironic capabilities
for fancy operational queries, like the ones we have in
https://github.com/openstack/nova/blob/3716752/nova/scheduler/filters/extra_specs_ops.py#L24-L35
and if Ironic correctly documents how to put such things into traits ? (eg.
say CUSTOM_I_HAVE_MORE_THAN_2_GPUS)

All of the above makes me a bit worried by a possible
ComputeCapabilitiesFilter deprecation, if we aren't yet able to provide a
clear upgrade path for our users.

-Sylvain

-melanie
>
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Stein PTG summary

2018-09-27 Thread Sylvain Bauza
On Thu, Sep 27, 2018 at 2:46 AM Matt Riedemann  wrote:

> On 9/26/2018 5:30 PM, Sylvain Bauza wrote:
> > So, during this day, we also discussed about NUMA affinity and we said
> > that we could possibly use nested resource providers for NUMA cells in
> > Stein, but given we don't have yet a specific Placement API query, NUMA
> > affinity should still be using the NUMATopologyFilter.
> > That said, when looking about how to use this filter for vGPUs, it looks
> > to me that I'd need to provide a new version for the NUMACell object and
> > modify the virt.hardware module. Are we also accepting this (given it's
> > a temporary question), or should we need to wait for the Placement API
> > support ?
> >
> > Folks, what are you thoughts ?
>
> I'm pretty sure we've said several times already that modeling NUMA in
> Placement is not something for which we're holding up the extraction.
>
>
It's not an extraction question. Just about knowing whether the Nova folks
would accept us to modify some o.vo object and module just for a temporary
time until Placement API has some new query parameter.
Whether Placement is extracted or not isn't really the problem, it's more
about the time it will take for this query parameter ("numbered request
groups to be in the same subtree") to be implemented in the Placement API.
The real problem we have with vGPUs is that if we don't have NUMA affinity,
the performance would be around 10% less for vGPUs (if the pGPU isn't on
the same NUMA cell than the pCPU). Not sure large operators would accept
that :(

-Sylvain

-- 
>
> Thanks,
>
> Matt
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Stein PTG summary

2018-09-26 Thread Sylvain Bauza
Thanks for the recap email, Mel. Just a question inline for all the people
that were in the room by Wednesday.

Le jeu. 27 sept. 2018 à 00:10, melanie witt  a écrit :

> Hello everybody,
>
> I've written up a high level summary of the discussions we had at the
> PTG -- please feel free to reply to this thread to fill in anything I've
> missed.
>
> We used our PTG etherpad:
>
> https://etherpad.openstack.org/p/nova-ptg-stein
>
> as an agenda and each topic we discussed was filled in with agreements,
> todos, and action items during the discussion. Please check out the
> etherpad to find notes relevant to your topics of interest, and reach
> out to us on IRC in #openstack-nova, on this mailing list with the
> [nova] tag, or by email to me if you have any questions.
>
> Now, onto the high level summary:
>
> Rocky retrospective
> ===
> We began Wednesday morning with a retro on the Rocky cycle and captured
> notes on this etherpad:
>
> https://etherpad.openstack.org/p/nova-rocky-retrospective
>
> The runways review process was seen as overall positive and helped get
> some blueprint implementations merged that had languished in previous
> cycles. We agreed to continue with the runways process as-is in Stein
> and use it for approved blueprints. We did note that we could do better
> at queuing important approved work into runways, such as
> placement-related efforts that were not added to runways last cycle.
>
> We discussed whether or not to move the spec freeze deadline back to
> milestone 1 (we used milestone 2 in Rocky). I have an action item to dig
> into whether or not the late breaking regressions we found at RC time:
>
> https://etherpad.openstack.org/p/nova-rocky-release-candidate-todo
>
> were related to the later spec freeze at milestone 2. The question we
> want to answer is: did a later spec freeze lead to implementations
> landing later and resulting in the late detection of regressions at
> release candidate time?
>
> Finally, we discussed a lot of things around project management,
> end-to-end themes for a cycle, and people generally not feeling they had
> clarity throughout the cycle about which efforts and blueprints were
> most important, aside from runways. We got a lot of work done in Rocky,
> but not as much of it materialized into user-facing features and
> improvements as it did in Queens. Last cycle, we had thought runways
> would capture what is a priority at any given time, but looking back, we
> determined it would be helpful if we still had over-arching
> goals/efforts/features written down for people to refer to throughout
> the cycle. We dove deeper into that discussion on Friday during the hour
> before lunch, where we came up with user-facing themes we aim to
> accomplish in the Stein cycle:
>
> https://etherpad.openstack.org/p/nova-ptg-stein-priorities
>
> Note that these are _not_ meant to preempt anything in runways, these
> are just 1) for my use as a project manager and 2) for everyone's use to
> keep a bigger picture of our goals for the cycle in their heads, to aid
> in their work and review outside of runways.
>
> Themes
> ==
> With that, I'll briefly mention the themes we came up with for the cycle:
>
> * Compute nodes capable to upgrade and exist with nested resource
> providers for multiple GPU types
>
> * Multi-cell operational enhancements: resilience to "down" or
> poor-performing cells and cross-cell instance migration
>
> * Volume-backed user experience and API hardening: ability to specify
> volume type during boot-from-volume, detach/attach of root volume, and
> volume-backed rebuild
>
> These are the user-visible features and functionality we aim to deliver
> and we'll keep tabs on these efforts throughout the cycle to keep them
> making progress.
>
> Placement
> =
> As usual, we had a lot of discussions on placement-related topics, so
> I'll try to highlight the main things that stand out to me. Please see
> the "Placement" section of our PTG etherpad for all the details and
> additional topics we discussed.
>
> We discussed the regression in behavior that happened when we removed
> the Aggregate[Core|Ram|Disk]Filters from the scheduler filters -- these
> filters allowed operators to set overcommit allocation ratios per
> aggregate instead of per host. We agreed on the importance of restoring
> this functionality and hashed out a concrete plan, with two specs needed
> to move forward:
>
> https://review.openstack.org/552105
> https://review.openstack.org/544683
>
> The other standout discussions were around the placement extraction and
> closing the gaps in nested resource providers. For the placement
> extraction, we are focusing on full support of an upgrade from
> integrated placement => extracted placement, including assisting with
> making sure deployment tools like OpenStack-Ansible and TripleO are able
> to support the upgrade. For closing the gaps in nested resource
> providers, there are many parts to it that are 

Re: [openstack-dev] Forum Topic Submission Period

2018-09-19 Thread Sylvain Bauza
Le mer. 19 sept. 2018 à 00:41, Jimmy McArthur  a
écrit :

> Hey Matt,
>
>
> Matt Riedemann wrote:
> >
> > Just a process question.
>
> Good question.
> > I submitted a presentation for the normal marketing blitz part of the
> > summit which wasn't accepted (I'm still dealing with this emotionally,
> > btw...)
>


Same as I do :-) Unrelated point, for the first time in all the Summits I
know, I wasn't able to know the track chairs for a specific track. Ideally,
I'd love to reach them in order to know what they disliked in my proposal.



> If there's anything I can do...
> > but when I look at the CFP link for Forum topics, my thing shows up
> > there as "Received" so does that mean my non-Forum-at-all submission
> > is now automatically a candidate for the Forum because that would not
> > be my intended audience (only suits and big wigs please).
> Forum Submissions would be considered separate and non-Forum submissions
> will not be considered for the Forum. The submission process is based on
> the track you submit to and, in the case of the Forum, we separate this
> track out from the rest of the submission process.
>
> If you think there is still something funky, send me a note via
> speakersupp...@openstack.org or ji...@openstack.org and I'll work
> through it with you.
>
>
I have another question, do you know why we can't propose a Forum session
with multiple speakers ? Is this a bug or an expected behaviour ? In
general, there is only one moderator for a Forum session, but in the past,
I clearly remember we had some sessions that were having multiple
moderators (for various reasons).

-Sylvain


Cheers,
> Jimmy
>
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-sigs] [Openstack-operators] [tc]Global Reachout Proposal

2018-09-18 Thread Sylvain Bauza
Le mar. 18 sept. 2018 à 16:00, Thierry Carrez  a
écrit :

> Sylvain Bauza wrote:
> >
> >
> > Le mar. 18 sept. 2018 à 14:41, Jeremy Stanley  > <mailto:fu...@yuggoth.org>> a écrit :
> >
> > On 2018-09-18 11:26:57 +0900 (+0900), Ghanshyam Mann wrote:
> > [...]
> >  > I can understand that IRC cannot be used in China which is very
> >  > painful and mostly it is used weChat.
> > [...]
> >
> > I have yet to hear anyone provide first-hand confirmation that
> > access to Freenode's IRC servers is explicitly blocked by the
> > mainland Chinese government. There has been a lot of speculation
> > that the usual draconian corporate firewall policies (surprise, the
> > rest of the World gets to struggle with those too, it's not just a
> > problem in China) are blocking a variety of messaging protocols from
> > workplace networks and the people who encounter this can't tell the
> > difference because they're already accustomed to much of their other
> > communications being blocked at the border. I too have heard from
> > someone who's heard from someone that "IRC can't be used in China"
> > but the concrete reasons why continue to be missing from these
> > discussions.
> >
> > Thanks fungi, that's the crux of the problem I'd like to see discussed
> > in the governance change.
> > In this change, it states the non-use of existing and official
> > communication tools as to be "cumbersome". See my comment on PS1, I
> > thought the original concern was technical.
> >
> > Why are we discussing about WeChat now ? Is that because a large set of
> > our contributors *can't* access IRC or because they *prefer* any other ?
> > In the past, we made clear for a couple of times why IRC is our
> > communication channel. I don't see those reasons to be invalid now, but
> > I'm still open to understand the problems about why our community
> > becomes de facto fragmented.
>
> Agreed, I'm still trying to grasp the issue we are trying to solve here.
>
> We really need to differentiate between technical blockers (firewall),
> cultural blockers (language) and network effect preferences (preferred
> platform).
>
> We should definitely try to address technical blockers, as we don't want
> to exclude anyone. We can also allow for a bit of flexibility in the
> tools used in our community, to accommodate cultural blockers as much as
> we possibly can (keeping in mind that in the end, the code has to be
> written, proposed and discussed in a single language). We can even
> encourage community members to reach out on local social networks... But
> I'm reluctant to pass an official resolution to recommend that TC
> members engage on specific platforms because "everyone is there".
>
>
I second your opinion on this. Before voting on a TC resolution, we first
need at least to understand the problem.
Like I said previously, stating 'cumbersome' in the proposed resolution
doesn't imply a technical issue hence me jumping straight on the 3rd
possibility you mentioned, which is "by convenience".

In that case, the TC should rather reinforce the message that, as a whole
community, we try to avoid silos and that contributors should be highly
encouraged to stop discussing on other channels but the official ones.
Having the First Contact SIG be the first line for helping those people to
migrate to IRC (by helping them understand how it works, how to play with,
which kind of setup is preferrable (bouncers)) seems a great idea.

-Sylvain

-- 
> Thierry Carrez (ttx)
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-sigs] [Openstack-operators] [tc]Global Reachout Proposal

2018-09-18 Thread Sylvain Bauza
Le mar. 18 sept. 2018 à 14:41, Jeremy Stanley  a écrit :

> On 2018-09-18 11:26:57 +0900 (+0900), Ghanshyam Mann wrote:
> [...]
> > I can understand that IRC cannot be used in China which is very
> > painful and mostly it is used weChat.
> [...]
>
> I have yet to hear anyone provide first-hand confirmation that
> access to Freenode's IRC servers is explicitly blocked by the
> mainland Chinese government. There has been a lot of speculation
> that the usual draconian corporate firewall policies (surprise, the
> rest of the World gets to struggle with those too, it's not just a
> problem in China) are blocking a variety of messaging protocols from
> workplace networks and the people who encounter this can't tell the
> difference because they're already accustomed to much of their other
> communications being blocked at the border. I too have heard from
> someone who's heard from someone that "IRC can't be used in China"
> but the concrete reasons why continue to be missing from these
> discussions.
>


Thanks fungi, that's the crux of the problem I'd like to see discussed in
the governance change.
In this change, it states the non-use of existing and official
communication tools as to be "cumbersome". See my comment on PS1, I thought
the original concern was technical.

Why are we discussing about WeChat now ? Is that because a large set of our
contributors *can't* access IRC or because they *prefer* any other ?
In the past, we made clear for a couple of times why IRC is our
communication channel. I don't see those reasons to be invalid now, but I'm
still open to understand the problems about why our community becomes de
facto fragmented.

-Sylvain





> --
> Jeremy Stanley
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [election][tc]Question for candidates about global reachout

2018-09-17 Thread Sylvain Bauza
Le lun. 17 sept. 2018 à 15:32, Jeremy Stanley  a écrit :

> On 2018-09-16 14:14:41 +0200 (+0200), Jean-philippe Evrard wrote:
> [...]
> > - What is the problem joining Wechat will solve (keeping in mind the
> > language barrier)?
>
> As I understand it, the suggestion is that mere presence of project
> leadership in venues where this emerging subset of our community
> gathers would provide a strong signal that we support them and care
> about their experience with the software.
>
> > - Isn't this problem already solved for other languages with
> > existing initiatives like local ambassadors and i18n team? Why
> > aren't these relevant?
> [...]
>
> It seems like there are at least couple of factors at play here:
> first the significant number of users and contributors within
> mainland China compared to other regions (analysis suggests there
> were nearly as many contributors to the Rocky release from China as
> the USA), but second there may be facets of Chinese culture which
> make this sort of demonstrative presence a much stronger signal than
> it would be in other cultures.
>
> > - Pardon my ignorance here, what is the problem with email? (I
> > understand some chat systems might be blocked, I thought emails
> > would be fine, and the lowest common denominator).
>
> Someone in the TC room (forgive me, I don't recall who now, maybe
> Rico?) asserted that Chinese contributors generally only read the
> first message in any given thread (perhaps just looking for possible
> announcements?) and that if they _do_ attempt to read through some
> of the longer threads they don't participate in them because the
> discussion is presumed to be over and decisions final by the time
> they "reach the end" (I guess not realizing that it's perfectly fine
> to reply to a month-old discussion and try to help alter course on
> things if you have an actual concern?).
>
>
While I understand the technical issues that could be due using IRC in
China, I still don't get why opening the gates and saying WeChat being yet
another official channel would prevent our community from fragmenting.

Truly the usage of IRC is certainly questionable, but if we have multiple
ways to discuss, I just doubt we could prevent us to silo ourselves between
our personal usages.
Either we consider the new channels as being only for southbound
communication, or we envisage the possibility, as a community, to migrate
from IRC to elsewhere (I'm particulary not fan of the latter so I would
challenge this but I can understand the reasons)

-Sylvain

> I also have technical questions about 'wechat' (like how do you
> > use it without a smartphone?) and the relevance of tools we
> > currently use, but this will open Pandora's box, and I'd rather
> > not spend my energy on closing that box right now :D
>
> Not that I was planning on running it myself, but I did look into
> the logistics. Apparently there is at least one free/libre open
> source wechat client under active development but you still need to
> use a separate mobile device to authenticate your client's
> connection to wechat's central communication service. By design, it
> appears this is so that you can't avoid reporting your physical
> location (it's been suggested this is to comply with government
> requirements for tracking citizens participating in potentially
> illegal discussions). They also go to lengths to prevent you from
> running the required mobile app within an emulator, since that would
> provide a possible workaround to avoid being tracked. Further, there
> is some history of backdoors getting included in the software, so
> you need to use it with the expectation that you're basically
> handing over all communications and content for which you use that
> mobile device to wechat developers/service operators and, by proxy,
> the Chinese government.
> --
> Jeremy Stanley
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] About microversion setting to enable nested resource provider

2018-09-17 Thread Sylvain Bauza
On Mon, Sep 17, 2018 at 6:43 AM, Jay Pipes  wrote:

> On 09/16/2018 09:28 PM, Naichuan Sun wrote:
>
>> Hi, Sylvain,
>>
>> In truth I’m worrying about the old root rp which include the vgpu
>> inventory. There is no field in the inventory which can display which
>> GPU/GPUG it belong to, right? Anyway,  will discuss it after you come back.
>>
>
> As Sylvain mentions below, you will need to have some mechanism in the
> XenAPI virt driver which creates child resource providers under the
> existing root provider (which is the compute node resource provider). You
> will need to have the virt driver persist the mapping between your internal
> physical GPU group name and the UUID of the resource provider record that
> the virt driver creates for that PGPU group.
>

AFAICT, we even don't need to persist the mapping. As we only support one
GPU type (or group for Xen) in Rocky, you just have to know what was the
original type for Stein and then just look at the related resource provider;
That's why i wrote an upgrade impact section in my multiple-types spec (see
below) saying that in Stein, you need to make sure you only accept one type
until the reshape is fully done.

-Sylvain


> So, for example, let's say you have two PGPU groups on the host. They are
> named PGPU_A and PGPU_B. The XenAPI virt driver will need to ask the
> ProviderTree object it receives in the update_provider_tree() virt driver
> method whether there is a resource provider named "PGPU_A" in the tree. If
> not, the virt driver needs to create a new child resource provider with the
> name "PGPU_A" with a parent provider pointing to the root compute node
> provider. The ProviderTree.new_child() method is used to create new child
> providers:
>
> https://github.com/openstack/nova/blob/82270cc261f6c1d9d2cc3
> 86f1fb445dd66023f75/nova/compute/provider_tree.py#L411
>
> Hope that makes sense,
> -jay
>
> Thank very much.
>>
>> BR.
>>
>> Naichuan Sun
>>
>> *From:*Sylvain Bauza [mailto:sba...@redhat.com]
>> *Sent:* Friday, September 14, 2018 9:34 PM
>> *To:* OpenStack Development Mailing List (not for usage questions) <
>> openstack-dev@lists.openstack.org>
>> *Subject:* Re: [openstack-dev] About microversion setting to enable
>> nested resource provider
>>
>> Le jeu. 13 sept. 2018 à 19:29, Naichuan Sun > <mailto:naichuan@citrix.com>> a écrit :
>>
>> Hi, Sylvain,
>>
>> Thank you very much for the information. It is pity that I can’t
>> attend the meeting.
>>
>> I have a concern about reshaper in multi-type vgpu support.
>>
>> In the old vgpu support, we only have one vgpu inventory in root
>> resource provider, which means we only support one vgpu type. When
>> do reshape, placement will send allocations(which include just one
>> vgpu resource allocation information) to the driver, if the host
>> have more than one pgpu/pgpug(which support different vgpu type),
>> how do we know which pgpu/pgpug own the allocation information? Do
>> we need to communicate with hypervisor the confirm that?
>>
>> The reshape will actually move the existing allocations for a VGPU
>> resource class to the inventory for this class that is on the child
>> resource provider now with the reshape.
>>
>> Since we agreed on keeping consistent naming, there is no need to guess
>> which is which. That said, you raise a point that was discussed during the
>> PTG and we all agreed there was an upgrade impact as multiple vGPUs
>> shouldn't be allowed until the reshape is done.
>>
>> Accordingly, see my spec I reproposed for Stein which describes the
>> upgrade impact https://review.openstack.org/#/c/602474/
>>
>> Since I'm at the PTG, we have huge time difference between you and me,
>> but we can discuss on that point next week when I'm back (my mornings match
>> then your afternoons)
>>
>> -Sylvain
>>
>> Thank you very much.
>>
>> BR.
>>
>> Naichuan Sun
>>
>> *From:*Sylvain Bauza [mailto:sba...@redhat.com
>> <mailto:sba...@redhat.com>]
>> *Sent:* Thursday, September 13, 2018 11:47 PM
>> *To:* OpenStack Development Mailing List (not for usage questions)
>> > <mailto:openstack-dev@lists.openstack.org>>
>> *Subject:* Re: [openstack-dev] About microversion setting to enable
>> nested resource provider
>>
>> Hey Naichuan,
>>
>> FWIW, we discussed on the missing pieces for nested resource
>> providers. See the (currently-in-use

Re: [openstack-dev] About microversion setting to enable nested resource provider

2018-09-14 Thread Sylvain Bauza
Le jeu. 13 sept. 2018 à 19:29, Naichuan Sun  a
écrit :

> Hi, Sylvain,
>
>
>
> Thank you very much for the information. It is pity that I can’t attend
> the meeting.
>
> I have a concern about reshaper in multi-type vgpu support.
>
> In the old vgpu support, we only have one vgpu inventory in root resource
> provider, which means we only support one vgpu type. When do reshape,
> placement will send allocations(which include just one vgpu resource
> allocation information) to the driver, if the host have more than one
> pgpu/pgpug(which support different vgpu type), how do we know which
> pgpu/pgpug own the allocation information? Do we need to communicate with
> hypervisor the confirm that?
>

The reshape will actually move the existing allocations for a VGPU resource
class to the inventory for this class that is on the child resource
provider now with the reshape.

Since we agreed on keeping consistent naming, there is no need to guess
which is which. That said, you raise a point that was discussed during the
PTG and we all agreed there was an upgrade impact as multiple vGPUs
shouldn't be allowed until the reshape is done.

Accordingly, see my spec I reproposed for Stein which describes the upgrade
impact https://review.openstack.org/#/c/602474/

Since I'm at the PTG, we have huge time difference between you and me, but
we can discuss on that point next week when I'm back (my mornings match
then your afternoons)

-Sylvain

>
>
> Thank you very much.
>
>
>
> BR.
>
> Naichuan Sun
>
>
>
> *From:* Sylvain Bauza [mailto:sba...@redhat.com]
> *Sent:* Thursday, September 13, 2018 11:47 PM
> *To:* OpenStack Development Mailing List (not for usage questions) <
> openstack-dev@lists.openstack.org>
> *Subject:* Re: [openstack-dev] About microversion setting to enable
> nested resource provider
>
>
>
> Hey Naichuan,
>
> FWIW, we discussed on the missing pieces for nested resource providers.
> See the (currently-in-use) etherpad
> https://etherpad.openstack.org/p/nova-ptg-stein and lookup for "closing
> the gap on nested resource providers" (L144 while I speak)
>
>
>
> The fact that we are not able to schedule yet is a critical piece that we
> said we're going to work on it as soon as we can.
>
>
>
> -Sylvain
>
>
>
> On Thu, Sep 13, 2018 at 9:14 AM, Eric Fried  wrote:
>
> There's a patch series in progress for this:
>
> https://review.openstack.org/#/q/topic:use-nested-allocation-candidates
>
> It needs some TLC. I'm sure gibi and tetsuro would welcome some help...
>
> efried
>
>
> On 09/13/2018 08:31 AM, Naichuan Sun wrote:
> > Thank you very much, Jay.
> > Is there somewhere I could set microversion(some configure file?), Or
> just modify the source code to set it?
> >
> > BR.
> > Naichuan Sun
> >
> > -Original Message-
> > From: Jay Pipes [mailto:jaypi...@gmail.com]
> > Sent: Thursday, September 13, 2018 9:19 PM
> > To: Naichuan Sun ; OpenStack Development
> Mailing List (not for usage questions) 
> > Cc: melanie witt ; efr...@us.ibm.com; Sylvain Bauza
> 
> > Subject: Re: About microversion setting to enable nested resource
> provider
> >
> > On 09/13/2018 06:39 AM, Naichuan Sun wrote:
> >> Hi, guys,
> >>
> >> Looks n-rp is disabled by default because microversion matches 1.29 :
> >> https://github.com/openstack/nova/blob/master/nova/api/openstack/place
> >> ment/handlers/allocation_candidate.py#L252
> >>
> >> Anyone know how to set the microversion to enable n-rp in placement?
> >
> > It is the client which must send the 1.29+ placement API microversion
> header to indicate to the placement API server that the client wants to
> receive nested provider information in the allocation candidates response.
> >
> > Currently, nova-scheduler calls the scheduler reportclient's
> > get_allocation_candidates() method:
> >
> >
> https://github.com/openstack/nova/blob/0ba34a818414823eda5e693dc2127a534410b5df/nova/scheduler/manager.py#L138
> >
> > The scheduler reportclient's get_allocation_candidates() method
> currently passes the 1.25 placement API microversion header:
> >
> >
> https://github.com/openstack/nova/blob/0ba34a818414823eda5e693dc2127a534410b5df/nova/scheduler/client/report.py#L353
> >
> >
> https://github.com/openstack/nova/blob/0ba34a818414823eda5e693dc2127a534410b5df/nova/scheduler/client/report.py#L53
> >
> > In order to get the nested information returned in the allocation
> candidates response, that would need to be upped to 1.29.
> >
> > Best,
> > -jay
>
> >
> __

Re: [openstack-dev] About microversion setting to enable nested resource provider

2018-09-13 Thread Sylvain Bauza
Hey Naichuan,
FWIW, we discussed on the missing pieces for nested resource providers. See
the (currently-in-use) etherpad
https://etherpad.openstack.org/p/nova-ptg-stein and lookup for "closing the
gap on nested resource providers" (L144 while I speak)

The fact that we are not able to schedule yet is a critical piece that we
said we're going to work on it as soon as we can.

-Sylvain

On Thu, Sep 13, 2018 at 9:14 AM, Eric Fried  wrote:

> There's a patch series in progress for this:
>
> https://review.openstack.org/#/q/topic:use-nested-allocation-candidates
>
> It needs some TLC. I'm sure gibi and tetsuro would welcome some help...
>
> efried
>
> On 09/13/2018 08:31 AM, Naichuan Sun wrote:
> > Thank you very much, Jay.
> > Is there somewhere I could set microversion(some configure file?), Or
> just modify the source code to set it?
> >
> > BR.
> > Naichuan Sun
> >
> > -Original Message-
> > From: Jay Pipes [mailto:jaypi...@gmail.com]
> > Sent: Thursday, September 13, 2018 9:19 PM
> > To: Naichuan Sun ; OpenStack Development
> Mailing List (not for usage questions) 
> > Cc: melanie witt ; efr...@us.ibm.com; Sylvain Bauza
> 
> > Subject: Re: About microversion setting to enable nested resource
> provider
> >
> > On 09/13/2018 06:39 AM, Naichuan Sun wrote:
> >> Hi, guys,
> >>
> >> Looks n-rp is disabled by default because microversion matches 1.29 :
> >> https://github.com/openstack/nova/blob/master/nova/api/openstack/place
> >> ment/handlers/allocation_candidate.py#L252
> >>
> >> Anyone know how to set the microversion to enable n-rp in placement?
> >
> > It is the client which must send the 1.29+ placement API microversion
> header to indicate to the placement API server that the client wants to
> receive nested provider information in the allocation candidates response.
> >
> > Currently, nova-scheduler calls the scheduler reportclient's
> > get_allocation_candidates() method:
> >
> > https://github.com/openstack/nova/blob/0ba34a818414823eda5e693dc2127a
> 534410b5df/nova/scheduler/manager.py#L138
> >
> > The scheduler reportclient's get_allocation_candidates() method
> currently passes the 1.25 placement API microversion header:
> >
> > https://github.com/openstack/nova/blob/0ba34a818414823eda5e693dc2127a
> 534410b5df/nova/scheduler/client/report.py#L353
> >
> > https://github.com/openstack/nova/blob/0ba34a818414823eda5e693dc2127a
> 534410b5df/nova/scheduler/client/report.py#L53
> >
> > In order to get the nested information returned in the allocation
> candidates response, that would need to be upped to 1.29.
> >
> > Best,
> > -jay
> > 
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] extraction (technical) update

2018-09-05 Thread Sylvain Bauza
On Wed, Sep 5, 2018 at 3:56 PM, Matt Riedemann  wrote:

> On 9/5/2018 8:39 AM, Dan Smith wrote:
>
>> Why not?
>>
>
> Because of the versions table as noted earlier. Up until this point no one
> had mentioned that but it would be an issue.
>
>
>> I think the safest/cleanest thing to do here is renumber placement-related
>> migrations from 1, and provide a script or procedure to dump just the
>> placement-related tables from the nova_api database to the new one (not
>> including the sqlalchemy-migrate versions table).
>>
>
> I'm OK with squashing/trimming/resetting the version to 1. What was not
> mentioned earlier in this thread was (1) an acknowledgement that we'd need
> to drop the versions table to reset the version in the new database and (2)
> any ideas about providing scripts to help with the DB migration.
>
> I think it's safe too. Operators could just migrate the tables by using a
read-only slave connection to a new DB and then using this script that
would drop the non-needed tables.
For people wanting to migrate tables, I think having placement versions
being different is not a problem given the tables are the same.

-Sylvain

-- 
>
> Thanks,
>
> Matt
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][placement] Freezing placement for extraction

2018-08-31 Thread Sylvain Bauza
On Thu, Aug 30, 2018 at 6:34 PM, Eric Fried  wrote:

> Greetings.
>
> The captains of placement extraction have declared readiness to begin
> the process of seeding the new repository (once [1] has finished
> merging). As such, we are freezing development in the affected portions
> of the openstack/nova repository until this process is completed. We're
> relying on our active placement reviewers noticing any patches that
> touch these "affected portions" and, if that reviewer is not a nova
> core, bringing them to the attention of one, so we can put a -2 on it.
>
>
Apologies for having missed the large and wide discussions about placement
future in the past weeks. I was off so I just saw the consensus yesterday
evening my time.
Now that disclaimer is done, can I know the reasoning why we call the
freeze as of now and not waiting for either Stein-2 or Stein-3 ?

My main concern is that the reshaper series is still being reviewed for
Nova. Some other changes using Placement (like drivers using nested
Resource Providers and the likes) are also not yet implemented (or even be
uploaded) and I'm a bit afraid of us discovering yet another cross-services
problem (say with having two distinct computes having different versions)
that would make the fix more harder than just fixing directly.



> Once the extraction is complete [2], any such frozen patches should be
> abandoned and reproposed to the openstack/placement repository.
>
> Since there will be an interval during which placement code will exist
> in both repositories, but before $world has cut over to using
> openstack/placement, it is possible that some crucial fix will still
> need to be merged into the openstack/nova side. In this case, the fix
> must be proposed to *both* repositories, and the justification for its
> existence in openstack/nova made clear.
>
>
We surely can do such things for small fixes that don't impact a lot of
files. What I'm a bit afraid of is any large change that would get some
merge conflicts. Sure, we can find ways to fix it too, but again, why
shouldn't we just wait for Stein-2 ?

-Sylvain (yet again apologies for the late opinion).


For more details on the technical aspects of the extraction process,
> refer to this thread [3].
>
> For information on the procedural/governance process we will be
> following, see [4].
>
> Please let us know if you have any questions or concerns, either via
> this thread or in #openstack-placement.
>
> [1] https://review.openstack.org/#/c/597220/
> [2] meaning that we've merged the initial glut of patches necessary to
> repath everything and get tests passing
> [3]
> http://lists.openstack.org/pipermail/openstack-dev/2018-August/133781.html
> [4] https://docs.openstack.org/infra/manual/creators.html
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [stable][nova] Nominating melwitt for nova stable core

2018-08-30 Thread Sylvain Bauza
On Wed, Aug 29, 2018 at 4:42 AM, Tony Breeds 
wrote:

> On Tue, Aug 28, 2018 at 03:26:02PM -0500, Matt Riedemann wrote:
> > I hereby nominate Melanie Witt for nova stable core. Mel has shown that
> she
> > knows the stable branch policy and is also an active reviewer of nova
> stable
> > changes.
> >
> > +1/-1 comes from the stable-maint-core team [1] and then after a week
> with
> > no negative votes I think it's a done deal. Of course +1/-1 from existing
> > nova-stable-maint [2] is also good feedback.
> >
> > [1] https://review.openstack.org/#/admin/groups/530,members
> > [2] https://review.openstack.org/#/admin/groups/540,members
>
> +1 from me!
>
>
+1 (just depiling emails)

Yours Tony.
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

2018-06-20 Thread Sylvain Bauza
On Tue, Jun 19, 2018 at 9:59 PM, Artom Lifshitz  wrote:

> > Adding
> > claims support later on wouldn't change any on-the-wire messaging, it
> would
> > just make things work more robustly.
>
> I'm not even sure about that. Assuming [1] has at least the right
> idea, it looks like it's an either-or kind of thing: either we use
> resource tracker claims and get the new instance NUMA topology that
> way, or do what was in the spec and have the dest send it to the
> source.
>
> That being said, I still think I'm still in favor of choosing the
> "easy" way out. For instance, [2] should fail because we can't access
> the api db from the compute node. So unless there's a simpler way,
> using RT claims would involve changing the RPC to add parameters to
> check_can_live_migration_destination, which, while not necessarily
> bad, seems like useless complexity for a thing we know will get ripped
> out.
>
> When we reviewed the spec, we agreed as a community to say that we should
still get race conditions once the series is implemented, but at least it
helps operators.
Quoting :
"It would also be possible for another instance to steal NUMA resources
from a live migrated instance before the latter’s destination compute host
has a chance to claim them. Until NUMA resource providers are implemented
[3]  and allow for an essentially
atomic schedule+claim operation, scheduling and claiming will keep being
done at different times on different nodes. Thus, the potential for races
will continue to exist."
https://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/numa-aware-live-migration.html#proposed-change

So, my own opinion is that yes, the "easy" way out is better than no way.
>From what I undertand (and let's be honest I hadn't time to look at your
code yet), your series don't diverge from the proposed implementation so I
don't see a problem here. If, for some reasons, you need to write an
alternative, just tell us why (and ideally write a spec amendment patch so
the spec is consistent with the series).

-Sylvain




[1] https://review.openstack.org/#/c/576222/
> [2] https://review.openstack.org/#/c/576222/3/nova/compute/manager.py@5897
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][osc] Documenting compute API microversion gaps in OSC

2018-06-08 Thread Sylvain Bauza
On Fri, Jun 8, 2018 at 3:35 AM, Matt Riedemann  wrote:

> I've started an etherpad [1] to identify the compute API microversion gaps
> in python-openstackclient.
>
> It's a small start right now so I would appreciate some help on this, even
> just a few people looking at a couple of these per day would get it done
> quickly.
>
> Not all compute API microversions will require explicit changes to OSC,
> for example 2.3 [2] just adds some more fields to some API responses which
> might automatically get dumped in "show" commands. We just need to verify
> that the fields that come back in the response are actually shown by the
> CLI and then mark it in the etherpad.
>
> Once we identify the gaps, we can start talking about actually closing
> those gaps and deprecating the nova CLI, which could be part of a community
> wide goal - but there are other things going on in OSC right now (major
> refactor to use the SDK, core reviewer needs) so we'll have to figure out
> when the time is right.
>
> [1] https://etherpad.openstack.org/p/compute-api-microversion-gap-in-osc
> [2] https://docs.openstack.org/nova/latest/reference/api-microve
> rsion-history.html#maximum-in-kilo
>
>
Good idea, Matt. I think we could maybe discuss with the First Contact SIG
because it looks to me some developers could help us for that, while it
doesn't need to be a Nova expert.

I'll also try to see how I can help on this.
-Sylvain

-- 
>
> Thanks,
>
> Matt
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

2018-05-31 Thread Sylvain Bauza
On Thu, May 31, 2018 at 8:26 PM, Eric Fried  wrote:

> > 1. Make everything perform the pivot on compute node start (which can be
> >re-used by a CLI tool for the offline case)
> > 2. Make everything default to non-nested inventory at first, and provide
> >a way to migrate a compute node and its instances one at a time (in
> >place) to roll through.
>
> I agree that it sure would be nice to do ^ rather than requiring the
> "slide puzzle" thing.
>
> But how would this be accomplished, in light of the current "separation
> of responsibilities" drawn at the virt driver interface, whereby the
> virt driver isn't supposed to talk to placement directly, or know
> anything about allocations?  Here's a first pass:
>
>

What we usually do is to implement either at the compute service level or
at the virt driver level some init_host() method that will reconcile what
you want.
For example, we could just imagine a non-virt specific method (and I like
that because it's non-virt specific) - ie. called by compute's init_host()
that would lookup the compute root RP inventories, see whether one ore more
inventories tied to specific resource classes have to be moved from the
root RP and be attached to a child RP.
The only subtility that would require a virt-specific update would be the
name of the child RP (as both Xen and libvirt plan to use the child RP name
as the vGPU type identifier) but that's an implementation detail that a
possible virt driver update by the resource tracker would reconcile that.


The virt driver, via the return value from update_provider_tree, tells
> the resource tracker that "inventory of resource class A on provider B
> have moved to provider C" for all applicable AxBxC.  E.g.
>
> [ { 'from_resource_provider': ,
> 'moved_resources': [VGPU: 4],
> 'to_resource_provider': 
>   },
>   { 'from_resource_provider': ,
> 'moved_resources': [VGPU: 4],
> 'to_resource_provider': 
>   },
>   { 'from_resource_provider': ,
> 'moved_resources': [
> SRIOV_NET_VF: 2,
> NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND: 1000,
> NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND: 1000,
> ],
> 'to_resource_provider': 
>   }
> ]
>
> As today, the resource tracker takes the updated provider tree and
> invokes [1] the report client method update_from_provider_tree [2] to
> flush the changes to placement.  But now update_from_provider_tree also
> accepts the return value from update_provider_tree and, for each "move":
>
> - Creates provider C (as described in the provider_tree) if it doesn't
> already exist.
> - Creates/updates provider C's inventory as described in the
> provider_tree (without yet updating provider B's inventory).  This ought
> to create the inventory of resource class A on provider C.
> - Discovers allocations of rc A on rp B and POSTs to move them to rp C*.
> - Updates provider B's inventory.
>
> (*There's a hole here: if we're splitting a glommed-together inventory
> across multiple new child providers, as the VGPUs in the example, we
> don't know which allocations to put where.  The virt driver should know
> which instances own which specific inventory units, and would be able to
> report that info within the data structure.  That's getting kinda close
> to the virt driver mucking with allocations, but maybe it fits well
> enough into this model to be acceptable?)
>
> Note that the return value from update_provider_tree is optional, and
> only used when the virt driver is indicating a "move" of this ilk.  If
> it's None/[] then the RT/update_from_provider_tree flow is the same as
> it is today.
>
> If we can do it this way, we don't need a migration tool.  In fact, we
> don't even need to restrict provider tree "reshaping" to release
> boundaries.  As long as the virt driver understands its own data model
> migrations and reports them properly via update_provider_tree, it can
> shuffle its tree around whenever it wants.
>
> Thoughts?
>
> -efried
>
> [1]
> https://github.com/openstack/nova/blob/8753c9a38667f984d385b4783c3c2f
> c34d7e8e1b/nova/compute/resource_tracker.py#L890
> [2]
> https://github.com/openstack/nova/blob/8753c9a38667f984d385b4783c3c2f
> c34d7e8e1b/nova/scheduler/client/report.py#L1341
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

2018-05-31 Thread Sylvain Bauza
On Thu, May 31, 2018 at 7:44 PM, Chris Dent  wrote:

> On Thu, 31 May 2018, Dan Smith wrote:
>
> I kinda think we need to either:
>>
>> 1. Make everything perform the pivot on compute node start (which can be
>>   re-used by a CLI tool for the offline case)
>>
>
> This sounds effectively like: validate my inventory and allocations
> at compute node start, correcting them as required (including the
> kind of migration stuff related to nested). Is that right?
>
> That's something I'd like to be the norm. It takes us back to a sort
> of self-healing compute node.
>
> Or am I missing something (forgive me, I've been on holiday).
>


I think I understand the same as you. And I think it's actually the best
approach. Wow, Dan, you saved my life again. Should I call you Mitch
Buchannon ?



>
> I just think that forcing people to take down their data plane to work
>> around our own data model is kinda evil and something we should be
>> avoiding at this level of project maturity. What we're really saying is
>> "we know how to translate A into B, but we require you to move many GBs
>> of data over the network and take some downtime because it's easier for
>> *us* than making it seamless."
>>
>
> If we can do it, I agree that being not evil is good.
>
> --
> Chris Dent   ٩◔̯◔۶   https://anticdent.org/
> freenode: cdent tw: @anticdent
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

2018-05-31 Thread Sylvain Bauza
On Thu, May 31, 2018 at 7:09 PM, Dan Smith  wrote:

> > My feeling is that we should not attempt to "migrate" any allocations
> > or inventories between root or child providers within a compute node,
> > period.
>
> While I agree this is the simplest approach, it does put a lot of
> responsibility on the operators to do work to sidestep this issue, which
> might not even apply to them (and knowing if it does might be
> difficult).
>
>
Shit, I missed the point why we were discussing about migrations. When you
upgrade, you wanna move your workloads for upgrading your kernel and the
likes. Gotcha.
But, I assume that's not something mandatory for a single upgrade (say
Queens>Rocky). In that case, you just wanna upgrade your compute without
moving your instances. Or you notified your users about a maintenance and
you know you have a minimal maintenance period for breaking them.
In both cases, adding more steps for upgrading seems a tricky and dangerous
path for those operators who are afraid of making a mistake.


> > The virt drivers should simply error out of update_provider_tree() if
> > there are ANY existing VMs on the host AND the virt driver wishes to
> > begin tracking resources with nested providers.
> >
> > The upgrade operation should look like this:
> >
> > 1) Upgrade placement
> > 2) Upgrade nova-scheduler
> > 3) start loop on compute nodes. for each compute node:
> >  3a) disable nova-compute service on node (to take it out of scheduling)
> >  3b) evacuate all existing VMs off of node
>
> You mean s/evacuate/cold migrate/ of course... :)
>
> >  3c) upgrade compute node (on restart, the compute node will see no
> >  VMs running on the node and will construct the provider tree inside
> >  update_provider_tree() with an appropriate set of child providers
> >  and inventories on those child providers)
> >  3d) enable nova-compute service on node
> >
> > Which is virtually identical to the "normal" upgrade process whenever
> > there are significant changes to the compute node -- such as upgrading
> > libvirt or the kernel.
>
> Not necessarily. It's totally legit (and I expect quite common) to just
> reboot the host to take kernel changes, bringing back all the instances
> that were there when it resumes. The "normal" case of moving things
> around slide-puzzle-style applies to live migration (which isn't an
> option here). I think people that can take downtime for the instances
> would rather not have to move things around for no reason if the
> instance has to get shut off anyway.
>
>
Yeah exactly that. Accepting a downtime is fair, to the price to not have a
long list of operations to do during that downtime period.



> > Nested resource tracking is another such significant change and should
> > be dealt with in a similar way, IMHO.
>
> This basically says that for anyone to move to rocky, they will have to
> cold migrate every single instance in order to do that upgrade right? I
> mean, anyone with two socket machines or SRIOV NICs would end up with at
> least one level of nesting, correct? Forcing everyone to move everything
> to do an upgrade seems like a non-starter to me.
>
>
For the moment, we aren't providing NUMA topologies with nested RPs but
once we do that, yeah, that would imply the above, which sounds harsh to
hear from an operator perspective.



> We also need to consider the case where people would be FFU'ing past
> rocky (i.e. never running rocky computes). We've previously said that
> we'd provide a way to push any needed transitions with everything
> offline to facilitate that case, so I think we need to implement that
> method anyway.
>
> I kinda think we need to either:
>
> 1. Make everything perform the pivot on compute node start (which can be
>re-used by a CLI tool for the offline case)
>

That's another alternative I haven't explored yet. Thanks for the idea. We
already reconcile the world when we restart the compute service by checking
whether mediated devices exist, so that could be a good option actually.



> 2. Make everything default to non-nested inventory at first, and provide
>a way to migrate a compute node and its instances one at a time (in
>place) to roll through.
>
>
 We could say that Rocky isn't supporting multiple vGPU types until you
make the necessary DB migration that will create child RPs and the likes.
That's yet another approach.

We can also document "or do the cold-migration slide puzzle thing" as an
> alternative for people that feel that's more reasonable.
>
> I just think that forcing people to take down their data plane to work
> around our own data model is kinda evil and something we should be
> avoiding at this level of project maturity. What we're really saying is
> "we know how to translate A into B, but we require you to move many GBs
> of data over the network and take some downtime because it's easier for
> *us* than making it seamless."
>
> --Dan
>
> 

Re: [openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

2018-05-31 Thread Sylvain Bauza
On Thu, May 31, 2018 at 5:00 PM, Jay Pipes  wrote:

> On 05/31/2018 05:10 AM, Sylvain Bauza wrote:
>
>> After considering the whole approach, discussing with a couple of folks
>> over IRC, here is what I feel the best approach for a seamless upgrade :
>>   - VGPU inventory will be kept on root RP (for the first type) in Queens
>> so that a compute service upgrade won't impact the DB
>>   - during Queens, operators can run a DB online migration script (like
>> the ones we currently have in https://github.com/openstack/n
>> ova/blob/c2f42b0/nova/cmd/manage.py#L375) that will create a new
>> resource provider for the first type and move the inventory and allocations
>> to it.
>>   - it's the responsibility of the virt driver code to check whether a
>> child RP with its name being the first type name already exists to know
>> whether to update the inventory against the root RP or the child RP.
>>
>> Does it work for folks ?
>>
>
> No, sorry, that doesn't work for me. It seems overly complex and fragile,
> especially considering that VGPUs are not moveable anyway (no support for
> live migrating them). Same goes for CPU pinning, NUMA topologies, PCI
> passthrough devices, SR-IOV PF/VFs and all the other "must have" features
> that have been added to the virt driver over the last 5 years.
>
> My feeling is that we should not attempt to "migrate" any allocations or
> inventories between root or child providers within a compute node, period.
>
>
I don't understand why you're talking of *moving* an instance. My concern
was about upgrading a compute node to Rocky where some instances were
already there, and using vGPUs.


> The virt drivers should simply error out of update_provider_tree() if
> there are ANY existing VMs on the host AND the virt driver wishes to begin
> tracking resources with nested providers.
>
> The upgrade operation should look like this:
>
> 1) Upgrade placement
> 2) Upgrade nova-scheduler
> 3) start loop on compute nodes. for each compute node:
>  3a) disable nova-compute service on node (to take it out of scheduling)
>  3b) evacuate all existing VMs off of node
>  3c) upgrade compute node (on restart, the compute node will see no
>  VMs running on the node and will construct the provider tree inside
>  update_provider_tree() with an appropriate set of child providers
>  and inventories on those child providers)
>  3d) enable nova-compute service on node
>
> Which is virtually identical to the "normal" upgrade process whenever
> there are significant changes to the compute node -- such as upgrading
> libvirt or the kernel. Nested resource tracking is another such significant
> change and should be dealt with in a similar way, IMHO.
>
>
Upgrading to Rocky for vGPUs doesn't need to also upgrade libvirt or the
kernel. So why operators should need to "evacuate" (I understood that as
"migrate")  instances if they don't need to upgrade their host OS ?

Best,
> -jay
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

2018-05-31 Thread Sylvain Bauza
On Thu, May 31, 2018 at 4:34 PM, Jay Pipes  wrote:

> On 05/29/2018 09:12 AM, Sylvain Bauza wrote:
>
>> We could keep the old inventory in the root RP for the previous vGPU type
>> already supported in Queens and just add other inventories for other vGPU
>> types now supported. That looks possibly the simpliest option as the virt
>> driver knows that.
>>
>
> What do you mean by "vGPU type"? Are you referring to the multiple GPU
> types stuff where specific virt drivers know how to handle different vGPU
> vendor types? Or are you referring to a "non-nested VGPU inventory on the
> compute node provider" versus a "VGPU inventory on multiple child
> providers, each representing a different physical GPU (or physical GPU
> group in the case of Xen)"?
>
>
I speak about a "vGPU type" because it's how we agreed to have multiple
child RPs.
See
https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/add-support-for-vgpu.html#proposed-change

For Xen, a vGPU type is a Xen GPU group. For libvirt, it's just a mdev type.
Each pGPU can support multiple types. For the moment, we only support one
type, but my spec ( https://review.openstack.org/#/c/557065/ ) explains
more about that.


-jay
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

2018-05-31 Thread Sylvain Bauza
On Thu, May 31, 2018 at 3:54 PM, Eric Fried  wrote:

> This seems reasonable, but...
>
> On 05/31/2018 04:34 AM, Balázs Gibizer wrote:
> >
> >
> > On Thu, May 31, 2018 at 11:10 AM, Sylvain Bauza 
> wrote:
> >>>
> >>
> >> After considering the whole approach, discussing with a couple of
> >> folks over IRC, here is what I feel the best approach for a seamless
> >> upgrade :
> >>  - VGPU inventory will be kept on root RP (for the first type) in
> >> Queens so that a compute service upgrade won't impact the DB
> >>  - during Queens, operators can run a DB online migration script (like
> -^^
> Did you mean Rocky?
>


Oops, yeah of course. Queens > Rocky.

>
> >> the ones we currently have in
> >> https://github.com/openstack/nova/blob/c2f42b0/nova/cmd/manage.py#L375)
> that
> >> will create a new resource provider for the first type and move the
> >> inventory and allocations to it.
> >>  - it's the responsibility of the virt driver code to check whether a
> >> child RP with its name being the first type name already exists to
> >> know whether to update the inventory against the root RP or the child
> RP.
> >>
> >> Does it work for folks ?
> >
> > +1 works for me
> > gibi
> >
> >> PS : we already have the plumbing in place in nova-manage and we're
> >> still managing full Nova resources. I know we plan to move Placement
> >> out of the nova tree, but for the Rocky timeframe, I feel we can
> >> consider nova-manage as the best and quickiest approach for the data
> >> upgrade.
> >>
> >> -Sylvain
> >>
> >>
> >
> >
> > 
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

2018-05-31 Thread Sylvain Bauza
On Wed, May 30, 2018 at 1:06 PM, Balázs Gibizer  wrote:

>
>
> On Tue, May 29, 2018 at 3:12 PM, Sylvain Bauza  wrote:
>
>>
>>
>> On Tue, May 29, 2018 at 2:21 PM, Balázs Gibizer <
>> balazs.gibi...@ericsson.com> wrote:
>>
>>>
>>>
>>> On Tue, May 29, 2018 at 1:47 PM, Sylvain Bauza 
>>> wrote:
>>>
>>>>
>>>>
>>>> Le mar. 29 mai 2018 à 11:02, Balázs Gibizer <
>>>> balazs.gibi...@ericsson.com> a écrit :
>>>>
>>>>>
>>>>>
>>>>> On Tue, May 29, 2018 at 9:38 AM, Sylvain Bauza 
>>>>> wrote:
>>>>> >
>>>>> >
>>>>> > On Tue, May 29, 2018 at 3:08 AM, TETSURO NAKAMURA
>>>>> >  wrote
>>>>> >
>>>>> >> > In that situation, say for example with VGPU inventories, that
>>>>> >> would mean
>>>>> >> > that the compute node would stop reporting inventories for its
>>>>> >> root RP, but
>>>>> >> > would rather report inventories for at least one single child RP.
>>>>> >> > In that model, do we reconcile the allocations that were already
>>>>> >> made
>>>>> >> > against the "root RP" inventory ?
>>>>> >>
>>>>> >> It would be nice to see Eric and Jay comment on this,
>>>>> >> but if I'm not mistaken, when the virt driver stops reporting
>>>>> >> inventories for its root RP, placement would try to delete that
>>>>> >> inventory inside and raise InventoryInUse exception if any
>>>>> >> allocations still exist on that resource.
>>>>> >>
>>>>> >> ```
>>>>> >> update_from_provider_tree() (nova/compute/resource_tracker.py)
>>>>> >>   + _set_inventory_for_provider() (nova/scheduler/client/report.py)
>>>>> >>   + put() - PUT /resource_providers//inventories with
>>>>> >> new inventories (scheduler/client/report.py)
>>>>> >>   + set_inventories() (placement/handler/inventory.py)
>>>>> >>   + _set_inventory()
>>>>> >> (placement/objects/resource_proveider.py)
>>>>> >>   + _delete_inventory_from_provider()
>>>>> >> (placement/objects/resource_proveider.py)
>>>>> >>   -> raise exception.InventoryInUse
>>>>> >> ```
>>>>> >>
>>>>> >> So we need some trick something like deleting VGPU allocations
>>>>> >> before upgrading and set the allocation again for the created new
>>>>> >> child after upgrading?
>>>>> >>
>>>>> >
>>>>> > I wonder if we should keep the existing inventory in the root RP, and
>>>>> > somehow just reserve the left resources (so Placement wouldn't pass
>>>>> > that root RP for queries, but would still have allocations). But
>>>>> > then, where and how to do this ? By the resource tracker ?
>>>>> >
>>>>>
>>>>> AFAIK it is the virt driver that decides to model the VGU resource at a
>>>>> different place in the RP tree so I think it is the responsibility of
>>>>> the same virt driver to move any existing allocation from the old place
>>>>> to the new place during this change.
>>>>>
>>>>> Cheers,
>>>>> gibi
>>>>>
>>>>
>>>> Why not instead not move the allocation but rather have the virt driver
>>>> updating the root RP by modifying the reserved value to the total size?
>>>>
>>>> That way, the virt driver wouldn't need to ask for an allocation but
>>>> rather continue to provide inventories...
>>>>
>>>> Thoughts?
>>>>
>>>
>>> Keeping the old allocaton at the old RP and adding a similar sized
>>> reservation in the new RP feels hackis as those are not really reserved
>>> GPUs but used GPUs just from the old RP. If somebody sums up the total
>>> reported GPUs in this setup via the placement API then she will get more
>>> GPUs in total that what is physically visible for the hypervisor as the
>>> GPUs part of the old allocation reported twice in two different total
>>> va

Re: [openstack-dev] [Cyborg] [Nova] Cyborg traits

2018-05-30 Thread Sylvain Bauza
On Wed, May 30, 2018 at 1:33 AM, Nadathur, Sundar  wrote:

> Hi all,
>The Cyborg/Nova scheduling spec [1] details what traits will be applied
> to the resource providers that represent devices like GPUs. Some of the
> traits referred to vendor names. I got feedback that traits must not refer
> to products or specific models of devices. I agree. However, we need some
> reference to device types to enable matching the VM driver with the device.
>
> TL;DR We need some reference to device types, but we don't need product
> names. I will update the spec [1] to clarify that. Rest of this email
> clarifies why we need device types in traits, and what traits we propose to
> include.
>
> In general, an accelerator device is operated by two pieces of software: a
> driver in the kernel (which may discover and handle the PF for SR-IOV
> devices), and a driver/library in the guest (which may handle the assigned
> VF).
>
> The device assigned to the VM must match the driver/library packaged in
> the VM. For this, the request must explicitly state what category of
> devices it needs. For example, if the VM needs a GPU, it needs to say
> whether it needs an AMD GPU or an Nvidia GPU, since it may have the
> driver/libraries for that vendor alone. It may also need to state what
> version of Cuda is needed, if it is a Nvidia GPU. These aspects are
> necessarily vendor-specific.
>
>
FWIW, the vGPU implementation for Nova also has the same concern. We want
to provide traits for explicitly say "use this vGPU type" but given it's
related to a specific vendor, we can't just say "ask for this frame buffer
size, or just for the display heads", but rather "we need a vGPU accepting
Quadro vDWS license".


> Further, one driver/library version may handle multiple devices. Since a
> new driver version may be backwards compatible, multiple driver versions
> may manage the same device. The development/release of the driver/library
> inside the VM should be independent of the kernel driver for that device.
>
>
I agree.


> For FPGAs, there is an additional twist as the VM may need specific
> bitstream(s), and they match only specific device/region types. The
> bitstream for a device from a vendor will not fit any other device from the
> same vendor, let alone other vendors. IOW, the region type is specific not
> just to a vendor but to a device type within the vendor. So, it is
> essential to identify the device type.
>
> So, the proposed set of RCs and traits are as below. As we learn more
> about actual usages by operators, we may need to evolve this set.
>
>- There is a resource class per device category e.g.
>CUSTOM_ACCELERATOR_GPU, CUSTOM_ACCELERATOR_FPGA.
>- The resource provider that represents a device has the following
>traits:
>   - Vendor/Category trait: e.g. CUSTOM_GPU_AMD, CUSTOM_FPGA_XILINX.
>   - Device type trait which is a refinement of vendor/category trait
>   e.g. CUSTOM_FPGA_XILINX_VU9P.
>
> NOTE: This is not a product or model, at least for FPGAs. Multiple
> products may use the same FPGA chip.
> NOTE: The reason for having both the vendor/category and this one is that
> a flavor may ask for either, depending on the granularity desired. IOW, if
> one driver can handle all devices from a vendor (*eye roll*), the flavor
> can ask for the vendor/category trait alone. If there are separate drivers
> for different device families from the same vendor, the flavor must specify
> the trait for the device family.
> NOTE: The equivalent trait for GPUs may be like CUSTOM_GPU_NVIDIA_P90, but
> I'll let others decide if that is a product or not.
>
>
I was about to propose the same for vGPUs in Nova, ie. using custom traits.
The only concern is that we need operators to set the traits directly using
osc-placement instead of having Nova magically provide those traits. But
anyway, given operators need to set the vGPU types they want, I think it's
acceptable.



>
>- For FPGAs, we have additional traits:
>  - Functionality trait: e.g. CUSTOM_FPGA_COMPUTE,
>  CUSTOM_FPGA_NETWORK, CUSTOM_FPGA_STORAGE
>  - Region type ID.  e.g. CUSTOM_FPGA_INTEL_REGION_.
>  - Optionally, a function ID, indicating what function is
>  currently programmed in the region RP. e.g. 
> CUSTOM_FPGA_INTEL_FUNCTION_.
>  Not all implementations may provide it. The function trait may 
> change on
>  reprogramming, but it is not expected to be frequent.
>  - Possibly, CUSTOM_PROGRAMMABLE as a separate trait.
>
> [1] https://review.openstack.org/#/c/554717/
>


I'll try to review the spec as soon as I can.

-Sylvain

>
>
> Thanks.
>
> Regards,
> Sundar
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>

Re: [openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

2018-05-29 Thread Sylvain Bauza
On Tue, May 29, 2018 at 2:21 PM, Balázs Gibizer  wrote:

>
>
> On Tue, May 29, 2018 at 1:47 PM, Sylvain Bauza  wrote:
>
>>
>>
>> Le mar. 29 mai 2018 à 11:02, Balázs Gibizer 
>> a écrit :
>>
>>>
>>>
>>> On Tue, May 29, 2018 at 9:38 AM, Sylvain Bauza 
>>> wrote:
>>> >
>>> >
>>> > On Tue, May 29, 2018 at 3:08 AM, TETSURO NAKAMURA
>>> >  wrote
>>> >
>>> >> > In that situation, say for example with VGPU inventories, that
>>> >> would mean
>>> >> > that the compute node would stop reporting inventories for its
>>> >> root RP, but
>>> >> > would rather report inventories for at least one single child RP.
>>> >> > In that model, do we reconcile the allocations that were already
>>> >> made
>>> >> > against the "root RP" inventory ?
>>> >>
>>> >> It would be nice to see Eric and Jay comment on this,
>>> >> but if I'm not mistaken, when the virt driver stops reporting
>>> >> inventories for its root RP, placement would try to delete that
>>> >> inventory inside and raise InventoryInUse exception if any
>>> >> allocations still exist on that resource.
>>> >>
>>> >> ```
>>> >> update_from_provider_tree() (nova/compute/resource_tracker.py)
>>> >>   + _set_inventory_for_provider() (nova/scheduler/client/report.py)
>>> >>   + put() - PUT /resource_providers//inventories with
>>> >> new inventories (scheduler/client/report.py)
>>> >>   + set_inventories() (placement/handler/inventory.py)
>>> >>   + _set_inventory()
>>> >> (placement/objects/resource_proveider.py)
>>> >>   + _delete_inventory_from_provider()
>>> >> (placement/objects/resource_proveider.py)
>>> >>   -> raise exception.InventoryInUse
>>> >> ```
>>> >>
>>> >> So we need some trick something like deleting VGPU allocations
>>> >> before upgrading and set the allocation again for the created new
>>> >> child after upgrading?
>>> >>
>>> >
>>> > I wonder if we should keep the existing inventory in the root RP, and
>>> > somehow just reserve the left resources (so Placement wouldn't pass
>>> > that root RP for queries, but would still have allocations). But
>>> > then, where and how to do this ? By the resource tracker ?
>>> >
>>>
>>> AFAIK it is the virt driver that decides to model the VGU resource at a
>>> different place in the RP tree so I think it is the responsibility of
>>> the same virt driver to move any existing allocation from the old place
>>> to the new place during this change.
>>>
>>> Cheers,
>>> gibi
>>>
>>
>> Why not instead not move the allocation but rather have the virt driver
>> updating the root RP by modifying the reserved value to the total size?
>>
>> That way, the virt driver wouldn't need to ask for an allocation but
>> rather continue to provide inventories...
>>
>> Thoughts?
>>
>
> Keeping the old allocaton at the old RP and adding a similar sized
> reservation in the new RP feels hackis as those are not really reserved
> GPUs but used GPUs just from the old RP. If somebody sums up the total
> reported GPUs in this setup via the placement API then she will get more
> GPUs in total that what is physically visible for the hypervisor as the
> GPUs part of the old allocation reported twice in two different total
> value. Could we just report less GPU inventories to the new RP until the
> old RP has GPU allocations?
>
>

We could keep the old inventory in the root RP for the previous vGPU type
already supported in Queens and just add other inventories for other vGPU
types now supported. That looks possibly the simpliest option as the virt
driver knows that.



> Some alternatives from my jetlagged brain:
>
> a) Implement a move inventory/allocation API in placement. Given a
> resource class and a source RP uuid and a destination RP uuid placement
> moves the inventory and allocations of that resource class from the source
> RP to the destination RP. Then the virt drive can call this API to move the
> allocation. This has an impact on the fast forward upgrade as it needs
> running virt driver to do the allocation move.
>
>
Instead of having the virt driver doing that (TBH

Re: [openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

2018-05-29 Thread Sylvain Bauza
Le mar. 29 mai 2018 à 11:02, Balázs Gibizer  a
écrit :

>
>
> On Tue, May 29, 2018 at 9:38 AM, Sylvain Bauza 
> wrote:
> >
> >
> > On Tue, May 29, 2018 at 3:08 AM, TETSURO NAKAMURA
> >  wrote
> >
> >> > In that situation, say for example with VGPU inventories, that
> >> would mean
> >> > that the compute node would stop reporting inventories for its
> >> root RP, but
> >> > would rather report inventories for at least one single child RP.
> >> > In that model, do we reconcile the allocations that were already
> >> made
> >> > against the "root RP" inventory ?
> >>
> >> It would be nice to see Eric and Jay comment on this,
> >> but if I'm not mistaken, when the virt driver stops reporting
> >> inventories for its root RP, placement would try to delete that
> >> inventory inside and raise InventoryInUse exception if any
> >> allocations still exist on that resource.
> >>
> >> ```
> >> update_from_provider_tree() (nova/compute/resource_tracker.py)
> >>   + _set_inventory_for_provider() (nova/scheduler/client/report.py)
> >>   + put() - PUT /resource_providers//inventories with
> >> new inventories (scheduler/client/report.py)
> >>   + set_inventories() (placement/handler/inventory.py)
> >>   + _set_inventory()
> >> (placement/objects/resource_proveider.py)
> >>   + _delete_inventory_from_provider()
> >> (placement/objects/resource_proveider.py)
> >>   -> raise exception.InventoryInUse
> >> ```
> >>
> >> So we need some trick something like deleting VGPU allocations
> >> before upgrading and set the allocation again for the created new
> >> child after upgrading?
> >>
> >
> > I wonder if we should keep the existing inventory in the root RP, and
> > somehow just reserve the left resources (so Placement wouldn't pass
> > that root RP for queries, but would still have allocations). But
> > then, where and how to do this ? By the resource tracker ?
> >
>
> AFAIK it is the virt driver that decides to model the VGU resource at a
> different place in the RP tree so I think it is the responsibility of
> the same virt driver to move any existing allocation from the old place
> to the new place during this change.
>
> Cheers,
> gibi
>

Why not instead not move the allocation but rather have the virt driver
updating the root RP by modifying the reserved value to the total size?

That way, the virt driver wouldn't need to ask for an allocation but rather
continue to provide inventories...

Thoughts?


> > -Sylvain
> >
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

2018-05-29 Thread Sylvain Bauza
2018-05-29 11:01 GMT+02:00 Balázs Gibizer :

>
>
> On Tue, May 29, 2018 at 9:38 AM, Sylvain Bauza  wrote:
>
>>
>>
>> On Tue, May 29, 2018 at 3:08 AM, TETSURO NAKAMURA <
>> nakamura.tets...@lab.ntt.co.jp> wrote
>>
>> > In that situation, say for example with VGPU inventories, that would
>>> mean
>>> > that the compute node would stop reporting inventories for its root
>>> RP, but
>>> > would rather report inventories for at least one single child RP.
>>> > In that model, do we reconcile the allocations that were already made
>>> > against the "root RP" inventory ?
>>>
>>> It would be nice to see Eric and Jay comment on this,
>>> but if I'm not mistaken, when the virt driver stops reporting
>>> inventories for its root RP, placement would try to delete that inventory
>>> inside and raise InventoryInUse exception if any allocations still exist on
>>> that resource.
>>>
>>> ```
>>> update_from_provider_tree() (nova/compute/resource_tracker.py)
>>>   + _set_inventory_for_provider() (nova/scheduler/client/report.py)
>>>   + put() - PUT /resource_providers//inventories with new
>>> inventories (scheduler/client/report.py)
>>>   + set_inventories() (placement/handler/inventory.py)
>>>   + _set_inventory() (placement/objects/resource_pr
>>> oveider.py)
>>>   + _delete_inventory_from_provider()
>>> (placement/objects/resource_proveider.py)
>>>   -> raise exception.InventoryInUse
>>> ```
>>>
>>> So we need some trick something like deleting VGPU allocations before
>>> upgrading and set the allocation again for the created new child after
>>> upgrading?
>>>
>>>
>> I wonder if we should keep the existing inventory in the root RP, and
>> somehow just reserve the left resources (so Placement wouldn't pass that
>> root RP for queries, but would still have allocations). But then, where and
>> how to do this ? By the resource tracker ?
>>
>>
> AFAIK it is the virt driver that decides to model the VGU resource at a
> different place in the RP tree so I think it is the responsibility of the
> same virt driver to move any existing allocation from the old place to the
> new place during this change.
>
>
No. Allocations are done by the scheduler or by the conductor. Virt drivers
only provide inventories.



> Cheers,
> gibi
>
>
> -Sylvain
>>
>>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

2018-05-29 Thread Sylvain Bauza
On Tue, May 29, 2018 at 3:08 AM, TETSURO NAKAMURA <
nakamura.tets...@lab.ntt.co.jp> wrote:

> Hi,
>
> > Do I still understand correctly ? If yes, perfect, let's jump to my
> upgrade
> > concern.
>
> Yes, I think. The old microversions look into only root providers and give
> up providing resources if a root provider itself doesn't have enough
> inventories for requested resources. But the new microversion looks into
> the root's descendents also and see if it can provide requested resources
> *collectively* in that tree.
>
> The tests from [1] would help you understand this, where VCPUs come from
> the root(compute host) and SRIOV_NET_VFs from its grandchild.
>
> [1] https://review.openstack.org/#/c/565487/15/nova/tests/functi
> onal/api/openstack/placement/gabbits/allocation-candidates.yaml@362
>
>
Yeah I already saw those tests, but I wanted to make sure I was correctly
understanding.


> > In that situation, say for example with VGPU inventories, that would mean
> > that the compute node would stop reporting inventories for its root RP,
> but
> > would rather report inventories for at least one single child RP.
> > In that model, do we reconcile the allocations that were already made
> > against the "root RP" inventory ?
>
> It would be nice to see Eric and Jay comment on this,
> but if I'm not mistaken, when the virt driver stops reporting inventories
> for its root RP, placement would try to delete that inventory inside and
> raise InventoryInUse exception if any allocations still exist on that
> resource.
>
> ```
> update_from_provider_tree() (nova/compute/resource_tracker.py)
>   + _set_inventory_for_provider() (nova/scheduler/client/report.py)
>   + put() - PUT /resource_providers//inventories with new
> inventories (scheduler/client/report.py)
>   + set_inventories() (placement/handler/inventory.py)
>   + _set_inventory() (placement/objects/resource_proveider.py)
>   + _delete_inventory_from_provider()
> (placement/objects/resource_proveider.py)
>   -> raise exception.InventoryInUse
> ```
>
> So we need some trick something like deleting VGPU allocations before
> upgrading and set the allocation again for the created new child after
> upgrading?
>
>
I wonder if we should keep the existing inventory in the root RP, and
somehow just reserve the left resources (so Placement wouldn't pass that
root RP for queries, but would still have allocations). But then, where and
how to do this ? By the resource tracker ?

-Sylvain


> On 2018/05/28 23:18, Sylvain Bauza wrote:
>
>> Hi,
>>
>> I already told about that in a separate thread, but let's put it here too
>> for more visibility.
>>
>> tl;dr: I suspect existing allocations are being lost when we upgrade a
>> compute service from Queens to Rocky, if those allocations are made
>> against
>> inventories that are now provided by a child Resource Provider.
>>
>>
>> I started reviewing https://review.openstack.org/#/c/565487/ and bottom
>> patches to understand the logic with querying nested resource providers.
>>
>>> From what I understand, the scheduler will query Placement using the same
>>>
>> query but will get (thanks to a new microversion) not only allocation
>> candidates that are root resource providers but also any possible child.
>>
>> If so, that's great as in a rolling upgrade scenario with mixed computes
>> (both Queens and Rocky), we will still continue to return both old RPs and
>> new child RPs if they both support the same resource classes ask.
>> Accordingly, allocations done by the scheduler will be made against the
>> corresponding Resource Provider, whether it's a root RP (old way) or a
>> child RP (new way).
>>
>> Do I still understand correctly ? If yes, perfect, let's jump to my
>> upgrade
>> concern.
>> Now, consider the Queens->Rocky compute upgrade. If I'm an operator and I
>> start deploying Rocky on one compute node, it will provide to Placement
>> API
>> new inventories that are possibly nested.
>> In that situation, say for example with VGPU inventories, that would mean
>> that the compute node would stop reporting inventories for its root RP,
>> but
>> would rather report inventories for at least one single child RP.
>> In that model, do we reconcile the allocations that were already made
>> against the "root RP" inventory ? I don't think so, hence my question
>> here.
>>
>> Thanks,
>> -Sylvain
>>
>>
>>

[openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

2018-05-28 Thread Sylvain Bauza
Hi,

I already told about that in a separate thread, but let's put it here too
for more visibility.

tl;dr: I suspect existing allocations are being lost when we upgrade a
compute service from Queens to Rocky, if those allocations are made against
inventories that are now provided by a child Resource Provider.


I started reviewing https://review.openstack.org/#/c/565487/ and bottom
patches to understand the logic with querying nested resource providers.
>From what I understand, the scheduler will query Placement using the same
query but will get (thanks to a new microversion) not only allocation
candidates that are root resource providers but also any possible child.

If so, that's great as in a rolling upgrade scenario with mixed computes
(both Queens and Rocky), we will still continue to return both old RPs and
new child RPs if they both support the same resource classes ask.
Accordingly, allocations done by the scheduler will be made against the
corresponding Resource Provider, whether it's a root RP (old way) or a
child RP (new way).

Do I still understand correctly ? If yes, perfect, let's jump to my upgrade
concern.
Now, consider the Queens->Rocky compute upgrade. If I'm an operator and I
start deploying Rocky on one compute node, it will provide to Placement API
new inventories that are possibly nested.
In that situation, say for example with VGPU inventories, that would mean
that the compute node would stop reporting inventories for its root RP, but
would rather report inventories for at least one single child RP.
In that model, do we reconcile the allocations that were already made
against the "root RP" inventory ? I don't think so, hence my question here.

Thanks,
-Sylvain
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [nova] Need some feedback on the proposed heal_allocations CLI

2018-05-28 Thread Sylvain Bauza
On Fri, May 25, 2018 at 12:19 AM, Matt Riedemann 
wrote:

> I've written a nova-manage placement heal_allocations CLI [1] which was a
> TODO from the PTG in Dublin as a step toward getting existing
> CachingScheduler users to roll off that (which is deprecated).
>
> During the CERN cells v1 upgrade talk it was pointed out that CERN was
> able to go from placement-per-cell to centralized placement in Ocata
> because the nova-computes in each cell would automatically recreate the
> allocations in Placement in a periodic task, but that code is gone once
> you're upgraded to Pike or later.
>
> In various other talks during the summit this week, we've talked about
> things during upgrades where, for instance, if placement is down for some
> reason during an upgrade, a user deletes an instance and the allocation
> doesn't get cleaned up from placement so it's going to continue counting
> against resource usage on that compute node even though the server instance
> in nova is gone. So this CLI could be expanded to help clean up situations
> like that, e.g. provide it a specific server ID and the CLI can figure out
> if it needs to clean things up in placement.
>
> So there are plenty of things we can build into this, but the patch is
> already quite large. I expect we'll also be backporting this to stable
> branches to help operators upgrade/fix allocation issues. It already has
> several things listed in a code comment inline about things to build into
> this later.
>
> My question is, is this good enough for a first iteration or is there
> something severely missing before we can merge this, like the automatic
> marker tracking mentioned in the code (that will probably be a non-trivial
> amount of code to add). I could really use some operator feedback on this
> to just take a look at what it already is capable of and if it's not going
> to be useful in this iteration, let me know what's missing and I can add
> that in to the patch.
>
> [1] https://review.openstack.org/#/c/565886/
>
>

It does sound for me a good way to help operators.

That said, given I'm now working on using Nested Resource Providers for
VGPU inventories, I wonder about a possible upgrade problem with VGPU
allocations. Given that :
 - in Queens, VGPU inventories are for the root RP (ie. the compute node
RP), but,
 - in Rocky, VGPU inventories will be for children RPs (ie. against a
specific VGPU type), then

if we have VGPU allocations in Queens, when upgrading to Rocky, we should
maybe recreate the allocations to a specific other inventory ?

Hope you see the problem with upgrading by creating nested RPs ?


> --
>
> Thanks,
>
> Matt
>
> ___
> OpenStack-operators mailing list
> openstack-operat...@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cyborg] [nova] Cyborg quotas

2018-05-18 Thread Sylvain Bauza
Le ven. 18 mai 2018 à 13:59, Nadathur, Sundar  a
écrit :

> Hi Matt,
> On 5/17/2018 3:18 PM, Matt Riedemann wrote:
>
> On 5/17/2018 3:36 PM, Nadathur, Sundar wrote:
>
> This applies only to the resources that Nova handles, IIUC, which does not
> handle accelerators. The generic method that Alex talks about is obviously
> preferable but, if that is not available in Rocky, is the filter an option?
>
>
> If nova isn't creating accelerator resources managed by cyborg, I have no
> idea why nova would be doing quota checks on those types of resources. And
> no, I don't think adding a scheduler filter to nova for checking
> accelerator quota is something we'd add either. I'm not sure that would
> even make sense - the quota for the resource is per tenant, not per host is
> it? The scheduler filters work on a per-host basis.
>
> Can we not extend BaseFilter.filter_all() to get all the hosts in a
> filter?
>
> https://github.com/openstack/nova/blob/master/nova/filters.py#L36
>
> I should have made it clearer that this putative filter will be
> out-of-tree, and needed only till better solutions become available.
>

No, there are two clear parameters for a filter, and changing that would
mean a new paradigm for FilterScheduler.
If you need to have a check for all the hosts, maybe it should be either a
pre-filter for Placement or a post-filter but we don't accept out of tree
yet.


> Like any other resource in openstack, the project that manages that
> resource should be in charge of enforcing quota limits for it.
>
> Agreed. Not sure how other projects handle it, but here's the situation
> for Cyborg. A request may get scheduled on a compute node with no
> intervention by Cyborg. So, the earliest check that can be made today is in
> the selected compute node. A simple approach can result in quota violations
> as in this example.
>
> Say there are 5 devices in a cluster. A tenant has a quota of 4 and is
> currently using 3. That leaves 2 unused devices, of which the tenant is
> permitted to use only one. But he may submit two concurrent requests, and
> they may land on two different compute nodes. The Cyborg agent in each node
> will see the current tenant usage as 3 and let the request go through,
> resulting in quota violation.
>
> To prevent this, we need some kind of atomic update , like SQLAlchemy's
> with_lockmode():
>
> https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#Pessimistic_Locking_-_SELECT_FOR_UPDATE
> That seems to have issues, as documented in the link above. Also, since
> every compute node does that, it would also serialize the bringup of all
> instances with accelerators, across the cluster.
>
> If there is a better solution, I'll be happy to hear it.
>
> Thanks,
> Sundar
>
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][placement] Trying to summarize bp/glance-image-traits scheduling alternatives for rebuild

2018-04-24 Thread Sylvain Bauza
Sorry folks for the late reply, I'll try to also weigh in the Gerrit change.

On Tue, Apr 24, 2018 at 2:55 PM, Jay Pipes  wrote:

> On 04/23/2018 05:51 PM, Arvind N wrote:
>
>> Thanks for the detailed options Matt/eric/jay.
>>
>> Just few of my thoughts,
>>
>> For #1, we can make the explanation very clear that we rejected the
>> request because the original traits specified in the original image and the
>> new traits specified in the new image do not match and hence rebuild is not
>> supported.
>>
>
> I believe I had suggested that on the spec amendment patch. Matt had
> concerns about an error message being a poor user experience (I don't
> necessarily disagree with that) and I had suggested a clearer error message
> to try and make that user experience slightly less sucky.
>
> For #3,
>>
>> Even though it handles the nested provider, there is a potential issue.
>>
>> Lets say a host with two SRIOV nic. One is normal SRIOV nic(VF1), another
>> one with some kind of offload feature(VF2).(Described by alex)
>>
>> Initial instance launch happens with VF:1 allocated, rebuild launches
>> with modified request with traits=HW_NIC_OFFLOAD_X, so basically we want
>> the instance to be allocated VF2.
>>
>> But the original allocation happens against VF1 and since in rebuild the
>> original allocations are not changed, we have wrong allocations.
>>
>
> Yep, that is certainly an issue. The only solution to this that I can see
> would be to have the conductor ask the compute node to do the pre-flight
> check. The compute node already has the entire tree of providers, their
> inventories and traits, along with information about providers that share
> resources with the compute node. It has this information in the
> ProviderTree object in the reportclient that is contained in the compute
> node resource tracker.
>
> The pre-flight check, if run on the compute node, would be able to grab
> the allocation records for the instance and determine if the required
> traits for the new image are present on the actual resource providers
> allocated against for the instance (and not including any child providers
> not allocated against).
>
>
Yup, that. We also have pre-flight checks for move operations like live and
cold migrations, and I'd really like to keep all the conditionals in the
conductor, because it knows better than the scheduler which operation is
asked.
I'm not really happy with adding more in the scheduler about "yeah, it's a
rebuild, so please do something exceptional", and I'm also not happy with
having a filter (that can be disabled) calling the Placement API.


> Or... we chalk this up as a "too bad" situation and just either go with
> option #1 or simply don't care about it.


Also, that too. Maybe just provide an error should be enough, nope?
Operators, what do you think ? (cross-calling openstack-operators@)

 -Sylvain


>
> Best,
> -jay
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [placement][nova] Decision time on granular request groups for like resources

2018-04-19 Thread Sylvain Bauza
2018-04-19 10:38 GMT+02:00 Balázs Gibizer :

>
>
> On Thu, Apr 19, 2018 at 12:45 AM, Eric Fried  wrote:
>
>>  I have a feeling we're just going to go back and forth on this, as we
>>>  have for weeks now, and not reach any conclusion that is satisfactory to
>>>  everyone. And we'll delay, yet again, getting functionality into this
>>>  release that serves 90% of use cases because we are obsessing over the
>>>  0.01% of use cases that may pop up later.
>>>
>>
>> So I vote that, for the Rocky iteration of the granular spec, we add a
>> single `proximity={isolate|any}` qparam, required when any numbered
>> request groups are specified.  I believe this allows us to satisfy the
>> two NUMA use cases we care most about: "forced sharding" and "any fit".
>> And as you demonstrated, it leaves the way open for finer-grained and
>> more powerful semantics to be added in the future.
>>
>
> Can the proximity param specify relationship between the un-numbered and
> the numbered groups as well or only between numbered groups?
> Besides that I'm +1 about proxyimity={isolate|any}
>
>
What's the default behaviour if we aren't providing the proximity qparam ?
Isolate or any ?


> Cheers,
> gibi
>
>
>
>> -efried
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] placement update 18-14

2018-04-09 Thread Sylvain Bauza
On Fri, Apr 6, 2018 at 2:54 PM, Chris Dent  wrote:

>
> This is "contract" style update. New stuff will not be added to the
> lists.
>
> # Most Important
>
> There doesn't appear to be anything new with regard to most
> important. That which was important remains important. At the
> scheduler team meeting at the start of the week there was talk of
> working out ways to trim the amount of work in progress by using the
> nova priorities tracking etherpad to help sort things out:
>
> https://etherpad.openstack.org/p/rocky-nova-priorities-tracking
>
> Update provider tree and nested allocation candidates remain
> critical basic functionality on which much else is based. With most
> of provider tree done, it's really on nested allocation candidates.
>
> # What's Changed
>
> Quite a bit of provider tree related code has merged.
>
> Some negotiation happened with regard to when/if the fixes for
> shared providers is going to happen. I'm not sure how that resolved,
> if someone can follow up with that, that would be most excellent.
>
> Most of the placement-req-filter series merged.
>
> The spec for error codes in the placement API merged (code is in
> progress and ready for review, see below).
>
> # Questions
>
> * Eric and I discussed earlier in the week that it might be a good
>   time to start an #openstack-placement IRC channel, for two main
>   reasons: break things up so as to limit the crosstalk in the often
>   very busy #openstack-nova channel and to lend a bit of momentum
>   for going in that direction. Is this okay with everyone? If not,
>   please say so, otherwise I'll make it happen soon.
>
>
Fine by me. It's sometimes difficult to follow all the conversations so
having a separate channel looks good to me, at least for discussing only
about specific Placement questions.
For Nova related points (like how to use nested RPs for example with NUMA),
maybe #openstack-nova is still the main IRC channel for that.


* Shared providers status?
>   (I really think we need to make this go. It was one of the
>   original value propositions of placement: being able to accurate
>   manage shared disk.)
>
> # Bugs
>
> * Placement related bugs not yet in progress:  https://goo.gl/TgiPXb
>15, -1 on last week
> * In progress placement bugs: https://goo.gl/vzGGDQ
>13, +1 on last week
>
> # Specs
>
> These seem to be divided into three classes:
>
> * Normal stuff
> * Old stuff not getting attention or newer stuff that ought to be
>   abandoned because of lack of support
> * Anything related to the client side of using nested providers
>   effectively. This apparently needs a lot of thinking. If there are
>   some general sticking points we can extract and resolve, that
>   might help move the whole thing forward?
>
> * https://review.openstack.org/#/c/549067/
>   VMware: place instances on resource pool
>   (using update_provider_tree)
>
> * https://review.openstack.org/#/c/545057/
>   mirror nova host aggregates to placement API
>
> * https://review.openstack.org/#/c/552924/
>  Proposes NUMA topology with RPs
>
> * https://review.openstack.org/#/c/544683/
>  Account for host agg allocation ratio in placement
>
> * https://review.openstack.org/#/c/552927/
>  Spec for isolating configuration of placement database
>  (This has a strong +2 on it but needs one more.)
>
> * https://review.openstack.org/#/c/552105/
>  Support default allocation ratios
>
> * https://review.openstack.org/#/c/438640/
>  Spec on preemptible servers
>
> * https://review.openstack.org/#/c/556873/
>Handle nested providers for allocation candidates
>
> * https://review.openstack.org/#/c/556971/
>Add Generation to Consumers
>
> * https://review.openstack.org/#/c/557065/
>Proposes Multiple GPU types
>
> * https://review.openstack.org/#/c/555081/
>Standardize CPU resource tracking
>
> * https://review.openstack.org/#/c/502306/
>Network bandwidth resource provider
>
> * https://review.openstack.org/#/c/509042/
>Propose counting quota usage from placement
>
> # Main Themes
>
> ## Update Provider Tree
>
> Most of the main guts of this have merged (huzzah!). What's left are
> some loose end details, and clean handling of aggregates:
>
> https://review.openstack.org/#/q/topic:bp/update-provider-tree
>
> ## Nested providers in allocation candidates
>
> Representing nested provides in the response to GET
> /allocation_candidates is required to actually make use of all the
> topology that update provider tree will report. That work is in
> progress at:
>
> https://review.openstack.org/#/q/topic:bp/nested-resource-providers
> https://review.openstack.org/#/q/topic:bp/nested-resource-pr
> oviders-allocation-candidates
>
> Note that some of this includes the up-for-debate shared handling.
>
> ## Request Filters
>
> As far as I can tell this is mostly done (yay!) but there is a loose
> end: We merged an updated spec to support multiple member_of
> 

Re: [openstack-dev] [nova] Proposing Eric Fried for nova-core

2018-03-27 Thread Sylvain Bauza
+1

On Tue, Mar 27, 2018 at 4:00 AM, melanie witt  wrote:

> Howdy everyone,
>
> I'd like to propose that we add Eric Fried to the nova-core team.
>
> Eric has been instrumental to the placement effort with his work on nested
> resource providers and has been actively contributing to many other areas
> of openstack [0] like project-config, gerritbot, keystoneauth, devstack,
> os-loganalyze, and so on.
>
> He's an active reviewer in nova [1] and elsewhere in openstack and reviews
> in-depth, asking questions and catching issues in patches and working with
> authors to help get code into merge-ready state. These are qualities I look
> for in a potential core reviewer.
>
> In addition to all that, Eric is an active participant in the project in
> general, helping people with questions in the #openstack-nova IRC channel,
> contributing to design discussions, helping to write up outcomes of
> discussions, reporting bugs, fixing bugs, and writing tests. His
> contributions help to maintain and increase the health of our project.
>
> To the existing core team members, please respond with your comments, +1s,
> or objections within one week.
>
> Cheers,
> -melanie
>
> [0] https://review.openstack.org/#/q/owner:efried
> [1] http://stackalytics.com/report/contribution/nova/90
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Rocky spec review day

2018-03-21 Thread Sylvain Bauza
On Wed, Mar 21, 2018 at 2:12 PM, Eric Fried  wrote:

> +1 for the-earlier-the-better, for the additional reason that, if we
> don't finish, we can do another one in time for spec freeze.
>
>
+1 for Wed 27th March.



> And I, for one, wouldn't be offended if we could "officially start
> development" (i.e. focus on patches, start runways, etc.) before the
> mystical but arbitrary spec freeze date.
>
>
Sure, but given we have a lot of specs to review, TBH it'll be possible for
me to look at implementation patches only close to the 1st milestone.



> On 03/20/2018 07:29 PM, Matt Riedemann wrote:
> > On 3/20/2018 6:47 PM, melanie witt wrote:
> >> I was thinking that 2-3 weeks ahead of spec freeze would be
> >> appropriate, so that would be March 27 (next week) or April 3 if we do
> >> it on a Tuesday.
> >
> > It's spring break here on April 3 so I'll be listening to screaming
> > kids, I mean on vacation. Not that my schedule matters, just FYI.
> >
> > But regardless of that, I think the earlier the better to flush out
> > what's already there, since we've already approved quite a few
> > blueprints this cycle (32 to so far).
> >
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Adding Takashi Natsume to python-novaclient core

2018-02-09 Thread Sylvain Bauza
+1, no objections so far.

On Fri, Feb 9, 2018 at 4:01 PM, Matt Riedemann  wrote:

> I'd like to add Takashi to the python-novaclient core team.
>
> python-novaclient doesn't get a ton of activity or review, but Takashi has
> been a solid reviewer and contributor to that project for quite awhile now:
>
> http://stackalytics.com/report/contribution/python-novaclient/180
>
> He's always fast to get new changes up for microversion support and help
> review others that are there to keep moving changes forward.
>
> So unless there are objections, I'll plan on adding Takashi to the
> python-novaclient-core group next week.
>
> --
>
> Thanks,
>
> Matt
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] PTL Election Season

2018-01-23 Thread Sylvain Bauza
On Tue, Jan 23, 2018 at 12:09 AM, Matt Riedemann 
wrote:

> On 1/15/2018 11:04 AM, Kendall Nelson wrote:
>
>> Election details: https://governance.openstack.org/election/
>>
>> Please read the stipulations and timelines for candidates and electorate
>> contained in this governance documentation.
>>
>> Be aware, in the PTL elections if the program only has one candidate,
>> that candidate is acclaimed and there will be no poll. There will only be a
>> poll if there is more than one candidate stepping forward for a program's
>> PTL position.
>>
>> There will be further announcements posted to the mailing list as action
>> is required from the electorate or candidates. This email is for
>> information purposes only.
>>
>> If you have any questions which you feel affect others please reply to
>> this email thread.
>>
>>
> To anyone that cares, I don't plan on running for Nova PTL again for the
> Rocky release. Queens was my fourth tour and it's definitely time for
> someone else to get the opportunity to lead here. I don't plan on going
> anywhere and I'll be here to help with any transition needed assuming
> someone else (or a couple of people hopefully) will run in the election.
> It's been a great experience and I thank everyone that has had to put up
> with me and my obsessive paperwork and process disorder in the meantime.
>
>
Matt, you were a very good PTL. Not only because your reviews (after all,
you'll still review changes next cycle ;) ) but also because you were
helping others with their blueprints or questions if they had.
Keeping up with the Riedemann !
-S

> --
>
> Thanks,
>
> Matt
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] heads up to users of Aggregate[Core|Ram|Disk]Filter: behavior change in >= Ocata

2018-01-17 Thread Sylvain Bauza
On Wed, Jan 17, 2018 at 2:22 PM, Jay Pipes  wrote:

> On 01/16/2018 08:19 PM, Zhenyu Zheng wrote:
>
>> Thanks for the info, so it seems we are not going to implement aggregate
>> overcommit ratio in placement at least in the near future?
>>
>
> As @edleafe alluded to, we will not be adding functionality to the
> placement service to associate an overcommit ratio with an aggregate. This
> was/is buggy functionality that we do not wish to bring forward into the
> placement modeling system.
>
> Reasons the current functionality is poorly architected and buggy
> (mentioned in @melwitt's footnote):
>
> 1) If a nova-compute service's CONF.cpu_allocation_ratio is different from
> the host aggregate's cpu_allocation_ratio metadata value, which value
> should be considered by the AggregateCoreFilter filter?
>
> 2) If a nova-compute service is associated with multiple host aggregates,
> and those aggregates contain different values for their
> cpu_allocation_ratio metadata value, which one should be used by the
> AggregateCoreFilter?
>
> The bottom line for me is that the AggregateCoreFilter has been used as a
> crutch to solve a **configuration management problem**.
>
> Instead of the configuration management system (Puppet, etc) setting
> nova-compute service CONF.cpu_allocation_ratio options *correctly*, having
> the admin set the HostAggregate metadata cpu_allocation_ratio value is
> error-prone for the reasons listed above.
>
>
Well, the main cause why people started to use AggregateCoreFilter and
others is because pre-Newton, it was litterally impossible to assign
different allocation ratios in between computes except if you were grouping
them in aggregates and using those filters.
Now that ratios are per-compute, there is no need to keep those filters
except if you don't touch computes nova.conf's so that it defaults to the
scheduler ones. The crazy usecase would be like "I have 1000+ computes and
I just want to apply specific ratios to only one or two" but then, I'd
second Jay and say "Config management is the solution to your problem".



> Incidentally, this same design flaw is the reason that availability zones
> are so poorly defined in Nova. There is actually no such thing as an
> availability zone in Nova. Instead, an AZ is merely a metadata tag (or a
> CONF option! :( ) that may or may not exist against a host aggregate.
> There's lots of spaghetti in Nova due to the decision to use host aggregate
> metadata for availability zone information, which should have always been
> the domain of a **configuration management system** to set. [*]
>
>
IMHO, not exactly the root cause why we have spaghetti code for AZs. I
rather like the idea to see an availability zone as just a user-visible
aggregate, because it makes things simple to understand.
What the spaghetti code is due to is because the transitive relationship
between an aggregate, a compute and an instance is misunderstood and we
introduced the notion of "instance AZ" which is a fool. Instances shouldn't
have a field saying "here is my AZ", it should rather be a flag saying
"what the user wanted as AZ ? (None being a choice) "


In the Placement service, we have the concept of aggregates, too. However,
> in Placement, an aggregate (note: not "host aggregate") is merely a
> grouping mechanism for resource providers. Placement aggregates do not have
> any attributes themselves -- they merely represent the relationship between
> resource providers. Placement aggregates suffer from neither of the above
> listed design flaws because they are not buckets for metadata.
>
> ok .
>
> Best,
> -jay
>
> [*] Note the assumption on line 97 here:
>
> https://github.com/openstack/nova/blob/master/nova/availabil
> ity_zones.py#L96-L100
>
> On Wed, Jan 17, 2018 at 5:24 AM, melanie witt  melwi...@gmail.com>> wrote:
>>
>> Hello Stackers,
>>
>> This is a heads up to any of you using the AggregateCoreFilter,
>> AggregateRamFilter, and/or AggregateDiskFilter in the filter
>> scheduler. These filters have effectively allowed operators to set
>> overcommit ratios per aggregate rather than per compute node in <=
>> Newton.
>>
>> Beginning in Ocata, there is a behavior change where aggregate-based
>> overcommit ratios will no longer be honored during scheduling.
>> Instead, overcommit values must be set on a per compute node basis
>> in nova.conf.
>>
>> Details: as of Ocata, instead of considering all compute nodes at
>> the start of scheduler filtering, an optimization has been added to
>> query resource capacity from placement and prune the compute node
>> list with the result *before* any filters are applied. Placement
>> tracks resource capacity and usage and does *not* track aggregate
>> metadata [1]. Because of this, placement cannot consider
>> aggregate-based overcommit and will exclude compute nodes that do
>> not have capacity based on per compute node 

Re: [openstack-dev] [all] Switching to longer development cycles

2017-12-14 Thread Sylvain Bauza
On Thu, Dec 14, 2017 at 10:09 AM, Thierry Carrez 
wrote:

> Matt Riedemann wrote:
> > On 12/13/2017 4:15 PM, Thierry Carrez wrote:
> >> Based on several discussions I had with developers working part-time on
> >> OpenStack at various events lately, it sounded like slowing down our
> >> pace could be helpful to them and generally reduce stress in OpenStack
> >> development. I know people who can spend 100% of their time upstream can
> >> cope with our current rhythm. I just observe that we have less and less
> >> of those full-time people and need to attract more of the part-time one.
> >>
> >> If this proposal is not helping developers and making OpenStack
> >> development less painful, I don't think we should do it:)
> >
> > Given I have the luxury of working mostly full time upstream, I've
> > obviously got a skewed perspective on this whole discussion.
> >
> > I am interested in which part time developers are having issues keeping
> > up and how, i.e. are these core team members that don't feel they can be
> > good core reviewers if they aren't around enough to keep up with the
> > changes that are happening? I could definitely see a case like that with
> > some of the complicated stuff going on in nova like the placement and
> > cells v2 work.
>
> There was two kind of feedback.
>
> - People who would like to get more involved (and become core
> developers) but can't keep up with what's happening in their project or
> reach the review activity necessary to become core developers. A number
> of projects are struggling to recruit new core reviewers, so I think
> reducing expectations and slowing down the pace could help there.
>
>

Like others said, I don't see how slowing down the number of releases will
help those people to catch up with the project updates since features
should in theory be delivered at the same pace (if we don't loose developer
interests in developing features, which is one of my concerns).
Struggling with managing your day-to-day work really means you're in
trouble, and you need to find ways to prioritize your tasks, IMHO. Be
verbose, talk to your manager, try to find how you can increase time for
OpenStack. And yes, it's hard and I'm personnally facing problems too.


- People in projects that are more mature (or packaging projects) where
> the process overhead (coordinated releases, elections, PTG
> preparation...) ends up representing a significant chunk of the work.
> For those it felt like longer cycles would reduce that overhead and give
> them more time to do real work.
>
>
If we do intermediary releases, it doesn't really solve the problem either,
right?
There are actually two separate concerns from what I can read :
 - release coordination (tagging a milestone or branching a release)
 - governance and community coordination (electing a PTL or organizing a
PTG)


The former concern does seem to me less impactful now because thanks to a
couple of major improvements with Zuul and release management, we're less
depending on humans. And, again, doing intermediary releases would still
burn people's time, right ?
The latter is really a specific problem tied to a specific group of people
(namely the PTLs and the election team). Couldn't we rather review what
those people need to do and how we can help them to reduce the burden ?


> If we're talking about part time contributors that are contributing bug
> > fixes here and there, docs patches, random reviews, I'm not sure how
> > this is substantially better for them.
>
> No, I'm more talking about someone who can dedicate one day per week to
> OpenStack development, and who currently struggle to be part of a team
> where everyone has to be >80% to keep up with the rhythm.
>
> > We've said in this thread that project teams are encouraged to still do
> > intermediate releases, often. And we're still going to be working on
> > features, so how does that help slow things down for the part time
> > contributor?
>
> My hope is that by reducing the "coordinated release" pressure, we'd
> encourage project teams to adopt a rhythm that is more natural for them.
> Some teams would still be pretty active with lots of intermediary
> releases, while some others would have an easier time.
>
>

Well, I do feel that if we decide to go with a 1-year timeframe for a
"coordinated" release, it would highly raise the price of that release.
Imagine what it means for someone who really desperately want to integrate
a feature that require some kernel update and fancy network driver thingy.
If you loose the target, you're a dead man for 1 year. Of course, you can
target an intermediary release but since we only "coordinate" yearly, how
can you be sure that packagers will ship your feature from that
intermediary release ?

> If *everyone* must slow down then that's going to be a problem I think,
> > unless we do something like alternating intermediate releases where
> > there are new features and then only bug fixes, something like 

Re: [openstack-dev] [ironic] ironic and traits

2017-10-23 Thread Sylvain Bauza
On Mon, Oct 23, 2017 at 2:54 PM, Eric Fried  wrote:

> I agree with Sean.  In general terms:
>
> * A resource provider should be marked with a trait if that feature
>   * Can be turned on or off (whether it's currently on or not); or
>   * Is always on and can't ever be turned off.
>

No, traits are not boolean. If a resource provider stops providing a
capability, then the existing related trait should just be removed, that's
it.
If you see a trait, that's just means that the related capability for the
Resource Provider is supported, that's it too.

MHO.

-Sylvain



> * A consumer wanting that feature present (doesn't matter whether it's
> on or off) should specify it as a required *trait*.
> * A consumer wanting that feature present and turned on should
>   * Specify it as a required trait; AND
>   * Indicate that it be turned on via some other mechanism (e.g. a
> separate extra_spec).
>
> I believe this satisfies Dmitry's (Ironic's) needs, but also Jay's drive
> for placement purity.
>
> Please invite me to the hangout or whatever.
>
> Thanks,
> Eric
>
> On 10/23/2017 07:22 AM, Mooney, Sean K wrote:
> >
> >
> >
> >
> > *From:*Jay Pipes [mailto:jaypi...@gmail.com]
> > *Sent:* Monday, October 23, 2017 12:20 PM
> > *To:* OpenStack Development Mailing List  openstack.org>
> > *Subject:* Re: [openstack-dev] [ironic] ironic and traits
> >
> >
> >
> > Writing from my phone... May I ask that before you proceed with any plan
> > that uses traits for state information that we have a hangout or
> > videoconference to discuss this? Unfortunately today and tomorrow I'm
> > not able to do a hangout but I can do one on Wednesday any time of the
> day.
> >
> >
> >
> > */[Mooney, Sean K] on the uefi boot topic I did bring up at the ptg that
> > we wanted to standardizes tratis for “verified boot” /*
> >
> > */that included a trait for uefi secure boot enabled and to indicated a
> > hardware root of trust, e.g. intel boot guard or similar/*
> >
> > */we distinctly wanted to be able to tag nova compute hosts with those
> > new traits so we could require that vms that request/*
> >
> > */a host with uefi secure boot enabled and a hardware root of trust are
> > scheduled only to those nodes. /*
> >
> > */ /*
> >
> > */There are many other examples that effect both vms and bare metal such
> > as, ecc/interleaved memory, cluster on die, /*
> >
> > */l3 cache code and data prioritization, vt-d/vt-c, HPET, Hyper
> > threading, power states … all of these feature may be present on the
> > platform/*
> >
> > */but I also need to know if they are turned on. Ruling out state in
> > traits means all of this logic will eventually get pushed to scheduler
> > filters/*
> >
> > */which will be suboptimal long term as more state is tracked. Software
> > defined infrastructure may be the future but hardware defined software/*
> >
> > */is sadly the present…/*
> >
> > */ /*
> >
> > */I do however think there should be a sperateion between asking for a
> > host that provides x with a trait and  asking for x to be configure via/*
> >
> > */A trait. The trait secure_boot_enabled should never result in the
> > feature being enabled It should just find a host with it on. If you
> want/*
> >
> > */To request it to be turned on you would request a host with
> > secure_boot_capable as a trait and have a flavor extra spec or image
> > property to request/*
> >
> > */Ironic to enabled it.  these are two very different request and should
> > not be treated the same. /*
> >
> >
> >
> >
> >
> > Lemme know!
> >
> > -jay
> >
> >
> >
> > On Oct 23, 2017 5:01 AM, "Dmitry Tantsur"  > > wrote:
> >
> > Hi Jay!
> >
> > I appreciate your comments, but I think you're approaching the
> > problem from purely VM point of view. Things simply don't work the
> > same way in bare metal, at least not if we want to provide the same
> > user experience.
> >
> >
> >
> > On Sun, Oct 22, 2017 at 2:25 PM, Jay Pipes  > > wrote:
> >
> > Sorry for delay, took a week off before starting a new job.
> > Comments inline.
> >
> > On 10/16/2017 12:24 PM, Dmitry Tantsur wrote:
> >
> > Hi all,
> >
> > I promised John to dump my thoughts on traits to the ML, so
> > here we go :)
> >
> > I see two roles of traits (or kinds of traits) for bare
> metal:
> > 1. traits that say what the node can do already (e.g. "the
> > node is
> > doing UEFI boot")
> > 2. traits that say what the node can be *configured* to do
> > (e.g. "the node can
> > boot in UEFI mode")
> >
> >
> > There's only one role for traits. #2 above. #1 is state
> > information. Traits are not for state information. Traits are
> > only for communicating capabilities of a resource provider
> >  

Re: [openstack-dev] [Blazar] Team mascot idea

2017-10-11 Thread Sylvain Bauza
FWIW, the original name for Blazar was "Climate" so what about a weather
frog ? :-)

-Sylvain

2017-10-11 7:29 GMT+02:00 Masahito MUROI :

> Hi Blazar folks,
>
> As we discussed the topic in the last meeting, we're gathering an idea for
> the project mascot[1].
>
> Current ideas are following fours. If you have or come into mind another
> idea, please replay this mail.  We'll decided the candidacy at the next
> meeting.
>
> - house mouse
> - squirrel
> - shrike
> - blazar
>
>
> 1. https://www.openstack.org/project-mascots
>
> best regards,
> Masahito
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-09-29 Thread Sylvain Bauza
On Fri, Sep 29, 2017 at 2:32 AM, Dan Smith  wrote:

> In this serie of patches we are generalizing the PCI framework to
>>> handle MDEV devices. We arguing it's a lot of patches but most of them
>>> are small and the logic behind is basically to make it understand two
>>> new fields MDEV_PF and MDEV_VF.
>>>
>>
>> That's not really "generalizing the PCI framework to handle MDEV devices"
>> :) More like it's just changing the /pci module to understand a different
>> device management API, but ok.
>>
>
> Yeah, the series is adding more fields to our PCI structure to allow for
> more variations in the kinds of things we lump into those tables. This is
> my primary complaint with this approach, and has been since the topic first
> came up. I really want to avoid building any more dependency on the
> existing pci-passthrough mechanisms and focus any new effort on using
> resource providers for this. The existing pci-passthrough code is almost
> universally hated, poorly understood and tested, and something we should
> not be further building upon.
>
> In this serie of patches we make libvirt driver support, as usually,
>>> return resources and attach devices returned by the pci manager. This
>>> part can be reused for Resource Provider.
>>>
>>
>> Perhaps, but the idea behind the resource providers framework is to treat
>> devices as generic things. Placement doesn't need to know about the
>> particular device attachment status.
>>
>
> I quickly went through the patches and left a few comments. The base work
> of pulling some of this out of libvirt is there, but it's all focused on
> the act of populating pci structures from the vgpu information we get from
> libvirt. That code could be made to instead populate a resource inventory,
> but that's about the most of the set that looks applicable to the
> placement-based approach.
>
>
I'll review them too.

As mentioned in IRC and the previous ML discussion, my focus is on the
>> nested resource providers work and reviews, along with the other two
>> top-priority scheduler items (move operations and alternate hosts).
>>
>> I'll do my best to look at your patch series, but please note it's lower
>> priority than a number of other items.
>>
>
> FWIW, I'm not really planning to spend any time reviewing it until/unless
> it is retooled to generate an inventory from the virt driver.
>
> With the two patches that report vgpus and then create guests with them
> when asked converted to resource providers, I think that would be enough to
> have basic vgpu support immediately. No DB migrations, model changes, etc
> required. After that, helping to get the nested-rps and traits work landed
> gets us the ability to expose attributes of different types of those vgpus
> and opens up a lot of possibilities. IMHO, that's work I'm interested in
> reviewing.
>

That's exactly the things I would like to provide for Queens, so operators
would have a possibility to have flavors asking for vGPU resources in
Queens, even if they couldn't yet ask for a specific VGPU type yet (or
asking to be in the same NUMA cell than the CPU). The latter is definitely
needing to have nested resource providers, but the former (just having vGPU
resource classes provided by the virt driver) is possible for Queens.



> One thing that would be very useful, Sahid, if you could get with Eric
>> Fried (efried) on IRC and discuss with him the "generic device management"
>> system that was discussed at the PTG. It's likely that the /pci module is
>> going to be overhauled in Rocky and it would be good to have the mdev
>> device management API requirements included in that discussion.
>>
>
> Definitely this.
>

++


> --Dan
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Forum topics brainstorming

2017-09-29 Thread Sylvain Bauza
2017-09-28 23:45 GMT+02:00 Matt Riedemann :

> On 9/21/2017 4:01 PM, Matt Riedemann wrote:
>
>> So this shouldn't be news now that I've read back through a few emails in
>> the mailing list (I've been distracted with the Pike release, PTG planning,
>> etc) [1][2][3] but we have until Sept 29 to come up with whatever forum
>> sessions we want to propose.
>>
>> There is already an etherpad for Nova [4].
>>
>> The list of proposed topics is here [5]. The good news is we're not the
>> last ones to this party.
>>
>> So let's start throwing things on the etherpad and figure out what we
>> want to propose as forum session topis. If memory serves me, in Pike we
>> were pretty liberal in what we proposed.
>>
>> [1] http://lists.openstack.org/pipermail/openstack-dev/2017-Sept
>> ember/121783.html
>> [2] http://lists.openstack.org/pipermail/openstack-dev/2017-Sept
>> ember/122143.html
>> [3] http://lists.openstack.org/pipermail/openstack-dev/2017-Sept
>> ember/122454.html
>> [4] https://etherpad.openstack.org/p/SYD-nova-brainstorming
>> [5] http://forumtopics.openstack.org/
>>
>>
> The deadline for Queens Forum topic submissions is tomorrow. Based on our
> etherpad:
>
> https://etherpad.openstack.org/p/SYD-nova-brainstorming
>
> I plan to propose something like:
>
> 1. Cells v2 update and direction
>
> This would be an update on what happened in Pike, upgrade impacts, known
> issues, etc and what we're doing in Queens. I think we'd also lump the Pike
> quota behavior changes in here too if possible.
>
> 2. Placement update and direction
>
> Same as the Cells v2 discussion - a Pike update and the focus items for
> Queens. This would also be a place we can mention the Ironic flavor
> migration to custom resource classes that happens in Pike.
>
> 3. Queens development focus and checkpoint
>
> This would be a session to discuss anything in flight for Queens, what
> we're working on, and have a chance to ask questions of operators/users for
> feedback. For example, we plan to add vGPU support but it will be quite
> simple to start, similar with volume multi-attach.
>
> 4. Michael Still had an item in the etherpad about privsep. That could be
> a cross-project educational session on it's own if he's going to give a
> primer on what privsep is again and how it's integrated into projects. This
> session could be lumped into #3 above but is probably better on it's own if
> it's going to include discussion about operational impacts. I'm going to
> ask that mikal runs with this though.
>
> 
>
> There are some other things in the etherpad about hardware acceleration
> features and documentation, and I'll leave it up to others if they want to
> propose those sessions.
>
>

Yup, I provided two proposals :
http://forumtopics.openstack.org/cfp/details/47 for discussing about
documentation and release notes
http://forumtopics.openstack.org/cfp/details/48 for talking about how
operators could use OSC and make sure it works for the maximum version (at
least knowing the gap we have).

-Sylvain



-- 

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Is there any reason to exclude originally failed build hosts during live migration?

2017-09-20 Thread Sylvain Bauza
On Wed, Sep 20, 2017 at 10:15 PM, melanie witt  wrote:

> On Wed, 20 Sep 2017 13:47:18 -0500, Matt Riedemann wrote:
>
>> Presumably there was a good reason why the instance failed to build on a
>> host originally, but that could be for any number of reasons: resource
>> claim failed during a race, configuration issues, etc. Since we don't
>> really know what originally happened, it seems reasonable to not exclude
>> originally attempted build targets since the scheduler filters should still
>> validate them during live migration (this is all assuming you're not using
>> the 'force' flag with live migration - and if you are, all bets are off).
>>
>
> Yeah, I think because an original failure to build could have been a
> failed claim during a race, config issue, or just been a very long time
> ago, we shouldn't continue to exclude those hosts forever.
>
> If people agree with doing this fix, then we also have to consider making
>> a similar fix for other move operations like cold migrate, evacuate and
>> unshelve. However, out of those other move operations, only cold migrate
>> attempts any retries. If evacuate or unshelve fail on the target host,
>> there is no retry.
>>
>
> I agree with doing that fix for all of the move operations.
>
>
Yeah, a host could be failing when we created that instance 1 year ago,
that doesn't mean the host won't be available this time.

> -melanie
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Should we add the 'force' option to the cold migrate API too?

2017-08-31 Thread Sylvain Bauza
On Wed, Aug 30, 2017 at 5:09 PM, Matt Riedemann  wrote:

> Given the recent bugs [1][2] due to the force flag in the live migrate and
> evacuate APIs related to Placement, and some other long standing bugs about
> bypassing the scheduler [3], I don't think we should add the force option
> to the cold migrate API, as (re-)proposed in Takashi's cold migrate spec
> here [4].
>
> I'm fine with being able to specify a host during cold migrate/resize, but
> I think the specified host should be validated by the scheduler (and
> placement) so that the instance can actually move to that specified
> destination host.
>
> Since we've built more logic into the scheduler in Pike for integration
> with Placement, bypassing that gets us into maintenance issues with having
> to duplicate code throughout conductor and just in general, seems like a
> bad idea to force a host and bypass the scheduler and potentially break the
> instance. Not to mention the complicated logic of passing the host through
> from the API to conductor to the scheduler is it's own maintenance problem
> [5].
>
> Looking back at when the force flag was added to the APIs, it was from
> this spec [6]. Reading that, before that microversion if a host was
> specified we'd bypass the scheduler, so the force flag was really just
> there for backward compatibility


Indeed. That said, I've heard some ops wanting to migrate instances to
computes where resources were not possibly enough to accept the instance
but where it's preferred to have performance problems than just stopped
instances.
If you think about the move operations using the force flag (evacuate and
live-migrate), those were used by operators when they had a problem with a
compute node and they wanted to *evacuate* very quickly instances.




> I guess in case you wanted the option to break the instance or your
> deployment. :) Otherwise after that microversion if you specify a host but
> not the force flag, then we validate the specified host via the scheduler
> first. Given this, and the fact we don't have any backward compatibility to
> maintain with specifying a host for cold migrate, I don't think we need to
> add a force flag for it, unless people really love that option on the live
> migrate and evacuate APIs, but it just seems overly dangerous to me.
>

While I understand operators wanting to *evacuate* instances (or rebuilding
them by using the evacuation API) in case they see problems with hosts, I
don't see why we should need to have a "force" flag for a cold migration if
you're passing a target.
Say :
 - either your compute node is down and then you need to recreate your
customers' instances very quickly : then you call "nova evacuate".
 - or your compute node is still alive but you want to migrate quickly
without telling your customers : then you use "nova live-migrate".

I don't see cases where operators (because passing a target requires you to
be an admin)  would like to cold migrate instances for their customers
without communicating them a specific timeline for the move operation and
so quickly that it would require to use a force flag to bypass the
scheduler.
Maybe I'm wrong but I'm fine with asking Takashi to not add the force flag
in his implementation for the cold migration API and wait for people
wanting to have that flag to propose a specific specification that would
describe the use-case.



>
> Finally, if one is going to make the argument, "but this would be
> consistent with the live migrate and evacuate APIs", I can also point out
> that we don't allow you to specify a host (forced or not) during unshelve
> of a shelved offloaded instance - which is basically a move (new build on a
> new host chosen by the scheduler). I'm not advocating that we make unshelve
> more complicated though, because that's already broken in several known
> ways [7][8][9].
>

Well, we don't have consistent APIs anyway. If you think about all the move
operations plus the boot request itself, each of them is *already* very
different from the other from an API perspective. Yay.



>
> [1] https://bugs.launchpad.net/nova/+bug/1712008
> [2] https://bugs.launchpad.net/nova/+bug/1713786
> [3] https://bugs.launchpad.net/nova/+bug/1427772
> [4] https://review.openstack.org/#/c/489031/
> [5] http://lists.openstack.org/pipermail/openstack-dev/2017-Augu
> st/121342.html
> [6] https://specs.openstack.org/openstack/nova-specs/specs/mitak
> a/implemented/check-destination-on-migrations.html
> [7] https://bugs.launchpad.net/nova/+bug/1675791
> [8] https://bugs.launchpad.net/nova/+bug/1627694
> [9] https://bugs.launchpad.net/nova/+bug/1547142
>
> --
>
> Thanks,
>
> Matt
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

Re: [openstack-dev] [nova] RequestSpec questions about force_hosts/nodes and requested_destination

2017-08-30 Thread Sylvain Bauza
2017-08-30 15:43 GMT+02:00 Sylvain Bauza <sylvain.ba...@gmail.com>:

> Still on PTO for one day, but I'll try to answer. Sorry for the delay, was
> out traveling.
>
>
> 2017-08-21 23:09 GMT+02:00 Matt Riedemann <mriede...@gmail.com>:
>
>> I don't dabble in the RequestSpec code much, but in trying to fix bug
>> 1712008 [1] I'm venturing in there and have some questions. This is mostly
>> an email to Sylvain for when he gets back from vacation but I wanted to
>> dump it before moving forward.
>>
>>
> Heh :-)
>
>
>> Mainly, what is the difference between RequestSpec.force_hosts/force_nodes
>> and RequestSpec.requested_destination?
>>
>>
> force_[hosts/nodes] was a former key in the legacy filter_properties
> dictionary that was passed to the scheduler. When I designed the
> RequestSpec object, I just provided the same names for the fields that were
> related to those keys.
> requested_destination is a new field that wasn't previously in the legacy
> dictionaries but was just added when I wrote in Newton a new feature about
> verifying a target when moving.
>
> There is a main behavioural difference between those two RequestSpec
> fields, that is because the behaviours are different between a boot request
> and a move operation.
> When you boot an instance, you can pass (weirdly and terribly from an API
> point) a target for that instance by using the availability_zone flag. Some
> request  like nova boot inst1 --availability_zone nova:foo:bar will
> actually call the scheduler to just verify if host "foo" (and node "bar")
> is alive *without* verifying filters.
> When you move that instance by passing a target (something like "nova
> live-migrate inst1 foo2") will rather call the scheduler to verify only
> that host (and not the others) *by running filters*.
>
> TBC, if you 'force' a boot, it won't be a real forced boot because it will
> still call the scheduler but it will dumbly accept the proposed destination.
> If you want to *propose* a destination for a move, it will ask the
> scheduler to *really* verify that destination (or end up the migration if
> not possible). You can still want to *force* a move, but in that case, it
> just bypasses the scheduler.
>
>
>
>> When should one be used over the other? I take it that
>> requested_destination is the newest and coolest thing and we should always
>> use that first, and that's what the nova-api code is using, but I also see
>> the scheduler code checking force_hosts/force_nodes.
>>
>>
> MHO is that force_hosts is just a technical hack based on an unclear API
> contract about guarantting a boot request to succeed against a specific
> target. I'd rather propose a new API contract for providing a destination
> exactly based on the same semantics that the ones we have for move
> operations, ie. :
> - if a destination is proposed, run scheduler filters only against that
> destination and error out the boot request if the host can't sustain the
> request.
> - if a destination is proposed and a 'force' flag is used, then just pass
> the RPC request to the target compute service and magically expect
> everything will work.
>
> That said, there is a corner case now we have allocations made by the
> scheduler. If we go straight to the compute service, we won't create the
> allocation if a force operation is made. In that case, something (that has
> to be discussed in the spec mentioning the API change) has to reconcile
> allocations for the compute service. Maybe it's just a technical detail,
> but that has to be written in the spec tho.
>
>

Huh, haven't looked at the bug before writing the email. /me facepalms.
Yeah, the bug you wrote is the exact problem I mentioned above...



> Is that all legacy compatibility code? And if so, then why don't we handle
>> requested_destination in RequestSpec routines like
>> reset_forced_destinations() and to_legacy_filter_properties_dict(), i.e.
>> for the latter, if it's a new style RequestSpec with requested_destination
>> set, but we have to backport and call to_legacy_filter_properties_dict(),
>> shouldn't requested_destination be used to set force_hosts/force_nodes on
>> the old style filter properties?
>>
>> Since RequestSpec.requested_destination is the thing that restricts a
>> move operation to a single cell, it seems pretty important to always be
>> using that field when forcing where an instance is moving to. But I'm
>> confused about whether or not both requested_destination *and*
>> force_hosts/force_nodes should be set since the compat code doesn't seem to
>> transform the former into the latter.
>>
>>
> I agree, 

Re: [openstack-dev] [nova] RequestSpec questions about force_hosts/nodes and requested_destination

2017-08-30 Thread Sylvain Bauza
Still on PTO for one day, but I'll try to answer. Sorry for the delay, was
out traveling.


2017-08-21 23:09 GMT+02:00 Matt Riedemann :

> I don't dabble in the RequestSpec code much, but in trying to fix bug
> 1712008 [1] I'm venturing in there and have some questions. This is mostly
> an email to Sylvain for when he gets back from vacation but I wanted to
> dump it before moving forward.
>
>
Heh :-)


> Mainly, what is the difference between RequestSpec.force_hosts/force_nodes
> and RequestSpec.requested_destination?
>
>
force_[hosts/nodes] was a former key in the legacy filter_properties
dictionary that was passed to the scheduler. When I designed the
RequestSpec object, I just provided the same names for the fields that were
related to those keys.
requested_destination is a new field that wasn't previously in the legacy
dictionaries but was just added when I wrote in Newton a new feature about
verifying a target when moving.

There is a main behavioural difference between those two RequestSpec
fields, that is because the behaviours are different between a boot request
and a move operation.
When you boot an instance, you can pass (weirdly and terribly from an API
point) a target for that instance by using the availability_zone flag. Some
request  like nova boot inst1 --availability_zone nova:foo:bar will
actually call the scheduler to just verify if host "foo" (and node "bar")
is alive *without* verifying filters.
When you move that instance by passing a target (something like "nova
live-migrate inst1 foo2") will rather call the scheduler to verify only
that host (and not the others) *by running filters*.

TBC, if you 'force' a boot, it won't be a real forced boot because it will
still call the scheduler but it will dumbly accept the proposed destination.
If you want to *propose* a destination for a move, it will ask the
scheduler to *really* verify that destination (or end up the migration if
not possible). You can still want to *force* a move, but in that case, it
just bypasses the scheduler.



> When should one be used over the other? I take it that
> requested_destination is the newest and coolest thing and we should always
> use that first, and that's what the nova-api code is using, but I also see
> the scheduler code checking force_hosts/force_nodes.
>
>
MHO is that force_hosts is just a technical hack based on an unclear API
contract about guarantting a boot request to succeed against a specific
target. I'd rather propose a new API contract for providing a destination
exactly based on the same semantics that the ones we have for move
operations, ie. :
- if a destination is proposed, run scheduler filters only against that
destination and error out the boot request if the host can't sustain the
request.
- if a destination is proposed and a 'force' flag is used, then just pass
the RPC request to the target compute service and magically expect
everything will work.

That said, there is a corner case now we have allocations made by the
scheduler. If we go straight to the compute service, we won't create the
allocation if a force operation is made. In that case, something (that has
to be discussed in the spec mentioning the API change) has to reconcile
allocations for the compute service. Maybe it's just a technical detail,
but that has to be written in the spec tho.


> Is that all legacy compatibility code? And if so, then why don't we handle
> requested_destination in RequestSpec routines like
> reset_forced_destinations() and to_legacy_filter_properties_dict(), i.e.
> for the latter, if it's a new style RequestSpec with requested_destination
> set, but we have to backport and call to_legacy_filter_properties_dict(),
> shouldn't requested_destination be used to set force_hosts/force_nodes on
> the old style filter properties?
>
> Since RequestSpec.requested_destination is the thing that restricts a
> move operation to a single cell, it seems pretty important to always be
> using that field when forcing where an instance is moving to. But I'm
> confused about whether or not both requested_destination *and*
> force_hosts/force_nodes should be set since the compat code doesn't seem to
> transform the former into the latter.
>
>
I agree, that sounds important to me to have consistent behaviours between
all nova operations that require a placement decision. I have that in my
pipe for a very long time (I mean, writing the spec for modifying the API
contract on instance creation) so I push it a bit more.


> If this is all transitional code, we should really document the hell out
> of this in the RequestSpec class itself for anyone trying to write new
> client side code with it, like me.
>
>
That sounds a very good short win to me.

-Sylvain


> [1] https://bugs.launchpad.net/nova/+bug/1712008
>
> --
>
> Thanks,
>
> Matt
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: 

Re: [openstack-dev] [nova] Proposing Balazs Gibizer for nova-core

2017-08-30 Thread Sylvain Bauza
2017-08-29 18:02 GMT+02:00 Matt Riedemann :

> On 8/22/2017 8:18 PM, Matt Riedemann wrote:
>
>> I'm proposing that we add gibi to the nova core team. He's been around
>> for awhile now and has shown persistence and leadership in the
>> multi-release versioned notifications effort, which also included helping
>> new contributors to Nova get involved which helps grow our contributor base.
>>
>> Beyond that though, gibi has a good understanding of several areas of
>> Nova, gives thoughtful reviews and feedback, which includes -1s on changes
>> to get them in shape before a core reviewer gets to them, something I
>> really value and look for in people doing reviews who aren't yet on the
>> core team. He's also really helpful with not only reporting and triaging
>> bugs, but writing tests to recreate bugs so we know when they are fixed,
>> and also works on fixing them - something I expect from a core maintainer
>> of the project.
>>
>> So to the existing core team members, please respond with a yay/nay and
>> after about a week or so we should have a decision (knowing a few cores are
>> on vacation right now).
>>
>>
> It's been a week and we've had enough +1s so it's a done deal.
>
>
Sorry folks, was travelling last week so I haven't seen this email, but
definitely +1ing it.



> Welcome to the nova core team gibi!
>
>
>
Welcome gibi!


> --
>
> Thanks,
>
> Matt
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] bug triage experimentation

2017-06-26 Thread Sylvain Bauza


Le 23/06/2017 18:52, Sean Dague a écrit :
> The Nova bug backlog is just over 800 open bugs, which while
> historically not terrible, remains too large to be collectively usable
> to figure out where things stand. We've had a few recent issues where we
> just happened to discover upgrade bugs filed 4 months ago that needed
> fixes and backports.
> 
> Historically we've tried to just solve the bug backlog with volunteers.
> We've had many a brave person dive into here, and burn out after 4 - 6
> months. And we're currently without a bug lead. Having done a big giant
> purge in the past
> (http://lists.openstack.org/pipermail/openstack-dev/2014-September/046517.html)
> I know how daunting this all can be.
> 
> I don't think that people can currently solve the bug triage problem at
> the current workload that it creates. We've got to reduce the smart
> human part of that workload.
> 

Thanks for sharing ideas, Sean.

> But, I think that we can also learn some lessons from what active github
> projects do.
> 
> #1 Bot away bad states
> 
> There are known bad states of bugs - In Progress with no open patch,
> Assigned but not In Progress. We can just bot these away with scripts.
> Even better would be to react immediately on bugs like those, that helps
> to train folks how to use our workflow. I've got some starter scripts
> for this up at - https://github.com/sdague/nova-bug-tools
> 

Sometimes, I had no idea why but I noticed the Gerrit hook not working
(ie. amending the Launchpad bug with the Gerrit URL) so some of the bugs
I was looking for were actively being worked on (and I had the same
experience myself although my commit msg was pretty correctly marked AFAIR).

Either way, what you propose sounds reasonable to me. If you care about
fixing a bug by putting yourself owner of that bug, that also means you
engage yourself on a resolution sooner than later (even if I do fail
applying that to myself...).

> #2 Use tag based workflow
> 
> One lesson from github projects, is the github tracker has no workflow.
> Issues are openned or closed. Workflow has to be invented by every team
> based on a set of tags. Sometimes that's annoying, but often times it's
> super handy, because it allows the tracker to change workflows and not
> try to change the meaning of things like "Confirmed vs. Triaged" in your
> mind.
> 
> We can probably tag for information we know we need at lot easier. I'm
> considering something like
> 
> * needs.system-version
> * needs.openstack-version
> * needs.logs
> * needs.subteam-feedback
> * has.system-version
> * has.openstack-version
> * has.reproduce
> 
> Some of these a bot can process the text on and tell if that info was
> provided, and comment how to provide the updated info. Some of this
> would be human, but with official tags, it would probably help.
> 

The tags you propose seem to me related to an "Incomplete" vs.
"Confirmed" state of the bug.

If I'm not able to triage the bug because I'm missing information like
the release version or more logs, I put the bug as Incomplete.
I could add those tags, but I don't see where a programmatical approach
could help us.

If I understand correctly, you're rather trying to identify more what's
missing in the bug report to provide a clear path of resolution, so we
could mark the bug as Triaged, right? If so, I'd not propose those tags
for the reason I just said, but rather other tags like (disclaimer, I
suck at naming things):

 - rootcause.found
 - needs.rootcause.analysis
 - is.regression
 - reproduced.locally


> #3 machine assisted functional tagging
> 
> I'm playing around with some things that might be useful in mapping new
> bugs into existing functional buckets like: libvirt, volumes, etc. We'll
> see how useful it ends up being.
> 

Logs parsing could certainly help. If someone is able to provide a clear
stacktrace of some root exception, we can get for free the impact
functional bucket for 80% of cases.

I'm not fan of identifying a domain by text recognition (like that's not
because someone tells about libvirt that this is a libvirt bug tho), so
that's why I'd see more some logs analysis like I mentioned.


> #4 reporting on smaller slices
> 
> Build some tooling to report on the status and change over time of bugs
> under various tags. This will help visualize how we are doing
> (hopefully) and where the biggest piles of issues are.
> 
> The intent is the normal unit of interaction would be one of these
> smaller piles. Be they the 76 libvirt bugs, 61 volumes bugs, or 36
> vmware bugs. It would also highlight the rates of change in these piles,
> and what's getting attention and what is not.
> 

I do wonder if Markus already wrote such reporting tools. AFAIR, he had
a couple of very interesting reportings (and he also worked hard on the
bugs taxonomy) so we could potentially leverage those.

-Sylvain

> 
> This is going to be kind of an ongoing experiment, but as we currently
> have no one spear heading bug triage, it seemed 

Re: [openstack-dev] [ironic][nova] Goodbye^W See you later

2017-06-08 Thread Sylvain Bauza


Le 08/06/2017 14:45, Jim Rollenhagen a écrit :
> Hey friends,
> 
> I've been mostly missing for the past six weeks while looking for a new
> job, so maybe you've forgotten me already, maybe not. I'm happy to tell
> you I've found one that I think is a great opportunity for me. But, I'm
> sad to tell you that it's totally outside of the OpenStack community.
> 
> The last 3.5 years have been amazing. I'm extremely grateful that I've
> been able to work in this community - I've learned so much and met so
> many awesome people. I'm going to miss the insane(ly awesome) level of
> collaboration, the summits, the PTGs, and even some of the bikeshedding.
> We've built amazing things together, and I'm sure y'all will continue to
> do so without me.
> 
> I'll still be lurking in #openstack-dev and #openstack-ironic for a
> while, if people need me to drop a -2 or dictate old knowledge or
> whatever, feel free to ping me. Or if you just want to chat. :)
> 
> <3 jroll
> 
> P.S. obviously my core permissions should be dropped now :P
> 
> 

I'm both sad and happy for you. Mixed feelings but I do think you are
definitly a very good soft and hard skilled person.
Best luck in your next position.

-Sylvain

> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][scheduler][placement] Allocating Complex Resources

2017-06-06 Thread Sylvain Bauza


Le 06/06/2017 15:03, Edward Leafe a écrit :
> On Jun 6, 2017, at 4:14 AM, Sylvain Bauza <sba...@redhat.com
> <mailto:sba...@redhat.com>> wrote:
>>
>> The Plan A option you mention hides the complexity of the
>> shared/non-shared logic but to the price that it would make scheduling
>> decisions on those criteries impossible unless you put
>> filtering/weighting logic into Placement, which AFAIK we strongly
>> disagree with.
> 
> Not necessarily. Well, not now, for sure, but that’s why we need Traits
> to be integrated into Flavors as soon as possible so that we can make
> requests with qualitative requirements, not just quantitative. When that
> work is done, we can add traits to differentiate local from shared
> storage, just like we have traits to distinguish HDD from SSD. So if a
> VM with only local disk is needed, that will be in the request, and
> placement will never return hosts with shared storage. 
> 

Well, there is a whole difference between defining constraints into
flavors, and making a general constraint on a filter basis which is
opt-able by config.

Operators could claim that they would need to update all their N flavors
in order to achieve a strict separation for not-shared-with resource
providers, which would somehow leak into the fact that users would have
flavors that differ for that aspect.

I'm not saying it's not good to mark traits into flavor extraspecs,
sometimes they're all good, but I do care of the flavor count explosion
if we begin putting all the filtering logic into extraspecs (plus the
fact it can't be config-manageable like filters are at the moment).

-Sylvain

> -- Ed Leafe
> 
> 
> 
> 
> 
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][scheduler][placement] Allocating Complex Resources

2017-06-06 Thread Sylvain Bauza
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256



Le 05/06/2017 23:22, Ed Leafe a écrit :
> We had a very lively discussion this morning during the Scheduler
> subteam meeting, which was continued in a Google hangout. The
> subject was how to handle claiming resources when the Resource
> Provider is not "simple". By "simple", I mean a compute node that
> provides all of the resources itself, as contrasted with a compute
> node that uses a shared storage for disk space, or which has
> complex nested relationships with things such as PCI devices or
> NUMA nodes. The current situation is as follows:
> 
> a) scheduler gets a request with certain resource requirements
> (RAM, disk, CPU, etc.) b) scheduler passes these resource
> requirements to placement, which returns a list of hosts (compute
> nodes) that can satisfy the request. c) scheduler runs these
> through some filters and weighers to get a list ordered by best
> "fit" d) it then tries to claim the resources, by posting to
> placement allocations for these resources against the selected
> host e) once the allocation succeeds, scheduler returns that host
> to conductor to then have the VM built
> 
> (some details for edge cases left out for clarity of the overall
> process)
> 
> The problem we discussed comes into play when the compute node
> isn't the actual provider of the resources. The easiest example to
> consider is when the computes are associated with a shared storage
> provider. The placement query is smart enough to know that even if
> the compute node doesn't have enough local disk, it will get it
> from the shared storage, so it will return that host in step b)
> above. If the scheduler then chooses that host, when it tries to
> claim it, it will pass the resources and the compute node UUID back
> to placement to make the allocations. This is the point where the
> current code would fall short: somehow, placement needs to know to
> allocate the disk requested against the shared storage provider,
> and not the compute node.
> 
> One proposal is to essentially use the same logic in placement that
> was used to include that host in those matching the requirements.
> In other words, when it tries to allocate the amount of disk, it
> would determine that that host is in a shared storage aggregate,
> and be smart enough to allocate against that provider. This was
> referred to in our discussion as "Plan A".
> 
> Another proposal involved a change to how placement responds to the
> scheduler. Instead of just returning the UUIDs of the compute nodes
> that satisfy the required resources, it would include a whole bunch
> of additional information in a structured response. A straw man
> example of such a response is here:
> https://etherpad.openstack.org/p/placement-allocations-straw-man.
> This was referred to as "Plan B". The main feature of this approach
> is that part of that response would be the JSON dict for the
> allocation call, containing the specific resource provider UUID for
> each resource. This way, when the scheduler selects a host, it
> would simply pass that dict back to the /allocations call, and
> placement would be able to do the allocations directly against that
> information.
> 
> There was another issue raised: simply providing the host UUIDs
> didn't give the scheduler enough information in order to run its
> filters and weighers. Since the scheduler uses those UUIDs to
> construct HostState objects, the specific missing information was
> never completely clarified, so I'm just including this aspect of
> the conversation for completeness. It is orthogonal to the question
> of how to allocate when the resource provider is not "simple".
> 
> My current feeling is that we got ourselves into our existing mess
> of ugly, convoluted code when we tried to add these complex
> relationships into the resource tracker and the scheduler. We set
> out to create the placement engine to bring some sanity back to how
> we think about things we need to virtualize. I would really hate to
> see us make the same mistake again, by adding a good deal of
> complexity to handle a few non-simple cases. What I would like to
> avoid, no matter what the eventual solution chosen, is representing
> this complexity in multiple places. Currently the only two
> candidates for this logic are the placement engine, which knows
> about these relationships already, or the compute service itself,
> which has to handle the management of these complex virtualized
> resources.
> 
> I don't know the answer. I'm hoping that we can have a discussion
> that might uncover a clear approach, or, at the very least, one
> that is less murky than the others.
> 

I wasn't part neither of the scheduler meeting nor the hangout (hitted
by French holiday) so I don't get all the details in mind and I could
probably make wrong assumptions, so I apology in advance if I'm
telling anything silly.

That said, I still have some opinions and I'll put them here. Thanks
for having brought 

Re: [openstack-dev] [nova] Boston Forum session recap - claims in the scheduler (or conductor)

2017-05-19 Thread Sylvain Bauza


Le 19/05/2017 15:14, Chris Dent a écrit :
> On Thu, 18 May 2017, Matt Riedemann wrote:
> 
>> We didn't really get into this during the forum session, but there are
>> different opinions within the nova dev team on how to do claims in the
>> controller services (conductor vs scheduler). Sylvain Bauza has a
>> series which uses the conductor service, and Ed Leafe has a series
>> using the scheduler. More on that in the mailing list [3].
> 
> Since we've got multiple threads going on this topic, I put some
> of my concerns in a comment on one of Ed's reviews:
> 
> https://review.openstack.org/#/c/465171/3//COMMIT_MSG@30
> 
> It's a bit left fieldy but tries to ask about some of the long term
> concerns we may need to be thinking about here, with regard to other
> services using placement and maybe them needing a
> scheduler-like-thing too (because placement cannot do everything).
> 

That's actually a good question that I would translate to :
'Are other projects having interest in scheduling other things than just
instances ?'.

To be honest, that's something I wondered since ages and I tried during
the VM/BM Forum session [1] to ask operators/developers their usecases
they'd like to see for placement given the priority they gave.
If you look at the etherpad, you will see a couple of given usecases but
none of them are related to a generic scheduler, rather a compute
scheduler doing multi-projects affinity, which is already in our scope
thanks to Placement.

So, while I think it's a reasonable question to ask, it shouldn't divert
our current priority effort as it can't be said a motivation.
Also, I'm not particularly concerned by the interface between conductor
and scheduler that we have, as that interface is flexible enough for not
blocking us in the future, should we need to implement a generic scheduler.

-Sylvain

> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Boston Forum session recap - claims in the scheduler (or conductor)

2017-05-19 Thread Sylvain Bauza


Le 19/05/2017 12:19, John Garbutt a écrit :
> On 19 May 2017 at 10:03, Sylvain Bauza <sba...@redhat.com> wrote:
>>
>>
>> Le 19/05/2017 10:02, Sylvain Bauza a écrit :
>>>
>>>
>>> Le 19/05/2017 02:55, Matt Riedemann a écrit :
>>>> The etherpad for this session is here [1]. The goal for this session was
>>>> to inform operators and get feedback on the plan for what we're doing
>>>> with moving claims from the computes to the control layer (scheduler or
>>>> conductor).
>>>>
>>>> We mostly talked about retries, which also came up in the cells v2
>>>> session that Dan Smith led [2] and will recap later.
>>>>
>>>> Without getting into too many details, in the cells v2 session we came
>>>> to a compromise on build retries and said that we could pass hosts down
>>>> to the cell so that the cell-level conductor could retry if needed (even
>>>> though we expect doing claims at the top will fix the majority of
>>>> reasons you'd have a reschedule in the first place).
>>>>
>>>
>>> And during that session, we said that given cell-local conductors (when
>>> there is a reschedule) can't upcall the global (for all cells)
>>> schedulers, that's why we agreed to use the conductor to be calling
>>> Placement API for allocations.
>>>
>>>
>>>> During the claims in the scheduler session, a new wrinkle came up which
>>>> is the hosts that the scheduler returns to the top-level conductor may
>>>> be in different cells. So if we have two cells, A and B, with hosts x
>>>> and y in cell A and host z in cell B, we can't send z to A for retries,
>>>> or x or y to B for retries. So we need some kind of post-filter/weigher
>>>> filtering such that hosts are grouped by cell and then they can be sent
>>>> to the cells for retries as necessary.
>>>>
>>>
>>> That's already proposed for reviews in
>>> https://review.openstack.org/#/c/465175/
>>>
>>>
>>>> There was also some side discussion asking if we somehow regressed
>>>> pack-first strategies by using Placement in Ocata. John Garbutt and Dan
>>>> Smith have the context on this (I think) so I'm hoping they can clarify
>>>> if we really need to fix something in Ocata at this point, or is this
>>>> more of a case of closing a loop-hole?
>>>>
>>>
>>> The problem is that the scheduler doesn't verify the cells when trying
>>> to find a destination for an instance, it's just using weights for packing.
>>>
>>> So, for example, say I have N hosts and 2 cells, the first weighting
>>> host could be in cell1 while the second could be in cell2. Then, even if
>>> the operator uses the weighers for packing, for example a RequestSpec
>>> with num_instances=2 could push one instance in cell1 and the other in
>>> cell2.
>>>
>>> From a scheduler point of view, I think we could possibly add a
>>> CellWeigher that would help to pack instances within the same cell.
>>> Anyway, that's not related to the claims series, so we could possibly
>>> backport it for Ocata hopefully.
>>>
>>
>> Melanie actually made a good point about the current logic based on the
>> `host_subset_size`config option. If you're leaving it defaulted to 1, in
>> theory all instances coming along the scheduler would get a sorted list
>> of hosts by weights and only pick the first one (ie. packing all the
>> instances onto the same host) which is good for that (except of course
>> some user request that fits all the space of the host and where a spread
>> could be better by shuffling between multiple hosts).
>>
>> So, while I began deprecating that option because I thought the race
>> condition would be fixed by conductor claims, I think we should keep it
>> for the time being until we clearly identify whether it's still necessary.
>>
>> All what I said earlier above remains valid tho. In a world where 2
>> hosts are given as the less weighed ones, we could send instances from
>> the same user request onto different cells, but that only ties the
>> problem to a multi-instance boot problem, which is far less impactful.
> 
> FWIW, I think we need to keep this.
> 
> If you have *lots* of contention when picking your host, increasing
> host_subset_size should help reduce that contention (and maybe help
> increase the throughput). I haven't written a simulator to test it
> out, but it feels like we wi

Re: [openstack-dev] [nova] Boston Forum session recap - claims in the scheduler (or conductor)

2017-05-19 Thread Sylvain Bauza


Le 19/05/2017 10:02, Sylvain Bauza a écrit :
> 
> 
> Le 19/05/2017 02:55, Matt Riedemann a écrit :
>> The etherpad for this session is here [1]. The goal for this session was
>> to inform operators and get feedback on the plan for what we're doing
>> with moving claims from the computes to the control layer (scheduler or
>> conductor).
>>
>> We mostly talked about retries, which also came up in the cells v2
>> session that Dan Smith led [2] and will recap later.
>>
>> Without getting into too many details, in the cells v2 session we came
>> to a compromise on build retries and said that we could pass hosts down
>> to the cell so that the cell-level conductor could retry if needed (even
>> though we expect doing claims at the top will fix the majority of
>> reasons you'd have a reschedule in the first place).
>>
> 
> And during that session, we said that given cell-local conductors (when
> there is a reschedule) can't upcall the global (for all cells)
> schedulers, that's why we agreed to use the conductor to be calling
> Placement API for allocations.
> 
> 
>> During the claims in the scheduler session, a new wrinkle came up which
>> is the hosts that the scheduler returns to the top-level conductor may
>> be in different cells. So if we have two cells, A and B, with hosts x
>> and y in cell A and host z in cell B, we can't send z to A for retries,
>> or x or y to B for retries. So we need some kind of post-filter/weigher
>> filtering such that hosts are grouped by cell and then they can be sent
>> to the cells for retries as necessary.
>>
> 
> That's already proposed for reviews in
> https://review.openstack.org/#/c/465175/
> 
> 
>> There was also some side discussion asking if we somehow regressed
>> pack-first strategies by using Placement in Ocata. John Garbutt and Dan
>> Smith have the context on this (I think) so I'm hoping they can clarify
>> if we really need to fix something in Ocata at this point, or is this
>> more of a case of closing a loop-hole?
>>
> 
> The problem is that the scheduler doesn't verify the cells when trying
> to find a destination for an instance, it's just using weights for packing.
> 
> So, for example, say I have N hosts and 2 cells, the first weighting
> host could be in cell1 while the second could be in cell2. Then, even if
> the operator uses the weighers for packing, for example a RequestSpec
> with num_instances=2 could push one instance in cell1 and the other in
> cell2.
> 
> From a scheduler point of view, I think we could possibly add a
> CellWeigher that would help to pack instances within the same cell.
> Anyway, that's not related to the claims series, so we could possibly
> backport it for Ocata hopefully.
> 

Melanie actually made a good point about the current logic based on the
`host_subset_size`config option. If you're leaving it defaulted to 1, in
theory all instances coming along the scheduler would get a sorted list
of hosts by weights and only pick the first one (ie. packing all the
instances onto the same host) which is good for that (except of course
some user request that fits all the space of the host and where a spread
could be better by shuffling between multiple hosts).

So, while I began deprecating that option because I thought the race
condition would be fixed by conductor claims, I think we should keep it
for the time being until we clearly identify whether it's still necessary.

All what I said earlier above remains valid tho. In a world where 2
hosts are given as the less weighed ones, we could send instances from
the same user request onto different cells, but that only ties the
problem to a multi-instance boot problem, which is far less impactful.



> 
>> We also spent a good chunk of the session talking about overhead
>> calculations for memory_mb and disk_gb which happens in the compute and
>> on a per-hypervisor basis. In the absence of automating ways to adjust
>> for overhead, our solution for now is operators can adjust reserved host
>> resource values (vcpus, memory, disk) via config options and be
>> conservative or aggressive as they see fit. Chris Dent and I also noted
>> that you can adjust those reserved values via the placement REST API but
>> they will be overridden by the config in a periodic task - which may be
>> a bug, if not at least a surprise to an operator.
>>
>> We didn't really get into this during the forum session, but there are
>> different opinions within the nova dev team on how to do claims in the
>> controller services (conductor vs scheduler). Sylvain Bauza has a series
>> which uses the conductor service, and Ed Leafe has a series using the
>> sched

Re: [openstack-dev] [nova] Boston Forum session recap - claims in the scheduler (or conductor)

2017-05-19 Thread Sylvain Bauza


Le 19/05/2017 02:55, Matt Riedemann a écrit :
> The etherpad for this session is here [1]. The goal for this session was
> to inform operators and get feedback on the plan for what we're doing
> with moving claims from the computes to the control layer (scheduler or
> conductor).
> 
> We mostly talked about retries, which also came up in the cells v2
> session that Dan Smith led [2] and will recap later.
> 
> Without getting into too many details, in the cells v2 session we came
> to a compromise on build retries and said that we could pass hosts down
> to the cell so that the cell-level conductor could retry if needed (even
> though we expect doing claims at the top will fix the majority of
> reasons you'd have a reschedule in the first place).
> 

And during that session, we said that given cell-local conductors (when
there is a reschedule) can't upcall the global (for all cells)
schedulers, that's why we agreed to use the conductor to be calling
Placement API for allocations.


> During the claims in the scheduler session, a new wrinkle came up which
> is the hosts that the scheduler returns to the top-level conductor may
> be in different cells. So if we have two cells, A and B, with hosts x
> and y in cell A and host z in cell B, we can't send z to A for retries,
> or x or y to B for retries. So we need some kind of post-filter/weigher
> filtering such that hosts are grouped by cell and then they can be sent
> to the cells for retries as necessary.
> 

That's already proposed for reviews in
https://review.openstack.org/#/c/465175/


> There was also some side discussion asking if we somehow regressed
> pack-first strategies by using Placement in Ocata. John Garbutt and Dan
> Smith have the context on this (I think) so I'm hoping they can clarify
> if we really need to fix something in Ocata at this point, or is this
> more of a case of closing a loop-hole?
> 

The problem is that the scheduler doesn't verify the cells when trying
to find a destination for an instance, it's just using weights for packing.

So, for example, say I have N hosts and 2 cells, the first weighting
host could be in cell1 while the second could be in cell2. Then, even if
the operator uses the weighers for packing, for example a RequestSpec
with num_instances=2 could push one instance in cell1 and the other in
cell2.

From a scheduler point of view, I think we could possibly add a
CellWeigher that would help to pack instances within the same cell.
Anyway, that's not related to the claims series, so we could possibly
backport it for Ocata hopefully.


> We also spent a good chunk of the session talking about overhead
> calculations for memory_mb and disk_gb which happens in the compute and
> on a per-hypervisor basis. In the absence of automating ways to adjust
> for overhead, our solution for now is operators can adjust reserved host
> resource values (vcpus, memory, disk) via config options and be
> conservative or aggressive as they see fit. Chris Dent and I also noted
> that you can adjust those reserved values via the placement REST API but
> they will be overridden by the config in a periodic task - which may be
> a bug, if not at least a surprise to an operator.
> 
> We didn't really get into this during the forum session, but there are
> different opinions within the nova dev team on how to do claims in the
> controller services (conductor vs scheduler). Sylvain Bauza has a series
> which uses the conductor service, and Ed Leafe has a series using the
> scheduler. More on that in the mailing list [3].
> 

Sorry, but I do remember we had a consensus on using conductor at least
during the cells v2 session.

What I'm a bit afraid is that we're duplicating efforts on a sole
blueprint while we all agreed to go that way.

> Next steps are going to be weighing both options between Sylvain and Ed,
> picking a path and moving forward, as we don't have a lot of time to sit
> on this fence if we're going to get it done in Pike.
> 

There are multiple reasons why we chose to use conductor for that :
 - as I said earlier, conductors when rescheduling can't upcall a global
scheduler, and we agreed to not have (for the moment) second-level
schedulers for cellsv2
 - eventually in 1 or 2 cycles, nova-scheduler will become a library
that conductors can use for filtering/weighting reasons. The idea is to
stop doing RPC calls to a separate service that requires its own HA (and
we know we have problems with, given schedulers are stateful in memory).
Instead, we should make the scheduler modules stateless so operators
would only need to scale out conductors for performance. In that model,
I think conductors should be the engines responsible for making allocations.
 - scheduler doesn't have any idea on whether the instance request is
for a move operation or a boot, but conductors do know that logic.

Re: [openstack-dev] [nova][blazar][scientific] advanced instance scheduling: reservations and preeemption - Forum session

2017-05-01 Thread Sylvain Bauza
You can also count on me for discussing about what was Blazar previously
and how Nova could help it ;-)

-Sylvain

Le 1 mai 2017 21:53, "Jay Pipes"  a écrit :

> On 05/01/2017 03:39 PM, Blair Bethwaite wrote:
>
>> Hi all,
>>
>> Following up to the recent thread "[Openstack-operators] [scientific]
>> Resource reservation requirements (Blazar) - Forum session" and adding
>> openstack-dev.
>>
>> This is now a confirmed forum session
>> (https://www.openstack.org/summit/boston-2017/summit-schedul
>> e/events/18781/advanced-instance-scheduling-reservations-and-preemption)
>> to cover any advanced scheduling use-cases people want to talk about,
>> but in particular focusing on reservations and preemption as they are
>> big priorities particularly for scientific deployers.
>>
> >
>
>> Etherpad draft is
>> https://etherpad.openstack.org/p/BOS-forum-advanced-instance-scheduling,
>> please attend and contribute! In particular I'd appreciate background
>> spec and review links added to the etherpad.
>>
>> Jay, would you be able and interested to moderate this from the Nova side?
>>
>
> Masahito Muroi is currently marked as the moderator, but I will indeed be
> there and happy to assist Masahito in moderating, no problem.
>
> Best,
> -jay
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Blazar] Meeting time slots in Boston

2017-04-28 Thread Sylvain Bauza


Le 21/04/2017 06:41, Masahito MUROI a écrit :
> Hi all,
> 
> Thanks for choosing time slots!  Based on the table of Doodle, I'd like
> to pick following two slots for Blazar team meeting.
> 
> 1. 1pm-4pm on Monday for Blazar's internal features
> 2. 9am-10am or 11am on Thursday for discussions with PlacementAPI team
> 
> The first meeting will focus on Blazar's internal features, roadmap and
> etc.  1pm-2pm is also Lunch time. So it could start as lunch meeting.
> 
> In the second slot, we plan to discuss with PlacementAPI team. Summit
> would have breakout rooms or tables as usual.  We'll gather one of these
> and discuss concerns and/or usecases of collaboration with PlacementAPI.
> 

Being in the first meeting is difficult for me, but I can try to attend
it if you want me :-) Just ping me in case so.

For the second meeting, I'll be there.

-Sylvain

> 
> best regards,
> Masahito
> 
> 
> On 2017/04/18 13:21, Masahito MUROI wrote:
>> Hi Blazar folks,
>>
>> I created doodle[1] to decide our meeting time slots in Boston summit.
>> Please check slots you can join the meeting by Thursday.  I'll decide
>> the slots on this Friday.
>>
>> Additionally, I'd like to ask you to write down how many hours we have
>> the meeting in comments of the page.
>>
>> 1. http://doodle.com/poll/a7pccnhqsuk9tax7
>>
>> best regards,
>> Masahito
>>
>>
>> __
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] What is the difference between default_schedule_zone and default_availability_zone?

2017-04-13 Thread Sylvain Bauza


Le 12/04/2017 18:24, Matt Riedemann a écrit :
> I'm hoping someone more learned can teach me. These two options sound
> very similar, and are thus very confusing. They are also both in the
> [DEFAULT] config option group, but defined in different places [1][2].
> 
> The help text for one says:
> 
> "This option determines the availability zone to be used when it is not
> specified in the VM creation request."
> 
> The help text for the other says:
> 
> "Availability zone to use when user doesn't specify one.
> This option is used by the scheduler to determine which availability
> zone to place a new VM instance into if the user did not specify one
> at the time of VM boot request."
> 
> It looks like one goes on the instance record itself, and the other goes
> into an aggregate metadata if the zone is specified for the aggregate.
> 
> So while they sound the exact same, they are not? Or are they in the
> end? See how this is terrible?
> 


So, the upstream documentation mentions a bit of that :
https://docs.openstack.org/developer/nova/aggregates.html#availability-zones-azs

To be clear :
 - default_availability_zone is related to compute nodes. If one node is
not in an aggregate having an AZ metadata, then the availability_zones
API will return the node into the default AZ named by this opt (default
is "nova").
 - default_schedule_zone is related to instances. If a user doesn't
provide --availability-zone flag when booting the instance, then the
compute API will set the instance.az field to be this conf opt (default
is None). That conf opt value is not returned when you nova show an
instance, because OS-EXT-AZ:availability_zone attribute is rather
getting you the related AZ for the node where the instance is (ie.
either the aggregate AZ metadata where the host belongs to, or
default_availability_zone opt value).



To be clear, default_availability_zone is used for API related concerns
where we want to provide a consistent UX if people want to see a cloud
by AZs, while default_schedule_zone is rather for an internal use for
knowing whether we should force by default instances to be within a
specific AZ or if they should be AZ-free (and then move operations
wouldn't be constrained by AZs).


HTH,
-Sylvain


> [1]
> https://github.com/openstack/nova/blob/5a556a720f5a7394cab4c84fa6202976c6190b23/nova/conf/availability_zone.py#L34
> 
> [2]
> https://github.com/openstack/nova/blob/5a556a720f5a7394cab4c84fa6202976c6190b23/nova/conf/compute.py#L49
> 
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] How can i check if an instance belongs to host-aggregate

2017-03-29 Thread Sylvain Bauza


Le 29/03/2017 10:23, Pradeep Singh a écrit :
> Hello,
> 
> I want to filter out some instances(to avoid some operation on them)
> which are scheduled on  host-aggregates.
> 
> How can i filter out these instances from all instances list in my cloud.
> 

Aggregates are host-based. So, if you want to know the list of instances
running on specific aggregates, you first need to get the list of hosts
being in those aggregates and then you loop over each host for knowing
the instance list per host.

AFAIK, there is no helper method on the object side that does that
directly neither in the DB API, so that's a Python consumption.

-S

> Thanks in advance!!
> 
> Thanks,
> Pradeep Singh
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] small wart in the resource class API

2017-03-16 Thread Sylvain Bauza


Le 16/03/2017 14:12, Chris Dent a écrit :
> 
> (This message is a question asking "should we fix this?" and "if so,
> I guess that needs spec, since it is a microversion change, but
> would an update to the existing spec be good enough?")
> 
> We have a small wart in the API for creating and updating resources
> classes [1] that only became clear while evaluating the API for
> resource traits [2]. The interface for creating a resource class is
> not particularly idempotent and as a result the code for doing so
> from nova-compute [3] is not as simple as it could be.
> 
> It's all in the name _get_of_create_resource_class. There is at
> least one but sometimes two HTTP requests: first a GET to
> /resource_classes/{class} then a POST with a body to
> /resource_classes.
> 
> If instead there was just a straight PUT to
> /resource_classes/{class} with no body that returned success either
> upon create or "yeah it's already there" then it would always be one
> request and the above code could be simplified. This is how we've
> ended up defining things for traits [2].
> 


We recently decided to not ship a specific client project for tricks
like that, and we preferred to have a better REST API quite well documented.

Given that consensus, I think I'm totally fine using the PUT verb
instead of GET+POST and just verify the HTTP return code.


> Making this change would also allow us to address the fact that
> right now the PUT to /resource_classes/{class} takes a body which is
> the _new_ name with which to replace the name of the resource class
> identified by {class}.  This is an operation I'm pretty sure we
> don't want to do (commonly) as it means that anywhere that custom
> resource class was used in an inventory it's now going to have this
> new name (the relationship at the HTTP and outer layers is by name,
> but at the database level by id, the PUT does a row update) but the
> outside world is not savvy to this change.
> 

Agreed as well.

-Sylvain

> Thoughts?
> 
> [1]
> http://specs.openstack.org/openstack/nova-specs/specs/ocata/approved/custom-resource-classes.html#rest-api-impact
> 
> [2]
> http://specs.openstack.org/openstack/nova-specs/specs/pike/approved/resource-provider-traits.html#rest-api-impact
> 
> [3]
> https://github.com/openstack/nova/blob/d02c0aa7ba0e37fb61d9fe2b683835f28f528623/nova/scheduler/client/report.py#L704
> 
> 
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] experimenting with extracting placement

2017-03-13 Thread Sylvain Bauza


Le 13/03/2017 15:17, Jay Pipes a écrit :
> On 03/13/2017 09:16 AM, Sylvain Bauza wrote:
>> Please don't.
>> Having a separate repository would mean that deployers would need to
>> implement a separate package for placement plus discussing about
>> how/when to use it.
> 
> Apparently, there already *are* separate packages for
> openstack-nova-api-placement...
> 

Good to know. That said, I'm not sure all deployers are packaging that
separately :-)

FWIW, I'm not against the split, I just think we should first have a
separate and clean client package for placement in a previous cycle.

My thoughts are :
 - in Pike/Queens (TBD), do placementclient optional with fallbacking to
scheduler.report
 - in Queens/R, make placementclient mandatory
 - in R/S, make Placement a separate service.

That way, we could do the necessary quirks in the client in case the
split goes bad.

-Sylvain


> Best,
> -jay
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] experimenting with extracting placement

2017-03-13 Thread Sylvain Bauza


Le 13/03/2017 14:59, Jay Pipes a écrit :
> On 03/13/2017 08:41 AM, Chris Dent wrote:
>>
>> From the start we've been saying that it is probably right for the
>> placement service to have its own repository. This is aligned with
>> the long term goal of placement being useful to many services, not
>> just nova, and also helps to keep placement contained and
>> comprehensible and thus maintainable.
>>
>> I've been worried for some time that the longer we put this off, the
>> more complicated an extraction becomes. Rather than carry on
>> worrying about it, I took some time over the weekend to experiment
>> with a slapdash extraction to see if I could identify what would be
>> the sticking points. The results are here
>>
>> https://github.com/cdent/placement
>>
>> My methodology was to lay in the basics for being able to run the
>> functional (gabbi) tests and then using the failures to fix the
>> code. If you read the commit log (there's only 16 commits) in
>> reverse it tells a little story of what was required.
>>
>> All the gabbi tests are now passing (without them being changed)
>> except for four that verify the response strings from exceptions. I
>> didn't copy in exceptions, I created them anew to avoid copying
>> unnecessary nova-isms, and didn't bother (for now) with replicating
>> keyword handling.
>>
>> Unit tests and non-gabbi functional tests were not transferred over
>> (as that would have been something more than "slapdash").
>>
>> Some observations or things to think about:
>>
>> * Since there's only one database and all the db query code is in
>>   the objects, the database handling is simplified. olso_db setup
>>   can be used more directly.
>>
>> * The objects being oslo versioned objects is kind of overkill in
>>   this context but doesn't get too much in the way.
>>
>> * I collapsed the fields.ResourceClass and objects.ResourceClass
>>   into the same file so the latter was renamed. Doing this
>>   exploration made a lot of the ResourceClass handling look pretty
>>   complicated. Much of that complexity is because we had to deal
>>   with evolving through different functionality. If we built this
>>   functionality in a greenfield repo it could probably be more
>>   simple.
>>
>> * The FaultWrapper middleware is turned off in the WSGI stack
>>   because copying it over from nova would require dealing with a
>>   hierarchy of classes. A simplified version of it would probably
>>   need to be stuck back in (and apparently a gabbi test to exercise
>>   it, as there's not one now).
>>
>> * The number of requirements in the two requirements files is nicely
>>   small.
>>
>> * The scheduler report client in nova, and to a minor degree the
>>   filter scheduler, use some of the same exceptions and ovo.objects
>>   that placement uses, which presents a bit of blechiness with
>>   regards to code duplication. I suppose long term we could consider
>>   a placement-lib or something like that, except that the
>>   functionality provided by the same-named objects and exceptions
>>   are not entirely congruent. From the point of view of the external
>>   part of the placement API what matters are not objects, but JSON
>>   structures.
>>
>> * I've done nothing here with regard to how devstack would choose
>>   between the old and new placement code locations but that will be
>>   something to solve. It seems like it ought to be possible for two
>>   different sources of the placement-code to exist; just register
>>   one endpoint. Since we've declared that service discovery is the
>>   correctly and only way to find placement, this ought to be okay.
>>
>> I'm not sure how or if we want to proceed with this topic, but I
>> think this at least allows us to talk about it with less guessing.
>> My generally summary is "yeah, this is doable, without huge amounts
>> of work."
> 
> Chris, great work on this over the weekend. It gives us some valuable
> data points and information to consider about the split out of the
> placement API. Really appreciate the effort.
> 
> A few things:
> 
> 1) Definitely agree on the need to have the Nova-side stuff *not*
> reference ovo objects for resource providers. We want the Nova side to
> use JSON/dict representations within the resource tracker and scheduler.
> This work can be done right now and isn't dependent on anything AFAIK.
> 
> 2) The FaultWrapper stuff can also be handled relatively free of
> dependencies. In fact, there is a spec around error reporting using
> codes in addition to messages [1] that we could tack on the FaultWrapper
> cleanup items. Basically, make that spec into a "fix up error handling
> in placement API" general work item list...
> 
> 3) While the split of the placement API is not the highest priority
> placement item in Pike (we are focused on traits, ironic integration,
> shared pools and then nested providers, in that order), I do think it's
> worthwhile splitting the placement service out from Nova in Queens. I
> don't believe 

Re: [openstack-dev] [nova] [placement] experimenting with extracting placement

2017-03-13 Thread Sylvain Bauza


Le 13/03/2017 14:21, Sean Dague a écrit :
> On 03/13/2017 09:16 AM, Sylvain Bauza wrote:
>>
>>
>> Le 13/03/2017 13:41, Chris Dent a écrit :
>>>
>>> From the start we've been saying that it is probably right for the
>>> placement service to have its own repository. This is aligned with
>>> the long term goal of placement being useful to many services, not
>>> just nova, and also helps to keep placement contained and
>>> comprehensible and thus maintainable.
>>>
>>> I've been worried for some time that the longer we put this off, the
>>> more complicated an extraction becomes. Rather than carry on
>>> worrying about it, I took some time over the weekend to experiment
>>> with a slapdash extraction to see if I could identify what would be
>>> the sticking points. The results are here
>>>
>>> https://github.com/cdent/placement
>>>
>>> My methodology was to lay in the basics for being able to run the
>>> functional (gabbi) tests and then using the failures to fix the
>>> code. If you read the commit log (there's only 16 commits) in
>>> reverse it tells a little story of what was required.
>>>
>>> All the gabbi tests are now passing (without them being changed)
>>> except for four that verify the response strings from exceptions. I
>>> didn't copy in exceptions, I created them anew to avoid copying
>>> unnecessary nova-isms, and didn't bother (for now) with replicating
>>> keyword handling.
>>>
>>> Unit tests and non-gabbi functional tests were not transferred over
>>> (as that would have been something more than "slapdash").
>>>
>>> Some observations or things to think about:
>>>
>>> * Since there's only one database and all the db query code is in
>>>   the objects, the database handling is simplified. olso_db setup
>>>   can be used more directly.
>>>
>>> * The objects being oslo versioned objects is kind of overkill in
>>>   this context but doesn't get too much in the way.
>>>
>>> * I collapsed the fields.ResourceClass and objects.ResourceClass
>>>   into the same file so the latter was renamed. Doing this
>>>   exploration made a lot of the ResourceClass handling look pretty
>>>   complicated. Much of that complexity is because we had to deal
>>>   with evolving through different functionality. If we built this
>>>   functionality in a greenfield repo it could probably be more
>>>   simple.
>>>
>>> * The FaultWrapper middleware is turned off in the WSGI stack
>>>   because copying it over from nova would require dealing with a
>>>   hierarchy of classes. A simplified version of it would probably
>>>   need to be stuck back in (and apparently a gabbi test to exercise
>>>   it, as there's not one now).
>>>
>>> * The number of requirements in the two requirements files is nicely
>>>   small.
>>>
>>> * The scheduler report client in nova, and to a minor degree the
>>>   filter scheduler, use some of the same exceptions and ovo.objects
>>>   that placement uses, which presents a bit of blechiness with
>>>   regards to code duplication. I suppose long term we could consider
>>>   a placement-lib or something like that, except that the
>>>   functionality provided by the same-named objects and exceptions
>>>   are not entirely congruent. From the point of view of the external
>>>   part of the placement API what matters are not objects, but JSON
>>>   structures.
>>>
>>> * I've done nothing here with regard to how devstack would choose
>>>   between the old and new placement code locations but that will be
>>>   something to solve. It seems like it ought to be possible for two
>>>   different sources of the placement-code to exist; just register
>>>   one endpoint. Since we've declared that service discovery is the
>>>   correctly and only way to find placement, this ought to be okay.
>>>
>>> I'm not sure how or if we want to proceed with this topic, but I
>>> think this at least allows us to talk about it with less guessing.
>>> My generally summary is "yeah, this is doable, without huge amounts
>>> of work."
>>>
>>
>> Please don't.
>> Having a separate repository would mean that deployers would need to
>> implement a separate package for placement plus discussing about
>> how/when to use it.
>>
>> For the moment, I'd rather prefer to leave

Re: [openstack-dev] [nova] [placement] experimenting with extracting placement

2017-03-13 Thread Sylvain Bauza


Le 13/03/2017 13:41, Chris Dent a écrit :
> 
> From the start we've been saying that it is probably right for the
> placement service to have its own repository. This is aligned with
> the long term goal of placement being useful to many services, not
> just nova, and also helps to keep placement contained and
> comprehensible and thus maintainable.
> 
> I've been worried for some time that the longer we put this off, the
> more complicated an extraction becomes. Rather than carry on
> worrying about it, I took some time over the weekend to experiment
> with a slapdash extraction to see if I could identify what would be
> the sticking points. The results are here
> 
> https://github.com/cdent/placement
> 
> My methodology was to lay in the basics for being able to run the
> functional (gabbi) tests and then using the failures to fix the
> code. If you read the commit log (there's only 16 commits) in
> reverse it tells a little story of what was required.
> 
> All the gabbi tests are now passing (without them being changed)
> except for four that verify the response strings from exceptions. I
> didn't copy in exceptions, I created them anew to avoid copying
> unnecessary nova-isms, and didn't bother (for now) with replicating
> keyword handling.
> 
> Unit tests and non-gabbi functional tests were not transferred over
> (as that would have been something more than "slapdash").
> 
> Some observations or things to think about:
> 
> * Since there's only one database and all the db query code is in
>   the objects, the database handling is simplified. olso_db setup
>   can be used more directly.
> 
> * The objects being oslo versioned objects is kind of overkill in
>   this context but doesn't get too much in the way.
> 
> * I collapsed the fields.ResourceClass and objects.ResourceClass
>   into the same file so the latter was renamed. Doing this
>   exploration made a lot of the ResourceClass handling look pretty
>   complicated. Much of that complexity is because we had to deal
>   with evolving through different functionality. If we built this
>   functionality in a greenfield repo it could probably be more
>   simple.
> 
> * The FaultWrapper middleware is turned off in the WSGI stack
>   because copying it over from nova would require dealing with a
>   hierarchy of classes. A simplified version of it would probably
>   need to be stuck back in (and apparently a gabbi test to exercise
>   it, as there's not one now).
> 
> * The number of requirements in the two requirements files is nicely
>   small.
> 
> * The scheduler report client in nova, and to a minor degree the
>   filter scheduler, use some of the same exceptions and ovo.objects
>   that placement uses, which presents a bit of blechiness with
>   regards to code duplication. I suppose long term we could consider
>   a placement-lib or something like that, except that the
>   functionality provided by the same-named objects and exceptions
>   are not entirely congruent. From the point of view of the external
>   part of the placement API what matters are not objects, but JSON
>   structures.
> 
> * I've done nothing here with regard to how devstack would choose
>   between the old and new placement code locations but that will be
>   something to solve. It seems like it ought to be possible for two
>   different sources of the placement-code to exist; just register
>   one endpoint. Since we've declared that service discovery is the
>   correctly and only way to find placement, this ought to be okay.
> 
> I'm not sure how or if we want to proceed with this topic, but I
> think this at least allows us to talk about it with less guessing.
> My generally summary is "yeah, this is doable, without huge amounts
> of work."
> 

Please don't.
Having a separate repository would mean that deployers would need to
implement a separate package for placement plus discussing about
how/when to use it.

For the moment, I'd rather prefer to leave operators using the placement
API by using Nova first and then after like 3 or 4 cycles, possibly
discussing with them how to cut it.

At the moment, I think that we already have a good priority for
placement in Nova, so I don't think it's a problem to still have it in Nova.

My .02,
-Sylvain

> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][docs] Broken nova instructions

2017-03-09 Thread Sylvain Bauza


Le 09/03/2017 11:25, Alexandra Settle a écrit :
> Hi everyone,
> 
>  
> 
> The installation guide for docs.o.o is currently broken and requires
> immediate attention from the doc and nova teams.
> 
>  
> 
> I have moved the relevant bug[0] to be CRITICAL at request. Brian Moss
> (bmoss) and Amy Marrich (spotz) have been working on a subsequent patch
> for the last 10 days without result; the current instruction set still
> do not enable the user to start up an instance. There is another Red Hat
> bug tracking the issue for RDO.[1]
> 
>  

Just summarizing the thoughts on IRC, could we please have a clear
understanding of the involved steps for installing greenfields, and what
is the exception or problem subsequent ?

> 
> Any and all reviews on the patch[2] would be appreciated. We need to be
> able to branch
> 
>  

Sure, will do.

-Sylvain
> 
> Thank you,
> 
>  
> 
> Alex
> 
>  
> 
> [0] https://bugs.launchpad.net/openstack-manuals/+bug/1663485
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1405098
> 
> [2] https://review.openstack.org/#/c/438328/
> 
>  
> 
>  
> 
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Freeze dates for Pike

2017-03-03 Thread Sylvain Bauza


Le 02/03/2017 23:46, Matt Riedemann a écrit :
> I mentioned this in the nova meeting today [1] but wanted to post to the
> ML for feedback.
> 
> We didn't talk about spec or feature freeze dates at the PTG. The Pike
> release schedule is [2].
> 
> Spec freeze
> ---
> 
> In Newton and Ocata we had spec freeze on the first milestone.
> 
> I'm proposing that we do the same thing for Pike. The first milestone
> for Pike is April 13th which gives us about 6 weeks to go through the
> specs we're going to approve. A rough look at the open specs in Gerrit
> shows we have about 125 proposed and some of those are going to be
> re-approvals from previous releases. We already have 16 blueprints
> approved for Pike. Keep in mind that in Newton we had ~100 approved
> blueprints by the freeze and completed or partially completed 64.
> 

I agree with you, letting more time for accepting more specs means less
time for merging more spec implementations.

That still leaves 6 weeks for accepting around 80 specs, which seems to
me enough.


> Feature freeze
> --
> 
> In Newton we had a non-priority feature freeze between n-1 and n-2. In
> Ocata we just had the feature freeze at o-3 for everything because of
> the short schedule.
> 
> We have fewer core reviewers so I personally don't want to cut off the
> majority of blueprints too early in the cycle so I'm proposing that we
> do like in Ocata and just follow the feature freeze on the p-3 milestone
> which is July 27th.
> 
> We will still have priority review items for the release and when push
> comes to shove those will get priority over other review items, but I
> don't think it's helpful to cut off non-priority blueprints before n-3.
> I thought there was a fair amount of non-priority blueprint code that
> landed in Ocata when we didn't cut it off early. Referring back to the
> Ocata blueprint burndown [3] most everything was completed between the
> 2nd milestone and feature freeze.
> 

Looks good to me too. Proposing and advertising one review sprint day
(or even two days) around pike-2 could also help us, because we could
have this kind of 'runway' between proposers and reviewers.

If so, after pike-2, we could just see how many blueprints are left for
the last milestone which could help us having a better vision on what we
could realistically feature for Pike.

My .02€
-Sylvain


> -- 
> 
> Does anyone have an issue with this plan? If not, I'll update [4] with
> the nova-specific dates.
> 
> [1]
> http://eavesdrop.openstack.org/meetings/nova/2017/nova.2017-03-02-21.00.log.html#l-119
> 
> [2] https://wiki.openstack.org/wiki/Nova/Pike_Release_Schedule
> [3]
> http://lists.openstack.org/pipermail/openstack-dev/2017-February/111639.html
> 
> [4] https://releases.openstack.org/pike/schedule.html
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][swg] per-project "Business only" moderated mailing lists

2017-02-27 Thread Sylvain Bauza


Le 27/02/2017 15:50, Matt Riedemann a écrit :
> On 2/26/2017 11:25 PM, Clint Byrum wrote:
>>
>> You have taken the folder approach, and that is a bit less complicated
>> to set up than sup+offlineimap, but still does require that you know
>> how to filter by tag. It also means that you are experiencing one of
>> the problems with cross posting whenever anybody adds a tag, as in
>> that setup each message is duplicated into each folder, or you have a
>> 'first match' sieve and then tag order becomes significant. Either way,
>> you have to flip back and forth to read a thread. Or maybe somebody has
>> an answer? Nobody in the room at the SWG session had one.
>>
> 
> I don't have the problem you're describing here. I've got a gmail
> account but I use Thunderbird for my mail client since filtering and
> foldering the dev ML in gmail is a nightmare, at least since I was
> already used to Thunderbird for another IMAP account already.
> 
> So yeah I've got lots of folders, and filters, but have sorted my
> filters such that the projects I care about the most get priority. So if
> there is a thread with several project tags on it, like the one you did
> for the nova-compute API session at the PTG, that still all just goes
> into my nova folder since that's priority #1 in my sort list in
> Thunderbird.
> 
> Over the years I tried to keep up with new folders for new
> tags/projects, but with the big tent that got impossible, so now I
> basically filter into folders the projects I really care about being on
> top of, and then the rest just goes into my default "openstack-dev"
> folder. If I find that I'm constantly missing something with a given
> tag, then I start filtering that into a new folder that's prioritized
> higher.
> 

FWIW, I use my internal mail server for tagging the emails having a
X-Topics value for the ones I want (eg. tagging "nova" for an email
having X-Topics: nova, or tagging "cross" for an email having X-Topics:
release).

Then, I'm adding the same tag in Thunderbird (each one having different
color) and magically, the list is having many colors ! \o/


Honestly, I don't understand why we should silo all our conversations
because X or Y. Once, I was also a newcomer, and the ML was already
difficult to follow. Sure, but then I used filters and wow, magically,
it worked for me !

-Sylvain
(and please, *do not* Slack)

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova]Placement api on https

2017-02-22 Thread Sylvain Bauza


Le 22/02/2017 05:14, Gyorgy Szombathelyi a écrit :
> Hi!
> 
> As the placement API is mandatory in Ocata, I found out that you cannot give 
> a custom CA cert or skip the validation in its clients. That would be bad, as 
> you can give a custom CA cert in the [keystone_authtoken] section, but not in 
> the [placement] one, so if you're using this feature, you simply cannot use 
> Nova. I made a simple patch, it would be nice if it could land in Ocata.
> 
> https://review.openstack.org/#/c/436475/
> 
> If it is accepted, I'll cherry-pick into stable/ocata.
> 

Could you please open a bug for that so I would triage ?

Thanks,
-Sylvain

> Br,
> György
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Call for upstream training liaison

2017-02-21 Thread Sylvain Bauza
I can help here even I'm not sure I'll be going to the Summit yet.

Le 21 févr. 2017 14:48, "Matt Riedemann"  a écrit :

> If you haven't seen this yet [1] there is going to be an upstream training
> meeting at the PTG on Wednesday 2/22 at 12ET to talk about upstream
> training and what liaisons from each project can do to help.
>
> This email is a call for anyone that's working on Nova and is interested
> in volunteering to be the team liaison for the upstream training which
> happens the weekend before the Pike summit in Boston.
>
> At a high level, it's my understanding that this involves helping some new
> contributors get comfortable with the various projects and to associate a
> person with each project so when they show up in IRC they know someone and
> can ask questions (similar to mentoring but I believe less involved on an
> ongoing basis).
>
> So if you're going to be at the Boston summit and this is something you're
> interested in, please either speak with me, Ildiko, or show up to the
> meeting to find out more.
>
> [1] http://lists.openstack.org/pipermail/openstack-dev/2017-Febr
> uary/112270.html
>
> --
>
> Thanks,
>
> Matt Riedemann
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Will unshelving an offloaded instance respect the original AZ?

2017-02-20 Thread Sylvain Bauza


Le 20/02/2017 09:41, Jay Pipes a écrit :
> On 02/18/2017 01:46 PM, Matt Riedemann wrote:
>> I haven't fully dug into testing this, but I got wondering about this
>> question from reviewing a change [1] which would make the unshelve
>> operation start to check the volume AZ compared to the instance AZ when
>> the compute manager calls _prep_block_device.
>>
>> That change is attempting to remove the check_attach() method in
>> nova.volume.cinder.API since it's mostly redundant with state checks
>> that Cinder does when reserving the volume. The only other thing that
>> Nova does in there right now is compare the AZs.
>>
>> What I'm wondering is, with that change, will things break because of a
>> scenario like this:
>>
>> 1. Create volume in AZ 1.
>> 2. Create server in AZ 1.
>> 3. Attach volume to server (or boot server from volume in step 2).
>> 4. Shelve (offload) server.
>> 5. Unshelve server - nova-scheduler puts it into AZ 2.
>> 6. _prep_block_device compares instance AZ 2 to volume AZ 1 and unshelve
>> fails with InvalidVolume.
>>
>> If unshelving a server in AZ 1 can't move it outside of AZ 1, then we're
>> fine and the AZ check when unshelving is redundant but harmless.
>>
>> [1]
>> https://review.openstack.org/#/c/335358/38/nova/virt/block_device.py@249
> 
> When an instance is unshelved, the unshelve_instance() RPC API method is
> passed a RequestSpec object as the request_spec parameter:
> 
> https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L600
> 
> 
> This request spec object is passed to schedule_instances():
> 
> https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L660
> 
> 
> (you will note that the code directly above there "resets force_hosts"
> parameters, ostensibly to prevent any forced destination host from being
> passed to the scheduler)
> 
> The question is: does the above request spec contain availability zone
> information for the original instance? If it does, we're good. If it
> doesn't, we can get into the problem described above.
> 
> From what I can tell (and Sylvain might be the best person to answer
> this, thus his cc'ing), the availability zone is *always* stored in the
> request spec for an instance:
> 
> https://github.com/openstack/nova/blob/master/nova/compute/api.py#L966
> 
> Which means that upon unshelving after a shelve_offload, we will always
> pass the scheduler the original AZ.
> 
> Sylvain, do you concur?
> 

tl;dr: Exactly this, it's not possible since Mitaka to unshelve on a
different AZ if you have the AZFilter enabled.

Longer version:

Exactly this. If the instance was booted using a specific AZ flag, then :

 #1 the instance.az field is set to something different from a conf opt
default
and #2 the attached RequestSpec is getting the AZ field set

Both are persisted later in the conductor.


Now, say this instance is shelved/unshelved, then we get the original
RequestSpec at the API level
https://github.com/openstack/nova/blob/466769e588dc44d11987430b54ca1bd7188abffb/nova/compute/api.py#L3275-L3276

That's how the above conductor method you provided is getting the Spec
passed as argument.

Later, when the call is made to the scheduler, if the AZFilter is
enabled, it goes verifying that spec_obj.az field against the compute AZ
and refuses to accept the host if the AZ is different.

One side note tho, if the instance is not specified with a AZ, then of
course it can be unshelved on a compute not in the same AZ, since the
user didn't explicitly asked to stick with an AZ.

HTH,
-Sylvain


> Best,
> -jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [chef] Making the Kitchen Great Again: A Retrospective on OpenStack & Chef

2017-02-16 Thread Sylvain Bauza


Le 16/02/2017 18:42, Alex Schultz a écrit :
> On Thu, Feb 16, 2017 at 9:12 AM, Ed Leafe  wrote:
>> On Feb 16, 2017, at 10:07 AM, Doug Hellmann  wrote:
>>
>>> When we signed off on the Big Tent changes we said competition
>>> between projects was desirable, and that deployers and contributors
>>> would make choices based on the work being done in those competing
>>> projects. Basically, the market would decide on the "optimal"
>>> solution. It's a hard message to hear, but that seems to be what
>>> is happening.
>>
>> This.
>>
>> We got much better at adding new things to OpenStack. We need to get better 
>> at letting go of old things.
>>
>> -- Ed Leafe
>>
>>
>>
> 
> I agree that the market will dictate what continues to survive, but if
> you're not careful you may be speeding up the decline as the end user
> (deployer/operator/cloud consumer) will switch completely to something
> else because it becomes to difficult to continue to consume via what
> used to be there and no longer is.  I thought the whole point was to
> not have vendor lock-in.  Honestly I think the focus is too much on
> the development and not enough on the consumption of the development
> output.  What are the point of all these features if no one can
> actually consume them.
> 

IMHO, I think the crux of the matter has been discussed previously and
said: it's how having collaboration between projects.

Noone can be seasoned by boiling the OpenStack ocean. It's wide, and you
need to build a boat.


That boat can be by having liaisons between deployment and service
projects. Or, by having influence within those projects - mutually.

Putting the burden on one side doesn't solve the problem. Rather, I'd by
far prefer to see communication at design stages (like for example
during the PTG).

-Sylvain


> Thanks,
> -Alex
> 
>>
>>
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [chef] Making the Kitchen Great Again: A Retrospective on OpenStack & Chef

2017-02-16 Thread Sylvain Bauza


Le 16/02/2017 10:17, Neil Jerram a écrit :
> On Thu, Feb 16, 2017 at 5:26 AM Joshua Harlow  > wrote:
> 
> Radical idea, have each project (not libraries) contain a dockerfile
> that builds the project into a deployable unit (or multiple dockerfiles
> for projects with multiple components) and then it becomes the projects
> responsibility for ensuring that the right code is in that dockerfile to
> move from release to release (whether that be a piece of code that does
> a configuration migration).
> 
> 
> I've wondered about that approach, but worried about having the Docker
> engine as a new dependency for each OpenStack node.  Would that matter?
>  (Or are there other reasons why OpenStack nodes commonly already have
> Docker on them?)
> 

And one could claim that each project should also maintain its Ansible
playbooks. And one could claim that each project should also maintain
its Chef cookbooks. And one could claim that each project should also
maintain its Puppet manifests.

I surely understand the problem that it is stated here and how it is
difficult for a deployment tool team to cope with the requirements that
every project makes every time it writes an upgrade impact.

For the good or worst, as a service project developer, the only way to
signal the change is to write a release note. I'm not at all seasoned by
all the quirks and specifics of a specific deployment tool, and it's
always hard time for figuring out if what I write can break other things.

What could be the solution to that distributed services problem ? Well,
understanding each other problem is certainly one of the solutions.
Getting more communication between teams can also certainly help. Having
consistent behaviours between heteregenous deployment tools could also
be a thing.

That's an iterative approach, and that takes time. Sure, and that's
frustrating. But, please, keep in mind we all go into the same direction.

-S

> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [keystone] Do we really need two listening ports ?

2017-02-01 Thread Sylvain Bauza


Le 01/02/2017 13:58, Thomas Goirand a écrit :
> On 02/01/2017 10:54 AM, Attila Fazekas wrote:
>> Hi all,
>>
>> Typically we have two keystone service listening on two separate ports
>> 35357 and 5000.
>>
>> Historically one of the port had limited functionality, but today I do
>> not see why we want
>> to have two separate service/port from the same code base for similar
>> purposes.
>>
>> Effective we use double amount of memory than it is really required,
>> because both port is served by completely different worker instances,
>> typically from the same physical server.
>>
>> I wonder, would it be difficult to use only a single port or at least
>> the same pool of workers for all keystone(identity, auth..) purposes?
>>
>> Best Regards,
>> Attila
> 
> This has been discussed and agreed a long time ago, but nobody did the
> work. Please do get rid of the 2nd port. And when you're at it, also get
> rid of the admin and internal endpoint in the service catalog.
> 

Only 35357 is declared as a regular IANA service port :
http://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml?search=openstack

You can do whatever you want with the other port, that's a configuration
option.

-Sylvain

> Cheers,
> 
> Thomas Goirand (zigo)
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Latest and greatest on trying to get n-sch to require placement

2017-01-26 Thread Sylvain Bauza


Le 26/01/2017 05:42, Matt Riedemann a écrit :
> This is my public hand off to Sylvain for the work done tonight.
> 
> Starting with the multinode grenade failure in the nova patch to
> integrate placement with the filter scheduler:
> 
> https://review.openstack.org/#/c/417961/
> 
> The test_schedule_to_all_nodes tempest test was failing in there because
> that test explicitly forces hosts using AZs to build two instances.
> Because we didn't have nova.conf on the Newton subnode in the multinode
> grenade job configured to talk to placement, there was no resource
> provider for that Newton subnode when we started running smoke tests
> after the upgrade to Ocata, so that test failed since the request to the
> subnode had a NoValidHost (because no resource provider was checking in
> from the Newton node).
> 
> Grenade is not topology aware so it doesn't know anything about the
> subnode. When the subnode is stacked, it does so via a post-stack hook
> script that devstack-gate writes into the grenade run, so after stacking
> the primary Newton node, it then uses Ansible to ssh into the subnode
> and stack Newton there too:
> 
> https://github.com/openstack-infra/devstack-gate/blob/master/devstack-vm-gate.sh#L629
> 
> 
> logs.openstack.org/61/417961/26/check/gate-grenade-dsvm-neutron-multinode-ubuntu-xenial/15545e4/logs/grenade.sh.txt.gz#_2017-01-26_00_26_59_296
> 
> 
> And placement was optional in Newton so, you know, problems.
> 
> Some options came to mind:
> 
> 1. Change the test to not be a smoke test which would exclude it from
> running during grenade. QA would barf on this.
> 
> 2. Hack some kind of pre-upgrade callback from d-g into grenade just for
> configuring placement on the compute subnode. This would probably
> require adding a script to devstack just so d-g has something to call so
> we could keep branch logic out of d-g, like what we did for the
> discover_hosts stuff for cells v2. This is more complicated than what I
> wanted to deal with tonight with limited time on my hands.
> 
> 3. Change the nova filter scheduler patch to fallback to get all compute
> nodes if there are no resource providers. We've already talked about
> this a few times already in other threads and I consider it a safety net
> we'd like to avoid if all else fails. If we did this, we could
> potentially restrict it to just the forced-host case...
> 
> 4. Setup the Newton subnode in the grenade run to configure placement,
> which I think we can do from d-g using the features yaml file. That's
> what I opted to go with and the patch is here:
> 
> https://review.openstack.org/#/c/425524/
> 
> I've made the nova patch dependent on that *and* the other grenade patch
> to install and configure placement on the primary node when upgrading
> from Newton to Ocata.
> 
> -- 
> 
> That's where we're at right now. If #4 fails, I think we are stuck with
> adding a workaround for #3 into Ocata and then remove that in Pike when
> we know/expect computes to be running placement (they would be in our
> grenade runs from ocata->pike at least).
> 

Circling back to the problem as time flies. As the patch Matt proposed
for option #4 is not fully working yet, I'm implementing option #3 by
making the HostManager.get_filtered_hosts() method being resilient to
the fact that there are no hosts given by the placement API if and only
if the user asked for forced destinations.

-Sylvain

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Latest and greatest on trying to get n-sch to require placement

2017-01-26 Thread Sylvain Bauza


Le 26/01/2017 15:14, Ed Leafe a écrit :
> On Jan 26, 2017, at 7:50 AM, Sylvain Bauza <sba...@redhat.com> wrote:
>>
>> That's where I think we have another problem, which is bigger than the
>> corner case you mentioned above : when upgrading from Newton to Ocata,
>> we said that all Newton computes have be upgraded to the latest point
>> release. Great. But we forgot to identify that it would also require to
>> *modify* their nova.conf so they would be able to call the placement API.
>>
>> That looks to me more than just a rolling upgrade mechanism. In theory,
>> a rolling upgrade process accepts that N-1 versioned computes can talk
>> to N versioned other services. That doesn't imply a necessary
>> configuration change (except the upgrade_levels flag) on the computes to
>> achieve that, right?
>>
>> http://docs.openstack.org/developer/nova/upgrade.html
> 
> Reading that page: "At this point, you must also ensure you update the 
> configuration, to stop using any deprecated features or options, and perform 
> any required work to transition to alternative features.”
> 
> So yes, "updating your configuration” is an expected action. I’m not sure why 
> this is so alarming.
> 

You give that phrase out of context. To give more details, that specific
sentence is related to what you should do *after* having your
maintenance window (ie. upgrading your controller while your API is
down) and the introduction paragraph mentions that all the bullet items
relate to all the nova services but the hypervisors.

And I'm not alarmed. I'm just trying to identify the correct upgrade
path that we should ask our operators to do. If that means adding an
extra step than the regular upgrade process, then I think everyone
should be aware of it.
Take myself, I'm probably exhausted and very narrow-eyed so I missed
that implication. I apologize for it and I want to clarify that.

-Sylvain

> 
> -- Ed Leafe
> 
> 
> 
> 
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Latest and greatest on trying to get n-sch to require placement

2017-01-26 Thread Sylvain Bauza


Le 26/01/2017 05:42, Matt Riedemann a écrit :
> This is my public hand off to Sylvain for the work done tonight.
> 

Thanks Matt for your help yesterday, was awesome to count you in even
you're personally away.


> Starting with the multinode grenade failure in the nova patch to
> integrate placement with the filter scheduler:
> 
> https://review.openstack.org/#/c/417961/
> 
> The test_schedule_to_all_nodes tempest test was failing in there because
> that test explicitly forces hosts using AZs to build two instances.
> Because we didn't have nova.conf on the Newton subnode in the multinode
> grenade job configured to talk to placement, there was no resource
> provider for that Newton subnode when we started running smoke tests
> after the upgrade to Ocata, so that test failed since the request to the
> subnode had a NoValidHost (because no resource provider was checking in
> from the Newton node).
> 

That's where I think the current implementation is weird : if you force
the scheduler to return you a destination (without even calling the
filters) by just verifying if the corresponding service is up, then why
are you needing to get the full list of computes before that ?

To the placement extend, if you just *force* the scheduler to return you
a destination, then why should we verify if the resources are happy ?
FWIW, we now have a fully different semantics that replaces the
"force_hosts" thing that I hate : it's called
RequestSpec.requested_destination and it actually verifies the filters
only for that destination. No straight bypass of the filters like
force_hosts does.

> Grenade is not topology aware so it doesn't know anything about the
> subnode. When the subnode is stacked, it does so via a post-stack hook
> script that devstack-gate writes into the grenade run, so after stacking
> the primary Newton node, it then uses Ansible to ssh into the subnode
> and stack Newton there too:
> 
> https://github.com/openstack-infra/devstack-gate/blob/master/devstack-vm-gate.sh#L629
> 
> 
> logs.openstack.org/61/417961/26/check/gate-grenade-dsvm-neutron-multinode-ubuntu-xenial/15545e4/logs/grenade.sh.txt.gz#_2017-01-26_00_26_59_296
> 
> 
> And placement was optional in Newton so, you know, problems.
> 

That's where I think we have another problem, which is bigger than the
corner case you mentioned above : when upgrading from Newton to Ocata,
we said that all Newton computes have be upgraded to the latest point
release. Great. But we forgot to identify that it would also require to
*modify* their nova.conf so they would be able to call the placement API.

That looks to me more than just a rolling upgrade mechanism. In theory,
a rolling upgrade process accepts that N-1 versioned computes can talk
to N versioned other services. That doesn't imply a necessary
configuration change (except the upgrade_levels flag) on the computes to
achieve that, right?

http://docs.openstack.org/developer/nova/upgrade.html


> Some options came to mind:
> 
> 1. Change the test to not be a smoke test which would exclude it from
> running during grenade. QA would barf on this.
> 
> 2. Hack some kind of pre-upgrade callback from d-g into grenade just for
> configuring placement on the compute subnode. This would probably
> require adding a script to devstack just so d-g has something to call so
> we could keep branch logic out of d-g, like what we did for the
> discover_hosts stuff for cells v2. This is more complicated than what I
> wanted to deal with tonight with limited time on my hands.
> 
> 3. Change the nova filter scheduler patch to fallback to get all compute
> nodes if there are no resource providers. We've already talked about
> this a few times already in other threads and I consider it a safety net
> we'd like to avoid if all else fails. If we did this, we could
> potentially restrict it to just the forced-host case...
> 
> 4. Setup the Newton subnode in the grenade run to configure placement,
> which I think we can do from d-g using the features yaml file. That's
> what I opted to go with and the patch is here:
> 
> https://review.openstack.org/#/c/425524/
> 
> I've made the nova patch dependent on that *and* the other grenade patch
> to install and configure placement on the primary node when upgrading
> from Newton to Ocata.
> 
> -- 
> 
> That's where we're at right now. If #4 fails, I think we are stuck with
> adding a workaround for #3 into Ocata and then remove that in Pike when
> we know/expect computes to be running placement (they would be in our
> grenade runs from ocata->pike at least).
> 


Given the above two problems that I stated, I think I'm in favor of a #3
approach now that would do the following :

 - modify the scheduler so that it's acceptable to have the placement
returning nothing if you force hosts

 - modify the scheduler so in the event of an empty list returned by the
placement API, fallback getting the list of all computes


That still leaves the problem where a few computes are not all 

Re: [openstack-dev] [nova] [placement] [operators] Optional resource asking or not?

2017-01-25 Thread Sylvain Bauza


Le 25/01/2017 05:10, Matt Riedemann a écrit :
> On 1/24/2017 2:57 PM, Matt Riedemann wrote:
>> On 1/24/2017 2:38 PM, Sylvain Bauza wrote:
>>>
>>> It's litterally 2 days before FeatureFreeze and we ask operators to
>>> change their cloud right now ? Looks difficult to me and like I said in
>>> multiple places by email, we have a ton of assertions saying it's
>>> acceptable to have not all the filters.
>>>
>>> -Sylvain
>>>
>>
>> I'm not sure why feature freeze in two days is going to make a huge
>> amount of difference here. Most large production clouds are probably
>> nowhere near trunk (I'm assuming most are on Mitaka or older at this
>> point just because of how deployments seem to tail the oldest supported
>> stable branch). Or are you mainly worried about deployment tooling
>> projects, like TripleO, needing to deal with this now?
>>
>> Anyone upgrading to Ocata is going to have to read the release notes and
>> assess the upgrade impacts regardless of when we make this change, be
>> that Ocata or Pike.
>>
>> Sylvain, are you suggesting that for Ocata if, for example, the
>> CoreFilter isn't in the list of enabled scheduler filters, we don't make
>> the request for VCPU when filtering resource providers, but we also log
>> a big fat warning in the n-sch logs saying we're going to switch over in
>> Pike and that cpu_allocation_ratio needs to be configured because the
>> CoreFilter is going to be deprecated in Ocata and removed in Pike?
>>
>> [1]
>> https://specs.openstack.org/openstack/nova-specs/specs/ocata/approved/resource-providers-scheduler-db-filters.html#other-deployer-impact
>>
>>
>>
> 
> To recap the discussion we had in IRC today, we're moving forward with
> the original plan of the *filter scheduler* always requesting VCPU,
> MEMORY_MB and DISK_GB* regardless of the enabled filters. The main
> reason being there isn't a clear path forward on straddling releases to
> deprecate or make decisions based on the enabled filters and provide a
> warning that makes sense.
> 
> For example, we can't deprecate the filters (at least yet) because the
> *caching scheduler* is still using them (it's not using placement yet).
> And if we logged a warning if you don't have the CoreFilter in
> CONF.filter_scheduler.enabled_filters, for example, but we don't want
> you to have it in that list, then what are you supposed to do? i.e. the
> goal is to not have the legacy primitive resource filters enabled for
> the filter scheduler in Pike, so you get into this weird situation of
> whether or not you have them enabled or not before Pike, and in what
> cases do you log a warning that makes sense. So we agreed at this point
> it's just simpler to say that if you don't enable these filters today,
> you're going to need to configure the appropriate allocation ratio
> configuration option prior to upgrading to Ocata. That will be in the
> upgrade section of the release notes and we can probably also work it
> into the placement devref as a deployment note. We can also work this
> into the nova-status upgrade check CLI.
> 
> *DISK_GB is special since we might have a flavor that's not specifying
> any disk or a resource provider with no DISK_GB allocations if the
> instances are all booted from volumes.
> 

Update on that agreement : I made the necessary modification in the
proposal [1] for not verifying the filters. We now send a request to the
Placement API by introspecting the flavor and we get a list of potential
destinations.

When I began doing that modification, I know there was a functional test
about server groups that needed modifications to match our agreement. I
consequently made that change located in a separate patch [2] as a
prerequisite for [1].

I then spotted a problem that we didn't identified when discussing :
when checking a destination, the legacy filters for CPU, RAM and disk
don't verify the maximum capacity of the host, they only multiple the
total size by the allocation ratio, so our proposal works for them.
Now, when using the placement service, it fails because somewhere in the
DB call needed for returning the destinations, we also verify a specific
field named max_unit [3].

Consequently, the proposal we agreed is not feature-parity between
Newton and Ocata. If you follow our instructions, you will still get
different result from a placement perspective between what was in Newton
and what will be Ocata.

Technically speaking, the functional test is a canary bird, telling you
that you get NoValidHosts while it was working previously.

After that I'm stuck. We can be discussing for a while about whether all
of that is sane or not, but the fact is, there is a discrepancy.

Honestly, I don't

Re: [openstack-dev] [nova] [placement] [operators] Optional resource asking or not?

2017-01-24 Thread Sylvain Bauza


Le 24/01/2017 22:22, Dan Smith a écrit :
>> No. Have administrators set the allocation ratios for the resources they
>> do not care about exceeding capacity to a very high number.
>>
>> If someone previously removed a filter, that doesn't mean that the
>> resources were not consumed on a host. It merely means the admin was
>> willing to accept a high amount of oversubscription. That's what the
>> allocation_ratio is for.
>>
>> The flavor should continue to have a consumed disk/vcpu/ram amount,
>> because the VM *does actually consume those resources*. If the operator
>> doesn't care about oversubscribing one or more of those resources, they
>> should set the allocation ratios of those inventories to a high value.
>>
>> No more adding configuration options for this kind of thing (or in this
>> case, looking at an old configuration option and parsing it to see if a
>> certain filter is listed in the list of enabled filters).
>>
>> We have a proper system of modeling these data-driven decisions now, so
>> my opinion is we should use it and ask operators to use the placement
>> REST API for what it was intended.
> 
> I agree with the above. I think it's extremely counter-intuitive to set
> a bunch of over-subscription values only to have them ignored because a
> scheduler filter isn't configured.
> 
> If we ignore some of the resources on schedule, the compute nodes will
> start reporting values that will make the resources appear to be
> negative to anything looking at the data. Before a somewhat-recent
> change of mine, the oversubscribed computes would have *failed* to
> report negative resources at all, which was a problem for a reconfigure
> event. I think the scheduler purposefully forcing computes into the red
> is a mistake.
> 
> Further, new users that don't know our sins of the past will wonder why
> the nice system they see in front of them isn't doing the right thing.
> Existing users can reconfigure allocation ratio values before they
> upgrade. We can also add something to our upgrade status tool to warn them.
> 

It's litterally 2 days before FeatureFreeze and we ask operators to
change their cloud right now ? Looks difficult to me and like I said in
multiple places by email, we have a ton of assertions saying it's
acceptable to have not all the filters.

-Sylvain

> --Dan
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] [operators] Optional resource asking or not?

2017-01-23 Thread Sylvain Bauza


Le 23/01/2017 15:18, Sylvain Bauza a écrit :
> 
> 
> Le 23/01/2017 15:11, Jay Pipes a écrit :
>> On 01/22/2017 04:40 PM, Sylvain Bauza wrote:
>>> Hey folks,
>>>
>>> tl;dr: should we GET /resource_providers for only the related resources
>>> that correspond to enabled filters ?
>>
>> No. Have administrators set the allocation ratios for the resources they
>> do not care about exceeding capacity to a very high number.
>>
>> If someone previously removed a filter, that doesn't mean that the
>> resources were not consumed on a host. It merely means the admin was
>> willing to accept a high amount of oversubscription. That's what the
>> allocation_ratio is for.
>>
>> The flavor should continue to have a consumed disk/vcpu/ram amount,
>> because the VM *does actually consume those resources*. If the operator
>> doesn't care about oversubscribing one or more of those resources, they
>> should set the allocation ratios of those inventories to a high value.
>>
>> No more adding configuration options for this kind of thing (or in this
>> case, looking at an old configuration option and parsing it to see if a
>> certain filter is listed in the list of enabled filters).
>>
>> We have a proper system of modeling these data-driven decisions now, so
>> my opinion is we should use it and ask operators to use the placement
>> REST API for what it was intended.
>>
> 
> I know your point, but please consider mine.
> What if an operator disabled CoreFilter in Newton and wants to upgrade
> to Ocata ?
> All of that implementation being very close to the deadline makes me
> nervous and I really want the seamless path for operators now using the
> placement service.
> 
> Also, like I said in my bigger explanation, we should need to modify a
> shit ton of assertions in our tests that can say "meh, don't use all the
> filters, but just these ones". Pretty risky so close to a FF.
> 

Oh, just discovered a related point : in Devstack, we don't set the
CoreFilter by default !
https://github.com/openstack-dev/devstack/blob/adcf0c50cd87c68abef7c3bb4785a07d3545be5d/lib/nova#L94

TBC, that means that the gate is not verifying the VCPUs by the filter,
just by the compute claims. Heh.

Honestly I think we really need to optionally the filters for Ocata then.

-Sylvain

> -Sylvain
> 
> 
>> Best,
>> -jay
>>
>>> Explanation below why even if I
>>> know we have a current consensus, maybe we should discuss again about it.
>>>
>>>
>>> I'm still trying to implement https://review.openstack.org/#/c/417961/
>>> but when trying to get the functional job being +1, I discovered that we
>>> have at least one functional test [1] asking for just the RAMFilter (and
>>> not for VCPUs or disks).
>>>
>>> Given the current PS is asking for *all* both CPU, RAM and disk, it's
>>> trampling the current test by getting a NoValidHost.
>>>
>>> Okay, I could just modify the test and make sure we have enough
>>> resources for the flavors but I actually now wonder if that's all good
>>> for our operators.
>>>
>>> I know we have a consensus saying that we should still ask for both CPU,
>>> RAM and disk at the same time, but I imagine our users coming back to us
>>> saying "eh, look, I'm no longer able to create instances even if I'm not
>>> using the CoreFilter" for example. It could be a bad day for them and
>>> honestly, I'm not sure just adding documentation or release notes would
>>> help them.
>>>
>>> What are you thinking if we say that for only this cycle, we still try
>>> to only ask for resources that are related to the enabled filters ?
>>> For example, say someone is disabling CoreFilter in the conf opt, then
>>> the scheduler shouldn't ask for VCPUs to the Placement API.
>>>
>>> FWIW, we have another consensus about not removing
>>> CoreFilter/RAMFilter/MemoryFilter because the CachingScheduler is still
>>> using them (and not calling the Placement API).
>>>
>>> Thanks,
>>> -Sylvain
>>>
>>> [1]
>>> https://github.com/openstack/nova/blob/de0eff47f2cfa271735bb754637f979659a2d91a/nova/tests/functional/test_server_group.py#L48
>>>
>>>
>>> __
>>>
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> openstack-dev-requ...@lists.openstack.org?subject:uns

Re: [openstack-dev] [nova] [placement] [operators] Optional resource asking or not?

2017-01-23 Thread Sylvain Bauza


Le 23/01/2017 15:11, Jay Pipes a écrit :
> On 01/22/2017 04:40 PM, Sylvain Bauza wrote:
>> Hey folks,
>>
>> tl;dr: should we GET /resource_providers for only the related resources
>> that correspond to enabled filters ?
> 
> No. Have administrators set the allocation ratios for the resources they
> do not care about exceeding capacity to a very high number.
> 
> If someone previously removed a filter, that doesn't mean that the
> resources were not consumed on a host. It merely means the admin was
> willing to accept a high amount of oversubscription. That's what the
> allocation_ratio is for.
> 
> The flavor should continue to have a consumed disk/vcpu/ram amount,
> because the VM *does actually consume those resources*. If the operator
> doesn't care about oversubscribing one or more of those resources, they
> should set the allocation ratios of those inventories to a high value.
> 
> No more adding configuration options for this kind of thing (or in this
> case, looking at an old configuration option and parsing it to see if a
> certain filter is listed in the list of enabled filters).
> 
> We have a proper system of modeling these data-driven decisions now, so
> my opinion is we should use it and ask operators to use the placement
> REST API for what it was intended.
> 

I know your point, but please consider mine.
What if an operator disabled CoreFilter in Newton and wants to upgrade
to Ocata ?
All of that implementation being very close to the deadline makes me
nervous and I really want the seamless path for operators now using the
placement service.

Also, like I said in my bigger explanation, we should need to modify a
shit ton of assertions in our tests that can say "meh, don't use all the
filters, but just these ones". Pretty risky so close to a FF.

-Sylvain


> Best,
> -jay
> 
>> Explanation below why even if I
>> know we have a current consensus, maybe we should discuss again about it.
>>
>>
>> I'm still trying to implement https://review.openstack.org/#/c/417961/
>> but when trying to get the functional job being +1, I discovered that we
>> have at least one functional test [1] asking for just the RAMFilter (and
>> not for VCPUs or disks).
>>
>> Given the current PS is asking for *all* both CPU, RAM and disk, it's
>> trampling the current test by getting a NoValidHost.
>>
>> Okay, I could just modify the test and make sure we have enough
>> resources for the flavors but I actually now wonder if that's all good
>> for our operators.
>>
>> I know we have a consensus saying that we should still ask for both CPU,
>> RAM and disk at the same time, but I imagine our users coming back to us
>> saying "eh, look, I'm no longer able to create instances even if I'm not
>> using the CoreFilter" for example. It could be a bad day for them and
>> honestly, I'm not sure just adding documentation or release notes would
>> help them.
>>
>> What are you thinking if we say that for only this cycle, we still try
>> to only ask for resources that are related to the enabled filters ?
>> For example, say someone is disabling CoreFilter in the conf opt, then
>> the scheduler shouldn't ask for VCPUs to the Placement API.
>>
>> FWIW, we have another consensus about not removing
>> CoreFilter/RAMFilter/MemoryFilter because the CachingScheduler is still
>> using them (and not calling the Placement API).
>>
>> Thanks,
>> -Sylvain
>>
>> [1]
>> https://github.com/openstack/nova/blob/de0eff47f2cfa271735bb754637f979659a2d91a/nova/tests/functional/test_server_group.py#L48
>>
>>
>> __
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] [operators] Optional resource asking or not?

2017-01-23 Thread Sylvain Bauza


Le 22/01/2017 22:40, Sylvain Bauza a écrit :
> Hey folks,
> 
> tl;dr: should we GET /resource_providers for only the related resources
> that correspond to enabled filters ? Explanation below why even if I
> know we have a current consensus, maybe we should discuss again about it.
> 
> 
> I'm still trying to implement https://review.openstack.org/#/c/417961/
> but when trying to get the functional job being +1, I discovered that we
> have at least one functional test [1] asking for just the RAMFilter (and
> not for VCPUs or disks).
> 
> Given the current PS is asking for *all* both CPU, RAM and disk, it's
> trampling the current test by getting a NoValidHost.
> 
> Okay, I could just modify the test and make sure we have enough
> resources for the flavors but I actually now wonder if that's all good
> for our operators.
> 
> I know we have a consensus saying that we should still ask for both CPU,
> RAM and disk at the same time, but I imagine our users coming back to us
> saying "eh, look, I'm no longer able to create instances even if I'm not
> using the CoreFilter" for example. It could be a bad day for them and
> honestly, I'm not sure just adding documentation or release notes would
> help them.
> 
> What are you thinking if we say that for only this cycle, we still try
> to only ask for resources that are related to the enabled filters ?
> For example, say someone is disabling CoreFilter in the conf opt, then
> the scheduler shouldn't ask for VCPUs to the Placement API.
> 
> FWIW, we have another consensus about not removing
> CoreFilter/RAMFilter/MemoryFilter because the CachingScheduler is still
> using them (and not calling the Placement API).
> 

A quick follow-up :
I first thought on some operators already disabling the DiskFilter
because they don't trust its calculations for shared disk.
We also have people that don't run the CoreFilter because they prefer
having only the compute claims doing the math and they don't care of
allocation ratios at all.


All those people would be trampled if we now begin to count resources
based on things they explicitely disabled.
That's why I updated my patch series and I wrote a quick verification on
which filter is running :

https://review.openstack.org/#/c/417961/16/nova/scheduler/host_manager.py@640

Ideally, I would refine that so that we would modify the BaseFilter
structure by having a method that would return the resource amount
needed by the RequestSpec and that would also disable the filter so it
would return always true (no need to doublecheck the filter if the
placement service already told this compute is sane). That way, we could
slowly but surely keep the existing interface for optionally verify
resources (ie. people would still use filters) but we would have the new
logic made by the Placement engine.

Given the very short window, that can be done in Pike, but at least
operators wouldn't be impacted in the upgrade path.

-Sylvain

> Thanks,
> -Sylvain
> 
> [1]
> https://github.com/openstack/nova/blob/de0eff47f2cfa271735bb754637f979659a2d91a/nova/tests/functional/test_server_group.py#L48
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] [placement] [operators] Optional resource asking or not?

2017-01-22 Thread Sylvain Bauza
Hey folks,

tl;dr: should we GET /resource_providers for only the related resources
that correspond to enabled filters ? Explanation below why even if I
know we have a current consensus, maybe we should discuss again about it.


I'm still trying to implement https://review.openstack.org/#/c/417961/
but when trying to get the functional job being +1, I discovered that we
have at least one functional test [1] asking for just the RAMFilter (and
not for VCPUs or disks).

Given the current PS is asking for *all* both CPU, RAM and disk, it's
trampling the current test by getting a NoValidHost.

Okay, I could just modify the test and make sure we have enough
resources for the flavors but I actually now wonder if that's all good
for our operators.

I know we have a consensus saying that we should still ask for both CPU,
RAM and disk at the same time, but I imagine our users coming back to us
saying "eh, look, I'm no longer able to create instances even if I'm not
using the CoreFilter" for example. It could be a bad day for them and
honestly, I'm not sure just adding documentation or release notes would
help them.

What are you thinking if we say that for only this cycle, we still try
to only ask for resources that are related to the enabled filters ?
For example, say someone is disabling CoreFilter in the conf opt, then
the scheduler shouldn't ask for VCPUs to the Placement API.

FWIW, we have another consensus about not removing
CoreFilter/RAMFilter/MemoryFilter because the CachingScheduler is still
using them (and not calling the Placement API).

Thanks,
-Sylvain

[1]
https://github.com/openstack/nova/blob/de0eff47f2cfa271735bb754637f979659a2d91a/nova/tests/functional/test_server_group.py#L48

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Order of n-api (placement) and n-sch upgrades for Ocata

2017-01-20 Thread Sylvain Bauza


Le 19/01/2017 21:39, Matt Riedemann a écrit :
> On Thu, Jan 19, 2017 at 2:29 PM, Alex Schultz 
> wrote:
>> 
>> What are these issues? My original message was to highlight one 
>> particular deployment type which is completely independent of
>> how things get packaged in the traditional sense of the word 
>> (rpms/deb/tar.gz).  Perhaps it's getting lost in terminology,
>> but packaging the software in one way and how it's run can be two
>> separate issues.  So what I'd like to know is how is that
>> impacted by whatever ordering is necessary, and if there's anyway
>> way not to explicitly have special cases that need to be handled
>> by the end user when applying updates.  It seems like we all want
>> similar things. I would like not to have to do anything different
>> from the install for upgrade. Why can't apply configs, restart
>> all services?  Or can I?  I seem to be getting mixed messages...
>> 
>> 
> 
> Sorry for being unclear on the issue. As Jay pointed out, if 
> nova-scheduler is upgraded before the placement service, the 
> nova-scheduler service will continue to start and take requests.
> The problem is if the filter scheduler code is requesting a
> microversion in the placement API which isn't available yet, in
> particular this 1.4 microversion, then scheduling requests will
> fail which to the end user means NoValidHost (the same as if we
> don't have any compute nodes yet, or available).
> 
> So as Jay also pointed out, if placement and n-sch are upgraded
> and restarted at the same time, the window for hitting this is
> minimal. If deployment tooling is written to make sure to restart
> the placement service *before* nova-scheduler, then there should be
> no window for issues.
> 


Thanks all for providing insights. I'm trying to see a consensus here,
and while I understand the concerns from Alex about the upgrade, I
think it's okay for a deployer having a "controller" node (disclaimer:
Nova doesn't have this concept, rather a list of components that are
not compute nodes) to have a very quick downtime (I mean getting
NoValidHosts if an user asks for an instance while the "controller" is
upgraded).
To be honest, Nova has never supported (yet) having rolling upgrades
for services that are not computes. If you look at the upgrade devref,
we ask for a maintenance window [1]. During that maintenance window,
we say it's safer to upgrade "nova-conductor first and nova-api last"
for coherence reasons but since that's during the maintenance window,
we're not supposed to have user requests coming in.

So, to circle back with the original problem, I think having the
nova-scheduler upgraded *before* placement is not a problem. If
deployers don't want to implement an upgrade scenario where placement
is upgraded before scheduler, that's fine. No need of extra work for
deployers. That's just that *if* you implement that path, the
scheduler could still get requests.

-Sylvain

[1]
http://docs.openstack.org/developer/nova/upgrade.html#rolling-upgrade-process


> --
> 
> Thanks,
> 
> Matt
> 
> __
>
> 
OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe 
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Do not recheck changes until 422709 is merged

2017-01-20 Thread Sylvain Bauza


Le 20/01/2017 04:16, Matt Riedemann a écrit :
> On 1/19/2017 5:09 PM, Matt Riedemann wrote:
>> On 1/19/2017 10:56 AM, Matt Riedemann wrote:
>>> The py35 unit test job is broken for Nova until this patch is merged:
>>>
>>> https://review.openstack.org/#/c/422709/
>>>
>>> So please hold off on the rechecks until that happens.
>>>
>>
>> We're good to go again for rechecks.
>>
> 
> Just a heads up that if you still see py35 unit test failures with a
> TypeError like this [1] then you need to rebase you patch.
> 
> [1]
> http://logs.openstack.org/37/410737/5/check/gate-nova-python35-db/4fec66c/console.html#_2017-01-20_01_56_21_021702
> 
> 

Just for understanding correctly the context, which merged change fixes
that, needing a rebase ?


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Order of n-api (placement) and n-sch upgrades for Ocata

2017-01-19 Thread Sylvain Bauza


Le 19/01/2017 17:00, Matt Riedemann a écrit :
> On 1/19/2017 9:43 AM, Sylvain Bauza wrote:
>>
>>
>> Le 19/01/2017 16:27, Matt Riedemann a écrit :
>>> Sylvain and I were talking about how he's going to work placement
>>> microversion requests into his filter scheduler patch [1]. He needs to
>>> make requests to the placement API with microversion 1.4 [2] or later
>>> for resource provider filtering on specific resource classes like VCPU
>>> and MEMORY_MB.
>>>
>>> The question was what happens if microversion 1.4 isn't available in the
>>> placement API, i.e. the nova-scheduler is running Ocata code now but the
>>> placement service is running Newton still.
>>>
>>> Our rolling upgrades doc [3] says:
>>>
>>> "It is safest to start nova-conductor first and nova-api last."
>>>
>>> But since placement is bundled with n-api that would cause issues since
>>> n-sch now depends on the n-api code.
>>>
>>> If you package the placement service separately from the nova-api
>>> service then this is probably not an issue. You can still roll out n-api
>>> last and restart it last (for control services), and just make sure that
>>> placement is upgraded before nova-scheduler (we need to be clear about
>>> that in [3]).
>>>
>>> But do we have any other issues if they are not packaged separately? Is
>>> it possible to install the new code, but still only restart the
>>> placement service before nova-api? I believe it is, but want to ask this
>>> out loud.
>>>
>>> I think we're probably OK here but I wanted to ask this out loud and
>>> make sure everyone is aware and can think about this as we're a week
>>> from feature freeze. We also need to look into devstack/grenade because
>>> I'm fairly certain that we upgrade n-sch *before* placement in a grenade
>>> run which will make any issues here very obvious in [1].
>>>
>>> [1] https://review.openstack.org/#/c/417961/
>>> [2]
>>> http://docs.openstack.org/developer/nova/placement.html#filter-resource-providers-having-requested-resource-capacity
>>>
>>>
>>> [3]
>>> http://docs.openstack.org/developer/nova/upgrade.html#rolling-upgrade-process
>>>
>>>
>>>
>>
>> I thought out loud in the nova channel at the following possibility :
>> since we always ask to upgrade n-cpus *AFTER* upgrading our other
>> services, we could imagine to allow the nova-scheduler gently accept to
>> have a placement service be Newton *UNLESS* you have Ocata computes.
>>
>> On other technical words, the scheduler getting a response from the
>> placement service is an hard requirement for Ocata. That said, if the
>> response code is a 400 with a message saying that the schema is
>> incorrect, it would be checking the max version of all the computes and
>> then :
>>  - either the max version is Newton and then call back the
>> ComputeNodeList.get_all() for getting the list of nodes
>>  - or, the max version is Ocata (at least one node is upgraded), and
>> then we would throw a NoValidHosts
>>
>> That way, the upgrade path would be :
>>  1/ upgrade your conductor
>>  2/ upgrade all your other services but n-cpus (we could upgrade and
>> restart n-sch before n-api, that would still work, or the contrary would
>> be fine too)
>>  3/ rolling upgrade your n-cpus
>>
>> I think we would keep then the existing upgrade path and we would still
>> have the placement service be mandatory for Ocata.
>>
>> Thoughts ?
>> -Sylvain
>>
>> __
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> 
> I don't like basing the n-sch decision on the service version of the
> computes, because the computes will keep trying to connect to the
> placement service until it's available, but not fail. That doesn't
> really mean that placement is new enough for the scheduler to use the
> 1.4 microversion.
> 
> So IMO we either charge forward as planned and make it clear in the docs
> that for Ocata, the placement service must be upgraded *before*
> nova-scheduler, or we punt and provide a fallback to just pulling all
> compute nodes from the database if we can't make the 1.4 request to
> placement. Given my original post here, I'd prefer t

Re: [openstack-dev] [nova] Order of n-api (placement) and n-sch upgrades for Ocata

2017-01-19 Thread Sylvain Bauza


Le 19/01/2017 16:27, Matt Riedemann a écrit :
> Sylvain and I were talking about how he's going to work placement
> microversion requests into his filter scheduler patch [1]. He needs to
> make requests to the placement API with microversion 1.4 [2] or later
> for resource provider filtering on specific resource classes like VCPU
> and MEMORY_MB.
> 
> The question was what happens if microversion 1.4 isn't available in the
> placement API, i.e. the nova-scheduler is running Ocata code now but the
> placement service is running Newton still.
> 
> Our rolling upgrades doc [3] says:
> 
> "It is safest to start nova-conductor first and nova-api last."
> 
> But since placement is bundled with n-api that would cause issues since
> n-sch now depends on the n-api code.
> 
> If you package the placement service separately from the nova-api
> service then this is probably not an issue. You can still roll out n-api
> last and restart it last (for control services), and just make sure that
> placement is upgraded before nova-scheduler (we need to be clear about
> that in [3]).
> 
> But do we have any other issues if they are not packaged separately? Is
> it possible to install the new code, but still only restart the
> placement service before nova-api? I believe it is, but want to ask this
> out loud.
> 
> I think we're probably OK here but I wanted to ask this out loud and
> make sure everyone is aware and can think about this as we're a week
> from feature freeze. We also need to look into devstack/grenade because
> I'm fairly certain that we upgrade n-sch *before* placement in a grenade
> run which will make any issues here very obvious in [1].
> 
> [1] https://review.openstack.org/#/c/417961/
> [2]
> http://docs.openstack.org/developer/nova/placement.html#filter-resource-providers-having-requested-resource-capacity
> 
> [3]
> http://docs.openstack.org/developer/nova/upgrade.html#rolling-upgrade-process
> 
> 

I thought out loud in the nova channel at the following possibility :
since we always ask to upgrade n-cpus *AFTER* upgrading our other
services, we could imagine to allow the nova-scheduler gently accept to
have a placement service be Newton *UNLESS* you have Ocata computes.

On other technical words, the scheduler getting a response from the
placement service is an hard requirement for Ocata. That said, if the
response code is a 400 with a message saying that the schema is
incorrect, it would be checking the max version of all the computes and
then :
 - either the max version is Newton and then call back the
ComputeNodeList.get_all() for getting the list of nodes
 - or, the max version is Ocata (at least one node is upgraded), and
then we would throw a NoValidHosts

That way, the upgrade path would be :
 1/ upgrade your conductor
 2/ upgrade all your other services but n-cpus (we could upgrade and
restart n-sch before n-api, that would still work, or the contrary would
be fine too)
 3/ rolling upgrade your n-cpus

I think we would keep then the existing upgrade path and we would still
have the placement service be mandatory for Ocata.

Thoughts ?
-Sylvain

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [infra] placement job is busted in stable/newton (NO MORE HOSTS LEFT)

2017-01-11 Thread Sylvain Bauza


Le 11/01/2017 02:03, Matt Riedemann a écrit :
> I'm trying to sort out failures in the placement job in stable/newton
> job where the tests aren't failing but it's something in the host
> cleanup step that blows up.
> 
> Looking here I see this:
> 
> http://logs.openstack.org/57/416757/1/check/gate-tempest-dsvm-neutron-placement-full-ubuntu-xenial-nv/dfe0c38/_zuul_ansible/ansible_log.txt.gz
> 
> 
> 2017-01-04 22:46:50,761 p=10771 u=zuul |  changed: [node] => {"changed":
> true, "checksum": "7f4d51086f4bc4de5ae6d83c00b0e458b8606aa2", "dest":
> "/tmp/05-cb20affd78a84851b47992ff129722af.sh", "gid": 3001, "group":
> "jenkins", "md5sum": "2de9baa70e4d28bbcca550a17959beab", "mode": "0555",
> "owner": "jenkins", "size": 647, "src":
> "/tmp/tmpz_guiR/.ansible/remote_tmp/ansible-tmp-1483570010.54-207083993908564/source",
> "state": "file", "uid": 3000}
> 2017-01-04 22:46:50,775 p=10771 u=zuul |  TASK [command generated from
> JJB] **
> 2017-01-04 23:44:42,880 p=10771 u=zuul |  fatal: [node]: FAILED! =>
> {"changed": true, "cmd":
> ["/tmp/05-cb20affd78a84851b47992ff129722af.sh"], "delta":
> "0:57:51.734808", "end": "2017-01-04 23:44:42.632473", "failed": true,
> "rc": 127, "start": "2017-01-04 22:46:50.897665", "stderr": "",
> "stdout": "", "stdout_lines": [], "warnings": []}
> 2017-01-04 23:44:42,887 p=10771 u=zuul |  NO MORE HOSTS LEFT
> *
> 2017-01-04 23:44:42,888 p=10771 u=zuul |  PLAY RECAP
> *
> 2017-01-04 23:44:42,888 p=10771 u=zuul |  node   :
> ok=13   changed=13   unreachable=0failed=1
> 
> I'm not sure what the 'NO MORE HOSTS LEFT' error means. Is there
> something wrong with the post/cleanup step for this job in newton? It's
> non-voting but we're backporting bug fixes for this code since it needs
> to work to upgrade to ocata.
> 


Is there a follow-up on the above problem ?

On a separate change, I also have the placement job being -1 because of
the ComputeFilter saying that the service is disabled because of
'connection of libvirt lost' :

http://logs.openstack.org/20/415520/5/check/gate-tempest-dsvm-neutron-placement-full-ubuntu-xenial-nv/19fcab4/logs/screen-n-sch.txt.gz#_2017-01-11_04_33_35_995


-Sylvain

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] Ocata upgrade procedure and problems when it's optional in Newton

2017-01-10 Thread Sylvain Bauza


Le 10/01/2017 14:49, Sylvain Bauza a écrit :
> Aloha folks,
> 
> Recently, I was discussing with TripleO folks. Disclaimer, I don't think
> it's only a TripleO related discussion but rather a larger one for all
> our deployers.
> 
> So, the question I was asked was about how to upgrade from Newton to
> Ocata for the Placement API when the deployer is not using yet the
> Placement API for Newton (because it was optional in Newton).
> 
> The quick answer was to say "easy, just upgrade the service and run the
> placement API *before* the scheduler upgrade". That's because we're
> working on a change for the scheduler calling the Placement API instead
> of getting all the compute nodes [1]
> 
> That said, I thought about something else : wait, the Newton compute
> nodes work with the Placement API, cool. Cool, but what if the Placement
> API is optional in Newton ? Then, the Newton computes are stopping to
> call the Placement API because of a nice decorator [2] (okay with me)
> 
> Then, imagine the problem for the upgrade : given we don't have
> deployers running the Placement API in Newton, they would need to
> *first* deploy the (Newton or Ocata) Placement service, then SIGHUP all
> the Newton compute nodes to have them reporting the resources (and
> creating the inventories), then wait for some minutes that all the
> inventories are reported, and then upgrade all the services (but the
> compute nodes of course) to Ocata, including the scheduler service.
> 
> The above looks a different upgrade policy, right?
>  - Either we say you need to run the Newton placement service *before*
> upgrading - and in that case, the Placement service is not optional for
> Newton, right?
>  - Or, we say you need to run the Ocata placement service and then
> restart the compute nodes *before* upgrading the services - and that's a
> very different situation than the current upgrade way.
> 
> For example, I know it's not a Nova stuff, but most of our deployers
> have what they say "controllers" vs. "compute" services, ie. all the
> Nova services but computes running on a single (or more) machine(s). In
> that case, the "controller" upgrade is monotonic and all the services
> are upgraded and restarted at the same stage. If so, that looks
> difficult for those deployers to just be asked to have a very different
> procedure.
> 
> Anyway, I think we need to carefully consider that, and probably find
> some solutions. For example, we could imagine (disclaimer #2, that's
> probably silly solutions, but that's the ones I'm thinking now) :
>  - a DB migration for creating the inventories and allocations before
> upgrading (ie. not asking the computes to register themselves to the
> placement API). That would be terrible because it's a data upgrade, I
> know...
>  - having the scheduler having a backwards compatible behaviour in [1],
> ie. trying to call the Placement API for getting the list of RPs or
> failback to calling all the ComputeNodes if that's not possible. But
> that would mean that the Placement API is still optional for Ocata :/
>  - merging the scheduler calling the Placement API [1] in a point
> release after we deliver Ocata (and still make the Placement API
> mandatory for Ocata) so that we would be sure that all computes are
> reporting their status to the Placement once we restart the scheduler in
> the point release.
> 

FWIW, a possible other solution has been discussed upstream in the
#openstack-nova channel and proposed by Dan Smith : we could remove the
try-once behaviour made in the decorator, backport it to Newton and do a
point release which would allow the compute nodes to try to reconcile
with the Placement API in a self-heal manner.

That would mean that deployers would have to upgrade to the latest
Newton point release before upgrading to Ocata, which is I think the
best supported model.

I'll propose a patch for that in my series as a bottom change for [1].

-Sylvain



> 
> Thoughts ?
> -Sylvain
> 
> 
> [1] https://review.openstack.org/#/c/417961/
> 
> [2]
> https://github.com/openstack/nova/blob/180e6340a595ec047c59365465f36fed7a669ec3/nova/scheduler/client/report.py#L40-L67
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] [placement] Ocata upgrade procedure and problems when it's optional in Newton

2017-01-10 Thread Sylvain Bauza
Aloha folks,

Recently, I was discussing with TripleO folks. Disclaimer, I don't think
it's only a TripleO related discussion but rather a larger one for all
our deployers.

So, the question I was asked was about how to upgrade from Newton to
Ocata for the Placement API when the deployer is not using yet the
Placement API for Newton (because it was optional in Newton).

The quick answer was to say "easy, just upgrade the service and run the
placement API *before* the scheduler upgrade". That's because we're
working on a change for the scheduler calling the Placement API instead
of getting all the compute nodes [1]

That said, I thought about something else : wait, the Newton compute
nodes work with the Placement API, cool. Cool, but what if the Placement
API is optional in Newton ? Then, the Newton computes are stopping to
call the Placement API because of a nice decorator [2] (okay with me)

Then, imagine the problem for the upgrade : given we don't have
deployers running the Placement API in Newton, they would need to
*first* deploy the (Newton or Ocata) Placement service, then SIGHUP all
the Newton compute nodes to have them reporting the resources (and
creating the inventories), then wait for some minutes that all the
inventories are reported, and then upgrade all the services (but the
compute nodes of course) to Ocata, including the scheduler service.

The above looks a different upgrade policy, right?
 - Either we say you need to run the Newton placement service *before*
upgrading - and in that case, the Placement service is not optional for
Newton, right?
 - Or, we say you need to run the Ocata placement service and then
restart the compute nodes *before* upgrading the services - and that's a
very different situation than the current upgrade way.

For example, I know it's not a Nova stuff, but most of our deployers
have what they say "controllers" vs. "compute" services, ie. all the
Nova services but computes running on a single (or more) machine(s). In
that case, the "controller" upgrade is monotonic and all the services
are upgraded and restarted at the same stage. If so, that looks
difficult for those deployers to just be asked to have a very different
procedure.

Anyway, I think we need to carefully consider that, and probably find
some solutions. For example, we could imagine (disclaimer #2, that's
probably silly solutions, but that's the ones I'm thinking now) :
 - a DB migration for creating the inventories and allocations before
upgrading (ie. not asking the computes to register themselves to the
placement API). That would be terrible because it's a data upgrade, I
know...
 - having the scheduler having a backwards compatible behaviour in [1],
ie. trying to call the Placement API for getting the list of RPs or
failback to calling all the ComputeNodes if that's not possible. But
that would mean that the Placement API is still optional for Ocata :/
 - merging the scheduler calling the Placement API [1] in a point
release after we deliver Ocata (and still make the Placement API
mandatory for Ocata) so that we would be sure that all computes are
reporting their status to the Placement once we restart the scheduler in
the point release.


Thoughts ?
-Sylvain


[1] https://review.openstack.org/#/c/417961/

[2]
https://github.com/openstack/nova/blob/180e6340a595ec047c59365465f36fed7a669ec3/nova/scheduler/client/report.py#L40-L67

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] Which service is using port 8778?

2016-12-20 Thread Sylvain Bauza


Le 20/12/2016 14:30, Jay Pipes a écrit :
> On 12/20/2016 08:03 AM, Sean Dague wrote:
>> On 12/20/2016 07:53 AM, Sylvain Bauza wrote:
>>> Le 20/12/2016 10:26, Sylvain Bauza a écrit :
>>>> Le 20/12/2016 10:20, Chris Dent a écrit :
>>>>> On Tue, 20 Dec 2016, Sylvain Bauza wrote:
>>>>>
>>>>>> Before moving forward and picking yet another port that could trample
>>>>>> another service, I'd rather prefer first that Senlin jobs would
>>>>>> temporarely disable the placement-* services so that the gate
>>>>>> would be
>>>>>> happy, while in the same time we try to identify a free port
>>>>>> number that
>>>>>> the placement API can safely bind.
>>>>>
>>>>> Another option here may be to not have the placement api bind to two
>>>>> ports. The current set up binds 8778 with the API at /, but what's
>>>>> registered in the service catalog is port 80 with the API at
>>>>> /placement.
>>>>>
>>>>> So perhaps only use the http://1.2.3.4/placement and disable the
>>>>> virtualhost that listens on 8778?
>>>>>
>>>>> I'd experiment with this myself but I'm going to be away from a
>>>>> compute all day. If people think it is a good idea but nobody has a
>>>>> chance to do it today I'll look into it tomorrow.
>>>>>
>>>>
>>>> Oh good catch. Since we register the service catalog with port 80, then
>>>> there is no reason to consume an application port.
>>>> Chris, don't worry, I'll play with that today.
>>>>
>>>
>>> So, after some investigation, I totally understand why we're using
>>> virtualhosts for running the WSGI apps corresponding to each service,
>>> since we want to keep the service catalog entries unchanged if the
>>> operator wants to move from eventlet to mod_wsgi.
>>>
>>> Given that devstack was deploying a service catalog entry pointing to
>>> HTTP port, should we just assume to drop the use of port 8778 ?
>>> I'm a bit afraid of any possible impact it could have for operators
>>> using the placement API with the virtualhost support which also provide
>>> a WSGI daemon mode compared to the embedded mode that is executing calls
>>> to :80/placement...
>>>
>>> Thoughts ? I mean, I can do the cut and drop that, but that will
>>> certainly have impact for other deployers that were reproducing the
>>> devstack install, like for TripleO :
>>> https://review.openstack.org/#/c/406300/13/manifests/wsgi/apache_placement.pp
>>>
>>
>> Yes, we should stop with the magic ports. Part of the reason of
>> switching over to apache was to alleviate all of that.
> 
> +100
> 
> -jay
> 

Change is up, comments welcome : https://review.openstack.org/#/c/413118/

-Sylvain

> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] Which service is using port 8778?

2016-12-20 Thread Sylvain Bauza


Le 20/12/2016 10:26, Sylvain Bauza a écrit :
> 
> 
> Le 20/12/2016 10:20, Chris Dent a écrit :
>> On Tue, 20 Dec 2016, Sylvain Bauza wrote:
>>
>>> Before moving forward and picking yet another port that could trample
>>> another service, I'd rather prefer first that Senlin jobs would
>>> temporarely disable the placement-* services so that the gate would be
>>> happy, while in the same time we try to identify a free port number that
>>> the placement API can safely bind.
>>
>> Another option here may be to not have the placement api bind to two
>> ports. The current set up binds 8778 with the API at /, but what's
>> registered in the service catalog is port 80 with the API at
>> /placement.
>>
>> So perhaps only use the http://1.2.3.4/placement and disable the
>> virtualhost that listens on 8778?
>>
>> I'd experiment with this myself but I'm going to be away from a
>> compute all day. If people think it is a good idea but nobody has a
>> chance to do it today I'll look into it tomorrow.
>>
> 
> Oh good catch. Since we register the service catalog with port 80, then
> there is no reason to consume an application port.
> Chris, don't worry, I'll play with that today.
> 

So, after some investigation, I totally understand why we're using
virtualhosts for running the WSGI apps corresponding to each service,
since we want to keep the service catalog entries unchanged if the
operator wants to move from eventlet to mod_wsgi.

Given that devstack was deploying a service catalog entry pointing to
HTTP port, should we just assume to drop the use of port 8778 ?
I'm a bit afraid of any possible impact it could have for operators
using the placement API with the virtualhost support which also provide
a WSGI daemon mode compared to the embedded mode that is executing calls
to :80/placement...

Thoughts ? I mean, I can do the cut and drop that, but that will
certainly have impact for other deployers that were reproducing the
devstack install, like for TripleO :
https://review.openstack.org/#/c/406300/13/manifests/wsgi/apache_placement.pp

-Sylvain

> -Sylvain
> 
> 
>>
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] Which service is using port 8778?

2016-12-20 Thread Sylvain Bauza


Le 20/12/2016 10:20, Chris Dent a écrit :
> On Tue, 20 Dec 2016, Sylvain Bauza wrote:
> 
>> Before moving forward and picking yet another port that could trample
>> another service, I'd rather prefer first that Senlin jobs would
>> temporarely disable the placement-* services so that the gate would be
>> happy, while in the same time we try to identify a free port number that
>> the placement API can safely bind.
> 
> Another option here may be to not have the placement api bind to two
> ports. The current set up binds 8778 with the API at /, but what's
> registered in the service catalog is port 80 with the API at
> /placement.
> 
> So perhaps only use the http://1.2.3.4/placement and disable the
> virtualhost that listens on 8778?
> 
> I'd experiment with this myself but I'm going to be away from a
> compute all day. If people think it is a good idea but nobody has a
> chance to do it today I'll look into it tomorrow.
> 

Oh good catch. Since we register the service catalog with port 80, then
there is no reason to consume an application port.
Chris, don't worry, I'll play with that today.

-Sylvain


> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


  1   2   3   4   5   6   >