Re: [openstack-dev] [nova] [placement] [operators] Optional resource asking or not?

2017-01-25 Thread Dan Smith
> Update on that agreement : I made the necessary modification in the
> proposal [1] for not verifying the filters. We now send a request to the
> Placement API by introspecting the flavor and we get a list of potential
> destinations.

Thanks!

> When I began doing that modification, I know there was a functional test
> about server groups that needed modifications to match our agreement. I
> consequently made that change located in a separate patch [2] as a
> prerequisite for [1].
> 
> I then spotted a problem that we didn't identified when discussing :
> when checking a destination, the legacy filters for CPU, RAM and disk
> don't verify the maximum capacity of the host, they only multiple the
> total size by the allocation ratio, so our proposal works for them.
> Now, when using the placement service, it fails because somewhere in the
> DB call needed for returning the destinations, we also verify a specific
> field named max_unit [3].
> 
> Consequently, the proposal we agreed is not feature-parity between
> Newton and Ocata. If you follow our instructions, you will still get
> different result from a placement perspective between what was in Newton
> and what will be Ocata.

To summarize some discussion on IRC:

The max_unit field limits the maximum size of any single allocation and
is not scaled by the allocation_ratio (for good reason). Right now,
computes report a max_unit equal to their total for CPU and RAM
resources. So the different behavior here is that placement will not
choose hosts where the instance would single-handedly overcommit the
entire host. Multiple instances still could, per the rules of the
allocation-ratio.

The consensus seems to be that this is entirely sane behavior that the
previous core and ram filters weren't considering. If there's a good
reason to allow computes to report that they're willing to take a
larger-than-100% single allocation, then we can make that change later,
but the justification seems lacking at the moment.

> Technically speaking, the functional test is a canary bird, telling you
> that you get NoValidHosts while it was working previously.

My opinion, which is shared by several other people, is that this test
is broken. It's trying to overcommit the host with a single instance,
and in fact, it's doing it unintentionally for some resources that just
aren't checked before the move to placement. Changing the test to
properly reflect the resources on the host should be the path forward
and Sylvain is working on that now.

The other concern that was raised was that since CoreFilter is not
necessarily enabled on all clouds, cpu_allocation_ratio is not being
honored on those systems today. Moving to placement with ocata will
cause that value to be used, which may be incorrect for certain
overly-committed clouds which had previously ignored it. However, I
think we need not be too concerned as the defaults for these values are
16x overcommit for CPU and 1.5x overcommit for RAM. Those are probably
on the upper limit of sane for most environments, but also large enough
to not cause any sort of immediate panic while people realize (if they
didn't read the release notes) that they may want to tweak them.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] [operators] Optional resource asking or not?

2017-01-25 Thread Sylvain Bauza


Le 25/01/2017 05:10, Matt Riedemann a écrit :
> On 1/24/2017 2:57 PM, Matt Riedemann wrote:
>> On 1/24/2017 2:38 PM, Sylvain Bauza wrote:
>>>
>>> It's litterally 2 days before FeatureFreeze and we ask operators to
>>> change their cloud right now ? Looks difficult to me and like I said in
>>> multiple places by email, we have a ton of assertions saying it's
>>> acceptable to have not all the filters.
>>>
>>> -Sylvain
>>>
>>
>> I'm not sure why feature freeze in two days is going to make a huge
>> amount of difference here. Most large production clouds are probably
>> nowhere near trunk (I'm assuming most are on Mitaka or older at this
>> point just because of how deployments seem to tail the oldest supported
>> stable branch). Or are you mainly worried about deployment tooling
>> projects, like TripleO, needing to deal with this now?
>>
>> Anyone upgrading to Ocata is going to have to read the release notes and
>> assess the upgrade impacts regardless of when we make this change, be
>> that Ocata or Pike.
>>
>> Sylvain, are you suggesting that for Ocata if, for example, the
>> CoreFilter isn't in the list of enabled scheduler filters, we don't make
>> the request for VCPU when filtering resource providers, but we also log
>> a big fat warning in the n-sch logs saying we're going to switch over in
>> Pike and that cpu_allocation_ratio needs to be configured because the
>> CoreFilter is going to be deprecated in Ocata and removed in Pike?
>>
>> [1]
>> https://specs.openstack.org/openstack/nova-specs/specs/ocata/approved/resource-providers-scheduler-db-filters.html#other-deployer-impact
>>
>>
>>
> 
> To recap the discussion we had in IRC today, we're moving forward with
> the original plan of the *filter scheduler* always requesting VCPU,
> MEMORY_MB and DISK_GB* regardless of the enabled filters. The main
> reason being there isn't a clear path forward on straddling releases to
> deprecate or make decisions based on the enabled filters and provide a
> warning that makes sense.
> 
> For example, we can't deprecate the filters (at least yet) because the
> *caching scheduler* is still using them (it's not using placement yet).
> And if we logged a warning if you don't have the CoreFilter in
> CONF.filter_scheduler.enabled_filters, for example, but we don't want
> you to have it in that list, then what are you supposed to do? i.e. the
> goal is to not have the legacy primitive resource filters enabled for
> the filter scheduler in Pike, so you get into this weird situation of
> whether or not you have them enabled or not before Pike, and in what
> cases do you log a warning that makes sense. So we agreed at this point
> it's just simpler to say that if you don't enable these filters today,
> you're going to need to configure the appropriate allocation ratio
> configuration option prior to upgrading to Ocata. That will be in the
> upgrade section of the release notes and we can probably also work it
> into the placement devref as a deployment note. We can also work this
> into the nova-status upgrade check CLI.
> 
> *DISK_GB is special since we might have a flavor that's not specifying
> any disk or a resource provider with no DISK_GB allocations if the
> instances are all booted from volumes.
> 

Update on that agreement : I made the necessary modification in the
proposal [1] for not verifying the filters. We now send a request to the
Placement API by introspecting the flavor and we get a list of potential
destinations.

When I began doing that modification, I know there was a functional test
about server groups that needed modifications to match our agreement. I
consequently made that change located in a separate patch [2] as a
prerequisite for [1].

I then spotted a problem that we didn't identified when discussing :
when checking a destination, the legacy filters for CPU, RAM and disk
don't verify the maximum capacity of the host, they only multiple the
total size by the allocation ratio, so our proposal works for them.
Now, when using the placement service, it fails because somewhere in the
DB call needed for returning the destinations, we also verify a specific
field named max_unit [3].

Consequently, the proposal we agreed is not feature-parity between
Newton and Ocata. If you follow our instructions, you will still get
different result from a placement perspective between what was in Newton
and what will be Ocata.

Technically speaking, the functional test is a canary bird, telling you
that you get NoValidHosts while it was working previously.

After that I'm stuck. We can be discussing for a while about whether all
of that is sane or not, but the fact is, there is a discrepancy.

Honestly, I don't know what to do unless considering that we're now so
close to the Feature Freeze that it's becoming an all-or-none situation.
My only silver bullet I still have could be considering a placement
failure as non blocker and fallbacking to calling the full list of nodes
for Ocata. I know that sucks, but I don't 

Re: [openstack-dev] [nova] [placement] [operators] Optional resource asking or not?

2017-01-24 Thread Matt Riedemann

On 1/24/2017 2:57 PM, Matt Riedemann wrote:

On 1/24/2017 2:38 PM, Sylvain Bauza wrote:


It's litterally 2 days before FeatureFreeze and we ask operators to
change their cloud right now ? Looks difficult to me and like I said in
multiple places by email, we have a ton of assertions saying it's
acceptable to have not all the filters.

-Sylvain



I'm not sure why feature freeze in two days is going to make a huge
amount of difference here. Most large production clouds are probably
nowhere near trunk (I'm assuming most are on Mitaka or older at this
point just because of how deployments seem to tail the oldest supported
stable branch). Or are you mainly worried about deployment tooling
projects, like TripleO, needing to deal with this now?

Anyone upgrading to Ocata is going to have to read the release notes and
assess the upgrade impacts regardless of when we make this change, be
that Ocata or Pike.

Sylvain, are you suggesting that for Ocata if, for example, the
CoreFilter isn't in the list of enabled scheduler filters, we don't make
the request for VCPU when filtering resource providers, but we also log
a big fat warning in the n-sch logs saying we're going to switch over in
Pike and that cpu_allocation_ratio needs to be configured because the
CoreFilter is going to be deprecated in Ocata and removed in Pike?

[1]
https://specs.openstack.org/openstack/nova-specs/specs/ocata/approved/resource-providers-scheduler-db-filters.html#other-deployer-impact




To recap the discussion we had in IRC today, we're moving forward with 
the original plan of the *filter scheduler* always requesting VCPU, 
MEMORY_MB and DISK_GB* regardless of the enabled filters. The main 
reason being there isn't a clear path forward on straddling releases to 
deprecate or make decisions based on the enabled filters and provide a 
warning that makes sense.


For example, we can't deprecate the filters (at least yet) because the 
*caching scheduler* is still using them (it's not using placement yet). 
And if we logged a warning if you don't have the CoreFilter in 
CONF.filter_scheduler.enabled_filters, for example, but we don't want 
you to have it in that list, then what are you supposed to do? i.e. the 
goal is to not have the legacy primitive resource filters enabled for 
the filter scheduler in Pike, so you get into this weird situation of 
whether or not you have them enabled or not before Pike, and in what 
cases do you log a warning that makes sense. So we agreed at this point 
it's just simpler to say that if you don't enable these filters today, 
you're going to need to configure the appropriate allocation ratio 
configuration option prior to upgrading to Ocata. That will be in the 
upgrade section of the release notes and we can probably also work it 
into the placement devref as a deployment note. We can also work this 
into the nova-status upgrade check CLI.


*DISK_GB is special since we might have a flavor that's not specifying 
any disk or a resource provider with no DISK_GB allocations if the 
instances are all booted from volumes.


--

Thanks,

Matt Riedemann

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] [operators] Optional resource asking or not?

2017-01-24 Thread Matt Riedemann

On 1/24/2017 2:57 PM, Matt Riedemann wrote:

On 1/24/2017 2:38 PM, Sylvain Bauza wrote:


It's litterally 2 days before FeatureFreeze and we ask operators to
change their cloud right now ? Looks difficult to me and like I said in
multiple places by email, we have a ton of assertions saying it's
acceptable to have not all the filters.

-Sylvain



I'm not sure why feature freeze in two days is going to make a huge
amount of difference here. Most large production clouds are probably
nowhere near trunk (I'm assuming most are on Mitaka or older at this
point just because of how deployments seem to tail the oldest supported
stable branch). Or are you mainly worried about deployment tooling
projects, like TripleO, needing to deal with this now?

Anyone upgrading to Ocata is going to have to read the release notes and
assess the upgrade impacts regardless of when we make this change, be
that Ocata or Pike.

Sylvain, are you suggesting that for Ocata if, for example, the
CoreFilter isn't in the list of enabled scheduler filters, we don't make
the request for VCPU when filtering resource providers, but we also log
a big fat warning in the n-sch logs saying we're going to switch over in
Pike and that cpu_allocation_ratio needs to be configured because the
CoreFilter is going to be deprecated in Ocata and removed in Pike?

[1]
https://specs.openstack.org/openstack/nova-specs/specs/ocata/approved/resource-providers-scheduler-db-filters.html#other-deployer-impact




To recap the discussion we had in IRC today, we're moving forward with 
the original plan of the *filter scheduler* always requesting VCPU, 
MEMORY_MB and DISK_GB* regardless of the enabled filters. The main 
reason being there isn't a clear path forward on straddling releases to 
deprecate or make decisions based on the enabled filters and provide a 
warning that makes sense.


For example, we can't deprecate the filters (at least yet) because the 
*caching scheduler* is still using them (it's not using placement yet). 
And if we logged a warning if you don't have the CoreFilter in 
CONF.filter_scheduler.enabled_filters, for example, but we don't want 
you to have it in that list, then what are you supposed to do? i.e. the 
goal is to not have the legacy primitive resource filters enabled for 
the filter scheduler in Pike, so you get into this weird situation of 
whether or not you have them enabled or not before Pike, and in what 
cases do you log a warning that makes sense. So we agreed at this point 
it's just simpler to say that if you don't enable these filters today, 
you're going to need to configure the appropriate allocation ratio 
configuration option prior to upgrading to Ocata. That will be in the 
upgrade section of the release notes and we can probably also work it 
into the placement devref as a deployment note. We can also work this 
into the nova-status upgrade check CLI.


*DISK_GB is special since we might have a flavor that's not specifying 
any disk or a resource provider with no DISK_GB allocations if the 
instances are all booted from volumes.


--

Thanks,

Matt Riedemann

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] [operators] Optional resource asking or not?

2017-01-24 Thread Matt Riedemann

On 1/24/2017 2:38 PM, Sylvain Bauza wrote:


It's litterally 2 days before FeatureFreeze and we ask operators to
change their cloud right now ? Looks difficult to me and like I said in
multiple places by email, we have a ton of assertions saying it's
acceptable to have not all the filters.

-Sylvain



I'm not sure why feature freeze in two days is going to make a huge 
amount of difference here. Most large production clouds are probably 
nowhere near trunk (I'm assuming most are on Mitaka or older at this 
point just because of how deployments seem to tail the oldest supported 
stable branch). Or are you mainly worried about deployment tooling 
projects, like TripleO, needing to deal with this now?


Anyone upgrading to Ocata is going to have to read the release notes and 
assess the upgrade impacts regardless of when we make this change, be 
that Ocata or Pike.


Sylvain, are you suggesting that for Ocata if, for example, the 
CoreFilter isn't in the list of enabled scheduler filters, we don't make 
the request for VCPU when filtering resource providers, but we also log 
a big fat warning in the n-sch logs saying we're going to switch over in 
Pike and that cpu_allocation_ratio needs to be configured because the 
CoreFilter is going to be deprecated in Ocata and removed in Pike?


[1] 
https://specs.openstack.org/openstack/nova-specs/specs/ocata/approved/resource-providers-scheduler-db-filters.html#other-deployer-impact


--

Thanks,

Matt Riedemann

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] [operators] Optional resource asking or not?

2017-01-24 Thread Sylvain Bauza


Le 24/01/2017 22:22, Dan Smith a écrit :
>> No. Have administrators set the allocation ratios for the resources they
>> do not care about exceeding capacity to a very high number.
>>
>> If someone previously removed a filter, that doesn't mean that the
>> resources were not consumed on a host. It merely means the admin was
>> willing to accept a high amount of oversubscription. That's what the
>> allocation_ratio is for.
>>
>> The flavor should continue to have a consumed disk/vcpu/ram amount,
>> because the VM *does actually consume those resources*. If the operator
>> doesn't care about oversubscribing one or more of those resources, they
>> should set the allocation ratios of those inventories to a high value.
>>
>> No more adding configuration options for this kind of thing (or in this
>> case, looking at an old configuration option and parsing it to see if a
>> certain filter is listed in the list of enabled filters).
>>
>> We have a proper system of modeling these data-driven decisions now, so
>> my opinion is we should use it and ask operators to use the placement
>> REST API for what it was intended.
> 
> I agree with the above. I think it's extremely counter-intuitive to set
> a bunch of over-subscription values only to have them ignored because a
> scheduler filter isn't configured.
> 
> If we ignore some of the resources on schedule, the compute nodes will
> start reporting values that will make the resources appear to be
> negative to anything looking at the data. Before a somewhat-recent
> change of mine, the oversubscribed computes would have *failed* to
> report negative resources at all, which was a problem for a reconfigure
> event. I think the scheduler purposefully forcing computes into the red
> is a mistake.
> 
> Further, new users that don't know our sins of the past will wonder why
> the nice system they see in front of them isn't doing the right thing.
> Existing users can reconfigure allocation ratio values before they
> upgrade. We can also add something to our upgrade status tool to warn them.
> 

It's litterally 2 days before FeatureFreeze and we ask operators to
change their cloud right now ? Looks difficult to me and like I said in
multiple places by email, we have a ton of assertions saying it's
acceptable to have not all the filters.

-Sylvain

> --Dan
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] [operators] Optional resource asking or not?

2017-01-24 Thread Dan Smith
> No. Have administrators set the allocation ratios for the resources they
> do not care about exceeding capacity to a very high number.
> 
> If someone previously removed a filter, that doesn't mean that the
> resources were not consumed on a host. It merely means the admin was
> willing to accept a high amount of oversubscription. That's what the
> allocation_ratio is for.
> 
> The flavor should continue to have a consumed disk/vcpu/ram amount,
> because the VM *does actually consume those resources*. If the operator
> doesn't care about oversubscribing one or more of those resources, they
> should set the allocation ratios of those inventories to a high value.
> 
> No more adding configuration options for this kind of thing (or in this
> case, looking at an old configuration option and parsing it to see if a
> certain filter is listed in the list of enabled filters).
> 
> We have a proper system of modeling these data-driven decisions now, so
> my opinion is we should use it and ask operators to use the placement
> REST API for what it was intended.

I agree with the above. I think it's extremely counter-intuitive to set
a bunch of over-subscription values only to have them ignored because a
scheduler filter isn't configured.

If we ignore some of the resources on schedule, the compute nodes will
start reporting values that will make the resources appear to be
negative to anything looking at the data. Before a somewhat-recent
change of mine, the oversubscribed computes would have *failed* to
report negative resources at all, which was a problem for a reconfigure
event. I think the scheduler purposefully forcing computes into the red
is a mistake.

Further, new users that don't know our sins of the past will wonder why
the nice system they see in front of them isn't doing the right thing.
Existing users can reconfigure allocation ratio values before they
upgrade. We can also add something to our upgrade status tool to warn them.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] [operators] Optional resource asking or not?

2017-01-23 Thread Sylvain Bauza


Le 23/01/2017 15:18, Sylvain Bauza a écrit :
> 
> 
> Le 23/01/2017 15:11, Jay Pipes a écrit :
>> On 01/22/2017 04:40 PM, Sylvain Bauza wrote:
>>> Hey folks,
>>>
>>> tl;dr: should we GET /resource_providers for only the related resources
>>> that correspond to enabled filters ?
>>
>> No. Have administrators set the allocation ratios for the resources they
>> do not care about exceeding capacity to a very high number.
>>
>> If someone previously removed a filter, that doesn't mean that the
>> resources were not consumed on a host. It merely means the admin was
>> willing to accept a high amount of oversubscription. That's what the
>> allocation_ratio is for.
>>
>> The flavor should continue to have a consumed disk/vcpu/ram amount,
>> because the VM *does actually consume those resources*. If the operator
>> doesn't care about oversubscribing one or more of those resources, they
>> should set the allocation ratios of those inventories to a high value.
>>
>> No more adding configuration options for this kind of thing (or in this
>> case, looking at an old configuration option and parsing it to see if a
>> certain filter is listed in the list of enabled filters).
>>
>> We have a proper system of modeling these data-driven decisions now, so
>> my opinion is we should use it and ask operators to use the placement
>> REST API for what it was intended.
>>
> 
> I know your point, but please consider mine.
> What if an operator disabled CoreFilter in Newton and wants to upgrade
> to Ocata ?
> All of that implementation being very close to the deadline makes me
> nervous and I really want the seamless path for operators now using the
> placement service.
> 
> Also, like I said in my bigger explanation, we should need to modify a
> shit ton of assertions in our tests that can say "meh, don't use all the
> filters, but just these ones". Pretty risky so close to a FF.
> 

Oh, just discovered a related point : in Devstack, we don't set the
CoreFilter by default !
https://github.com/openstack-dev/devstack/blob/adcf0c50cd87c68abef7c3bb4785a07d3545be5d/lib/nova#L94

TBC, that means that the gate is not verifying the VCPUs by the filter,
just by the compute claims. Heh.

Honestly I think we really need to optionally the filters for Ocata then.

-Sylvain

> -Sylvain
> 
> 
>> Best,
>> -jay
>>
>>> Explanation below why even if I
>>> know we have a current consensus, maybe we should discuss again about it.
>>>
>>>
>>> I'm still trying to implement https://review.openstack.org/#/c/417961/
>>> but when trying to get the functional job being +1, I discovered that we
>>> have at least one functional test [1] asking for just the RAMFilter (and
>>> not for VCPUs or disks).
>>>
>>> Given the current PS is asking for *all* both CPU, RAM and disk, it's
>>> trampling the current test by getting a NoValidHost.
>>>
>>> Okay, I could just modify the test and make sure we have enough
>>> resources for the flavors but I actually now wonder if that's all good
>>> for our operators.
>>>
>>> I know we have a consensus saying that we should still ask for both CPU,
>>> RAM and disk at the same time, but I imagine our users coming back to us
>>> saying "eh, look, I'm no longer able to create instances even if I'm not
>>> using the CoreFilter" for example. It could be a bad day for them and
>>> honestly, I'm not sure just adding documentation or release notes would
>>> help them.
>>>
>>> What are you thinking if we say that for only this cycle, we still try
>>> to only ask for resources that are related to the enabled filters ?
>>> For example, say someone is disabling CoreFilter in the conf opt, then
>>> the scheduler shouldn't ask for VCPUs to the Placement API.
>>>
>>> FWIW, we have another consensus about not removing
>>> CoreFilter/RAMFilter/MemoryFilter because the CachingScheduler is still
>>> using them (and not calling the Placement API).
>>>
>>> Thanks,
>>> -Sylvain
>>>
>>> [1]
>>> https://github.com/openstack/nova/blob/de0eff47f2cfa271735bb754637f979659a2d91a/nova/tests/functional/test_server_group.py#L48
>>>
>>>
>>> __
>>>
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


Re: [openstack-dev] [nova] [placement] [operators] Optional resource asking or not?

2017-01-23 Thread Sylvain Bauza


Le 23/01/2017 15:11, Jay Pipes a écrit :
> On 01/22/2017 04:40 PM, Sylvain Bauza wrote:
>> Hey folks,
>>
>> tl;dr: should we GET /resource_providers for only the related resources
>> that correspond to enabled filters ?
> 
> No. Have administrators set the allocation ratios for the resources they
> do not care about exceeding capacity to a very high number.
> 
> If someone previously removed a filter, that doesn't mean that the
> resources were not consumed on a host. It merely means the admin was
> willing to accept a high amount of oversubscription. That's what the
> allocation_ratio is for.
> 
> The flavor should continue to have a consumed disk/vcpu/ram amount,
> because the VM *does actually consume those resources*. If the operator
> doesn't care about oversubscribing one or more of those resources, they
> should set the allocation ratios of those inventories to a high value.
> 
> No more adding configuration options for this kind of thing (or in this
> case, looking at an old configuration option and parsing it to see if a
> certain filter is listed in the list of enabled filters).
> 
> We have a proper system of modeling these data-driven decisions now, so
> my opinion is we should use it and ask operators to use the placement
> REST API for what it was intended.
> 

I know your point, but please consider mine.
What if an operator disabled CoreFilter in Newton and wants to upgrade
to Ocata ?
All of that implementation being very close to the deadline makes me
nervous and I really want the seamless path for operators now using the
placement service.

Also, like I said in my bigger explanation, we should need to modify a
shit ton of assertions in our tests that can say "meh, don't use all the
filters, but just these ones". Pretty risky so close to a FF.

-Sylvain


> Best,
> -jay
> 
>> Explanation below why even if I
>> know we have a current consensus, maybe we should discuss again about it.
>>
>>
>> I'm still trying to implement https://review.openstack.org/#/c/417961/
>> but when trying to get the functional job being +1, I discovered that we
>> have at least one functional test [1] asking for just the RAMFilter (and
>> not for VCPUs or disks).
>>
>> Given the current PS is asking for *all* both CPU, RAM and disk, it's
>> trampling the current test by getting a NoValidHost.
>>
>> Okay, I could just modify the test and make sure we have enough
>> resources for the flavors but I actually now wonder if that's all good
>> for our operators.
>>
>> I know we have a consensus saying that we should still ask for both CPU,
>> RAM and disk at the same time, but I imagine our users coming back to us
>> saying "eh, look, I'm no longer able to create instances even if I'm not
>> using the CoreFilter" for example. It could be a bad day for them and
>> honestly, I'm not sure just adding documentation or release notes would
>> help them.
>>
>> What are you thinking if we say that for only this cycle, we still try
>> to only ask for resources that are related to the enabled filters ?
>> For example, say someone is disabling CoreFilter in the conf opt, then
>> the scheduler shouldn't ask for VCPUs to the Placement API.
>>
>> FWIW, we have another consensus about not removing
>> CoreFilter/RAMFilter/MemoryFilter because the CachingScheduler is still
>> using them (and not calling the Placement API).
>>
>> Thanks,
>> -Sylvain
>>
>> [1]
>> https://github.com/openstack/nova/blob/de0eff47f2cfa271735bb754637f979659a2d91a/nova/tests/functional/test_server_group.py#L48
>>
>>
>> __
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] [operators] Optional resource asking or not?

2017-01-23 Thread Sylvain Bauza


Le 22/01/2017 22:40, Sylvain Bauza a écrit :
> Hey folks,
> 
> tl;dr: should we GET /resource_providers for only the related resources
> that correspond to enabled filters ? Explanation below why even if I
> know we have a current consensus, maybe we should discuss again about it.
> 
> 
> I'm still trying to implement https://review.openstack.org/#/c/417961/
> but when trying to get the functional job being +1, I discovered that we
> have at least one functional test [1] asking for just the RAMFilter (and
> not for VCPUs or disks).
> 
> Given the current PS is asking for *all* both CPU, RAM and disk, it's
> trampling the current test by getting a NoValidHost.
> 
> Okay, I could just modify the test and make sure we have enough
> resources for the flavors but I actually now wonder if that's all good
> for our operators.
> 
> I know we have a consensus saying that we should still ask for both CPU,
> RAM and disk at the same time, but I imagine our users coming back to us
> saying "eh, look, I'm no longer able to create instances even if I'm not
> using the CoreFilter" for example. It could be a bad day for them and
> honestly, I'm not sure just adding documentation or release notes would
> help them.
> 
> What are you thinking if we say that for only this cycle, we still try
> to only ask for resources that are related to the enabled filters ?
> For example, say someone is disabling CoreFilter in the conf opt, then
> the scheduler shouldn't ask for VCPUs to the Placement API.
> 
> FWIW, we have another consensus about not removing
> CoreFilter/RAMFilter/MemoryFilter because the CachingScheduler is still
> using them (and not calling the Placement API).
> 

A quick follow-up :
I first thought on some operators already disabling the DiskFilter
because they don't trust its calculations for shared disk.
We also have people that don't run the CoreFilter because they prefer
having only the compute claims doing the math and they don't care of
allocation ratios at all.


All those people would be trampled if we now begin to count resources
based on things they explicitely disabled.
That's why I updated my patch series and I wrote a quick verification on
which filter is running :

https://review.openstack.org/#/c/417961/16/nova/scheduler/host_manager.py@640

Ideally, I would refine that so that we would modify the BaseFilter
structure by having a method that would return the resource amount
needed by the RequestSpec and that would also disable the filter so it
would return always true (no need to doublecheck the filter if the
placement service already told this compute is sane). That way, we could
slowly but surely keep the existing interface for optionally verify
resources (ie. people would still use filters) but we would have the new
logic made by the Placement engine.

Given the very short window, that can be done in Pike, but at least
operators wouldn't be impacted in the upgrade path.

-Sylvain

> Thanks,
> -Sylvain
> 
> [1]
> https://github.com/openstack/nova/blob/de0eff47f2cfa271735bb754637f979659a2d91a/nova/tests/functional/test_server_group.py#L48
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] [operators] Optional resource asking or not?

2017-01-23 Thread Jay Pipes

On 01/22/2017 04:40 PM, Sylvain Bauza wrote:

Hey folks,

tl;dr: should we GET /resource_providers for only the related resources
that correspond to enabled filters ?


No. Have administrators set the allocation ratios for the resources they 
do not care about exceeding capacity to a very high number.


If someone previously removed a filter, that doesn't mean that the 
resources were not consumed on a host. It merely means the admin was 
willing to accept a high amount of oversubscription. That's what the 
allocation_ratio is for.


The flavor should continue to have a consumed disk/vcpu/ram amount, 
because the VM *does actually consume those resources*. If the operator 
doesn't care about oversubscribing one or more of those resources, they 
should set the allocation ratios of those inventories to a high value.


No more adding configuration options for this kind of thing (or in this 
case, looking at an old configuration option and parsing it to see if a 
certain filter is listed in the list of enabled filters).


We have a proper system of modeling these data-driven decisions now, so 
my opinion is we should use it and ask operators to use the placement 
REST API for what it was intended.


Best,
-jay

> Explanation below why even if I

know we have a current consensus, maybe we should discuss again about it.


I'm still trying to implement https://review.openstack.org/#/c/417961/
but when trying to get the functional job being +1, I discovered that we
have at least one functional test [1] asking for just the RAMFilter (and
not for VCPUs or disks).

Given the current PS is asking for *all* both CPU, RAM and disk, it's
trampling the current test by getting a NoValidHost.

Okay, I could just modify the test and make sure we have enough
resources for the flavors but I actually now wonder if that's all good
for our operators.

I know we have a consensus saying that we should still ask for both CPU,
RAM and disk at the same time, but I imagine our users coming back to us
saying "eh, look, I'm no longer able to create instances even if I'm not
using the CoreFilter" for example. It could be a bad day for them and
honestly, I'm not sure just adding documentation or release notes would
help them.

What are you thinking if we say that for only this cycle, we still try
to only ask for resources that are related to the enabled filters ?
For example, say someone is disabling CoreFilter in the conf opt, then
the scheduler shouldn't ask for VCPUs to the Placement API.

FWIW, we have another consensus about not removing
CoreFilter/RAMFilter/MemoryFilter because the CachingScheduler is still
using them (and not calling the Placement API).

Thanks,
-Sylvain

[1]
https://github.com/openstack/nova/blob/de0eff47f2cfa271735bb754637f979659a2d91a/nova/tests/functional/test_server_group.py#L48

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] [placement] [operators] Optional resource asking or not?

2017-01-22 Thread Sylvain Bauza
Hey folks,

tl;dr: should we GET /resource_providers for only the related resources
that correspond to enabled filters ? Explanation below why even if I
know we have a current consensus, maybe we should discuss again about it.


I'm still trying to implement https://review.openstack.org/#/c/417961/
but when trying to get the functional job being +1, I discovered that we
have at least one functional test [1] asking for just the RAMFilter (and
not for VCPUs or disks).

Given the current PS is asking for *all* both CPU, RAM and disk, it's
trampling the current test by getting a NoValidHost.

Okay, I could just modify the test and make sure we have enough
resources for the flavors but I actually now wonder if that's all good
for our operators.

I know we have a consensus saying that we should still ask for both CPU,
RAM and disk at the same time, but I imagine our users coming back to us
saying "eh, look, I'm no longer able to create instances even if I'm not
using the CoreFilter" for example. It could be a bad day for them and
honestly, I'm not sure just adding documentation or release notes would
help them.

What are you thinking if we say that for only this cycle, we still try
to only ask for resources that are related to the enabled filters ?
For example, say someone is disabling CoreFilter in the conf opt, then
the scheduler shouldn't ask for VCPUs to the Placement API.

FWIW, we have another consensus about not removing
CoreFilter/RAMFilter/MemoryFilter because the CachingScheduler is still
using them (and not calling the Placement API).

Thanks,
-Sylvain

[1]
https://github.com/openstack/nova/blob/de0eff47f2cfa271735bb754637f979659a2d91a/nova/tests/functional/test_server_group.py#L48

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev