Re: [Openstack-operators] [nova] Would an api option to create an instance without powering on be useful?

2018-11-30 Thread Mohammed Naser
On Fri, Nov 30, 2018 at 7:07 AM Matthew Booth  wrote:

> I have a request to do $SUBJECT in relation to a V2V workflow. The use
> case here is conversion of a VM/Physical which was previously powered
> off. We want to move its data, but we don't want to be powering on
> stuff which wasn't previously on.
>
> This would involve an api change, and a hopefully very small change in
> drivers to support it. Technically I don't see it as an issue.
>
> However, is it a change we'd be willing to accept? Is there any good
> reason not to do this? Are there any less esoteric workflows which
> might use this feature?
>

If you upload an image of said VM which you don't boot, you'd really be
accomplishing the same thing, no?

Unless you want to be in a state where you want the VM to be there but
sitting in SHUTOFF state


> Matt
> --
> Matthew Booth
> Red Hat OpenStack Engineer, Compute DFG
>
> Phone: +442070094448 (UK)
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



-- 
Mohammed Naser — vexxhost
-
D. 514-316-8872
D. 800-910-1726 ext. 200
E. mna...@vexxhost.com
W. http://vexxhost.com
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] Nova hypervisor uuid

2018-11-27 Thread Matt Riedemann

On 11/27/2018 11:32 AM, Ignazio Cassano wrote:

Hi  All,
Please anyone know where hypervisor uuid is retrived?
Sometime updating kmv nodes with yum update it changes and in nova 
database 2 uuids are assigned to the same node.

regards
Ignazio




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



To be clear, do you mean the computes_nodes.uuid column value in the 
cell database? Which is also used for the GET /os-hypervisors response 
'id' value if using microversion >= 2.53. If so, that is generated 
randomly* when the compute_nodes table record is created:


https://github.com/openstack/nova/blob/8545ba2af7476e0884b5e7fb90965bef92d605bc/nova/compute/resource_tracker.py#L588

https://github.com/openstack/nova/blob/8545ba2af7476e0884b5e7fb90965bef92d605bc/nova/objects/compute_node.py#L312

When you hit this problem, are you sure the hostname on the compute host 
is not changing? Because when nova-compute starts up, it should look for 
the existing compute node record by host name and node name, which for 
the libvirt driver should be the same. That lookup code is here:


https://github.com/openstack/nova/blob/8545ba2af7476e0884b5e7fb90965bef92d605bc/nova/compute/resource_tracker.py#L815

So the only way nova-compute should create a new compute_nodes table 
record for the same host is if the host/node name changes during the 
upgrade. Is the deleted value in the database the same (0) for both of 
those records?


* The exception to this is for the ironic driver which re-uses the 
ironic node uuid as of this change: https://review.openstack.org/#/c/571535/


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Nova hypervisor uuid

2018-11-27 Thread Ignazio Cassano
Hi  All,
Please anyone know where hypervisor uuid is retrived?
Sometime updating kmv nodes with yum update it changes and in nova database
2 uuids are assigned to the same node.
regards
Ignazio
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova][placement] Placement requests and caching in the resource tracker

2018-11-02 Thread Eric Fried
All-

Based on a (long) discussion yesterday [1] I have put up a patch [2]
whereby you can set [compute]resource_provider_association_refresh to
zero and the resource tracker will never* refresh the report client's
provider cache. Philosophically, we're removing the "healing" aspect of
the resource tracker's periodic and trusting that placement won't
diverge from whatever's in our cache. (If it does, it's because the op
hit the CLI, in which case they should SIGHUP - see below.)

*except:
- When we initially create the compute node record and bootstrap its
resource provider.
- When the virt driver's update_provider_tree makes a change,
update_from_provider_tree reflects them in the cache as well as pushing
them back to placement.
- If update_from_provider_tree fails, the cache is cleared and gets
rebuilt on the next periodic.
- If you send SIGHUP to the compute process, the cache is cleared.

This should dramatically reduce the number of calls to placement from
the compute service. Like, to nearly zero, unless something is actually
changing.

Can I get some initial feedback as to whether this is worth polishing up
into something real? (It will probably need a bp/spec if so.)

[1]
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-11-01.log.html#t2018-11-01T17:32:03
[2] https://review.openstack.org/#/c/614886/

==
Background
==
In the Queens release, our friends at CERN noticed a serious spike in
the number of requests to placement from compute nodes, even in a
stable-state cloud. Given that we were in the process of adding a ton of
infrastructure to support sharing and nested providers, this was not
unexpected. Roughly, what was previously:

 @periodic_task:
 GET /resource_providers/$compute_uuid
 GET /resource_providers/$compute_uuid/inventories

became more like:

 @periodic_task:
 # In Queens/Rocky, this would still just return the compute RP
 GET /resource_providers?in_tree=$compute_uuid
 # In Queens/Rocky, this would return nothing
 GET /resource_providers?member_of=...=MISC_SHARES...
 for each provider returned above:  # i.e. just one in Q/R
 GET /resource_providers/$compute_uuid/inventories
 GET /resource_providers/$compute_uuid/traits
 GET /resource_providers/$compute_uuid/aggregates

In a cloud the size of CERN's, the load wasn't acceptable. But at the
time, CERN worked around the problem by disabling refreshing entirely.
(The fact that this seems to have worked for them is an encouraging sign
for the proposed code change.)

We're not actually making use of most of that information, but it sets
the stage for things that we're working on in Stein and beyond, like
multiple VGPU types, bandwidth resource providers, accelerators, NUMA,
etc., so removing/reducing the amount of information we look at isn't
really an option strategically.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] Is anyone running their own script to purge old instance_faults table entries?

2018-11-01 Thread Matt Riedemann
I came across this bug [1] in triage today and I thought this was fixed 
already [2] but either something regressed or there is more to do here.


I'm mostly just wondering, are operators already running any kind of 
script which purges old instance_faults table records before an instance 
is deleted and archived/purged? Because if so, that might be something 
we want to add as a nova-manage command.


[1] https://bugs.launchpad.net/nova/+bug/1800755
[2] https://review.openstack.org/#/c/409943/

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Removing the CachingScheduler

2018-10-24 Thread Matt Riedemann

On 10/18/2018 5:07 PM, Matt Riedemann wrote:

It's been deprecated since Pike, and the time has come to remove it [1].

mgagne has been the most vocal CachingScheduler operator I know and he 
has tested out the "nova-manage placement heal_allocations" CLI, added 
in Rocky, and said it will work for migrating his deployment from the 
CachingScheduler to the FilterScheduler + Placement.


If you are using the CachingScheduler and have a problem with its 
removal, now is the time to speak up or forever hold your peace.


[1] https://review.openstack.org/#/c/611723/1


This is your last chance to speak up if you are using the 
CachingScheduler and object to it being removed from nova in Stein. I 
have removed the -W pin from the review since a series of feature work 
is now stacked on top of it.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] Removing the CachingScheduler

2018-10-18 Thread Matt Riedemann

It's been deprecated since Pike, and the time has come to remove it [1].

mgagne has been the most vocal CachingScheduler operator I know and he 
has tested out the "nova-manage placement heal_allocations" CLI, added 
in Rocky, and said it will work for migrating his deployment from the 
CachingScheduler to the FilterScheduler + Placement.


If you are using the CachingScheduler and have a problem with its 
removal, now is the time to speak up or forever hold your peace.


[1] https://review.openstack.org/#/c/611723/1

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] nova-xvpvncproxy CLI options

2018-10-01 Thread Stephen Finucane
tl;dr: Is anyone calling 'nova-novncproxy' or 'nova-serialproxy' with
CLI arguments instead of a configuration file?

I've been doing some untangling of the console proxy services that nova
provides and trying to clean up the documentation for same [1]. As part
of these fixes, I noted a couple of inconsistencies in how we manage
the CLI options for these services.

Firstly, the 'nova-novncproxy' and 'nova-serialproxy' services accept
CLI configuration options while the 'nova-xvpvncproxy' service does
not.

   $ nova-novncproxy --help
   usage: nova-novncproxy [-h] [--vnc-auth_schemes VNC_AUTH_SCHEMES]   
  [--vnc-novncproxy_host VNC_NOVNCPROXY_HOST] 
  [--vnc-novncproxy_port VNC_NOVNCPROXY_PORT]   
 
  [--vnc-vencrypt_ca_certs VNC_VENCRYPT_CA_CERTS]
  [--vnc-vencrypt_client_cert VNC_VENCRYPT_CLIENT_CERT]
  [--vnc-vencrypt_client_key VNC_VENCRYPT_CLIENT_KEY]   
 
  [--cert CERT] [--config-dir DIR] [--config-file PATH]
  ...
  [--remote_debug-port REMOTE_DEBUG_PORT]

   $ nova-xvpvncproxy --help
   usage: nova-xvpvncproxy [-h] [--remote_debug-host REMOTE_DEBUG_HOST] 
   [--remote_debug-port REMOTE_DEBUG_PORT]
   ...
   [--version] [--watch-log-file]

This means that you could, conceivably, run 'nova-novncproxy' without a
configuration file but the same would not be possible with the 'nova-
xvpvncproxy' service. Secondly, the 'nova-novncproxy' CLI options are
added to a 'vnc' group, meaning they appear with an unnecessary 'vnc-'
prefix (e.g. '--vnc-novncproxy_host'), and the 'nova-serialproxy' CLI
options are prefixed with 'serial-' for the same reason. Finally, none
of these options are documented anywhere.

My initial plan [2] to resolve all of the above had been to add the CLI
options to the 'nova-xvpvncproxy' service and then go figure out how to
get oslo.config autodocumenting these for us in our man pages. However,
a quick search through GitHub, codesearch.o.o and Google turned up no
hits so I wonder if anyone is configuring these things by CLI? If not,
maybe we should just go and remove this code and insist on
configuration via config file?

Cheers,
Stephen

[1] https://review.openstack.org/606148
[2] https://review.openstack.org/606929


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova][publiccloud-wg] Proposal to shelve on stop/suspend

2018-09-14 Thread Matt Riedemann
tl;dr: I'm proposing a new parameter to the server stop (and suspend?) 
APIs to control if nova shelve offloads the server.


Long form: This came up during the public cloud WG session this week 
based on a couple of feature requests [1][2]. When a user stops/suspends 
a server, the hypervisor frees up resources on the host but nova 
continues to track those resources as being used on the host so the 
scheduler can't put more servers there. What operators would like to do 
is that when a user stops a server, nova actually shelve offloads the 
server from the host so they can schedule new servers on that host. On 
start/resume of the server, nova would find a new host for the server. 
This also came up in Vancouver where operators would like to free up 
limited expensive resources like GPUs when the server is stopped. This 
is also the behavior in AWS.


The problem with shelve is that it's great for operators but users just 
don't use it, maybe because they don't know what it is and stop works 
just fine. So how do you get users to opt into shelving their server?


I've proposed a high-level blueprint [3] where we'd add a new 
(microversioned) parameter to the stop API with three options:


* auto
* offload
* retain

Naming is obviously up for debate. The point is we would default to auto 
and if auto is used, the API checks a config option to determine the 
behavior - offload or retain. By default we would retain for backward 
compatibility. For users that don't care, they get auto and it's fine. 
For users that do care, they either (1) don't opt into the microversion 
or (2) specify the specific behavior they want. I don't think we need to 
expose what the cloud's configuration for auto is because again, if you 
don't care then it doesn't matter and if you do care, you can opt out of 
this.


"How do we get users to use the new microversion?" I'm glad you asked.

Well, nova CLI defaults to using the latest available microversion 
negotiated between the client and the server, so by default, anyone 
using "nova stop" would get the 'auto' behavior (assuming the client and 
server are new enough to support it). Long-term, openstack client plans 
on doing the same version negotiation.


As for the server status changes, if the server is stopped and shelved, 
the status would be 'SHELVED_OFFLOADED' rather than 'SHUTDOWN'. I 
believe this is fine especially if a user is not being specific and 
doesn't care about the actual backend behavior. On start, the API would 
allow starting (unshelving) shelved offloaded (rather than just stopped) 
instances. Trying to hide shelved servers as stopped in the API would be 
overly complex IMO so I don't want to try and mask that.


It is possible that a user that stopped and shelved their server could 
hit a NoValidHost when starting (unshelving) the server, but that really 
shouldn't happen in a cloud that's configuring nova to shelve by default 
because if they are doing this, their SLA needs to reflect they have the 
capacity to unshelve the server. If you can't honor that SLA, don't 
shelve by default.


So, what are the general feelings on this before I go off and start 
writing up a spec?


[1] https://bugs.launchpad.net/openstack-publiccloud-wg/+bug/1791681
[2] https://bugs.launchpad.net/openstack-publiccloud-wg/+bug/1791679
[3] https://blueprints.launchpad.net/nova/+spec/shelve-on-stop

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][placement][upgrade][qa] Some upgrade-specific news on extraction

2018-09-06 Thread Doug Hellmann
Excerpts from Matt Riedemann's message of 2018-09-06 15:58:41 -0500:
> I wanted to recap some upgrade-specific stuff from today outside of the 
> other [1] technical extraction thread.
> 
> Chris has a change up for review [2] which prompted the discussion.
> 
> That change makes placement only work with placement.conf, not 
> nova.conf, but does get a passing tempest run in the devstack patch [3].
> 
> The main issue here is upgrades. If you think of this like deprecating 
> config options, the old config options continue to work for a release 
> and then are dropped after a full release (or 3 months across boundaries 
> for CDers) [4]. Given that, Chris's patch would break the standard 
> deprecation policy. Clearly one simple way outside of code to make that 
> work is just copy and rename nova.conf to placement.conf and voila. But 
> that depends on *all* deployment/config tooling to get that right out of 
> the gate.
> 
> The other obvious thing is the database. The placement repo code as-is 
> today still has the check for whether or not it should use the placement 
> database but falls back to using the nova_api database [5]. So 
> technically you could point the extracted placement at the same nova_api 
> database and it should work. However, at some point deployers will 
> clearly need to copy the placement-related tables out of the nova_api DB 
> to a new placement DB and make sure the 'migrate_version' table is 
> dropped so that placement DB schema versions can reset to 1.
> 
> With respect to grenade and making this work in our own upgrade CI 
> testing, we have I think two options (which might not be mutually 
> exclusive):
> 
> 1. Make placement support using nova.conf if placement.conf isn't found 
> for Stein with lots of big warnings that it's going away in T. Then 
> Rocky nova.conf with the nova_api database configuration just continues 
> to work for placement in Stein. I don't think we then have any grenade 
> changes to make, at least in Stein for upgrading *from* Rocky. Assuming 
> fresh devstack installs in Stein use placement.conf and a 
> placement-specific database, then upgrades from Stein to T should also 
> be OK with respect to grenade, but likely punts the cut-over issue for 
> all other deployment projects (because we don't CI with grenade doing 
> Rocky->Stein->T, or FFU in other words).

Making placement read from both files should be pretty straightforward,
right? It's possible to pass default_config_files and default_config_dirs
to oslo.config, and the functions that build the original defaults
are part of the public API (find_config_files and find_config_dirs
in oslo_config.cfg) so the placement service can call them twice
(with different "project" arguments) and merge the results before
initializing the ConfigOpts instance.

Doug

> 
> 2. If placement doesn't support nova.conf in Stein, then grenade will 
> require an (exceptional) [6] from-rocky upgrade script which will (a) 
> write out placement.conf fresh and (b) run a DB migration script, likely 
> housed in the placement repo, to create the placement database and copy 
> the placement-specific tables out of the nova_api database. Any script 
> like this is likely needed regardless of what we do in grenade because 
> deployers will need to eventually do this once placement would drop 
> support for using nova.conf (if we went with option 1).
> 
> That's my attempt at a summary. It's going to be very important that 
> operators and deployment project contributors weigh in here if they have 
> strong preferences either way, and note that we can likely do both 
> options above - grenade could do the fresh cutover from rocky to stein 
> but we allow running with nova.conf and nova_api DB in placement in 
> stein with plans to drop that support in T.
> 
> [1] 
> http://lists.openstack.org/pipermail/openstack-dev/2018-September/subject.html#134184
> [2] https://review.openstack.org/#/c/600157/
> [3] https://review.openstack.org/#/c/600162/
> [4] 
> https://governance.openstack.org/tc/reference/tags/assert_follows-standard-deprecation.html#requirements
> [5] 
> https://github.com/openstack/placement/blob/fb7c1909/placement/db_api.py#L27
> [6] https://docs.openstack.org/grenade/latest/readme.html#theory-of-upgrade
> 

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova][placement][upgrade][qa] Some upgrade-specific news on extraction

2018-09-06 Thread Matt Riedemann
I wanted to recap some upgrade-specific stuff from today outside of the 
other [1] technical extraction thread.


Chris has a change up for review [2] which prompted the discussion.

That change makes placement only work with placement.conf, not 
nova.conf, but does get a passing tempest run in the devstack patch [3].


The main issue here is upgrades. If you think of this like deprecating 
config options, the old config options continue to work for a release 
and then are dropped after a full release (or 3 months across boundaries 
for CDers) [4]. Given that, Chris's patch would break the standard 
deprecation policy. Clearly one simple way outside of code to make that 
work is just copy and rename nova.conf to placement.conf and voila. But 
that depends on *all* deployment/config tooling to get that right out of 
the gate.


The other obvious thing is the database. The placement repo code as-is 
today still has the check for whether or not it should use the placement 
database but falls back to using the nova_api database [5]. So 
technically you could point the extracted placement at the same nova_api 
database and it should work. However, at some point deployers will 
clearly need to copy the placement-related tables out of the nova_api DB 
to a new placement DB and make sure the 'migrate_version' table is 
dropped so that placement DB schema versions can reset to 1.


With respect to grenade and making this work in our own upgrade CI 
testing, we have I think two options (which might not be mutually 
exclusive):


1. Make placement support using nova.conf if placement.conf isn't found 
for Stein with lots of big warnings that it's going away in T. Then 
Rocky nova.conf with the nova_api database configuration just continues 
to work for placement in Stein. I don't think we then have any grenade 
changes to make, at least in Stein for upgrading *from* Rocky. Assuming 
fresh devstack installs in Stein use placement.conf and a 
placement-specific database, then upgrades from Stein to T should also 
be OK with respect to grenade, but likely punts the cut-over issue for 
all other deployment projects (because we don't CI with grenade doing 
Rocky->Stein->T, or FFU in other words).


2. If placement doesn't support nova.conf in Stein, then grenade will 
require an (exceptional) [6] from-rocky upgrade script which will (a) 
write out placement.conf fresh and (b) run a DB migration script, likely 
housed in the placement repo, to create the placement database and copy 
the placement-specific tables out of the nova_api database. Any script 
like this is likely needed regardless of what we do in grenade because 
deployers will need to eventually do this once placement would drop 
support for using nova.conf (if we went with option 1).


That's my attempt at a summary. It's going to be very important that 
operators and deployment project contributors weigh in here if they have 
strong preferences either way, and note that we can likely do both 
options above - grenade could do the fresh cutover from rocky to stein 
but we allow running with nova.conf and nova_api DB in placement in 
stein with plans to drop that support in T.


[1] 
http://lists.openstack.org/pipermail/openstack-dev/2018-September/subject.html#134184

[2] https://review.openstack.org/#/c/600157/
[3] https://review.openstack.org/#/c/600162/
[4] 
https://governance.openstack.org/tc/reference/tags/assert_follows-standard-deprecation.html#requirements
[5] 
https://github.com/openstack/placement/blob/fb7c1909/placement/db_api.py#L27

[6] https://docs.openstack.org/grenade/latest/readme.html#theory-of-upgrade

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration

2018-08-29 Thread Matt Riedemann

On 8/29/2018 3:21 PM, Tim Bell wrote:

Sounds like a good topic for PTG/Forum?


Yeah it's already on the PTG agenda [1][2]. I started the thread because 
I wanted to get the ball rolling as early as possible, and with people 
that won't attend the PTG and/or the Forum, to weigh in on not only the 
known issues with cross-cell migration but also the things I'm not 
thinking about.


[1] https://etherpad.openstack.org/p/nova-ptg-stein
[2] https://etherpad.openstack.org/p/nova-ptg-stein-cells

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration

2018-08-29 Thread Tim Bell

Given the partial retirement scenario (i.e. only racks A-C retired due to 
cooling contrainsts, racks D-F still active with same old hardware but still 
useful for years), adding new hardware to old cells would not be non-optimal. 
I'm ignoring the long list of other things to worry such as preserving IP 
addresses etc.

Sounds like a good topic for PTG/Forum?

Tim

-Original Message-
From: Jay Pipes 
Date: Wednesday, 29 August 2018 at 22:12
To: Dan Smith , Tim Bell 
Cc: "openstack-operators@lists.openstack.org" 

Subject: Re: [Openstack-operators] [nova][cinder][neutron] Cross-cell cold 
migration

On 08/29/2018 04:04 PM, Dan Smith wrote:
>> - The VMs to be migrated are not generally not expensive
>> configurations, just hardware lifecycles where boxes go out of
>> warranty or computer centre rack/cooling needs re-organising. For
>> CERN, this is a 6-12 month frequency of ~10,000 VMs per year (with a
>> ~30% pet share)
>> - We make a cell from identical hardware at a single location, this
>> greatly simplifies working out hardware issues, provisioning and
>> management
>> - Some cases can be handled with the 'please delete and
>> re-create'. Many other cases need much user support/downtime (and
>> require significant effort or risk delaying retirements to get
>> agreement)
> 
> Yep, this is the "organizational use case" of cells I refer to. I assume
> that if one aisle (cell) is being replaced, it makes sense to stand up
> the new one as its own cell, migrate the pets from one to the other and
> then decommission the old one. Being only an aisle away, it's reasonable
> to think that *this* situation might not suffer from the complexity of
> needing to worry about heavyweight migrate network and storage.

For this use case, why not just add the new hardware directly into the 
existing cell and migrate the workloads onto the new hardware, then 
disable the old hardware and retire it?

I mean, there might be a short period of time where the cell's DB and MQ 
would be congested due to lots of migration operations, but it seems a 
lot simpler to me than trying to do cross-cell migrations when cells 
have been designed pretty much from the beginning of cellsv2 to not talk 
to each other or allow any upcalls.

Thoughts?
-jay


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration

2018-08-29 Thread Jay Pipes

On 08/29/2018 04:04 PM, Dan Smith wrote:

- The VMs to be migrated are not generally not expensive
configurations, just hardware lifecycles where boxes go out of
warranty or computer centre rack/cooling needs re-organising. For
CERN, this is a 6-12 month frequency of ~10,000 VMs per year (with a
~30% pet share)
- We make a cell from identical hardware at a single location, this
greatly simplifies working out hardware issues, provisioning and
management
- Some cases can be handled with the 'please delete and
re-create'. Many other cases need much user support/downtime (and
require significant effort or risk delaying retirements to get
agreement)


Yep, this is the "organizational use case" of cells I refer to. I assume
that if one aisle (cell) is being replaced, it makes sense to stand up
the new one as its own cell, migrate the pets from one to the other and
then decommission the old one. Being only an aisle away, it's reasonable
to think that *this* situation might not suffer from the complexity of
needing to worry about heavyweight migrate network and storage.


For this use case, why not just add the new hardware directly into the 
existing cell and migrate the workloads onto the new hardware, then 
disable the old hardware and retire it?


I mean, there might be a short period of time where the cell's DB and MQ 
would be congested due to lots of migration operations, but it seems a 
lot simpler to me than trying to do cross-cell migrations when cells 
have been designed pretty much from the beginning of cellsv2 to not talk 
to each other or allow any upcalls.


Thoughts?
-jay

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration

2018-08-29 Thread Dan Smith
> - The VMs to be migrated are not generally not expensive
> configurations, just hardware lifecycles where boxes go out of
> warranty or computer centre rack/cooling needs re-organising. For
> CERN, this is a 6-12 month frequency of ~10,000 VMs per year (with a
> ~30% pet share)
> - We make a cell from identical hardware at a single location, this
> greatly simplifies working out hardware issues, provisioning and
> management
> - Some cases can be handled with the 'please delete and
> re-create'. Many other cases need much user support/downtime (and
> require significant effort or risk delaying retirements to get
> agreement)

Yep, this is the "organizational use case" of cells I refer to. I assume
that if one aisle (cell) is being replaced, it makes sense to stand up
the new one as its own cell, migrate the pets from one to the other and
then decommission the old one. Being only an aisle away, it's reasonable
to think that *this* situation might not suffer from the complexity of
needing to worry about heavyweight migrate network and storage.

> From my understanding, these models were feasible with Cells V1.

I don't think cellsv1 supported any notion of moving things between
cells at all, unless you had some sort of external hack for doing
it. Being able to migrate between cells at all was always one of the
things we touted as a "future feature" for cellsv2.

Unless of course you mean migration in terms of
snapshot-to-glance-and-redeploy?

--Dan

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration

2018-08-29 Thread Tim Bell
I've not followed all the arguments here regarding internals but CERN's 
background usage of Cells v2 (and thoughts on impact of cross cell migration) 
is below. Some background at 
https://www.openstack.org/videos/vancouver-2018/moving-from-cellsv1-to-cellsv2-at-cern.
 Some rough parameters with the team providing more concrete numbers if 
needed

- The VMs to be migrated are not generally not expensive configurations, just 
hardware lifecycles where boxes go out of warranty or computer centre 
rack/cooling needs re-organising. For CERN, this is a 6-12 month frequency of 
~10,000 VMs per year (with a ~30% pet share)
- We make a cell from identical hardware at a single location, this greatly 
simplifies working out hardware issues, provisioning and management
- Some cases can be handled with the 'please delete and re-create'. Many other 
cases need much user support/downtime (and require significant effort or risk 
delaying retirements to get agreement)
- When a new hardware delivery is made, we would hope to define a new cell (as 
it is a different configuration)
- Depending on the facilities retirement plans, we would work out what needed 
to be moved to new resources
- There are many different scenarios for migration (either live or cold)
-- All instances in the old cell would be migrated to the new hardware which 
would have sufficient capacity
-- All instances in a single cell would be migrated to several different cells 
such as the new cells being smaller
-- Some instances would be migrated because those racks need to be retired but 
other servers in the cell would remain for a further year or two until 
retirement was mandatory

With many cells and multiple locations, spreading the hypervisors across the 
cells in anticipation of potential migrations is unattractive.

From my understanding, these models were feasible with Cells V1.

We can discuss further, at the PTG or Summit, on the operational flexibility 
which we have taken advantage of so far and alternative models.

Tim

-Original Message-
From: Dan Smith 
Date: Wednesday, 29 August 2018 at 18:47
To: Jay Pipes 
Cc: "openstack-operators@lists.openstack.org" 

Subject: Re: [Openstack-operators] [nova][cinder][neutron] Cross-cell cold  
migration

> A release upgrade dance involves coordination of multiple moving
> parts. It's about as similar to this scenario as I can imagine. And
> there's a reason release upgrades are not done entirely within Nova;
> clearly an external upgrade tool or script is needed to orchestrate
> the many steps and components involved in the upgrade process.

I'm lost here, and assume we must be confusing terminology or something.

> The similar dance for cross-cell migration is the coordination that
> needs to happen between Nova, Neutron and Cinder. It's called
> orchestration for a reason and is not what Nova is good at (as we've
> repeatedly seen)

Most other operations in Nova meet this criteria. Boot requires
coordination between Nova, Cinder, and Neutron. As do migrate, start,
stop, evacuate. We might decide that (for now) the volume migration
thing is beyond the line we're willing to cross, and that's cool, but I
think it's an arbitrary limitation we shouldn't assume is
impossible. Moving instances around *is* what nova is (supposed to be)
good at.

> The thing that makes *this* particular scenario problematic is that
> cells aren't user-visible things. User-visible things could much more
> easily be orchestrated via external actors, as I still firmly believe
> this kind of thing should be done.

I'm having a hard time reconciling these:

1. Cells aren't user-visible, and shouldn't be (your words and mine).
2. Cross-cell migration should be done by an external service (your
   words).
3. External services work best when things are user-visible (your words).

You say the user-invisible-ness makes orchestrating this externally
difficult and I agree, but...is your argument here just that it
shouldn't be done at all?

>> As we discussed in YVR most recently, it also may become an important
>> thing for operators and users where expensive accelerators are committed
>> to instances with part-time usage patterns.
>
> I don't think that's a valid use case in respect to this scenario of
> cross-cell migration.

You're right, it has nothing to do with cross-cell migration at all. I
was pointing to *other* legitimate use cases for shelve.

> Also, I'd love to hear from anyone in the real world who has
> successfully migrated (live or otherwise) an instance that "owns"
> expensive hardware (accelerators, SR-IOV PFs, GPUs or otherwise).

Again, the accelerator case has nothing to do with migrating across
  

Re: [Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration

2018-08-29 Thread Jay Pipes

On 08/29/2018 02:26 PM, Chris Friesen wrote:

On 08/29/2018 10:02 AM, Jay Pipes wrote:


Also, I'd love to hear from anyone in the real world who has successfully
migrated (live or otherwise) an instance that "owns" expensive hardware
(accelerators, SR-IOV PFs, GPUs or otherwise).


I thought cold migration of instances with such devices was supported 
upstream?


That's not what I asked. :)

-jay

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration

2018-08-29 Thread Chris Friesen

On 08/29/2018 10:02 AM, Jay Pipes wrote:


Also, I'd love to hear from anyone in the real world who has successfully
migrated (live or otherwise) an instance that "owns" expensive hardware
(accelerators, SR-IOV PFs, GPUs or otherwise).


I thought cold migration of instances with such devices was supported upstream?

Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration

2018-08-29 Thread Jay Pipes

On 08/29/2018 12:39 PM, Dan Smith wrote:

If we're going to discuss removing move operations from Nova, we should
do that in another thread. This one is about making existing operations
work :)


OK, understood. :)


The admin only "owns" the instance because we have no ability to
transfer ownership of the instance and a cell isn't a user-visible
thing. An external script that accomplishes this kind of orchestrated
move from one cell to another could easily update the ownership of
said instance in the DB.


So step 5 was "do surgery on the database"? :)


Yep. You'd be surprised how often that ends up being the case.

I'm currently sitting here looking at various integration tooling for 
doing just this kind of thing for our deployments of >150K baremetal 
compute nodes. The number of specific-to-an-environment variables that 
need to be considered and worked into the overall migration plan is 
breathtaking. And trying to do all of that inside of Nova just isn't 
feasible for the scale at which we run.


At least, that's my point of view. I won't drag this conversation out 
any further on tangents.


Best,
-jay

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration

2018-08-29 Thread Dan Smith
> A release upgrade dance involves coordination of multiple moving
> parts. It's about as similar to this scenario as I can imagine. And
> there's a reason release upgrades are not done entirely within Nova;
> clearly an external upgrade tool or script is needed to orchestrate
> the many steps and components involved in the upgrade process.

I'm lost here, and assume we must be confusing terminology or something.

> The similar dance for cross-cell migration is the coordination that
> needs to happen between Nova, Neutron and Cinder. It's called
> orchestration for a reason and is not what Nova is good at (as we've
> repeatedly seen)

Most other operations in Nova meet this criteria. Boot requires
coordination between Nova, Cinder, and Neutron. As do migrate, start,
stop, evacuate. We might decide that (for now) the volume migration
thing is beyond the line we're willing to cross, and that's cool, but I
think it's an arbitrary limitation we shouldn't assume is
impossible. Moving instances around *is* what nova is (supposed to be)
good at.

> The thing that makes *this* particular scenario problematic is that
> cells aren't user-visible things. User-visible things could much more
> easily be orchestrated via external actors, as I still firmly believe
> this kind of thing should be done.

I'm having a hard time reconciling these:

1. Cells aren't user-visible, and shouldn't be (your words and mine).
2. Cross-cell migration should be done by an external service (your
   words).
3. External services work best when things are user-visible (your words).

You say the user-invisible-ness makes orchestrating this externally
difficult and I agree, but...is your argument here just that it
shouldn't be done at all?

>> As we discussed in YVR most recently, it also may become an important
>> thing for operators and users where expensive accelerators are committed
>> to instances with part-time usage patterns.
>
> I don't think that's a valid use case in respect to this scenario of
> cross-cell migration.

You're right, it has nothing to do with cross-cell migration at all. I
was pointing to *other* legitimate use cases for shelve.

> Also, I'd love to hear from anyone in the real world who has
> successfully migrated (live or otherwise) an instance that "owns"
> expensive hardware (accelerators, SR-IOV PFs, GPUs or otherwise).

Again, the accelerator case has nothing to do with migrating across
cells, but merely demonstrates another example of where shelve may be
the thing operators actually desire. Maybe I shouldn't have confused the
discussion by bringing it up.

> The patterns that I have seen are one of the following:
>
> * Applications don't move. They are pets that stay on one or more VMs
> or baremetal nodes and they grow roots.
>
> * Applications are designed to *utilize* the expensive hardware. They
> don't "own" the hardware itself.
>
> In this latter case, the application is properly designed and stores
> its persistent data in a volume and doesn't keep state outside of the
> application volume. In these cases, the process of "migrating" an
> instance simply goes away. You just detach the application persistent
> volume, shut down the instance, start up a new one elsewhere (allowing
> the scheduler to select one that meets the resource constraints in the
> flavor/image), attach the volume again and off you go. No messing
> around with shelving, offloading, migrating, or any of that nonsense
> in Nova.

Jay, you know I sympathize with the fully-ephemeral application case,
right? Can we agree that pets are a thing and that migrations are not
going to be leaving Nova's scope any time soon? If so, I think we can
get back to the real discussion, and if not, I think we probably, er,
can't :)

> We should not pretend that what we're discussing here is anything
> other than hacking orchestration workarounds into Nova to handle
> poorly-designed applications that have grown roots on some hardware
> and think they "own" hardware resources in a Nova deployment.

I have no idea how we got to "own hardware resources" here. The point of
this discussion is to make our instance-moving operations work across
cells. We designed cellsv2 to be invisible and baked into the core of
Nova. We intended for it to not fall into the trap laid by cellsv1,
where the presence of multiple cells meant that a bunch of regular
operations don't work like they would otherwise.

If we're going to discuss removing move operations from Nova, we should
do that in another thread. This one is about making existing operations
work :)

> If that's the case, why are we discussing shelve at all? Just stop the
> instance, copy/migrate the volume data (if needed, again it completely
> depends on the deployment, network topology and block storage
> backend), to a new location (new cell, new AZ, new host agg, does it
> really matter?) and start a new instance, attaching the volume after
> the instance starts or supplying the volume in the boot/create
> 

Re: [Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration

2018-08-29 Thread Jay Pipes
I respect your opinion but respectfully disagree that this is something 
we need to spend our time on. Comments inline.


On 08/29/2018 10:47 AM, Dan Smith wrote:

* Cells can shard across flavors (and hardware type) so operators
would like to move users off the old flavors/hardware (old cell) to
new flavors in a new cell.


So cell migrations are kind of the new release upgrade dance. Got it.


No, cell migrations are about moving instances between cells for
whatever reason. If you have small cells for organization, then it's
about not building arbitrary barriers between aisles. If you use it for
hardware refresh, then it might be related to long term lifecycle. I'm
not sure what any of this has to do with release upgrades or dancing.


A release upgrade dance involves coordination of multiple moving parts. 
It's about as similar to this scenario as I can imagine. And there's a 
reason release upgrades are not done entirely within Nova; clearly an 
external upgrade tool or script is needed to orchestrate the many steps 
and components involved in the upgrade process.


The similar dance for cross-cell migration is the coordination that 
needs to happen between Nova, Neutron and Cinder. It's called 
orchestration for a reason and is not what Nova is good at (as we've 
repeatedly seen)


 The thing that makes *this* particular scenario problematic is that 
cells aren't user-visible things. User-visible things could much more 
easily be orchestrated via external actors, as I still firmly believe 
this kind of thing should be done.



shelve was and continues to be a hack in order for users to keep an
IPv4 address while not consuming compute resources for some amount of
time. [1]


As we discussed in YVR most recently, it also may become an important
thing for operators and users where expensive accelerators are committed
to instances with part-time usage patterns.


I don't think that's a valid use case in respect to this scenario of 
cross-cell migration. If the target cell compute doesn't have the same 
expensive accelerators on them, nobody would want or permit a move to 
that target cell anyway.


Also, I'd love to hear from anyone in the real world who has 
successfully migrated (live or otherwise) an instance that "owns" 
expensive hardware (accelerators, SR-IOV PFs, GPUs or otherwise).


The patterns that I have seen are one of the following:

* Applications don't move. They are pets that stay on one or more VMs or 
baremetal nodes and they grow roots.


* Applications are designed to *utilize* the expensive hardware. They 
don't "own" the hardware itself.


In this latter case, the application is properly designed and stores its 
persistent data in a volume and doesn't keep state outside of the 
application volume. In these cases, the process of "migrating" an 
instance simply goes away. You just detach the application persistent 
volume, shut down the instance, start up a new one elsewhere (allowing 
the scheduler to select one that meets the resource constraints in the 
flavor/image), attach the volume again and off you go. No messing around 
with shelving, offloading, migrating, or any of that nonsense in Nova.


We should not pretend that what we're discussing here is anything other 
than hacking orchestration workarounds into Nova to handle 
poorly-designed applications that have grown roots on some hardware and 
think they "own" hardware resources in a Nova deployment.



It has also come up more than once in the realm of "but I need to
detach my root volume" scenarios. I love to hate on shelve as well,
but recently a few more legit (than merely keeping an IPv4 address)
use-cases have come out for it, and I don't think Matt is wrong that
cross-cell migration *might* be easier as a shelve operation under
the covers.


Matt may indeed be right, but I'm certainly allowed to express my 
opinion that I think shelve is a monstrosity that should be avoided at 
all costs and building additional orchestration functionality into Nova 
on top of an already-shaky foundation (shelve) isn't something I think 
is a long-term maintainable solution.



If cross-cell cold migration is similarly just about the user being
able to keep their instance's IPv4 address while allowing an admin to
move an instance's storage to another physical location, then my firm
belief is that this kind of activity needs to be coordinated
*externally to Nova*.


I'm not sure how you could make that jump, but no, I don't think that's
the case. In any sort of large cloud that uses cells to solve problems
of scale, I think it's quite likely to expect that your IPv4 address
physically can't be honored in the target cell, and/or requires some
less-than-ideal temporary tunneling for bridging the gap.


If that's the case, why are we discussing shelve at all? Just stop the 
instance, copy/migrate the volume data (if needed, again it completely 
depends on the deployment, network topology and block storage backend), 
to a new location (new 

Re: [Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration

2018-08-29 Thread Dan Smith
>> * Cells can shard across flavors (and hardware type) so operators
>> would like to move users off the old flavors/hardware (old cell) to
>> new flavors in a new cell.
>
> So cell migrations are kind of the new release upgrade dance. Got it.

No, cell migrations are about moving instances between cells for
whatever reason. If you have small cells for organization, then it's
about not building arbitrary barriers between aisles. If you use it for
hardware refresh, then it might be related to long term lifecycle. I'm
not sure what any of this has to do with release upgrades or dancing.

> shelve was and continues to be a hack in order for users to keep an
> IPv4 address while not consuming compute resources for some amount of
> time. [1]

As we discussed in YVR most recently, it also may become an important
thing for operators and users where expensive accelerators are committed
to instances with part-time usage patterns. It has also come up more
than once in the realm of "but I need to detach my root volume"
scenarios. I love to hate on shelve as well, but recently a few more
legit (than merely keeping an IPv4 address) use-cases have come out for
it, and I don't think Matt is wrong that cross-cell migration *might* be
easier as a shelve operation under the covers.

> If cross-cell cold migration is similarly just about the user being
> able to keep their instance's IPv4 address while allowing an admin to
> move an instance's storage to another physical location, then my firm
> belief is that this kind of activity needs to be coordinated
> *externally to Nova*.

I'm not sure how you could make that jump, but no, I don't think that's
the case. In any sort of large cloud that uses cells to solve problems
of scale, I think it's quite likely to expect that your IPv4 address
physically can't be honored in the target cell, and/or requires some
less-than-ideal temporary tunneling for bridging the gap.

> Since we're not talking about live migration (thank all that is holy),

Oh it's coming. Don't think it's not.

> I believe the safest and most effective way to perform such a
> cross-cell "migration" would be the following basic steps:
>
> 0. ensure that each compute node is associated with at least one nova
> host aggregate that is *only* in a single cell
> 1. shut down the instance (optionally snapshotting required local disk
> changes if the user is unfortunately using their root disk for
> application data)
> 2. "save" the instance's IP address by manually creating a port in
> Neutron and assigning the IP address manually to that port. this of
> course will be deployment-dependent since you will need to hope the
> saved IP address for the migrating instance is in a subnet range that
> is available in the target cell
> 3. migrate the volume manually. this will be entirely deployment and
> backend-dependent as smcginnis alluded to in a response to this thread
> 4. have the admin boot the instance in a host aggregate that is known
> to be in the target cell, passing --network
> port_id=$SAVED_PORT_WITH_IP and --volume $MIGRATED_VOLUME_UUID
> arguments as needed. the admin would need to do this because users
> don't know about host aggregates and, frankly, the user shouldn't know
> about host aggregates, cells, or any of this.

What you just described here is largely shelve, ignoring the volume
migration part and the fact that such a manual process means the user
loses the instance's uuid and various other elements about it (such as
create time, action/event history, etc). Oh, and ignoring the fact that
the user no longer owns their instance (the admin does) :)

Especially given that migrating across a cell may mean "one aisle over,
same storage provider and network" to a lot of people, the above being a
completely manual process seems a little crazy to me.

--Dan

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration

2018-08-29 Thread Jay Pipes
Sorry for delayed response. Was on PTO when this came out. Comments 
inline...


On 08/22/2018 09:23 PM, Matt Riedemann wrote:

Hi everyone,

I have started an etherpad for cells topics at the Stein PTG [1]. The 
main issue in there right now is dealing with cross-cell cold migration 
in nova.


At a high level, I am going off these requirements:

* Cells can shard across flavors (and hardware type) so operators would 
like to move users off the old flavors/hardware (old cell) to new 
flavors in a new cell.


So cell migrations are kind of the new release upgrade dance. Got it.

* There is network isolation between compute hosts in different cells, 
so no ssh'ing the disk around like we do today. But the image service is 
global to all cells.


Based on this, for the initial support for cross-cell cold migration, I 
am proposing that we leverage something like shelve offload/unshelve 
masquerading as resize. We shelve offload from the source cell and 
unshelve in the target cell. This should work for both volume-backed and 
non-volume-backed servers (we use snapshots for shelved offloaded 
non-volume-backed servers).


shelve was and continues to be a hack in order for users to keep an IPv4 
address while not consuming compute resources for some amount of time. [1]


If cross-cell cold migration is similarly just about the user being able 
to keep their instance's IPv4 address while allowing an admin to move an 
instance's storage to another physical location, then my firm belief is 
that this kind of activity needs to be coordinated *externally to Nova*.


Each deployment is going to be different, and in all cases of cross-cell 
migration, the admins doing these move operations are going to need to 
understand various network, storage and failure domains that are 
particular to that deployment (and not something we have the ability to 
discover in any automated fashion).


Since we're not talking about live migration (thank all that is holy), I 
believe the safest and most effective way to perform such a cross-cell 
"migration" would be the following basic steps:


0. ensure that each compute node is associated with at least one nova 
host aggregate that is *only* in a single cell
1. shut down the instance (optionally snapshotting required local disk 
changes if the user is unfortunately using their root disk for 
application data)
2. "save" the instance's IP address by manually creating a port in 
Neutron and assigning the IP address manually to that port. this of 
course will be deployment-dependent since you will need to hope the 
saved IP address for the migrating instance is in a subnet range that is 
available in the target cell
3. migrate the volume manually. this will be entirely deployment and 
backend-dependent as smcginnis alluded to in a response to this thread
4. have the admin boot the instance in a host aggregate that is known to 
be in the target cell, passing --network port_id=$SAVED_PORT_WITH_IP and 
--volume $MIGRATED_VOLUME_UUID arguments as needed. the admin would need 
to do this because users don't know about host aggregates and, frankly, 
the user shouldn't know about host aggregates, cells, or any of this.


Best,
-jay

[1] ok, shelve also lets a user keep their instance ID. I don't care 
much about that.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] Deprecating Core/Disk/RamFilter

2018-08-24 Thread Matt Riedemann
This is just an FYI that I have proposed that we deprecate the 
core/ram/disk filters [1]. We should have probably done this back in 
Pike when we removed them from the default enabled_filters list and also 
deprecated the CachingScheduler, which is the only in-tree scheduler 
driver that benefits from enabling these filters. With the 
heal_allocations CLI, added in Rocky, we can probably drop the 
CachingScheduler in Stein so the pieces are falling into place. As we 
saw in a recent bug [2], having these enabled in Stein now causes 
blatantly incorrect filtering on ironic nodes.


Comments are welcome here, the review, or in IRC.

[1] https://review.openstack.org/#/c/596502/
[2] https://bugs.launchpad.net/tripleo/+bug/1787910

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration

2018-08-22 Thread Sam Morrison
I think in our case we’d only migrate between cells if we know the network and 
storage is accessible and would never do it if not. 
Thinking moving from old to new hardware at a cell level.

If storage and network isn’t available ideally it would fail at the api request.

There is also ceph backed instances and so this is also something to take into 
account which nova would be responsible for.

I’ll be in Denver so we can discuss more there too.

Cheers,
Sam





> On 23 Aug 2018, at 11:23 am, Matt Riedemann  wrote:
> 
> Hi everyone,
> 
> I have started an etherpad for cells topics at the Stein PTG [1]. The main 
> issue in there right now is dealing with cross-cell cold migration in nova.
> 
> At a high level, I am going off these requirements:
> 
> * Cells can shard across flavors (and hardware type) so operators would like 
> to move users off the old flavors/hardware (old cell) to new flavors in a new 
> cell.
> 
> * There is network isolation between compute hosts in different cells, so no 
> ssh'ing the disk around like we do today. But the image service is global to 
> all cells.
> 
> Based on this, for the initial support for cross-cell cold migration, I am 
> proposing that we leverage something like shelve offload/unshelve 
> masquerading as resize. We shelve offload from the source cell and unshelve 
> in the target cell. This should work for both volume-backed and 
> non-volume-backed servers (we use snapshots for shelved offloaded 
> non-volume-backed servers).
> 
> There are, of course, some complications. The main ones that I need help with 
> right now are what happens with volumes and ports attached to the server. 
> Today we detach from the source and attach at the target, but that's assuming 
> the storage backend and network are available to both hosts involved in the 
> move of the server. Will that be the case across cells? I am assuming that 
> depends on the network topology (are routed networks being used?) and storage 
> backend (routed storage?). If the network and/or storage backend are not 
> available across cells, how do we migrate volumes and ports? Cinder has a 
> volume migrate API for admins but I do not know how nova would know the 
> proper affinity per-cell to migrate the volume to the proper host (cinder 
> does not have a routed storage concept like routed provider networks in 
> neutron, correct?). And as far as I know, there is no such thing as port 
> migration in Neutron.
> 
> Could Placement help with the volume/port migration stuff? Neutron routed 
> provider networks rely on placement aggregates to schedule the VM to a 
> compute host in the same network segment as the port used to create the VM, 
> however, if that segment does not span cells we are kind of stuck, correct?
> 
> To summarize the issues as I see them (today):
> 
> * How to deal with the targeted cell during scheduling? This is so we can 
> even get out of the source cell in nova.
> 
> * How does the API deal with the same instance being in two DBs at the same 
> time during the move?
> 
> * How to handle revert resize?
> 
> * How are volumes and ports handled?
> 
> I can get feedback from my company's operators based on what their deployment 
> will look like for this, but that does not mean it will work for others, so I 
> need as much feedback from operators, especially those running with multiple 
> cells today, as possible. Thanks in advance.
> 
> [1] https://etherpad.openstack.org/p/nova-ptg-stein-cells
> 
> -- 
> 
> Thanks,
> 
> Matt


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration

2018-08-22 Thread Matt Riedemann

Hi everyone,

I have started an etherpad for cells topics at the Stein PTG [1]. The 
main issue in there right now is dealing with cross-cell cold migration 
in nova.


At a high level, I am going off these requirements:

* Cells can shard across flavors (and hardware type) so operators would 
like to move users off the old flavors/hardware (old cell) to new 
flavors in a new cell.


* There is network isolation between compute hosts in different cells, 
so no ssh'ing the disk around like we do today. But the image service is 
global to all cells.


Based on this, for the initial support for cross-cell cold migration, I 
am proposing that we leverage something like shelve offload/unshelve 
masquerading as resize. We shelve offload from the source cell and 
unshelve in the target cell. This should work for both volume-backed and 
non-volume-backed servers (we use snapshots for shelved offloaded 
non-volume-backed servers).


There are, of course, some complications. The main ones that I need help 
with right now are what happens with volumes and ports attached to the 
server. Today we detach from the source and attach at the target, but 
that's assuming the storage backend and network are available to both 
hosts involved in the move of the server. Will that be the case across 
cells? I am assuming that depends on the network topology (are routed 
networks being used?) and storage backend (routed storage?). If the 
network and/or storage backend are not available across cells, how do we 
migrate volumes and ports? Cinder has a volume migrate API for admins 
but I do not know how nova would know the proper affinity per-cell to 
migrate the volume to the proper host (cinder does not have a routed 
storage concept like routed provider networks in neutron, correct?). And 
as far as I know, there is no such thing as port migration in Neutron.


Could Placement help with the volume/port migration stuff? Neutron 
routed provider networks rely on placement aggregates to schedule the VM 
to a compute host in the same network segment as the port used to create 
the VM, however, if that segment does not span cells we are kind of 
stuck, correct?


To summarize the issues as I see them (today):

* How to deal with the targeted cell during scheduling? This is so we 
can even get out of the source cell in nova.


* How does the API deal with the same instance being in two DBs at the 
same time during the move?


* How to handle revert resize?

* How are volumes and ports handled?

I can get feedback from my company's operators based on what their 
deployment will look like for this, but that does not mean it will work 
for others, so I need as much feedback from operators, especially those 
running with multiple cells today, as possible. Thanks in advance.


[1] https://etherpad.openstack.org/p/nova-ptg-stein-cells

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][cinder] Disabling nova volume-update (aka swap volume; aka cinder live migration)

2018-08-21 Thread Lee Yarwood
On 20-08-18 16:29:52, Matthew Booth wrote:
> For those who aren't familiar with it, nova's volume-update (also
> called swap volume by nova devs) is the nova part of the
> implementation of cinder's live migration (also called retype).
> Volume-update is essentially an internal cinder<->nova api, but as
> that's not a thing it's also unfortunately exposed to users. Some
> users have found it and are using it, but because it's essentially an
> internal cinder<->nova api it breaks pretty easily if you don't treat
> it like a special snowflake. It looks like we've finally found a way
> it's broken for non-cinder callers that we can't fix, even with a
> dirty hack.
> 
> volume-updateessentially does a live copy of the
> data on  volume to  volume, then seamlessly swaps the
> attachment to  from  to . The guest OS on 
> will not notice anything at all as the hypervisor swaps the storage
> backing an attached volume underneath it.
> 
> When called by cinder, as intended, cinder does some post-operation
> cleanup such that  is deleted and  inherits the same
> volume_id; that is  effectively becomes . When called any
> other way, however, this cleanup doesn't happen, which breaks a bunch
> of assumptions. One of these is that a disk's serial number is the
> same as the attached volume_id. Disk serial number, in KVM at least,
> is immutable, so can't be updated during volume-update. This is fine
> if we were called via cinder, because the cinder cleanup means the
> volume_id stays the same. If called any other way, however, they no
> longer match, at least until a hard reboot when it will be reset to
> the new volume_id. It turns out this breaks live migration, but
> probably other things too. We can't think of a workaround.
> 
> I wondered why users would want to do this anyway. It turns out that
> sometimes cinder won't let you migrate a volume, but nova
> volume-update doesn't do those checks (as they're specific to cinder
> internals, none of nova's business, and duplicating them would be
> fragile, so we're not adding them!). Specifically we know that cinder
> won't let you migrate a volume with snapshots. There may be other
> reasons. If cinder won't let you migrate your volume, you can still
> move your data by using nova's volume-update, even though you'll end
> up with a new volume on the destination, and a slightly broken
> instance. Apparently the former is a trade-off worth making, but the
> latter has been reported as a bug.
> 
> I'd like to make it very clear that nova's volume-update, isn't
> expected to work correctly except when called by cinder. Specifically
> there was a proposal that we disable volume-update from non-cinder
> callers in some way, possibly by asserting volume state that can only
> be set by cinder. However, I'm also very aware that users are calling
> volume-update because it fills a need, and we don't want to trap data
> that wasn't previously trapped.
> 
> Firstly, is anybody aware of any other reasons to use nova's
> volume-update directly?
> 
> Secondly, is there any reason why we shouldn't just document then you
> have to delete snapshots before doing a volume migration? Hopefully
> some cinder folks or operators can chime in to let me know how to back
> them up or somehow make them independent before doing this, at which
> point the volume itself should be migratable?
> 
> If we can establish that there's an acceptable alternative to calling
> volume-update directly for all use-cases we're aware of, I'm going to
> propose heading off this class of bug by disabling it for non-cinder
> callers.

I'm definitely in favor of hiding this from users eventually but
wouldn't this require some form of deprecation cycle?

Warnings within the API documentation would also be useful and even
something we could backport to stable to highlight just how fragile this
API is ahead of any policy change.

Cheers,

-- 
Lee Yarwood A5D1 9385 88CB 7E5F BE64  6618 BCA6 6E33 F672 2D76


signature.asc
Description: PGP signature
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova][cinder] Disabling nova volume-update (aka swap volume; aka cinder live migration)

2018-08-20 Thread Matthew Booth
For those who aren't familiar with it, nova's volume-update (also
called swap volume by nova devs) is the nova part of the
implementation of cinder's live migration (also called retype).
Volume-update is essentially an internal cinder<->nova api, but as
that's not a thing it's also unfortunately exposed to users. Some
users have found it and are using it, but because it's essentially an
internal cinder<->nova api it breaks pretty easily if you don't treat
it like a special snowflake. It looks like we've finally found a way
it's broken for non-cinder callers that we can't fix, even with a
dirty hack.

volume-updateessentially does a live copy of the
data on  volume to  volume, then seamlessly swaps the
attachment to  from  to . The guest OS on 
will not notice anything at all as the hypervisor swaps the storage
backing an attached volume underneath it.

When called by cinder, as intended, cinder does some post-operation
cleanup such that  is deleted and  inherits the same
volume_id; that is  effectively becomes . When called any
other way, however, this cleanup doesn't happen, which breaks a bunch
of assumptions. One of these is that a disk's serial number is the
same as the attached volume_id. Disk serial number, in KVM at least,
is immutable, so can't be updated during volume-update. This is fine
if we were called via cinder, because the cinder cleanup means the
volume_id stays the same. If called any other way, however, they no
longer match, at least until a hard reboot when it will be reset to
the new volume_id. It turns out this breaks live migration, but
probably other things too. We can't think of a workaround.

I wondered why users would want to do this anyway. It turns out that
sometimes cinder won't let you migrate a volume, but nova
volume-update doesn't do those checks (as they're specific to cinder
internals, none of nova's business, and duplicating them would be
fragile, so we're not adding them!). Specifically we know that cinder
won't let you migrate a volume with snapshots. There may be other
reasons. If cinder won't let you migrate your volume, you can still
move your data by using nova's volume-update, even though you'll end
up with a new volume on the destination, and a slightly broken
instance. Apparently the former is a trade-off worth making, but the
latter has been reported as a bug.

I'd like to make it very clear that nova's volume-update, isn't
expected to work correctly except when called by cinder. Specifically
there was a proposal that we disable volume-update from non-cinder
callers in some way, possibly by asserting volume state that can only
be set by cinder. However, I'm also very aware that users are calling
volume-update because it fills a need, and we don't want to trap data
that wasn't previously trapped.

Firstly, is anybody aware of any other reasons to use nova's
volume-update directly?

Secondly, is there any reason why we shouldn't just document then you
have to delete snapshots before doing a volume migration? Hopefully
some cinder folks or operators can chime in to let me know how to back
them up or somehow make them independent before doing this, at which
point the volume itself should be migratable?

If we can establish that there's an acceptable alternative to calling
volume-update directly for all use-cases we're aware of, I'm going to
propose heading off this class of bug by disabling it for non-cinder
callers.

Matt
-- 
Matthew Booth
Red Hat OpenStack Engineer, Compute DFG

Phone: +442070094448 (UK)

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][glance] nova-compute choosing incorrect qemu binary when scheduling 'alternate' (ppc64, armv7l) architectures?

2018-08-18 Thread Matt Riedemann

On 8/11/2018 12:50 AM, Chris Apsey wrote:
This sounds promising and there seems to be a feasible way to do this, 
but it also sounds like a decent amount of effort and would be a new 
feature in a future release rather than a bugfix - am I correct in that 
assessment?


Yes I'd say it's a blueprint and not a bug fix - it's not something we'd 
backport to stable branches upstream, for example.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][glance] nova-compute choosing incorrect qemu binary when scheduling 'alternate' (ppc64, armv7l) architectures?

2018-08-10 Thread Chris Apsey
This sounds promising and there seems to be a feasible way to do this, but 
it also sounds like a decent amount of effort and would be a new feature in 
a future release rather than a bugfix - am I correct in that assessment?




On August 9, 2018 13:30:31 "Daniel P. Berrangé"  wrote:


On Thu, Aug 09, 2018 at 12:14:56PM -0500, Matt Riedemann wrote:

On 8/9/2018 6:03 AM, Chris Apsey wrote:

Exactly.  And I agree, it seems like hw_architecture should dictate
which emulator is chosen, but as you mentioned its currently not.  I'm
not sure if this is a bug and it's supposed to 'just work', or just
something that was never fully implemented (intentionally) and would be
more of a feature request/suggestion for a later version.  The docs are
kind of sparse in this area.

What are your thoughts?  I can open a bug if you think the scope is
reasonable.


I'm not sure if this is a bug or a feature, or if there are reasons why it's
never been done. I'm gonna have to rope in Kashyap and danpb since they'd
likely know more.

Dan/Kaskyap: tl;dr why doesn't the nova libvirt driver, configured for qemu,
set the guest.arch based on the hw_architecture image property so that you
can run ppc guests in an x86 host?


Yes, it should do exactly that IMHO !

The main caveat is that a hell of alot of code in libvirt assumes that
guest arch == host arch. ie when building guest XML there's lots of code
that looks at  caps.host.cpu.arch to decide how to configure the guest.
This all needs fixing to look at the guest.arch value instead, having
set that from hw_architecture prop.

Nova libvirt driver is already reporting that it is capable of running
guest with multiple arches (the _get_instance_capaiblities method in
nova/virt/libvirt/driver.py).

The only other thing is that you likely want to distinguish between
hosts that can do PPC64 via KVM vs those that can only do it via
emulation, so you don't get guests randomly placed on slow vs fast
hosts. Some kind of scheduler filter / weighting can do that based
on info already reported from the compute host I expect.


Regards,
Daniel
--
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|





___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][glance] nova-compute choosing incorrect qemu binary when scheduling 'alternate' (ppc64, armv7l) architectures?

2018-08-09 Thread Chris Apsey
Exactly.  And I agree, it seems like hw_architecture should dictate 
which emulator is chosen, but as you mentioned its currently not.  I'm 
not sure if this is a bug and it's supposed to 'just work', or just 
something that was never fully implemented (intentionally) and would be 
more of a feature request/suggestion for a later version.  The docs are 
kind of sparse in this area.


What are your thoughts?  I can open a bug if you think the scope is 
reasonable.


---
v/r

Chris Apsey
bitskr...@bitskrieg.net
https://www.bitskrieg.net

On 2018-08-08 06:40 PM, Matt Riedemann wrote:

On 8/8/2018 2:42 PM, Chris Apsey wrote:
qemu-system-arm, qemu-system-ppc64, etc. in our environment are all 
x86 packages, but they perform system-mode emulation (via dynamic 
instruction translation) for those target environments.  So, you run 
qemu-system-ppc64 on an x86 host in order to get a ppc64-emulated VM. 
Our use case is specifically directed at reverse engineering binaries 
and fuzzing for vulnerabilities inside of those architectures for 
things that aren't built for x86, but there are others.


If you were to apt-get install qemu-system and then hit autocomplete, 
you'd get a list of archiectures that qemu can emulate on x86 hardware 
- that's what we're trying to do incorporate.  We still want to run 
normal qemu-x86 with KVM virtualization extensions, but we ALSO want 
to run the other emulators without the KVM virtualization extensions 
in order to have more choice for target environments.


So to me, openstack would interpret this by checking to see if a 
target host supports the architecture specified in the image (it does 
this correctly), then it would choose the correct qemu-system-xx for 
spawning the instance based on the architecture flag of the image, 
which it currently does not (it always choose qemu-system-x86_64).


Does that make sense?


OK yeah now I'm following you - running ppc guests on an x86 host
(virt_type=qemu rather than kvm right?).

I would have thought the hw_architecture image property was used for
this somehow to configure the arch in the guest xml properly, like
it's used in a few places [1][2][3].

See [4], I'd think we'd set the guest.arch but don't see that
happening. We do set the guest.os_type though [5].

[1]
https://github.com/openstack/nova/blob/c18b1c1bd646d7cefa3d3e4b25ce59460d1a6ebc/nova/virt/libvirt/driver.py#L4649
[2]
https://github.com/openstack/nova/blob/c18b1c1bd646d7cefa3d3e4b25ce59460d1a6ebc/nova/virt/libvirt/driver.py#L4927
[3]
https://github.com/openstack/nova/blob/c18b1c1bd646d7cefa3d3e4b25ce59460d1a6ebc/nova/virt/libvirt/blockinfo.py#L257
[4] https://libvirt.org/formatcaps.html#elementGuest
[5]
https://github.com/openstack/nova/blob/c18b1c1bd646d7cefa3d3e4b25ce59460d1a6ebc/nova/virt/libvirt/driver.py#L5196


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][glance] nova-compute choosing incorrect qemu binary when scheduling 'alternate' (ppc64, armv7l) architectures?

2018-08-08 Thread Matt Riedemann

On 8/8/2018 2:42 PM, Chris Apsey wrote:
qemu-system-arm, qemu-system-ppc64, etc. in our environment are all x86 
packages, but they perform system-mode emulation (via dynamic 
instruction translation) for those target environments.  So, you run 
qemu-system-ppc64 on an x86 host in order to get a ppc64-emulated VM. 
Our use case is specifically directed at reverse engineering binaries 
and fuzzing for vulnerabilities inside of those architectures for things 
that aren't built for x86, but there are others.


If you were to apt-get install qemu-system and then hit autocomplete, 
you'd get a list of archiectures that qemu can emulate on x86 hardware - 
that's what we're trying to do incorporate.  We still want to run normal 
qemu-x86 with KVM virtualization extensions, but we ALSO want to run the 
other emulators without the KVM virtualization extensions in order to 
have more choice for target environments.


So to me, openstack would interpret this by checking to see if a target 
host supports the architecture specified in the image (it does this 
correctly), then it would choose the correct qemu-system-xx for spawning 
the instance based on the architecture flag of the image, which it 
currently does not (it always choose qemu-system-x86_64).


Does that make sense?


OK yeah now I'm following you - running ppc guests on an x86 host 
(virt_type=qemu rather than kvm right?).


I would have thought the hw_architecture image property was used for 
this somehow to configure the arch in the guest xml properly, like it's 
used in a few places [1][2][3].


See [4], I'd think we'd set the guest.arch but don't see that happening. 
We do set the guest.os_type though [5].


[1] 
https://github.com/openstack/nova/blob/c18b1c1bd646d7cefa3d3e4b25ce59460d1a6ebc/nova/virt/libvirt/driver.py#L4649
[2] 
https://github.com/openstack/nova/blob/c18b1c1bd646d7cefa3d3e4b25ce59460d1a6ebc/nova/virt/libvirt/driver.py#L4927
[3] 
https://github.com/openstack/nova/blob/c18b1c1bd646d7cefa3d3e4b25ce59460d1a6ebc/nova/virt/libvirt/blockinfo.py#L257

[4] https://libvirt.org/formatcaps.html#elementGuest
[5] 
https://github.com/openstack/nova/blob/c18b1c1bd646d7cefa3d3e4b25ce59460d1a6ebc/nova/virt/libvirt/driver.py#L5196


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][glance] nova-compute choosing incorrect qemu binary when scheduling 'alternate' (ppc64, armv7l) architectures?

2018-08-08 Thread Chris Apsey

Matt,

qemu-system-arm, qemu-system-ppc64, etc. in our environment are all x86 
packages, but they perform system-mode emulation (via dynamic 
instruction translation) for those target environments.  So, you run 
qemu-system-ppc64 on an x86 host in order to get a ppc64-emulated VM.  
Our use case is specifically directed at reverse engineering binaries 
and fuzzing for vulnerabilities inside of those architectures for things 
that aren't built for x86, but there are others.


If you were to apt-get install qemu-system and then hit autocomplete, 
you'd get a list of archiectures that qemu can emulate on x86 hardware - 
that's what we're trying to do incorporate.  We still want to run normal 
qemu-x86 with KVM virtualization extensions, but we ALSO want to run the 
other emulators without the KVM virtualization extensions in order to 
have more choice for target environments.


So to me, openstack would interpret this by checking to see if a target 
host supports the architecture specified in the image (it does this 
correctly), then it would choose the correct qemu-system-xx for spawning 
the instance based on the architecture flag of the image, which it 
currently does not (it always choose qemu-system-x86_64).


Does that make sense?

Chris



---
v/r

Chris Apsey
bitskr...@bitskrieg.net
https://www.bitskrieg.net

On 2018-08-08 03:07 PM, Matt Riedemann wrote:

On 8/7/2018 8:54 AM, Chris Apsey wrote:
We don't actually have any non-x86 hardware at the moment - we're just 
looking to run certain workloads in qemu full emulation mode sans KVM 
extensions (we know there is a huge performance hit - it's just for a 
few very specific things).  The hosts I'm talking about are normal 
intel-based compute nodes with several different qemu packages 
installed (arm, ppc, mips, x86_64 w/ kvm extensions, etc.).


Is nova designed to work in this kind of scenario?  It seems like many 
pieces are there, but they're just not quite tied together quite 
right, or there is some config option I'm missing.


As far as I know, nova doesn't make anything arch-specific for QEMU.
Nova will execute some qemu commands like qemu-img but as far as the
virt driver, it goes through the libvirt-python API bindings which
wrap over libvirtd which interfaces with QEMU. I would expect that if
you're on an x86_64 arch host, that you can't have non-x86_64 packages
installed on there (or they are noarch packages). Like, I don't know
how your packaging works (are these rpms or debs, or other?) but how
do you have ppc packages installed on an x86 system?


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][glance] nova-compute choosing incorrect qemu binary when scheduling 'alternate' (ppc64, armv7l) architectures?

2018-08-08 Thread Matt Riedemann

On 8/7/2018 8:54 AM, Chris Apsey wrote:
We don't actually have any non-x86 hardware at the moment - we're just 
looking to run certain workloads in qemu full emulation mode sans KVM 
extensions (we know there is a huge performance hit - it's just for a 
few very specific things).  The hosts I'm talking about are normal 
intel-based compute nodes with several different qemu packages installed 
(arm, ppc, mips, x86_64 w/ kvm extensions, etc.).


Is nova designed to work in this kind of scenario?  It seems like many 
pieces are there, but they're just not quite tied together quite right, 
or there is some config option I'm missing.


As far as I know, nova doesn't make anything arch-specific for QEMU. 
Nova will execute some qemu commands like qemu-img but as far as the 
virt driver, it goes through the libvirt-python API bindings which wrap 
over libvirtd which interfaces with QEMU. I would expect that if you're 
on an x86_64 arch host, that you can't have non-x86_64 packages 
installed on there (or they are noarch packages). Like, I don't know how 
your packaging works (are these rpms or debs, or other?) but how do you 
have ppc packages installed on an x86 system?


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][glance] nova-compute choosing incorrect qemu binary when scheduling 'alternate' (ppc64, armv7l) architectures?

2018-08-07 Thread Chris Apsey

Hey Matt,

We don't actually have any non-x86 hardware at the moment - we're just 
looking to run certain workloads in qemu full emulation mode sans KVM 
extensions (we know there is a huge performance hit - it's just for a 
few very specific things).  The hosts I'm talking about are normal 
intel-based compute nodes with several different qemu packages installed 
(arm, ppc, mips, x86_64 w/ kvm extensions, etc.).


Is nova designed to work in this kind of scenario?  It seems like many 
pieces are there, but they're just not quite tied together quite right, 
or there is some config option I'm missing.


Thanks!

---
v/r

Chris Apsey
bitskr...@bitskrieg.net
https://www.bitskrieg.net

On 2018-08-07 09:32 AM, Matt Riedemann wrote:

On 8/5/2018 1:43 PM, Chris Apsey wrote:
Trying to enable some alternate (non-x86) architectures on xenial + 
queens.  I can load up images and set the property correctly according 
to the supported values 
(https://docs.openstack.org/nova/queens/configuration/config.html) in 
image_properties_default_architecture.  From what I can tell, the 
scheduler works correctly and instances are only scheduled on nodes 
that have the correct qemu binary installed.  However, when the 
instance request lands on this node, it always starts it with 
qemu-system-x86_64 rather than qemu-system-arm, qemu-system-ppc, etc.  
If I manually set the correct binary, everything works as expected.


Am I missing something here, or is this a bug in nova-compute?


image_properties_default_architecture is only used in the scheduler
filter to pick a compute host, it doesn't do anything about the qemu
binary used in nova-compute. mnaser added the config option so maybe
he can share what he's done on his computes.

Do you have qemu-system-x86_64 on non-x86 systems? Seems like a
package/deploy issue since I'd expect x86 packages shouldn't install
on a ppc system and vice versa, and only one qemu package should
provide the binary.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][glance] nova-compute choosing incorrect qemu binary when scheduling 'alternate' (ppc64, armv7l) architectures?

2018-08-07 Thread Matt Riedemann

On 8/5/2018 1:43 PM, Chris Apsey wrote:
Trying to enable some alternate (non-x86) architectures on xenial + 
queens.  I can load up images and set the property correctly according 
to the supported values 
(https://docs.openstack.org/nova/queens/configuration/config.html) in 
image_properties_default_architecture.  From what I can tell, the 
scheduler works correctly and instances are only scheduled on nodes that 
have the correct qemu binary installed.  However, when the instance 
request lands on this node, it always starts it with qemu-system-x86_64 
rather than qemu-system-arm, qemu-system-ppc, etc.  If I manually set 
the correct binary, everything works as expected.


Am I missing something here, or is this a bug in nova-compute?


image_properties_default_architecture is only used in the scheduler 
filter to pick a compute host, it doesn't do anything about the qemu 
binary used in nova-compute. mnaser added the config option so maybe he 
can share what he's done on his computes.


Do you have qemu-system-x86_64 on non-x86 systems? Seems like a 
package/deploy issue since I'd expect x86 packages shouldn't install on 
a ppc system and vice versa, and only one qemu package should provide 
the binary.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] StarlingX diff analysis

2018-08-07 Thread Matt Riedemann

On 8/7/2018 1:10 AM, Flint WALRUS wrote:
I didn’t had time to check StarlingX code quality, how did you feel it 
while you were doing your analysis?


I didn't dig into the test diffs themselves, but it was my impression 
that from what I was poking around in the local git repo, there were 
several changes which didn't have any test coverage.


For the really big full stack changes (L3 CAT, CPU scaling and 
shared/pinned CPUs on same host), toward the end I just started glossing 
over a lot of that because it's so much code in so many places, so I 
can't really speak very well to how it was written or how well it is 
tested (maybe WindRiver had a more robust CI system running integration 
tests, I don't know).


There were also some things which would have been caught in code review 
upstream. For example, they ignore the "force" parameter for live 
migration so that live migration requests always go through the 
scheduler. However, the "force" parameter is only on newer 
microversions. Before that, if you specified a host at all it would 
bypass the scheduler, but the change didn't take that into account, so 
they still have gaps in some of the things they were trying to 
essentially disable in the API.


On the whole I think the quality is OK. It's not really possible to 
accurately judge that when looking at a single diff this large.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] StarlingX diff analysis

2018-08-07 Thread Flint WALRUS
Hi matt, everyone,

I just read your analysis and would like to thank you for such work. I
really think there are numerous features included/used on this Nova rework
that would be highly beneficial for Nova and users of it.

I hope people will fairly appreciate you work.

I didn’t had time to check StarlingX code quality, how did you feel it
while you were doing your analysis?

Thanks a lot for this share.
I’ll have a closer look at it this afternoon as my company may be
interested by some features.

Kind regards,
G.
Le mar. 7 août 2018 à 00:03, Matt Riedemann  a écrit :

> In case you haven't heard, there was this StarlingX thing announced at
> the last summit. I have gone through the enormous nova diff in their
> repo and the results are in a spreadsheet [1]. Given the enormous
> spreadsheet (see a pattern?), I have further refined that into a set of
> high-level charts [2].
>
> I suspect there might be some negative reactions to even doing this type
> of analysis lest it might seem like promoting throwing a huge pile of
> code over the wall and expecting the OpenStack (or more specifically the
> nova) community to pick it up. That's not my intention at all, nor do I
> expect nova maintainers to be responsible for upstreaming any of this.
>
> This is all educational to figure out what the major differences and
> overlaps are and what could be constructively upstreamed from the
> starlingx staging repo since it's not all NFV and Edge dragons in here,
> there are some legitimate bug fixes and good ideas. I'm sharing it
> because I want to feel like my time spent on this in the last week
> wasn't all for nothing.
>
> [1]
>
> https://docs.google.com/spreadsheets/d/1ugp1FVWMsu4x3KgrmPf7HGX8Mh1n80v-KVzweSDZunU/edit?usp=sharing
> [2]
>
> https://docs.google.com/presentation/d/1P-__JnxCFUbSVlEoPX26Jz6VaOyNg-jZbBsmmKA2f0c/edit?usp=sharing
>
> --
>
> Thanks,
>
> Matt
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] StarlingX diff analysis

2018-08-06 Thread Matt Riedemann
In case you haven't heard, there was this StarlingX thing announced at 
the last summit. I have gone through the enormous nova diff in their 
repo and the results are in a spreadsheet [1]. Given the enormous 
spreadsheet (see a pattern?), I have further refined that into a set of 
high-level charts [2].


I suspect there might be some negative reactions to even doing this type 
of analysis lest it might seem like promoting throwing a huge pile of 
code over the wall and expecting the OpenStack (or more specifically the 
nova) community to pick it up. That's not my intention at all, nor do I 
expect nova maintainers to be responsible for upstreaming any of this.


This is all educational to figure out what the major differences and 
overlaps are and what could be constructively upstreamed from the 
starlingx staging repo since it's not all NFV and Edge dragons in here, 
there are some legitimate bug fixes and good ideas. I'm sharing it 
because I want to feel like my time spent on this in the last week 
wasn't all for nothing.


[1] 
https://docs.google.com/spreadsheets/d/1ugp1FVWMsu4x3KgrmPf7HGX8Mh1n80v-KVzweSDZunU/edit?usp=sharing
[2] 
https://docs.google.com/presentation/d/1P-__JnxCFUbSVlEoPX26Jz6VaOyNg-jZbBsmmKA2f0c/edit?usp=sharing


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova][glance] nova-compute choosing incorrect qemu binary when scheduling 'alternate' (ppc64, armv7l) architectures?

2018-08-05 Thread Chris Apsey

All,

Trying to enable some alternate (non-x86) architectures on xenial + 
queens.  I can load up images and set the property correctly according 
to the supported values 
(https://docs.openstack.org/nova/queens/configuration/config.html) in 
image_properties_default_architecture.  From what I can tell, the 
scheduler works correctly and instances are only scheduled on nodes that 
have the correct qemu binary installed.  However, when the instance 
request lands on this node, it always starts it with qemu-system-x86_64 
rather than qemu-system-arm, qemu-system-ppc, etc.  If I manually set 
the correct binary, everything works as expected.


Am I missing something here, or is this a bug in nova-compute?

Thanks in advance,

--
v/r

Chris Apsey
bitskr...@bitskrieg.net
https://www.bitskrieg.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Couple of CellsV2 questions

2018-07-24 Thread Jonathan Mills
Thanks, Matt.  Those are all good suggestions, and we will incorporate
your feedback into our plans.

On 07/23/2018 05:57 PM, Matt Riedemann wrote:
> I'll try to help a bit inline. Also cross-posting to openstack-dev and
> tagging with [nova] to highlight it.
> 
> On 7/23/2018 10:43 AM, Jonathan Mills wrote:
>> I am looking at implementing CellsV2 with multiple cells, and there's
>> a few things I'm seeking clarification on:
>>
>> 1) How does a superconductor know that it is a superconductor?  Is its
>> operation different in any fundamental way?  Is there any explicit
>> configuration or a setting in the database required? Or does it simply
>> not care one way or another?
> 
> It's a topology term, not really anything in config or the database that
> distinguishes the "super" conductor. I assume you've gone over the
> service layout in the docs:
> 
> https://docs.openstack.org/nova/latest/user/cellsv2-layout.html#service-layout
> 
> 
> There are also some summit talks from Dan about the topology linked here:
> 
> https://docs.openstack.org/nova/latest/user/cells.html#cells-v2
> 
> The superconductor is the conductor service at the "top" of the tree
> which interacts with the API and scheduler (controller) services and
> routes operations to the cell. Then once in a cell, the operation should
> ideally be confined there. So, for example, reschedules during a build
> would be confined to the cell. The cell conductor doesn't go back "up"
> to the scheduler to get a new set of hosts for scheduling. This of
> course depends on which release you're using and your configuration, see
> the caveats section in the cellsv2-layout doc.
> 
>>
>> 2) When I ran the command "nova-manage cell_v2 create_cell
>> --name=cell1 --verbose", the entry created for cell1 in the api
>> database includes only one rabbitmq server, but I have three of them
>> as an HA cluster.  Does it only support talking to one rabbitmq server
>> in this configuration? Or can I just update the cell1 transport_url in
>> the database to point to all three? Is that a supported configuration?
> 
> First, don't update stuff directly in the database if you don't have to.
> :) What you set on the transport_url should be whatever oslo.messaging
> can handle:
> 
> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.transport_url
> 
> 
> There is at least one reported bug for this but I'm not sure I fully
> grok it or what its status is at this point:
> 
> https://bugs.launchpad.net/nova/+bug/1717915
> 
>>
>> 3) Is there anything wrong with having one cell share the amqp bus
>> with your control plane, while having additional cells use their own
>> amqp buses? Certainly I realize that the point of CellsV2 is to shard
>> the amqp bus for greater horizontal scalability.  But in my case, my
>> first cell is on the smaller side, and happens to be colocated with
>> the control plane hardware (whereas other cells will be in other parts
>> of the datacenter, or in other datacenters with high-speed links).  I
>> was thinking of just pointing that first cell back at the same
>> rabbitmq servers used by the control plane, but perhaps directing them
>> at their own rabbitmq vhost. Is that a terrible idea?
> 
> Would need to get input from operators and/or Dan Smith's opinion on
> this one, but I'd say it's no worse than having a flat single cell
> deployment. However, if you're going to do multi-cell long-term anyway,
> then it would be best to get in the mindset and discipline of not
> relying on shared MQ between the controller services and the cells. In
> other words, just do the right thing from the start rather than have to
> worry about maybe changing the deployment / configuration for that one
> cell down the road when it's harder.
> 

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Couple of CellsV2 questions

2018-07-23 Thread Matt Riedemann
I'll try to help a bit inline. Also cross-posting to openstack-dev and 
tagging with [nova] to highlight it.


On 7/23/2018 10:43 AM, Jonathan Mills wrote:
I am looking at implementing CellsV2 with multiple cells, and there's a 
few things I'm seeking clarification on:


1) How does a superconductor know that it is a superconductor?  Is its 
operation different in any fundamental way?  Is there any explicit 
configuration or a setting in the database required? Or does it simply 
not care one way or another?


It's a topology term, not really anything in config or the database that 
distinguishes the "super" conductor. I assume you've gone over the 
service layout in the docs:


https://docs.openstack.org/nova/latest/user/cellsv2-layout.html#service-layout

There are also some summit talks from Dan about the topology linked here:

https://docs.openstack.org/nova/latest/user/cells.html#cells-v2

The superconductor is the conductor service at the "top" of the tree 
which interacts with the API and scheduler (controller) services and 
routes operations to the cell. Then once in a cell, the operation should 
ideally be confined there. So, for example, reschedules during a build 
would be confined to the cell. The cell conductor doesn't go back "up" 
to the scheduler to get a new set of hosts for scheduling. This of 
course depends on which release you're using and your configuration, see 
the caveats section in the cellsv2-layout doc.




2) When I ran the command "nova-manage cell_v2 create_cell --name=cell1 
--verbose", the entry created for cell1 in the api database includes 
only one rabbitmq server, but I have three of them as an HA cluster.  
Does it only support talking to one rabbitmq server in this 
configuration? Or can I just update the cell1 transport_url in the 
database to point to all three? Is that a supported configuration?


First, don't update stuff directly in the database if you don't have to. 
:) What you set on the transport_url should be whatever oslo.messaging 
can handle:


https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.transport_url

There is at least one reported bug for this but I'm not sure I fully 
grok it or what its status is at this point:


https://bugs.launchpad.net/nova/+bug/1717915



3) Is there anything wrong with having one cell share the amqp bus with 
your control plane, while having additional cells use their own amqp 
buses? Certainly I realize that the point of CellsV2 is to shard the 
amqp bus for greater horizontal scalability.  But in my case, my first 
cell is on the smaller side, and happens to be colocated with the 
control plane hardware (whereas other cells will be in other parts of 
the datacenter, or in other datacenters with high-speed links).  I was 
thinking of just pointing that first cell back at the same rabbitmq 
servers used by the control plane, but perhaps directing them at their 
own rabbitmq vhost. Is that a terrible idea?


Would need to get input from operators and/or Dan Smith's opinion on 
this one, but I'd say it's no worse than having a flat single cell 
deployment. However, if you're going to do multi-cell long-term anyway, 
then it would be best to get in the mindset and discipline of not 
relying on shared MQ between the controller services and the cells. In 
other words, just do the right thing from the start rather than have to 
worry about maybe changing the deployment / configuration for that one 
cell down the road when it's harder.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Cinder cross_az_attach=False changes/fixes

2018-07-15 Thread Matt Riedemann
Just an update on an old thread, but I've been working on the 
cross_az_attach=False issues again this past week and I think I have a 
couple of decent fixes.


On 5/31/2017 6:08 PM, Matt Riedemann wrote:

This is a request for any operators out there that configure nova to set:

[cinder]
cross_az_attach=False

To check out these two bug fixes:

1. https://review.openstack.org/#/c/366724/

This is a case where nova is creating the volume during boot from volume 
and providing an AZ to cinder during the volume create request. Today we 
just pass the instance.availability_zone which is None if the instance 
was created without an AZ set. It's unclear to me if that causes the 
volume creation to fail (someone in IRC was showing the volume going 
into ERROR state while Nova was waiting for it to be available), but I 
think it will cause the later attach to fail here [1] because the 
instance AZ (defaults to None) and volume AZ (defaults to nova) may not 
match. I'm still looking for more details on the actual failure in that 
one though.


The proposed fix in this case is pass the AZ associated with any host 
aggregate that the instance is in.


This was indirectly fixed by change 
https://review.openstack.org/#/c/446053/ in Pike where we now set the 
instance.availability_zone in conductor after we get a selected host 
from the scheduler (we get the AZ for the host and set that on the 
instance before sending the instance to compute to build it).


While investigating this on master, I found a new bug where we do an 
up-call to the API DB which fails in a split MQ setup, and I have a fix 
here:


https://review.openstack.org/#/c/582342/



2. https://review.openstack.org/#/c/469675/

This is similar, but rather than checking the AZ when we're on the 
compute and the instance has a host, we're in the API and doing a boot 
from volume where an existing volume is provided during server create. 
By default, the volume's AZ is going to be 'nova'. The code doing the 
check here is getting the AZ for the instance, and since the instance 
isn't on a host yet, it's not in any aggregate, so the only AZ we can 
get is from the server create request itself. If an AZ isn't provided 
during the server create request, then we're comparing 
instance.availability_zone (None) to volume['availability_zone'] 
("nova") and that results in a 400.


My proposed fix is in the case of BFV checks from the API, we default 
the AZ if one wasn't requested when comparing against the volume. By 
default this is going to compare "nova" for nova and "nova" for cinder, 
since CONF.default_availability_zone is "nova" by default in both projects.


I've refined this fix a bit to be more flexible:

https://review.openstack.org/#/c/469675/

So now if doing boot from volume and we're checking 
cross_az_attach=False in the API and the user didn't explicitly request 
an AZ for the instance, we do a few checks:


1. If [DEFAULT]/default_schedule_zone is not None (the default), we use 
that to compare against the volume AZ.


2. If the volume AZ is equal to the [DEFAULT]/default_availability_zone 
(nova by default in both nova and cinder), we're OK - no issues.


3. If the volume AZ is not equal to [DEFAULT]/default_availability_zone, 
it means either the volume was created with a specific AZ or cinder's 
default AZ is configured differently from nova's. In that case, I take 
the volume AZ and put it into the instance RequestSpec so that during 
scheduling, the nova scheduler picks a host in the same AZ as the volume 
- if that AZ isn't in nova, we fail to schedule (NoValidHost) (but that 
shouldn't really happen, why would one have cross_az_attach=False w/o 
mirrored AZ in both cinder and nova?).




--

I'm requesting help from any operators that are setting 
cross_az_attach=False because I have to imagine your users have run into 
this and you're patching around it somehow, so I'd like input on how you 
or your users are dealing with this.


I'm also trying to recreate these in upstream CI [2] which I was already 
able to do with the 2nd bug.


This devstack patch has recreated both issues above and I'm adding the 
fixes to it as dependencies to show the problems are resolved.




Having said all of this, I really hate cross_az_attach as it's 
config-driven API behavior which is not interoperable across clouds. 
Long-term I'd really love to deprecate this option but we need a 
replacement first, and I'm hoping placement with compute/volume resource 
providers in a shared aggregate can maybe make that happen.


[1] 
https://github.com/openstack/nova/blob/f278784ccb06e16ee12a42a585c5615abe65edfe/nova/virt/block_device.py#L368 


[2] https://review.openstack.org/#/c/467674/



--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] Denver Stein ptg planning

2018-07-11 Thread melanie witt

Hello Devs and Ops,

I've created an etherpad where we can start collecting ideas for topics 
to cover at the Stein PTG. Please feel free to add your comments and 
topics with your IRC nick next to it to make it easier to discuss with you.


https://etherpad.openstack.org/p/nova-ptg-stein

Cheers,
-melanie

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] increasing the number of allowed volumes attached per instance > 26

2018-06-07 Thread melanie witt

Hello Stackers,

Recently, we've received interest about increasing the maximum number of 
allowed volumes to attach to a single instance > 26. The limit of 26 is 
because of a historical limitation in libvirt (if I remember correctly) 
and is no longer limited at the libvirt level in the present day. So, 
we're looking at providing a way to attach more than 26 volumes to a 
single instance and we want your feedback.


We'd like to hear from operators and users about their use cases for 
wanting to be able to attach a large number of volumes to a single 
instance. If you could share your use cases, it would help us greatly in 
moving forward with an approach for increasing the maximum.


Some ideas that have been discussed so far include:

A) Selecting a new, higher maximum that still yields reasonable 
performance on a single compute host (64 or 128, for example). Pros: 
helps prevent the potential for poor performance on a compute host from 
attaching too many volumes. Cons: doesn't let anyone opt-in to a higher 
maximum if their environment can handle it.


B) Creating a config option to let operators choose how many volumes 
allowed to attach to a single instance. Pros: lets operators opt-in to a 
maximum that works in their environment. Cons: it's not discoverable for 
those calling the API.


C) Create a configurable API limit for maximum number of volumes to 
attach to a single instance that is either a quota or similar to a 
quota. Pros: lets operators opt-in to a maximum that works in their 
environment. Cons: it's yet another quota?


Please chime in with your use cases and/or thoughts on the different 
approaches.


Thanks for your help,
-melanie

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] nova-compute automatically disabling itself?

2018-06-07 Thread Matt Riedemann

On 2/6/2018 6:44 PM, Matt Riedemann wrote:

On 2/6/2018 2:14 PM, Chris Apsey wrote:
but we would rather have intermittent build failures rather than 
compute nodes falling over in the future.


Note that once a compute has a successful build, the consecutive build 
failures counter is reset. So if your limit is the default (10) and you 
have 10 failures in a row, the compute service is auto-disabled. But if 
you have say 5 failures and then a pass, it's reset to 0 failures.


Obviously if you're doing a pack-first scheduling strategy rather than 
spreading instances across the deployment, a burst of failures could 
easily disable a compute, especially if that host is overloaded like you 
saw. I'm not sure if rescheduling is helping you or not - that would be 
useful information since we consider the need to reschedule off a failed 
compute host as a bad thing. At the Forum in Boston when this idea came 
up, it was specifically for the case that operators in the room didn't 
want a bad compute to become a "black hole" in their deployment causing 
lots of reschedules until they get that one fixed.


Just an update on this. There is a change merged in Rocky [1] which is 
also going through backports to Queens and Pike. If you've already 
disabled the "consecutive_build_service_disable_threshold" config option 
then it's a no-op. If you haven't, 
"consecutive_build_service_disable_threshold" is now used to count build 
failures but no longer auto-disable the compute service on the 
configured threshold is met (10 by default). The build failure count is 
then used by a new weigher (enabled by default) to sort hosts with build 
failures to the back of the list of candidate hosts for new builds. Once 
there is a successful build on a given host, the failure count is reset. 
The idea here is that hosts which are failing are given lower priority 
during scheduling.


[1] https://review.openstack.org/#/c/572195/

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] Need feedback on spec for handling down cells in the API

2018-06-07 Thread Matt Riedemann
We have a nova spec [1] which is at the point that it needs some API 
user (and operator) feedback on what nova API should be doing when 
listing servers and there are down cells (unable to reach the cell DB or 
it times out).


tl;dr: the spec proposes to return "shell" instances which have the 
server uuid and created_at fields set, and maybe some other fields we 
can set, but otherwise a bunch of fields in the server response would be 
set to UNKNOWN sentinel values. This would be unversioned, and therefore 
could wreak havoc on existing client side code that expects fields like 
'config_drive' and 'updated' to be of a certain format.


There are alternatives listed in the spec so please read this over and 
provide feedback since this is a pretty major UX change.


Oh, and no pressure, but today is the spec freeze deadline for Rocky.

[1] https://review.openstack.org/#/c/557369/

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] isolate hypervisor to project

2018-06-04 Thread Chris Friesen

On 06/04/2018 05:43 AM, Tobias Urdin wrote:

Hello,

I have received a question about a more specialized use case where we need to
isolate several hypervisors to a specific project. My first thinking was
using nova flavors for only that project and add extra specs properties to
use a specific host aggregate but this means I need to assign values to all
other flavors to not use those which seems weird.

How could I go about solving this the easies/best way or from the
history of the mailing lists, the most supported way since there is a
lot of changes to scheduler/placement part right now?


There was a "Strict isolation of group of hosts for images" spec that was 
proposed for a number of releases but never got accepted:


https://review.openstack.org/#/c/381912/

The idea was to have special metadata on a host aggregate and a new scheduler 
filter such that only instances with images having a property matching the 
metadata would be allowed to land on that host aggregate.


In the end the spec was abandoned (see the final comment in the review) because 
it was expected that a combination of other accepted features would enable the 
desired behaviour.


It might be worth checking out the links in the final comment.

Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] isolate hypervisor to project

2018-06-04 Thread Tobias Urdin
Saw now in the docs that multiple aggregate_instance_extra_specs keys
should be a comma-separated list.
But other than that, would the below do what I'm looking for?

Has a very high maintenance level when having a lot of hypervisors and
steadily adding new ones, but I can't see any other way to fully isolate it.
Would've been cool if the RFE you mentioned [1] could be researched and
if it qualifies implemented.

Best regards

[1] https://bugs.launchpad.net/openstack-publiccloud-wg/+bug/1771523

On 06/04/2018 03:32 PM, Tobias Urdin wrote:
> Hello,
> Thanks for the reply Matt.
>
> The hard thing here is that I have to ensure it the other way around as
> well i.e other instances cannot be allowed landing on those "reserved"
> hypervisors.
> I assume I could do something like in [1] and also set key-value
> metadata on all flavors to select a host aggregate that is not the
> "reserved" hypervisors.
>
> openstack aggregate create fast-cpu --property fast-cpu=true --property
> other=true
> openstack aggregate create normal-cpu --property normal-cpu=true
> --property other=true
> openstack aggregate create dedicated --property dedicated=true
> openstack aggregate add host fast-cpu compute1
> openstack aggregate add host normal-cpu compute2
> openstack aggregate add host dedicated compute3
> openstack flavor create --vcpus 4 --ram 4096 --disk 50 --property
> aggregate_instance_extra_specs:fast-cpu=true --property
> aggregate_instance_extra_specs:other=true fast-cpu.medium
> openstack flavor create --vcpus 4 --ram 4096 --disk 50 --property
> aggregate_instance_extra_specs:normal-cpu=true --property
> aggregate_instance_extra_specs:other=true normal-cpu.medium
> openstack flavor create --vcpus 4 --ram 4096 --disk 50 --private
> --project  --property
> aggregate_instance_extra_specs:dedicated=true dedicated.medium
>
> It's seems very messy, would that be an supported approach?
> We are on Queens, doing it in a way that is not removed in the future
> would be optimal.
>
> Best regards
>
> [1] https://www.brad-x.com/2016/01/01/dedicate-compute-hosts-to-projects/
>
>
> On 06/04/2018 02:50 PM, Matt Riedemann wrote:
>> On 6/4/2018 6:43 AM, Tobias Urdin wrote:
>>> I have received a question about a more specialized use case where we
>>> need to isolate several hypervisors
>>>
>>> to a specific project. My first thinking was using nova flavors for only
>>> that project and add extra specs properties to use a specific host
>>> aggregate but this
>>>
>>> means I need to assign values to all other flavors to not use those
>>> which seems weird.
>>>
>>>
>>> How could I go about solving this the easies/best way or from the
>>> history of the mailing lists, the most supported way since there is a
>>> lot of changes
>>>
>>> to scheduler/placement part right now?
>> Depending on which release you're on, it sounds like you want to use this:
>>
>> https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#aggregatemultitenancyisolation
>>
>> In Rocky we have a replacement for that filter which does pre-filtering 
>> in Placement which should give you a performance gain when it comes time 
>> to do the host filtering:
>>
>> https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#tenant-isolation-with-placement
>>
>> Note that even if you use AggregateMultiTenancyIsolation for the one 
>> project, other projects can still randomly land on the hosts in that 
>> aggregate unless you also assign those to their own aggregates.
>>
>> It sounds like you're might be looking for a dedicated hosts feature? 
>> There is an RFE from the public cloud work group for that:
>>
>> https://bugs.launchpad.net/openstack-publiccloud-wg/+bug/1771523
>>
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] isolate hypervisor to project

2018-06-04 Thread Tobias Urdin
Hello,
Thanks for the reply Matt.

The hard thing here is that I have to ensure it the other way around as
well i.e other instances cannot be allowed landing on those "reserved"
hypervisors.
I assume I could do something like in [1] and also set key-value
metadata on all flavors to select a host aggregate that is not the
"reserved" hypervisors.

openstack aggregate create fast-cpu --property fast-cpu=true --property
other=true
openstack aggregate create normal-cpu --property normal-cpu=true
--property other=true
openstack aggregate create dedicated --property dedicated=true
openstack aggregate add host fast-cpu compute1
openstack aggregate add host normal-cpu compute2
openstack aggregate add host dedicated compute3
openstack flavor create --vcpus 4 --ram 4096 --disk 50 --property
aggregate_instance_extra_specs:fast-cpu=true --property
aggregate_instance_extra_specs:other=true fast-cpu.medium
openstack flavor create --vcpus 4 --ram 4096 --disk 50 --property
aggregate_instance_extra_specs:normal-cpu=true --property
aggregate_instance_extra_specs:other=true normal-cpu.medium
openstack flavor create --vcpus 4 --ram 4096 --disk 50 --private
--project  --property
aggregate_instance_extra_specs:dedicated=true dedicated.medium

It's seems very messy, would that be an supported approach?
We are on Queens, doing it in a way that is not removed in the future
would be optimal.

Best regards

[1] https://www.brad-x.com/2016/01/01/dedicate-compute-hosts-to-projects/


On 06/04/2018 02:50 PM, Matt Riedemann wrote:
> On 6/4/2018 6:43 AM, Tobias Urdin wrote:
>> I have received a question about a more specialized use case where we
>> need to isolate several hypervisors
>>
>> to a specific project. My first thinking was using nova flavors for only
>> that project and add extra specs properties to use a specific host
>> aggregate but this
>>
>> means I need to assign values to all other flavors to not use those
>> which seems weird.
>>
>>
>> How could I go about solving this the easies/best way or from the
>> history of the mailing lists, the most supported way since there is a
>> lot of changes
>>
>> to scheduler/placement part right now?
> Depending on which release you're on, it sounds like you want to use this:
>
> https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#aggregatemultitenancyisolation
>
> In Rocky we have a replacement for that filter which does pre-filtering 
> in Placement which should give you a performance gain when it comes time 
> to do the host filtering:
>
> https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#tenant-isolation-with-placement
>
> Note that even if you use AggregateMultiTenancyIsolation for the one 
> project, other projects can still randomly land on the hosts in that 
> aggregate unless you also assign those to their own aggregates.
>
> It sounds like you're might be looking for a dedicated hosts feature? 
> There is an RFE from the public cloud work group for that:
>
> https://bugs.launchpad.net/openstack-publiccloud-wg/+bug/1771523
>


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] isolate hypervisor to project

2018-06-04 Thread Matt Riedemann

On 6/4/2018 6:43 AM, Tobias Urdin wrote:

I have received a question about a more specialized use case where we
need to isolate several hypervisors

to a specific project. My first thinking was using nova flavors for only
that project and add extra specs properties to use a specific host
aggregate but this

means I need to assign values to all other flavors to not use those
which seems weird.


How could I go about solving this the easies/best way or from the
history of the mailing lists, the most supported way since there is a
lot of changes

to scheduler/placement part right now?


Depending on which release you're on, it sounds like you want to use this:

https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#aggregatemultitenancyisolation

In Rocky we have a replacement for that filter which does pre-filtering 
in Placement which should give you a performance gain when it comes time 
to do the host filtering:


https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#tenant-isolation-with-placement

Note that even if you use AggregateMultiTenancyIsolation for the one 
project, other projects can still randomly land on the hosts in that 
aggregate unless you also assign those to their own aggregates.


It sounds like you're might be looking for a dedicated hosts feature? 
There is an RFE from the public cloud work group for that:


https://bugs.launchpad.net/openstack-publiccloud-wg/+bug/1771523

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] isolate hypervisor to project

2018-06-04 Thread Tobias Urdin
Hello,

I have received a question about a more specialized use case where we
need to isolate several hypervisors

to a specific project. My first thinking was using nova flavors for only
that project and add extra specs properties to use a specific host
aggregate but this

means I need to assign values to all other flavors to not use those
which seems weird.


How could I go about solving this the easies/best way or from the
history of the mailing lists, the most supported way since there is a
lot of changes

to scheduler/placement part right now?


Best regards


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] proposal to postpone nova-network core functionality removal to Stein

2018-05-31 Thread melanie witt

Hello Operators and Devs,

This cycle at the PTG, we had decided to start making some progress 
toward removing nova-network [1] (thanks to those who have helped!) and 
so far, we've landed some patches to extract common network utilities 
from nova-network core functionality into separate utility modules. And 
we've started proposing removal of nova-network REST APIs [2].


At the cells v2 sync with operators forum session at the summit [3], we 
learned that CERN is in the middle of migrating from nova-network to 
neutron and that holding off on removal of nova-network core 
functionality until Stein would help them out a lot to have a safety net 
as they continue progressing through the migration.


If we recall correctly, they did say that removal of the nova-network 
REST APIs would not impact their migration and Surya Seetharaman is 
double-checking about that and will get back to us. If so, we were 
thinking we can go ahead and work on nova-network REST API removals this 
cycle to make some progress while holding off on removing the core 
functionality of nova-network until Stein.


I wanted to send this to the ML to let everyone know what we were 
thinking about this and to receive any additional feedback folks might 
have about this plan.


Thanks,
-melanie

[1] https://etherpad.openstack.org/p/nova-ptg-rocky L301
[2] https://review.openstack.org/567682
[3] 
https://etherpad.openstack.org/p/YVR18-cellsv2-migration-sync-with-operators 
L30


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Need some feedback on the proposed heal_allocations CLI

2018-05-29 Thread Matt Riedemann

On 5/28/2018 7:31 AM, Sylvain Bauza wrote:
That said, given I'm now working on using Nested Resource Providers for 
VGPU inventories, I wonder about a possible upgrade problem with VGPU 
allocations. Given that :
  - in Queens, VGPU inventories are for the root RP (ie. the compute 
node RP), but,
  - in Rocky, VGPU inventories will be for children RPs (ie. against a 
specific VGPU type), then


if we have VGPU allocations in Queens, when upgrading to Rocky, we 
should maybe recreate the allocations to a specific other inventory ?


For how the heal_allocations CLI works today, if the instance has any 
allocations in placement, it skips that instance. So this scenario 
wouldn't be a problem.




Hope you see the problem with upgrading by creating nested RPs ?


Yes, the CLI doesn't attempt to have any knowledge about nested resource 
providers, it just takes the flavor embedded in the instance and creates 
allocations against the compute node provider using the flavor. It has 
no explicit knowledge about granular request groups or more advanced 
features like that.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Need some feedback on the proposed heal_allocations CLI

2018-05-28 Thread Sylvain Bauza
On Fri, May 25, 2018 at 12:19 AM, Matt Riedemann 
wrote:

> I've written a nova-manage placement heal_allocations CLI [1] which was a
> TODO from the PTG in Dublin as a step toward getting existing
> CachingScheduler users to roll off that (which is deprecated).
>
> During the CERN cells v1 upgrade talk it was pointed out that CERN was
> able to go from placement-per-cell to centralized placement in Ocata
> because the nova-computes in each cell would automatically recreate the
> allocations in Placement in a periodic task, but that code is gone once
> you're upgraded to Pike or later.
>
> In various other talks during the summit this week, we've talked about
> things during upgrades where, for instance, if placement is down for some
> reason during an upgrade, a user deletes an instance and the allocation
> doesn't get cleaned up from placement so it's going to continue counting
> against resource usage on that compute node even though the server instance
> in nova is gone. So this CLI could be expanded to help clean up situations
> like that, e.g. provide it a specific server ID and the CLI can figure out
> if it needs to clean things up in placement.
>
> So there are plenty of things we can build into this, but the patch is
> already quite large. I expect we'll also be backporting this to stable
> branches to help operators upgrade/fix allocation issues. It already has
> several things listed in a code comment inline about things to build into
> this later.
>
> My question is, is this good enough for a first iteration or is there
> something severely missing before we can merge this, like the automatic
> marker tracking mentioned in the code (that will probably be a non-trivial
> amount of code to add). I could really use some operator feedback on this
> to just take a look at what it already is capable of and if it's not going
> to be useful in this iteration, let me know what's missing and I can add
> that in to the patch.
>
> [1] https://review.openstack.org/#/c/565886/
>
>

It does sound for me a good way to help operators.

That said, given I'm now working on using Nested Resource Providers for
VGPU inventories, I wonder about a possible upgrade problem with VGPU
allocations. Given that :
 - in Queens, VGPU inventories are for the root RP (ie. the compute node
RP), but,
 - in Rocky, VGPU inventories will be for children RPs (ie. against a
specific VGPU type), then

if we have VGPU allocations in Queens, when upgrading to Rocky, we should
maybe recreate the allocations to a specific other inventory ?

Hope you see the problem with upgrading by creating nested RPs ?


> --
>
> Thanks,
>
> Matt
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] Need some feedback on the proposed heal_allocations CLI

2018-05-24 Thread Matt Riedemann
I've written a nova-manage placement heal_allocations CLI [1] which was 
a TODO from the PTG in Dublin as a step toward getting existing 
CachingScheduler users to roll off that (which is deprecated).


During the CERN cells v1 upgrade talk it was pointed out that CERN was 
able to go from placement-per-cell to centralized placement in Ocata 
because the nova-computes in each cell would automatically recreate the 
allocations in Placement in a periodic task, but that code is gone once 
you're upgraded to Pike or later.


In various other talks during the summit this week, we've talked about 
things during upgrades where, for instance, if placement is down for 
some reason during an upgrade, a user deletes an instance and the 
allocation doesn't get cleaned up from placement so it's going to 
continue counting against resource usage on that compute node even 
though the server instance in nova is gone. So this CLI could be 
expanded to help clean up situations like that, e.g. provide it a 
specific server ID and the CLI can figure out if it needs to clean 
things up in placement.


So there are plenty of things we can build into this, but the patch is 
already quite large. I expect we'll also be backporting this to stable 
branches to help operators upgrade/fix allocation issues. It already has 
several things listed in a code comment inline about things to build 
into this later.


My question is, is this good enough for a first iteration or is there 
something severely missing before we can merge this, like the automatic 
marker tracking mentioned in the code (that will probably be a 
non-trivial amount of code to add). I could really use some operator 
feedback on this to just take a look at what it already is capable of 
and if it's not going to be useful in this iteration, let me know what's 
missing and I can add that in to the patch.


[1] https://review.openstack.org/#/c/565886/

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] FYI on changes that might impact out of tree scheduler filters

2018-05-17 Thread Matt Riedemann
CERN has upgraded to Cells v2 and is doing performance testing of the 
scheduler and were reporting some things today which got us back to this 
bug [1]. So I've starting pushing some patches related to this but also 
related to an older blueprint I created [2]. In summary, we do quite a 
bit of DB work just to load up a list of instance objects per host that 
the in-tree filters don't even use.


The first change [3] is a simple optimization to avoid the default joins 
on the instance_info_caches and security_groups tables. If you have out 
of tree filters that, for whatever reason, rely on the 
HostState.instances objects to have info_cache or security_groups set, 
they'll continue to work, but will have to round-trip to the DB to 
lazy-load the fields, which is going to be a performance penalty on that 
filter. See the change for details.


The second change in the series [4] is more drastic in that we'll do 
away with pulling the full Instance object per host, which means only a 
select set of optional fields can be lazy-loaded [5], and the rest will 
result in an exception. The patch currently has a workaround config 
option to continue doing things the old way if you have out of tree 
filters that rely on this, but for good citizens with only in-tree 
filters, you will get a performance improvement during scheduling.


There are some other things we can do to optimize more of this flow, but 
this email is just about the ones that have patches up right now.


[1] https://bugs.launchpad.net/nova/+bug/1737465
[2] 
https://blueprints.launchpad.net/nova/+spec/put-host-manager-instance-info-on-a-diet

[3] https://review.openstack.org/#/c/569218/
[4] https://review.openstack.org/#/c/569247/
[5] 
https://github.com/openstack/nova/blob/de52fefa1fd52ccaac6807e5010c5f2a2dcbaab5/nova/objects/instance.py#L66


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] [placement] placement extraction session at forum

2018-05-09 Thread Chris Dent


I've started an etherpad related to the Vancouver Forum session on
extracting placement from nova. It's mostly just an outline for
now but is evolving:

https://etherpad.openstack.org/p/YVR-placement-extraction

If we can get some real information in there before the session we
are much more likely to have a productive session. Please feel free
to add any notes or questions you have there. Or on this thread if
you prefer.

The (potentially overly-optimistic) hope is that we can complete any
prepatory work before the end of Rocky and then do the extraction in
Stein. If we are willing to accept (please, let's) some form of
control plane downtime data migration issues can be vastly eased.
Getting agreement on how that might work is one of the goals of the
session.

Your input very appreciated.

--
Chris Dent   ٩◔̯◔۶   https://anticdent.org/
freenode: cdent tw: @anticdent___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova][ironic] ironic_host_manager and baremetal scheduler options removal

2018-05-02 Thread Matt Riedemann
The baremetal scheduling options were deprecated in Pike [1] and the 
ironic_host_manager was deprecated in Queens [2] and is now being 
removed [3]. Deployments must use resource classes now for baremetal 
scheduling. [4]


The large host subset size value is also no longer needed. [5]

I've gone through all of the references to "ironic_host_manager" that I 
could find in codesearch.o.o and updated projects accordingly [6].


Please reply ASAP to this thread and/or [3] if you have issues with this.

[1] https://review.openstack.org/#/c/493052/
[2] https://review.openstack.org/#/c/521648/
[3] https://review.openstack.org/#/c/565805/
[4] 
https://docs.openstack.org/ironic/latest/install/configure-nova-flavors.html#scheduling-based-on-resource-classes

[5] https://review.openstack.org/565736/
[6] 
https://review.openstack.org/#/q/topic:exact-filters+(status:open+OR+status:merged)


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] Concern about trusted certificates API change

2018-04-18 Thread Matt Riedemann
There is a compute REST API change proposed [1] which will allow users 
to pass trusted certificate IDs to be used with validation of images 
when creating or rebuilding a server. The trusted cert IDs are based on 
certificates stored in some key manager, e.g. Barbican.


The full nova spec is here [2].

The main concern I have is that trusted certs will not be supported for 
volume-backed instances, and some clouds only support volume-backed 
instances. The way the patch is written is that if the user attempts to 
boot from volume with trusted certs, it will fail.


In thinking about a semi-discoverable/configurable solution, I'm 
thinking we should add a policy rule around trusted certs to indicate if 
they can be used or not. Beyond the boot from volume issue, the only 
virt driver that supports trusted cert image validation is the libvirt 
driver, so any cloud that's not using the libvirt driver simply cannot 
support this feature, regardless of boot from volume. We have added 
similar policy rules in the past for backend-dependent features like 
volume extend and volume multi-attach, so I don't think this is a new issue.


Alternatively we can block the change in nova until it supports boot 
from volume, but that would mean needing to add trusted cert image 
validation support into cinder along with API changes, effectively 
killing the chance of this getting done in nova in Rocky, and this 
blueprint has been around since at least Ocata so it would be good to 
make progress if possible.


[1] https://review.openstack.org/#/c/486204/
[2] 
https://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/nova-validate-certificates.html


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Rocky forum topics brainstorming

2018-04-18 Thread melanie witt

On Fri, 13 Apr 2018 08:00:31 -0700, Melanie Witt wrote:

+openstack-operators (apologies that I forgot to add originally)

On Mon, 9 Apr 2018 10:09:12 -0700, Melanie Witt wrote:

Hey everyone,

Let's collect forum topic brainstorming ideas for the Forum sessions in
Vancouver in this etherpad [0]. Once we've brainstormed, we'll select
and submit our topic proposals for consideration at the end of this
week. The deadline for submissions is Sunday April 15.

Thanks,
-melanie

[0] https://etherpad.openstack.org/p/YVR-nova-brainstorming


Just a reminder that we're collecting forum topic ideas to propose for
Vancouver and input from operators is especially important. Please add
your topics and/or comments to the etherpad [0] and we'll submit
proposals before the Sunday deadline.


Here's a list of nova-related sessions that have been proposed:

* CellsV2 migration process sync with operators:
  http://forumtopics.openstack.org/cfp/details/125

* nova/neutron + ops cross-project session:
  http://forumtopics.openstack.org/cfp/details/124

* Planning to use Placement in Cinder:
  http://forumtopics.openstack.org/cfp/details/89

* Building the path to extracting Placement from Nova:
  http://forumtopics.openstack.org/cfp/details/88

* Multi-attach introduction and future direction:
  http://forumtopics.openstack.org/cfp/details/101

* Making NFV features easier to use:
  http://forumtopics.openstack.org/cfp/details/146

A list of all proposed forum topics can be seen here:

http://forumtopics.openstack.org

Cheers,
-melanie




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] Default scheduler filters survey

2018-04-18 Thread Artom Lifshitz
Hi all,

A CI issue [1] caused by tempest thinking some filters are enabled
when they're really not, and a proposed patch [2] to add
(Same|Different)HostFilter to the default filters as a workaround, has
led to a discussion about what filters should be enabled by default in
nova.

The default filters should make sense for a majority of real world
deployments. Adding some filters to the defaults because CI needs them
is faulty logic, because the needs of CI are different to the needs of
operators/users, and the latter takes priority (though it's my
understanding that a good chunk of operators run tempest on their
clouds post-deployment as a way to validate that the cloud is working
properly, so maybe CI's and users' needs aren't that different after
all).

To that end, we'd like to know what filters operators are enabling in
their deployment. If you can, please reply to this email with your
[filter_scheduler]/enabled_filters (or
[DEFAULT]/scheduler_default_filters if you're using an older version)
option from nova.conf. Any other comments are welcome as well :)

Cheers!

[1] https://bugs.launchpad.net/tempest/+bug/1628443
[2] https://review.openstack.org/#/c/561651/

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Rocky forum topics brainstorming

2018-04-13 Thread melanie witt

+openstack-operators (apologies that I forgot to add originally)

On Mon, 9 Apr 2018 10:09:12 -0700, Melanie Witt wrote:

Hey everyone,

Let's collect forum topic brainstorming ideas for the Forum sessions in
Vancouver in this etherpad [0]. Once we've brainstormed, we'll select
and submit our topic proposals for consideration at the end of this
week. The deadline for submissions is Sunday April 15.

Thanks,
-melanie

[0] https://etherpad.openstack.org/p/YVR-nova-brainstorming


Just a reminder that we're collecting forum topic ideas to propose for 
Vancouver and input from operators is especially important. Please add 
your topics and/or comments to the etherpad [0] and we'll submit 
proposals before the Sunday deadline.


Thanks all,
-melanie




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [Nova][Deployers] Optional, platform specific, dependancies in requirements.txt

2018-04-11 Thread Michael Still
Hi,

https://review.openstack.org/#/c/523387 proposes adding a z/VM specific
dependancy to nova's requirements.txt. When I objected the counter argument
is that we have examples of windows specific dependancies (os-win) and
powervm specific dependancies in that file already.

I think perhaps all three are a mistake and should be removed.

My recollection is that for drivers like ironic which may not be deployed
by everyone, we have the dependancy documented, and then loaded at runtime
by the driver itself instead of adding it to requirements.txt. This is to
stop pip for auto-installing the dependancy for anyone who wants to run
nova. I had assumed this was at the request of the deployer community.

So what do we do with z/VM? Do we clean this up? Or do we now allow
dependancies that are only useful to a very small number of deployments
into requirements.txt?

Michael
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Nova resources are out of sync in ocata version

2018-04-09 Thread Saverio Proto
It works for me in Newton.
Try it at your own risk :)

Cheers,

Saverio

2018-04-09 13:23 GMT+02:00 Anwar Durrani :
> No this is different one. should i try this one ? if it works ?
>
> On Mon, Apr 9, 2018 at 4:11 PM, Saverio Proto  wrote:
>>
>> Hello Anwar,
>>
>> are you talking about this script ?
>>
>> https://github.com/openstack/osops-tools-contrib/blob/master/nova/nova-libvirt-compare.py
>>
>> it does not work for you ?
>>
>> Saverio
>>
>> 2018-04-09 11:53 GMT+02:00 Anwar Durrani :
>> > Hi All,
>> >
>> > Nova resources are out of sync in ocata version, what values are showing
>> > on
>> > dashboard are mismatch of actual running instances, i do remember i had
>> > script for auto sync resources but this script is getting fail in this
>> > case,
>> > Kindly help here.
>> >
>> > --
>> > Thanks & regards,
>> > Anwar M. Durrani
>> > +91-9923205011
>> >
>> >
>> >
>> > ___
>> > OpenStack-operators mailing list
>> > OpenStack-operators@lists.openstack.org
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>> >
>
>
>
>
> --
> Thanks & regards,
> Anwar M. Durrani
> +91-9923205011
>
>

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Nova resources are out of sync in ocata version

2018-04-09 Thread Anwar Durrani
No this is different one. should i try this one ? if it works ?

On Mon, Apr 9, 2018 at 4:11 PM, Saverio Proto  wrote:

> Hello Anwar,
>
> are you talking about this script ?
> https://github.com/openstack/osops-tools-contrib/blob/
> master/nova/nova-libvirt-compare.py
>
> it does not work for you ?
>
> Saverio
>
> 2018-04-09 11:53 GMT+02:00 Anwar Durrani :
> > Hi All,
> >
> > Nova resources are out of sync in ocata version, what values are showing
> on
> > dashboard are mismatch of actual running instances, i do remember i had
> > script for auto sync resources but this script is getting fail in this
> case,
> > Kindly help here.
> >
> > --
> > Thanks & regards,
> > Anwar M. Durrani
> > +91-9923205011
> >
> >
> >
> > ___
> > OpenStack-operators mailing list
> > OpenStack-operators@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> >
>



-- 
Thanks & regards,
Anwar M. Durrani
+91-9923205011

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Nova resources are out of sync in ocata version

2018-04-09 Thread Anwar Durrani
Hi All,

Nova resources are out of sync in ocata version, what values are showing on
dashboard are mismatch of actual running instances, i do remember i had
script for auto sync resources but this script is getting fail in this
case, Kindly help here.

-- 
Thanks & regards,
Anwar M. Durrani
+91-9923205011

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] nova-placement-api tuning

2018-04-03 Thread Alex Schultz
On Tue, Apr 3, 2018 at 4:48 AM, Chris Dent  wrote:
> On Mon, 2 Apr 2018, Alex Schultz wrote:
>
>> So this is/was valid. A few years back there was some perf tests done
>> with various combinations of process/threads and for Keystone it was
>> determined that threads should be 1 while you should adjust the
>> process count (hence the bug). Now I guess the question is for every
>> service what is the optimal configuration but I'm not sure there's
>> anyone who's looking at this in the upstream for all the services.  In
>> the puppet modules for consistency we applied a similar concept for
>> all the services when they are deployed under apache.  It can be tuned
>> as needed for each service but I don't think we have any great
>> examples of perf numbers. It's really a YMMV thing. We ship a basic
>> default that isn't crazy, but it's probably not optimal either.
>
>
> Do you happen to recall if the trouble with keystone and threaded
> web servers had anything to do with eventlet? Support for the
> eventlet-based server was removed from keystone in Newton.
>

It was running under httpd I believe.

> I've been doing some experiments with placement using multiple uwsgi
> processes, each with multiple threads and it appears to be working
> very well. Ideally all the OpenStack HTTP-based services would be
> able to run effectively in that kind of setup. If they can't I'd
> like to help make it possible.
>
> In any case: processes 3, threads 1 for WSGIDaemonProcess for the
> placement service for a deployment of any real size errs on the
> side of too conservative and I hope we can make some adjustments
> there.
>

You'd say that until you realize that the deployment may also be
sharing every other service api running on the box.  Imagine keystone,
glance, nova, cinder, gnocchi, etc etc all running on the same
machine. Then 3 isn't so conservative. They start adding up and
exhausting resources (cpu cores/memory) really quickly.  In a perfect
world, yes each api service would get it's own system with processes
== processor count but in most cases they end up getting split between
the number of services running on the box.  In puppet we did a sliding
scale and have several facts[0] that can be used if a person doesn't
want to switch to $::processorcount.  If you're rolling your own you
can tune it easier but when you have to come up with something that
might be collocated with a bunch of other services you have to hedge
your bets to make sure it works most of the time.

Thanks,
-Alex

[0] 
http://git.openstack.org/cgit/openstack/puppet-openstacklib/tree/lib/facter/os_workers.rb

>
> --
> Chris Dent   ٩◔̯◔۶   https://anticdent.org/
> freenode: cdent tw: @anticdent
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] nova-placement-api tuning

2018-04-03 Thread Jay Pipes

On 04/03/2018 06:48 AM, Chris Dent wrote:

On Mon, 2 Apr 2018, Alex Schultz wrote:


So this is/was valid. A few years back there was some perf tests done
with various combinations of process/threads and for Keystone it was
determined that threads should be 1 while you should adjust the
process count (hence the bug). Now I guess the question is for every
service what is the optimal configuration but I'm not sure there's
anyone who's looking at this in the upstream for all the services.  In
the puppet modules for consistency we applied a similar concept for
all the services when they are deployed under apache.  It can be tuned
as needed for each service but I don't think we have any great
examples of perf numbers. It's really a YMMV thing. We ship a basic
default that isn't crazy, but it's probably not optimal either.


Do you happen to recall if the trouble with keystone and threaded
web servers had anything to do with eventlet? Support for the
eventlet-based server was removed from keystone in Newton.


IIRC, it had something to do with the way the keystoneauth middleware 
interacted with memcache... not sure if this is still valid any more 
though. Probably worth re-checking the performance.


-jay

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] nova-placement-api tuning

2018-04-03 Thread Chris Dent

On Mon, 2 Apr 2018, Alex Schultz wrote:


So this is/was valid. A few years back there was some perf tests done
with various combinations of process/threads and for Keystone it was
determined that threads should be 1 while you should adjust the
process count (hence the bug). Now I guess the question is for every
service what is the optimal configuration but I'm not sure there's
anyone who's looking at this in the upstream for all the services.  In
the puppet modules for consistency we applied a similar concept for
all the services when they are deployed under apache.  It can be tuned
as needed for each service but I don't think we have any great
examples of perf numbers. It's really a YMMV thing. We ship a basic
default that isn't crazy, but it's probably not optimal either.


Do you happen to recall if the trouble with keystone and threaded
web servers had anything to do with eventlet? Support for the
eventlet-based server was removed from keystone in Newton.

I've been doing some experiments with placement using multiple uwsgi
processes, each with multiple threads and it appears to be working
very well. Ideally all the OpenStack HTTP-based services would be
able to run effectively in that kind of setup. If they can't I'd
like to help make it possible.

In any case: processes 3, threads 1 for WSGIDaemonProcess for the
placement service for a deployment of any real size errs on the
side of too conservative and I hope we can make some adjustments
there.

--
Chris Dent   ٩◔̯◔۶   https://anticdent.org/
freenode: cdent tw: @anticdent___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] nova-placement-api tuning

2018-04-02 Thread Alex Schultz
On Fri, Mar 30, 2018 at 11:11 AM, iain MacDonnell
 wrote:
>
>
> On 03/29/2018 02:13 AM, Belmiro Moreira wrote:
>>
>> Some lessons so far...
>> - Scale keystone accordingly when enabling placement.
>
>
> Speaking of which; I suppose I have the same question for keystone
> (currently running under httpd also). I'm currently using threads=1, based
> on this (IIRC):
>
> https://bugs.launchpad.net/puppet-keystone/+bug/1602530
>
> but I'm not sure if that's valid?
>
> Between placement and ceilometer feeding gnocchi, keystone is kept very
> busy.
>
> Recommendations for processes/threads for keystone? And any other tuning
> hints... ?
>

So this is/was valid. A few years back there was some perf tests done
with various combinations of process/threads and for Keystone it was
determined that threads should be 1 while you should adjust the
process count (hence the bug). Now I guess the question is for every
service what is the optimal configuration but I'm not sure there's
anyone who's looking at this in the upstream for all the services.  In
the puppet modules for consistency we applied a similar concept for
all the services when they are deployed under apache.  It can be tuned
as needed for each service but I don't think we have any great
examples of perf numbers. It's really a YMMV thing. We ship a basic
default that isn't crazy, but it's probably not optimal either.

Thanks,
-Alex

> Thanks!
>
> ~iain
>
>
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] about use spice console

2018-03-29 Thread 李杰
The error info is :
CRITICAL nova [None req-a84d278b-43db-4c94-864b-7a9733aa772c None None] 
Unhandled error: IOError: [Errno 13] Permission denied: '/etc/nova/policy.json'
ERROR nova Traceback (most recent call last):
ERROR nova   File "/usr/bin/nova-compute", line 10, in 
ERROR nova sys.exit(main())
ERROR nova   File "/opt/stack/nova/nova/cmd/compute.py", line 57, in main
ERROR nova topic=compute_rpcapi.RPC_TOPIC)
ERROR nova   File "/opt/stack/nova/nova/service.py", line 240, in create
ERROR nova periodic_interval_max=periodic_interval_max)
ERROR nova   File "/opt/stack/nova/nova/service.py", line 116, in __init__
ERROR nova self.manager = manager_class(host=self.host, *args, **kwargs)
ERROR nova   File "/opt/stack/nova/nova/compute/manager.py", line 509, in 
__init__
ERROR nova self.compute_api = compute.API()
ERROR nova   File "/opt/stack/nova/nova/compute/__init__.py", line 39, in API
ERROR nova return importutils.import_object(class_name, *args, **kwargs)
ERROR nova   File "/usr/lib/python2.7/site-packages/oslo_utils/importutils.py", 
line 44, in import_object
ERROR nova return import_class(import_str)(*args, **kwargs)
ERROR nova   File "/opt/stack/nova/nova/compute/api.py", line 254, in __init__
ERROR nova self.compute_rpcapi = compute_rpcapi.ComputeAPI()
ERROR nova   File "/opt/stack/nova/nova/compute/rpcapi.py", line 354, in 
__init__
ERROR nova self.router = rpc.ClientRouter(default_client)
ERROR nova   File "/opt/stack/nova/nova/rpc.py", line 414, in __init__
ERROR nova 
self.run_periodic_tasks(nova.context.RequestContext(overwrite=False))
ERROR nova   File "/opt/stack/nova/nova/context.py", line 146, in __init__
ERROR nova self.is_admin = policy.check_is_admin(self)
ERROR nova   File "/opt/stack/nova/nova/policy.py", line 177, in check_is_admin
ERROR nova init()
ERROR nova   File "/opt/stack/nova/nova/policy.py", line 75, in init
ERROR nova _ENFORCER.load_rules()
ERROR nova   File "/usr/lib/python2.7/site-packages/oslo_policy/policy.py", 
line 537, in load_rules
ERROR nova overwrite=self.overwrite)
ERROR nova   File "/usr/lib/python2.7/site-packages/oslo_policy/policy.py", 
line 675, in _load_policy_file
ERROR nova self._file_cache, path, force_reload=force_reload)
ERROR nova   File 
"/usr/lib/python2.7/site-packages/oslo_policy/_cache_handler.py", line 41, in 
read_cached_file
ERROR nova with open(filename) as fap:
ERROR nova IOError: [Errno 13] Permission denied: '/etc/nova/policy.json'

 
 
-- Original ------
From:  "李杰"<li...@unitedstack.com>;
Date:  Thu, Mar 29, 2018 05:24 PM
To:  "openstack-operators"<openstack-operators@lists.openstack.org>; 

Subject:  [Openstack-operators] [nova] about use spice console

 
Hi,all
   Now I want to use spice console replace novnc in instance.But the 
openstack documentation is a bit sparse on what configuration parameters to 
enable for SPICE console access. But my result is the nova-compute service and 
nova-consoleauth service failed,and the log tell me the "IOError: [Errno 13] 
Permission denied: /etc/nova/policy.json".So can you help me achieve this?Thank 
you very much.
   ENV is Pike or Queens release devstack.
   This is the step:
   1.on controller:
  yum install -y spice-server spice-protocol 
openstack-nova-spicehtml5proxy spice-html5
  change the nova.conf
  [default]
  vnc_enabled=false
  [spice]
  html5proxy_host=controller_ip
  html5proxy_port=6082
  keymap=en-us
  stop the novnc service 
  start the spicehtml5proxy.service
 systemctl start openstack-nova-spicehtml5proxy.service
2.on conmpute:
  yum install -y spice-server spice-protocol spice-html5
  change the nova-cpu.conf
  [default]
  vnc_enabled=false
  [spice]
  agent_enabled = True
  enabled = True
  html5proxy_base_url = http://controller_ip:6082/spice_auto.html
  html5proxy_host = 0.0.0.0
  html5proxy_port = 6082
  keymap = en-us
  server_listen = 127.0.0.1
  server_proxyclient_address = 127.0.0.1
restart the compute service





Best Regards
Rambo___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] nova-placement-api tuning

2018-03-29 Thread Chris Dent

On Thu, 29 Mar 2018, iain MacDonnell wrote:


If I'm reading

http://modwsgi.readthedocs.io/en/develop/user-guides/processes-and-threading.html

right, it seems that the MPM is not pertinent when using WSGIDaemonProcess.


It doesn't impact the number wsgi processes that will exist or how
they are configured, but it does control the flexibility with which
apache itself will scale to accept initial connections. That's not a
problem you're yet seeing at your scale, but is an issue when the
number of compute nodes gets much bigger.

--
Chris Dent   ٩◔̯◔۶   https://anticdent.org/
freenode: cdent tw: @anticdent___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] nova-placement-api tuning

2018-03-29 Thread iain MacDonnell



On 03/29/2018 04:24 AM, Chris Dent wrote:

On Thu, 29 Mar 2018, Belmiro Moreira wrote:

[lots of great advice snipped]


- Change apache mpm default from prefork to event/worker.
- Increase the WSGI number of processes/threads considering where 
placement

is running.


If I'm reading

http://modwsgi.readthedocs.io/en/develop/user-guides/processes-and-threading.html

right, it seems that the MPM is not pertinent when using WSGIDaemonProcess.



Another option is to switch to nginx and uwsgi. In situations where
the web server is essentially operating as a proxy to another
process which is being the WSGI server, nginx has a history of being
very effective.


Evaluating adoption of uwsgi is on my to-do list ... not least because 
it'd enable restarting of services individually...


~iain


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] nova-placement-api tuning

2018-03-29 Thread Matt Riedemann

On 3/29/2018 12:05 PM, Chris Dent wrote:
Other suggestions? I'm looking at things like turning off 
scheduler_tracks_instance_changes, since affinity scheduling is not 
needed (at least so-far), but not sure that that will help with 
placement load (seems like it might, though?)


This won't impact the placement service itself.


It seemed like it might be causing the compute nodes to make calls to 
update allocations, so I was thinking it might reduce the load a bit, 
but I didn't confirm that. This was "clutching at straws" - hopefully 
I won't need to now.


There's duplication of instance state going to both placement and
the nova-scheduler. The number of calls from nova-compute to
placement reduces a bit as you updgrade to newer releases. It's
still more than we'd prefer.


As Chris said, scheduler_tracks_instance_changes doesn't have anything 
to do with Placement, and it will add more RPC load to your system 
because all computes are RPC casting to the scheduler for every instance 
create/delete/move operation along with a periodic that runs, by 
default, every minute on each compute service to sync things up.


The primary need for scheduler_tracks_instance_changes is the 
(anti-)affinity filters in the scheduler (and maybe if you're using the 
CachingScheduler). If you don't enable the (anti-)affinity filters (they 
are enabled by default), then you can disable 
scheduler_tracks_instance_changes.


Note that you can still disable scheduler_tracks_instance_changes and 
run the affinity filters, but the scheduler will likely make poor 
decisions in a busy cloud which can result in reschedules, which are 
also expensive.


Long-term, we hope to remove the need for 
scheduler_tracks_instance_changes at all because we should have all of 
the information we need about the instances in the Placement service, 
which is generally considered global to the deployment. However, we 
don't yet have a way to model affinity/distance in Placement, and that's 
what's holding us back from removing scheduler_tracks_instance_changes 
and the existing affinity filters.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] nova-placement-api tuning

2018-03-29 Thread Chris Dent

On Thu, 29 Mar 2018, iain MacDonnell wrote:


placement python stack and kicks out the 401. So this mostly
indicates that socket accept is taking forever.


Well, this test connects and gets a 400 immediately:

echo | nc -v apihost 8778

so I don't think it's at at the socket level, but, I assume, the actual WSGI 
app, once the socket connection is established. I did try to choose a test 
that tickles the app, but doesn't "get too deep", as you say.


Sorry I was being terribly non-specific. I meant generically
somewhere along the way from the either the TCP socket that is
accept the initial http connection to 8778 or the unix domain socket
that is between apache2 and the wsgi daemon process. As you've
discerned, the TCP socket and apache2 are fine.

Good question. I could have sworn it was in the installation guide, but I 
can't find it now. It must have come from RDO, i.e.:


https://github.com/rdo-packages/nova-distgit/blob/rpm-master/nova-placement-api.conf


Ooph. I'll see if I can find someone to talk to about that.

Right, that was my basic assessment too so now I'm trying to figure out 
how it should be tuned, but had not been able to find any guidelines, so 
thought of asking here. You've confirmed that I'm on the right track (or at 
least "a" right track).


The mod wsgi docs have a fair bit of stuff about tuning in them, but
it is mixed in amongst various things, but
http://modwsgi.readthedocs.io/en/develop/user-guides/processes-and-threading.html
might be a good starting point.

Other suggestions? I'm looking at things like turning off 
scheduler_tracks_instance_changes, since affinity scheduling is not needed 
(at least so-far), but not sure that that will help with placement load 
(seems like it might, though?)


This won't impact the placement service itself.


It seemed like it might be causing the compute nodes to make calls to update 
allocations, so I was thinking it might reduce the load a bit, but I didn't 
confirm that. This was "clutching at straws" - hopefully I won't need to now.


There's duplication of instance state going to both placement and
the nova-scheduler. The number of calls from nova-compute to
placement reduces a bit as you updgrade to newer releases. It's
still more than we'd prefer.


--
Chris Dent   ٩◔̯◔۶   https://anticdent.org/
freenode: cdent tw: @anticdent___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] nova-placement-api tuning

2018-03-29 Thread iain MacDonnell



On 03/29/2018 01:19 AM, Chris Dent wrote:

On Wed, 28 Mar 2018, iain MacDonnell wrote:

Looking for recommendations on tuning of nova-placement-api. I have a 
few moderately-sized deployments (~200 nodes, ~4k instances), 
currently on Ocata, and instance creation is getting very slow as they 
fill up.


This should be well within the capabilities of an appropriately
installed placement service, so I reckon something is weird about
your installation. More within.


$ time curl http://apihost:8778/

{"error": {"message": "The request you have made requires 
authentication.", "code": 401, "title": "Unauthorized"}}

real    0m20.656s
user    0m0.003s
sys    0m0.001s


This is good choice for trying to determine what's up because it
avoids any interaction with the database and most of the stack of
code: the web server answers, runs a very small percentage of the
placement python stack and kicks out the 401. So this mostly
indicates that socket accept is taking forever.


Well, this test connects and gets a 400 immediately:

echo | nc -v apihost 8778

so I don't think it's at at the socket level, but, I assume, the actual 
WSGI app, once the socket connection is established. I did try to choose 
a test that tickles the app, but doesn't "get too deep", as you say.



nova-placement-api is running under mod_wsgi with the "standard"(?) 
config, i.e.:


Do you recall where this configuration comes from? The settings for
WSGIDaemonProcess are not very good and if there is some packaging
or documentation that is settings this way it would be good to find
it and fix it.


Good question. I could have sworn it was in the installation guide, but 
I can't find it now. It must have come from RDO, i.e.:


https://github.com/rdo-packages/nova-distgit/blob/rpm-master/nova-placement-api.conf



Depending on what else is on the host running placement I'd boost
processes to number of cores divided by 2, 3 or 4 and boost threads to
around 25. Or you can leave 'threads' off and it will default to 15
(at least in recent versions of mod wsgi).

With the settings a below you're basically saying that you want to
handle 3 connections at a time, which isn't great, since each of
your compute-nodes wants to talk to placement multiple times a
minute (even when nothing is happening).


Right, that was my basic assessment too so now I'm trying to figure 
out how it should be tuned, but had not been able to find any 
guidelines, so thought of asking here. You've confirmed that I'm on the 
right track (or at least "a" right track).




Tweaking the number of processes versus the number of threads
depends on whether it appears that the processes are cpu or I/O
bound. More threads helps when things are I/O bound.


Interesting. Will keep that in mind. Thanks!


...
 WSGIProcessGroup nova-placement-api
 WSGIApplicationGroup %{GLOBAL}
 WSGIPassAuthorization On
 WSGIDaemonProcess nova-placement-api processes=3 threads=1 user=nova 
group=nova

 WSGIScriptAlias / /usr/bin/nova-placement-api
...


[snip]

Other suggestions? I'm looking at things like turning off 
scheduler_tracks_instance_changes, since affinity scheduling is not 
needed (at least so-far), but not sure that that will help with 
placement load (seems like it might, though?)


This won't impact the placement service itself.


It seemed like it might be causing the compute nodes to make calls to 
update allocations, so I was thinking it might reduce the load a bit, 
but I didn't confirm that. This was "clutching at straws" - hopefully I 
won't need to now.




A while back I did some experiments with trying to overload
placement by using the fake virt driver in devstack and wrote it up
at  https://anticdent.org/placement-scale-fun.html


The gist was that with a properly tuned placement service it was
other parts of the system that suffered first.


Interesting. Thanks for sharing that!

~iain


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] nova-placement-api tuning

2018-03-29 Thread Chris Dent

On Thu, 29 Mar 2018, Belmiro Moreira wrote:

[lots of great advice snipped]


- Change apache mpm default from prefork to event/worker.
- Increase the WSGI number of processes/threads considering where placement
is running.


Another option is to switch to nginx and uwsgi. In situations where
the web server is essentially operating as a proxy to another
process which is being the WSGI server, nginx has a history of being
very effective.

--
Chris Dent   ٩◔̯◔۶   https://anticdent.org/
freenode: cdent tw: @anticdent___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] about use spice console

2018-03-29 Thread 李杰
Hi,all
   Now I want to use spice console replace novnc in instance.But the 
openstack documentation is a bit sparse on what configuration parameters to 
enable for SPICE console access. But my result is the nova-compute service and 
nova-consoleauth service failed,and the log tell me the "IOError: [Errno 13] 
Permission denied: /etc/nova/policy.json".So can you help me achieve this?Thank 
you very much.
   ENV is Pike or Queens release devstack.
   This is the step:
   1.on controller:
  yum install -y spice-server spice-protocol 
openstack-nova-spicehtml5proxy spice-html5
  change the nova.conf
  [default]
  vnc_enabled=false
  [spice]
  html5proxy_host=controller_ip
  html5proxy_port=6082
  keymap=en-us
  stop the novnc service 
  start the spicehtml5proxy.service
 systemctl start openstack-nova-spicehtml5proxy.service
2.on conmpute:
  yum install -y spice-server spice-protocol spice-html5
  change the nova-cpu.conf
  [default]
  vnc_enabled=false
  [spice]
  agent_enabled = True
  enabled = True
  html5proxy_base_url = http://controller_ip:6082/spice_auto.html
  html5proxy_host = 0.0.0.0
  html5proxy_port = 6082
  keymap = en-us
  server_listen = 127.0.0.1
  server_proxyclient_address = 127.0.0.1
restart the compute service





Best Regards
Rambo___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] nova-placement-api tuning

2018-03-29 Thread Belmiro Moreira
Hi,
with Ocata upgrade we decided to run local placements (one service per
cellV1) because we were nervous about possible scalability issues but
specially the increase of the schedule time. Fortunately, this is now been
address with the placement-req-filter work.

We started slowly to aggregate our local placements into a the central one
(required for cellsV2).
Currently we have >7000 compute nodes (>40k requests per minute) into this
central placement. Still ~2000 compute nodes to go.

Some lessons so far...
- Scale keystone accordingly when enabling placement.
- Don't forget to configure memcache for keystone_authtoken.
- Change apache mpm default from prefork to event/worker.
- Increase the WSGI number of processes/threads considering where placement
is running.
- Have enough placement nodes considering your number of requests.
- Monitor the request time. This impacts VM scheduling. Also, depending how
it's configured the LB can also start removing placement nodes.
- DB could be a bottleneck.

We are still learning how to have a stable placement at scale.
It would be great if others can share their experiences.


Belmiro
CERN

On Thu, Mar 29, 2018 at 10:19 AM, Chris Dent  wrote:

> On Wed, 28 Mar 2018, iain MacDonnell wrote:
>
> Looking for recommendations on tuning of nova-placement-api. I have a few
>> moderately-sized deployments (~200 nodes, ~4k instances), currently on
>> Ocata, and instance creation is getting very slow as they fill up.
>>
>
> This should be well within the capabilities of an appropriately
> installed placement service, so I reckon something is weird about
> your installation. More within.
>
> $ time curl http://apihost:8778/
>> {"error": {"message": "The request you have made requires
>> authentication.", "code": 401, "title": "Unauthorized"}}
>> real0m20.656s
>> user0m0.003s
>> sys 0m0.001s
>>
>
> This is good choice for trying to determine what's up because it
> avoids any interaction with the database and most of the stack of
> code: the web server answers, runs a very small percentage of the
> placement python stack and kicks out the 401. So this mostly
> indicates that socket accept is taking forever.
>
> nova-placement-api is running under mod_wsgi with the "standard"(?)
>> config, i.e.:
>>
>
> Do you recall where this configuration comes from? The settings for
> WSGIDaemonProcess are not very good and if there is some packaging
> or documentation that is settings this way it would be good to find
> it and fix it.
>
> Depending on what else is on the host running placement I'd boost
> processes to number of cores divided by 2, 3 or 4 and boost threads to
> around 25. Or you can leave 'threads' off and it will default to 15
> (at least in recent versions of mod wsgi).
>
> With the settings a below you're basically saying that you want to
> handle 3 connections at a time, which isn't great, since each of
> your compute-nodes wants to talk to placement multiple times a
> minute (even when nothing is happening).
>
> Tweaking the number of processes versus the number of threads
> depends on whether it appears that the processes are cpu or I/O
> bound. More threads helps when things are I/O bound.
>
> ...
>>  WSGIProcessGroup nova-placement-api
>>  WSGIApplicationGroup %{GLOBAL}
>>  WSGIPassAuthorization On
>>  WSGIDaemonProcess nova-placement-api processes=3 threads=1 user=nova
>> group=nova
>>  WSGIScriptAlias / /usr/bin/nova-placement-api
>> ...
>>
>
> [snip]
>
> Other suggestions? I'm looking at things like turning off
>> scheduler_tracks_instance_changes, since affinity scheduling is not
>> needed (at least so-far), but not sure that that will help with placement
>> load (seems like it might, though?)
>>
>
> This won't impact the placement service itself.
>
> A while back I did some experiments with trying to overload
> placement by using the fake virt driver in devstack and wrote it up
> at https://anticdent.org/placement-scale-fun.html
>
> The gist was that with a properly tuned placement service it was
> other parts of the system that suffered first.
>
> --
> Chris Dent   ٩◔̯◔۶   https://anticdent.org/
> freenode: cdent tw: @anticdent
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] nova-placement-api tuning

2018-03-29 Thread Chris Dent

On Wed, 28 Mar 2018, iain MacDonnell wrote:

Looking for recommendations on tuning of nova-placement-api. I have a few 
moderately-sized deployments (~200 nodes, ~4k instances), currently on Ocata, 
and instance creation is getting very slow as they fill up.


This should be well within the capabilities of an appropriately
installed placement service, so I reckon something is weird about
your installation. More within.


$ time curl http://apihost:8778/
{"error": {"message": "The request you have made requires authentication.", 
"code": 401, "title": "Unauthorized"}}

real0m20.656s
user0m0.003s
sys 0m0.001s


This is good choice for trying to determine what's up because it
avoids any interaction with the database and most of the stack of
code: the web server answers, runs a very small percentage of the
placement python stack and kicks out the 401. So this mostly
indicates that socket accept is taking forever.

nova-placement-api is running under mod_wsgi with the "standard"(?) config, 
i.e.:


Do you recall where this configuration comes from? The settings for
WSGIDaemonProcess are not very good and if there is some packaging
or documentation that is settings this way it would be good to find
it and fix it.

Depending on what else is on the host running placement I'd boost
processes to number of cores divided by 2, 3 or 4 and boost threads to
around 25. Or you can leave 'threads' off and it will default to 15
(at least in recent versions of mod wsgi).

With the settings a below you're basically saying that you want to
handle 3 connections at a time, which isn't great, since each of
your compute-nodes wants to talk to placement multiple times a
minute (even when nothing is happening).

Tweaking the number of processes versus the number of threads
depends on whether it appears that the processes are cpu or I/O
bound. More threads helps when things are I/O bound.


...
 WSGIProcessGroup nova-placement-api
 WSGIApplicationGroup %{GLOBAL}
 WSGIPassAuthorization On
 WSGIDaemonProcess nova-placement-api processes=3 threads=1 user=nova 
group=nova

 WSGIScriptAlias / /usr/bin/nova-placement-api
...


[snip]

Other suggestions? I'm looking at things like turning off 
scheduler_tracks_instance_changes, since affinity scheduling is not needed 
(at least so-far), but not sure that that will help with placement load 
(seems like it might, though?)


This won't impact the placement service itself.

A while back I did some experiments with trying to overload
placement by using the fake virt driver in devstack and wrote it up
at https://anticdent.org/placement-scale-fun.html

The gist was that with a properly tuned placement service it was
other parts of the system that suffered first.

--
Chris Dent   ٩◔̯◔۶   https://anticdent.org/
freenode: cdent tw: @anticdent___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] Hard fail if you try to rename an AZ with instances in it?

2018-03-27 Thread Matt Riedemann
Sylvain has had a spec up for awhile [1] about solving an old issue 
where admins can rename an AZ (via host aggregate metadata changes) 
while it has instances in it, which likely results in at least user 
confusion, but probably other issues later if you try to move those 
instances, e.g. the request spec probably points at the original AZ name 
and if that's gone (renamed) the scheduler probably pukes (would need to 
test this).


Anyway, I'm wondering if anyone relies on this behavior, or if they 
consider it a bug that the API allows admins to do this? I tend to 
consider this a bug in the API, and should just be fixed without a 
microversion. In other words, you shouldn't have to opt out of broken 
behavior using microversions.


[1] https://review.openstack.org/#/c/446446/

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] nova 17.0.1 released (queens)

2018-03-07 Thread David Medberry
Thanks for the headsup Matt.

On Wed, Mar 7, 2018 at 4:57 PM, Matt Riedemann  wrote:

> I just wanted to give a heads up to anyone thinking about upgrading to
> queens that nova has released a 17.0.1 patch release [1].
>
> There are some pretty important fixes in there that came up after the
> queens GA so if you haven't upgraded yet, I recommend going straight to
> that one instead of 17.0.0.
>
> [1] https://review.openstack.org/#/c/550620/
>
> --
>
> Thanks,
>
> Matt
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] nova 17.0.1 released (queens)

2018-03-07 Thread Matt Riedemann
I just wanted to give a heads up to anyone thinking about upgrading to 
queens that nova has released a 17.0.1 patch release [1].


There are some pretty important fixes in there that came up after the 
queens GA so if you haven't upgraded yet, I recommend going straight to 
that one instead of 17.0.0.


[1] https://review.openstack.org/#/c/550620/

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] [nova-lxd] Query regarding LXC instantiation using nova

2018-02-20 Thread James Page
Hi Amit

(re-titled thread with scoped topics)

As Matt has already referenced, [0] is a good starting place for using the
nova-lxd driver.

On Tue, 20 Feb 2018 at 11:13 Amit Kumar  wrote:

> Hello,
>
> I have a running OpenStack Ocata setup on which I am able to launch VMs.
> But I want to move to LXC instantiation instead of VMs. So, for this, I
> installed nova-compute-lxd on my compute node (Ubuntu 16.04).
> */etc/nova/nova-compute.conf* on my compute nodes was changed to contain
> the following values for *compute_driver* and* virt_type*.
>
> *[DEFAULT]*
> *compute_driver = lxd.LXDDriver*
>

You only need the above part for nova-lxd (the below snippet is for the
libvirt/lxc driver)


> *[libvirt]*
> *virt_type = lxc*
>
> After this, I restarted the nova-compute service and launched an instance,
> launch failed after some time (4-5 mins remain in spawning state) and gives
> the following error:
> [Error: No valid host was found. There are not enough hosts available.]. 
> Detailed
> nova-compute logs are attached with this e-mail.
>

Looking at your logs, it would appear a VIF plugging timeout occurred; was
your cloud functional with Libvirt/KVM before you made the switch to using
nova-lxd?  The neutron log files would be a good place to look so see what
went wrong.

Regards

James

[0] https://linuxcontainers.org/lxd/getting-started-openstack/
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] Regression bug for boot from volume with IsolatedHostsFilter

2018-02-11 Thread Matt Riedemann

I triaged this bug a couple of weeks ago:

https://bugs.launchpad.net/nova/+bug/1746483

It looks like it's been regressed since Mitaka when that filter started 
using the RequestSpec object rather than legacy filter_properties dict.


Looking a bit deeper though, it looks like this filter never worked for 
volume-backed instances. That's because this code, called from the 
compute API, never takes the image_id out of the volumes 
"volume_image_metadata":


https://github.com/openstack/nova/blob/fa6c0f9cb14f1b4ce4d9b1dbacb1743173089986/nova/utils.py#L1032

So before the regression that breaks the filter, the filter just never 
got the image.id to validate and accepted whatever host for that 
instance since it didn't know the image to tell if it was isolated or not.


I've got a functional recreate test for the bug and I think it's a 
pretty easy fix, but a question comes up about backports, which is - do 
we do two fixes for this bug, one to backport to stable which is just 
handling the missing RequestSpec.image.id attribute in the filter so the 
filter doesn't explode? Then we do another fix which actually pulls the 
image_id off the volume_image_metadata and put that properly into the 
RequestSpec so the filter actually _works_ with volume-backed instances? 
That would technically be a change in behavior for the filter, albeit 
likely the correct thing to do all along but we just never did it, and 
apparently no one ever noticed or cared (it's not a default enabled 
filter after all).


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] nova-compute automatically disabling itself?

2018-02-06 Thread Matt Riedemann

On 2/6/2018 2:14 PM, Chris Apsey wrote:
but we would rather have intermittent build failures rather than compute 
nodes falling over in the future.


Note that once a compute has a successful build, the consecutive build 
failures counter is reset. So if your limit is the default (10) and you 
have 10 failures in a row, the compute service is auto-disabled. But if 
you have say 5 failures and then a pass, it's reset to 0 failures.


Obviously if you're doing a pack-first scheduling strategy rather than 
spreading instances across the deployment, a burst of failures could 
easily disable a compute, especially if that host is overloaded like you 
saw. I'm not sure if rescheduling is helping you or not - that would be 
useful information since we consider the need to reschedule off a failed 
compute host as a bad thing. At the Forum in Boston when this idea came 
up, it was specifically for the case that operators in the room didn't 
want a bad compute to become a "black hole" in their deployment causing 
lots of reschedules until they get that one fixed.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] nova-compute automatically disabling itself?

2018-02-06 Thread Chris Apsey

All,

This was the core issue - setting 
consecutive_build_service_disable_threshold = 0 in nova.conf (on 
controllers and compute nodes) solved this.  It was being triggered by 
neutron dropping requests (and/or responses) for vif-plugging due to cpu 
usage on the neutron endpoints being pegged at 100% for too long.  We 
increased our rpc_response_timeout value and this issue appears to be 
resolved for the time being.  We can probably safely remove the 
consecutive_build_service_disable_threshold option at this point, but we 
would rather have intermittent build failures rather than compute nodes 
falling over in the future.


Slightly related, we are noticing that neutron endpoints are using 
noticeably more CPU time recently than in the past w/ a similar workload 
(we run linuxbridge w/ vxlan).  We believe this is tied to our 
application of KPTI for meltdown mitigation across the various hosts in 
our cluster (the timeline matches).  Has anyone else experienced similar 
impacts or can suggest anything to try to lessen the impact?


---
v/r

Chris Apsey
bitskr...@bitskrieg.net
https://www.bitskrieg.net

On 2018-01-31 04:47 PM, Chris Apsey wrote:

That looks promising.  I'll report back to confirm the solution.

Thanks!

---
v/r

Chris Apsey
bitskr...@bitskrieg.net
https://www.bitskrieg.net

On 2018-01-31 04:40 PM, Matt Riedemann wrote:

On 1/31/2018 3:16 PM, Chris Apsey wrote:

All,

Running in to a strange issue I haven't seen before.

Randomly, the nova-compute services on compute nodes are disabling 
themselves (as if someone ran openstack compute service set --disable 
hostX nova-compute.  When this happens, the node continues to report 
itself as 'up' - the service is just disabled.  As a result, if 
enough of these occur, we get scheduling errors due to lack of 
available resources (which makes sense).  Re-enabling them works just 
fine and they continue on as if nothing happened.  I looked through 
the logs and I can find the API calls where we re-enable the services 
(PUT /v2.1/os-services/enable), but I do not see any API calls where 
the services are getting disabled initially.


Is anyone aware of any cases where compute nodes will automatically 
disable their nova-compute service on their own, or has anyone seen 
this before and might know a root cause?  We have plenty of spare 
vcpus and RAM on each node - like less than 25% utilization (both in 
absolute terms and in terms of applied ratios).


We're seeing follow-on errors regarding rmq messages getting lost and 
vif-plug failures, but we think those are a symptom, not a cause.


Currently running pike on Xenial.

---
v/r

Chris Apsey
bitskr...@bitskrieg.net
https://www.bitskrieg.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



This is actually a feature added in Pike:

https://review.openstack.org/#/c/463597/

This came up in discussion with operators at the Forum in Boston.

The vif-plug failures are likely the reason those computes are getting 
disabled.


There is a config option "consecutive_build_service_disable_threshold"
which you can set to disable the auto-disable behavior as some have
experienced issues with it:

https://bugs.launchpad.net/nova/+bug/1742102


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] nova-compute automatically disabling itself?

2018-01-31 Thread Chris Apsey

That looks promising.  I'll report back to confirm the solution.

Thanks!

---
v/r

Chris Apsey
bitskr...@bitskrieg.net
https://www.bitskrieg.net

On 2018-01-31 04:40 PM, Matt Riedemann wrote:

On 1/31/2018 3:16 PM, Chris Apsey wrote:

All,

Running in to a strange issue I haven't seen before.

Randomly, the nova-compute services on compute nodes are disabling 
themselves (as if someone ran openstack compute service set --disable 
hostX nova-compute.  When this happens, the node continues to report 
itself as 'up' - the service is just disabled.  As a result, if enough 
of these occur, we get scheduling errors due to lack of available 
resources (which makes sense).  Re-enabling them works just fine and 
they continue on as if nothing happened.  I looked through the logs 
and I can find the API calls where we re-enable the services (PUT 
/v2.1/os-services/enable), but I do not see any API calls where the 
services are getting disabled initially.


Is anyone aware of any cases where compute nodes will automatically 
disable their nova-compute service on their own, or has anyone seen 
this before and might know a root cause?  We have plenty of spare 
vcpus and RAM on each node - like less than 25% utilization (both in 
absolute terms and in terms of applied ratios).


We're seeing follow-on errors regarding rmq messages getting lost and 
vif-plug failures, but we think those are a symptom, not a cause.


Currently running pike on Xenial.

---
v/r

Chris Apsey
bitskr...@bitskrieg.net
https://www.bitskrieg.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



This is actually a feature added in Pike:

https://review.openstack.org/#/c/463597/

This came up in discussion with operators at the Forum in Boston.

The vif-plug failures are likely the reason those computes are getting 
disabled.


There is a config option "consecutive_build_service_disable_threshold"
which you can set to disable the auto-disable behavior as some have
experienced issues with it:

https://bugs.launchpad.net/nova/+bug/1742102


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] nova-compute automatically disabling itself?

2018-01-31 Thread Eric Fried
There's [1], but I would have expected you to see error logs like [2] if
that's what you're hitting.

[1]
https://github.com/openstack/nova/blob/master/nova/conf/compute.py#L627-L645
[2]
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1714-L1716

efried

On 01/31/2018 03:16 PM, Chris Apsey wrote:
> All,
> 
> Running in to a strange issue I haven't seen before.
> 
> Randomly, the nova-compute services on compute nodes are disabling
> themselves (as if someone ran openstack compute service set --disable
> hostX nova-compute.  When this happens, the node continues to report
> itself as 'up' - the service is just disabled.  As a result, if enough
> of these occur, we get scheduling errors due to lack of available
> resources (which makes sense).  Re-enabling them works just fine and
> they continue on as if nothing happened.  I looked through the logs and
> I can find the API calls where we re-enable the services (PUT
> /v2.1/os-services/enable), but I do not see any API calls where the
> services are getting disabled initially.
> 
> Is anyone aware of any cases where compute nodes will automatically
> disable their nova-compute service on their own, or has anyone seen this
> before and might know a root cause?  We have plenty of spare vcpus and
> RAM on each node - like less than 25% utilization (both in absolute
> terms and in terms of applied ratios).
> 
> We're seeing follow-on errors regarding rmq messages getting lost and
> vif-plug failures, but we think those are a symptom, not a cause.
> 
> Currently running pike on Xenial.
> 
> ---
> v/r
> 
> Chris Apsey
> bitskr...@bitskrieg.net
> https://www.bitskrieg.net
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] nova-compute automatically disabling itself?

2018-01-31 Thread Matt Riedemann

On 1/31/2018 3:16 PM, Chris Apsey wrote:

All,

Running in to a strange issue I haven't seen before.

Randomly, the nova-compute services on compute nodes are disabling 
themselves (as if someone ran openstack compute service set --disable 
hostX nova-compute.  When this happens, the node continues to report 
itself as 'up' - the service is just disabled.  As a result, if enough 
of these occur, we get scheduling errors due to lack of available 
resources (which makes sense).  Re-enabling them works just fine and 
they continue on as if nothing happened.  I looked through the logs and 
I can find the API calls where we re-enable the services (PUT 
/v2.1/os-services/enable), but I do not see any API calls where the 
services are getting disabled initially.


Is anyone aware of any cases where compute nodes will automatically 
disable their nova-compute service on their own, or has anyone seen this 
before and might know a root cause?  We have plenty of spare vcpus and 
RAM on each node - like less than 25% utilization (both in absolute 
terms and in terms of applied ratios).


We're seeing follow-on errors regarding rmq messages getting lost and 
vif-plug failures, but we think those are a symptom, not a cause.


Currently running pike on Xenial.

---
v/r

Chris Apsey
bitskr...@bitskrieg.net
https://www.bitskrieg.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



This is actually a feature added in Pike:

https://review.openstack.org/#/c/463597/

This came up in discussion with operators at the Forum in Boston.

The vif-plug failures are likely the reason those computes are getting 
disabled.


There is a config option "consecutive_build_service_disable_threshold" 
which you can set to disable the auto-disable behavior as some have 
experienced issues with it:


https://bugs.launchpad.net/nova/+bug/1742102

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] nova-compute automatically disabling itself?

2018-01-31 Thread Chris Apsey

All,

Running in to a strange issue I haven't seen before.

Randomly, the nova-compute services on compute nodes are disabling 
themselves (as if someone ran openstack compute service set --disable 
hostX nova-compute.  When this happens, the node continues to report 
itself as 'up' - the service is just disabled.  As a result, if enough 
of these occur, we get scheduling errors due to lack of available 
resources (which makes sense).  Re-enabling them works just fine and 
they continue on as if nothing happened.  I looked through the logs and 
I can find the API calls where we re-enable the services (PUT 
/v2.1/os-services/enable), but I do not see any API calls where the 
services are getting disabled initially.


Is anyone aware of any cases where compute nodes will automatically 
disable their nova-compute service on their own, or has anyone seen this 
before and might know a root cause?  We have plenty of spare vcpus and 
RAM on each node - like less than 25% utilization (both in absolute 
terms and in terms of applied ratios).


We're seeing follow-on errors regarding rmq messages getting lost and 
vif-plug failures, but we think those are a symptom, not a cause.


Currently running pike on Xenial.

---
v/r

Chris Apsey
bitskr...@bitskrieg.net
https://www.bitskrieg.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova][neutron] Extend instance IP filter for floating IP

2018-01-24 Thread Hongbin Lu
Hi all,

Nova currently allows us to filter instances by fixed IP address(es). This 
feature is known to be useful in an operational scenario that cloud 
administrators detect abnormal traffic in an IP address and want to trace down 
to the instance that this IP address belongs to. This feature works well except 
a limitation that it only supports fixed IP address(es). In the real 
operational scenarios, cloud administrators might find that the abused IP 
address is a floating IP and want to do the filtering in the same way as fixed 
IP.

Right now, unfortunately, the experience is diverged between these two classes 
of IP address. Cloud administrators need to deploy the logic to (i) detect the 
class of IP address (fixed or floating), (ii) use nova's IP filter if the 
address is a fixed IP address, (iii) do manual filtering if the address is a 
floating IP address. I wonder if nova team is willing to accept an enhancement 
that makes the IP filter support both. Optimally, cloud administrators can 
simply pass the abused IP address to nova and nova will handle the 
heterogeneity.

In term of implementation, I expect the change is small. After this patch [1], 
Nova will query Neutron to compile a list of ports' device_ids (device_id is 
equal to the uuid of the instance to which the port binds) and use the 
device_ids to query the instances. If Neutron returns an empty list, Nova can 
give a second try to query Neutron for floating IPs. There is a RFE [2] and POC 
[3] for proposing to add a device_id attribute to the floating IP API resource. 
Nova can leverage this attribute to compile a list of instances uuids and use 
it as filter on listing the instances.

If this feature is implemented, will it benefit the general community? Finally, 
I also wonder how others are tackling a similar problem. Appreciate your 
feedback.

[1] https://review.openstack.org/#/c/525505/
[2] https://bugs.launchpad.net/neutron/+bug/1723026
[3] https://review.openstack.org/#/c/534882/

Best regards,
Hongbin
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] heads up to users of Aggregate[Core|Ram|Disk]Filter: behavior change in >= Ocata

2018-01-16 Thread melanie witt

Hello Stackers,

This is a heads up to any of you using the AggregateCoreFilter, 
AggregateRamFilter, and/or AggregateDiskFilter in the filter scheduler. 
These filters have effectively allowed operators to set overcommit 
ratios per aggregate rather than per compute node in <= Newton.


Beginning in Ocata, there is a behavior change where aggregate-based 
overcommit ratios will no longer be honored during scheduling. Instead, 
overcommit values must be set on a per compute node basis in nova.conf.


Details: as of Ocata, instead of considering all compute nodes at the 
start of scheduler filtering, an optimization has been added to query 
resource capacity from placement and prune the compute node list with 
the result *before* any filters are applied. Placement tracks resource 
capacity and usage and does *not* track aggregate metadata [1]. Because 
of this, placement cannot consider aggregate-based overcommit and will 
exclude compute nodes that do not have capacity based on per compute 
node overcommit.


How to prepare: if you have been relying on per aggregate overcommit, 
during your upgrade to Ocata, you must change to using per compute node 
overcommit ratios in order for your scheduling behavior to stay 
consistent. Otherwise, you may notice increased NoValidHost scheduling 
failures as the aggregate-based overcommit is no longer being 
considered. You can safely remove the AggregateCoreFilter, 
AggregateRamFilter, and AggregateDiskFilter from your enabled_filters 
and you do not need to replace them with any other core/ram/disk 
filters. The placement query takes care of the core/ram/disk filtering 
instead, so CoreFilter, RamFilter, and DiskFilter are redundant.


Thanks,
-melanie

[1] Placement has been a new slate for resource management and prior to 
placement, there were conflicts between the different methods for 
setting overcommit ratios that were never addressed, such as, "which 
value to take if a compute node has overcommit set AND the aggregate has 
it set? Which takes precedence?" And, "if a compute node is in more than 
one aggregate, which overcommit value should be taken?" So, the 
ambiguities were not something that was desirable to bring forward into 
placement.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova][cinder] nova support for volume multiattach

2018-01-10 Thread Matt Riedemann

Hi everyone,

I wanted to point out that the nova API patch for volume mulitattach 
support is available for review:


https://review.openstack.org/#/c/271047/

It's actually a series of changes, but that is the last one that enables 
the feature in nova. It relies on the 2.59 compute API microversion to 
be able to create a server from a mulitattach volume or to attach a 
mulitattach volume to a server.


We do not allow attaching a mulitattach volume to a shelved offloaded 
server, to be consistent with the 2.49 microversion for tagged attach.


When creating a server from a mulitattach volume, the compute API will 
check to see that all nova-compute services in the deployment have been 
upgraded to the service version that supports the mulitattach code in 
the libvirt driver.


Similarly, when attaching a mulitattach volume to an existing server 
instance, the compute API will check that the compute hosting the 
instance is new enough to support mulitattach volumes (has been 
upgraded) and it's using a virt driver that supports the capability 
(currently only the libvirt driver).


There are more details in the release note but I wanted to point out 
those restrictions.


There is also a set of tempest integration tests here:

https://review.openstack.org/#/c/266605/

Those will be tested in the nova-multiattach CI job:

https://review.openstack.org/#/c/532689/

Due to restrictions with libvirt, mulitattach support is only available 
if qemu<2.10 or libvirt>=3.10. The test environment takes this into 
account for upstream testing.


Nova will rely on Cinder microversion >=3.44, which was added in Queens, 
for safe detach of a mulitattach volume.


There is a design spec for Cinder which describes how volume mulitattach 
will be supported in Cinder and how operators will be able to configure 
volume types and Cinder policy rules for mulitattach support:


https://specs.openstack.org/openstack/cinder-specs/specs/queens/enable-multiattach.html

Several people from various companies have been pushing this hard in the 
Queens release and we're two weeks away from feature freeze. I'm on 
vacation next week also, but I have a feeling that this will get done 
finally in Queens.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-operators][nova] Verbosity of nova scheduler

2018-01-10 Thread Matt Riedemann

On 1/10/2018 1:49 PM, Alec Hothan (ahothan) wrote:
The main problem is that the nova API does not return sufficient detail 
on the reason for a NoValidHostFound and perhaps that should be fixed at 
that level. Extending the API to return a reason field which is a json 
dict that is returned by the various filters  (with more meaningful 
filter-specific info) will help tremendously (no more need to go through 
the log to find out why).


There are security implications to doing this, which is why the ultimate 
reason behind the NoValidHost hasn't been exposed to the end user. It 
could leak details about the size, topology and configuration of the 
cloud and open it up to attacks.


A better alternative would be something like an audit log (or fault) 
that only the user with the admin role could see, like when they are 
investigating a support ticket.


There might be other cases where we should do a better job of validation 
in the API before casting off to the scheduler. If we can detect common 
reasons for a scheduling (or build) failure up front in the API, we can 
return that information immediately back to the user who can act upon 
it. That, in turn, should also improve our API documentation (assuming 
it's a common failure or something that's just not clear usage-wise in 
the docs).


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-operators][nova] Verbosity of nova scheduler

2018-01-10 Thread Alec Hothan (ahothan)

Matt,

Thanks for clarifying the logs, the older release I was using did not have much 
information in the scheduler log. I’ll double check on the Newton release to 
see how they look like.
As you mention, a simple pass/fail result may not always explain why it fails 
but it is definitely good to know which filter failed.
I still think that a VM failure to launch should be related to 1 ERROR log 
rather than 1 INFO log. In the example you provide, it is fine to have race 
conditions that result in a rejection and an ERROR log.

The main problem is that the nova API does not return sufficient detail on the 
reason for a NoValidHostFound and perhaps that should be fixed at that level. 
Extending the API to return a reason field which is a json dict that is 
returned by the various filters  (with more meaningful filter-specific info) 
will help tremendously (no more need to go through the log to find out why).

Regards,

  Alec



From: Matt Riedemann <mriede...@gmail.com>
Date: Wednesday, January 10, 2018 at 11:33 AM
To: "openstack-operators@lists.openstack.org" 
<openstack-operators@lists.openstack.org>
Subject: Re: [Openstack-operators] [openstack-operators][nova] Verbosity of 
nova scheduler

On 1/10/2018 1:15 AM, Alec Hothan (ahothan) wrote:
+1 on the “no valid host found”, this one should be at the very top of
the to-be-fixed list.
Very difficult to troubleshoot filters in lab testing (let alone in
production) as there can be many of them. This will get worst with more
NFV optimization filters with so many combinations it gets really
complex to debug when a VM cannot be launched with NFV optimized
flavors. With the scheduler filtering engine, there should be a
systematic way to log the reason for not finding a valid host - at the
very least the error should display which filter failed as an ERROR (and
not as DEBUG).
We really need to avoid deploying with DEBUG log level but unfortunately
this is the only way to troubleshoot. Too many debug-level logs are for
development debug (meaning pretty much useless in any circumstances –
developer forgot to remove before commit of the feature), many errors
that should be logged as ERROR but have been logged as DEBUG only.

The scheduler logs do print which filter returned 0 hosts for a given
request at INFO level.

For example, I was debugging this NoValidHost failure in a CI run:

http://logs.openstack.org/20/531020/3/check/tempest-full/1287dde/job-output.txt.gz#_2018-01-05_17_39_54_336999

And tracing the request ID to the scheduler logs, and filtering on just
INFO level logging to simulate what you'd have for the default in
production, I found the filter that kicked it out here:

http://logs.openstack.org/20/531020/3/check/tempest-full/1287dde/controller/logs/screen-n-sch.txt.gz?level=INFO#_Jan_05_17_00_30_564582

And there is a summary like this:

Jan 05 17:00:30.565996 ubuntu-xenial-infracloud-chocolate-0001705073
nova-scheduler[8932]: INFO nova.filters [None
req-737984ae-3ae8-4506-a5f9-6655a4ebf206
tempest-ServersAdminTestJSON-787960229
tempest-ServersAdminTestJSON-787960229] Filtering removed all hosts for
the request with instance ID '8ae8dc23-8f3b-4f0f-8775-2dcc2a5fc75b'.
Filter results: ['RetryFilter: (start: 1, end: 1)',
'AvailabilityZoneFilter: (start: 1, end: 1)', 'ComputeFilter: (start: 1,
end: 1)', 'ComputeCapabilitiesFilter: (start: 1, end: 1)',
'ImagePropertiesFilter: (start: 1, end: 1)',
'ServerGroupAntiAffinityFilter: (start: 1, end: 1)',
'ServerGroupAffinityFilter: (start: 1, end: 1)', 'SameHostFilter:
(start: 1, end: 0)']

"end: 0" means that's the filter that rejected the request. Without
digging into the actual code, the descriptions for the filters is here:

https://docs.openstack.org/nova/latest/user/filter-scheduler.html

Now just why this request failed requires a bit of understanding of why
my environment looks like (this CI run is using the CachingScheduler),
so there isn't a simple "ERROR: SameHostFilter rejected request because
you're using the CachingScheduler which is racy by design and you
created the instances in separate requests". You'd have a ton of false
negative ERRORs in the logs because of valid reasons for rejected a
request based on the current state of the system, which is going to make
debugging real issues that much harder.

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org<mailto:OpenStack-operators@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-operators][nova] Verbosity of nova scheduler

2018-01-10 Thread Matt Riedemann

On 1/10/2018 1:15 AM, Alec Hothan (ahothan) wrote:
+1 on the “no valid host found”, this one should be at the very top of 
the to-be-fixed list.


Very difficult to troubleshoot filters in lab testing (let alone in 
production) as there can be many of them. This will get worst with more 
NFV optimization filters with so many combinations it gets really 
complex to debug when a VM cannot be launched with NFV optimized 
flavors. With the scheduler filtering engine, there should be a 
systematic way to log the reason for not finding a valid host - at the 
very least the error should display which filter failed as an ERROR (and 
not as DEBUG).


We really need to avoid deploying with DEBUG log level but unfortunately 
this is the only way to troubleshoot. Too many debug-level logs are for 
development debug (meaning pretty much useless in any circumstances – 
developer forgot to remove before commit of the feature), many errors 
that should be logged as ERROR but have been logged as DEBUG only.




The scheduler logs do print which filter returned 0 hosts for a given 
request at INFO level.


For example, I was debugging this NoValidHost failure in a CI run:

http://logs.openstack.org/20/531020/3/check/tempest-full/1287dde/job-output.txt.gz#_2018-01-05_17_39_54_336999

And tracing the request ID to the scheduler logs, and filtering on just 
INFO level logging to simulate what you'd have for the default in 
production, I found the filter that kicked it out here:


http://logs.openstack.org/20/531020/3/check/tempest-full/1287dde/controller/logs/screen-n-sch.txt.gz?level=INFO#_Jan_05_17_00_30_564582

And there is a summary like this:

Jan 05 17:00:30.565996 ubuntu-xenial-infracloud-chocolate-0001705073 
nova-scheduler[8932]: INFO nova.filters [None 
req-737984ae-3ae8-4506-a5f9-6655a4ebf206 
tempest-ServersAdminTestJSON-787960229 
tempest-ServersAdminTestJSON-787960229] Filtering removed all hosts for 
the request with instance ID '8ae8dc23-8f3b-4f0f-8775-2dcc2a5fc75b'. 
Filter results: ['RetryFilter: (start: 1, end: 1)', 
'AvailabilityZoneFilter: (start: 1, end: 1)', 'ComputeFilter: (start: 1, 
end: 1)', 'ComputeCapabilitiesFilter: (start: 1, end: 1)', 
'ImagePropertiesFilter: (start: 1, end: 1)', 
'ServerGroupAntiAffinityFilter: (start: 1, end: 1)', 
'ServerGroupAffinityFilter: (start: 1, end: 1)', 'SameHostFilter: 
(start: 1, end: 0)']


"end: 0" means that's the filter that rejected the request. Without 
digging into the actual code, the descriptions for the filters is here:


https://docs.openstack.org/nova/latest/user/filter-scheduler.html

Now just why this request failed requires a bit of understanding of why 
my environment looks like (this CI run is using the CachingScheduler), 
so there isn't a simple "ERROR: SameHostFilter rejected request because 
you're using the CachingScheduler which is racy by design and you 
created the instances in separate requests". You'd have a ton of false 
negative ERRORs in the logs because of valid reasons for rejected a 
request based on the current state of the system, which is going to make 
debugging real issues that much harder.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-operators][nova] Verbosity of nova scheduler

2018-01-09 Thread Alec Hothan (ahothan)
Great to see some interest on trying to improve the log!

+1 on the “no valid host found”, this one should be at the very top of the 
to-be-fixed list.
Very difficult to troubleshoot filters in lab testing (let alone in production) 
as there can be many of them. This will get worst with more NFV optimization 
filters with so many combinations it gets really complex to debug when a VM 
cannot be launched with NFV optimized flavors. With the scheduler filtering 
engine, there should be a systematic way to log the reason for not finding a 
valid host - at the very least the error should display which filter failed as 
an ERROR (and not as DEBUG).

We really need to avoid deploying with DEBUG log level but unfortunately this 
is the only way to troubleshoot. Too many debug-level logs are for development 
debug (meaning pretty much useless in any circumstances – developer forgot to 
remove before commit of the feature), many errors that should be logged as 
ERROR but have been logged as DEBUG only.

Thanks

Alec


From: Mikhail Medvedev <mihail...@gmail.com>
Date: Tuesday, January 9, 2018 at 9:16 AM
To: Piotr Bielak <piotr.bie...@corp.ovh.com>
Cc: "openstack-operators@lists.openstack.org" 
<openstack-operators@lists.openstack.org>
Subject: Re: [Openstack-operators] [openstack-operators][nova] Verbosity of 
nova scheduler



On Tue, Jan 9, 2018 at 8:18 AM, Piotr Bielak 
<piotr.bie...@corp.ovh.com<mailto:piotr.bie...@corp.ovh.com>> wrote:
> Hi! I'm conducting some research about the nova scheduler logs
> verbosity. Did you ever encounter any situation, where you didn't feel
> satisfied with the amount or quality of the logs (at any log level).
> It is known that the nova-scheduler produces hardly any logs at INFO
> level. What are your experiences with the nova-scheduler in production
> environments? What would you like to see in the logs, what isn't printed
> at the moment (maybe some expected log format)?

I am supporting a couple of OpenStack dev clouds and I found in order to solve 
most operational problems faster I need DEBUG enabled everywhere. In case of 
scheduler, "No valid host was found" was untractable without the debug messages 
(as of Mitaka). I need to know what filter ate all the hosts and what values it 
used for calculations as a minimum.

>
> Thanks for any help and advice,
> Piotr Bielak
>

---
Mikhail Medvedev (mmedvede)
IBM
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-operators][nova] Verbosity of nova scheduler

2018-01-09 Thread Jeremy Stanley
On 2018-01-09 12:38:05 -0600 (-0600), Matt Riedemann wrote:
[...]
> Also, there is a noticeable impact to performance when running the
> scheduler with debug logging enabled which is why it's not
> recommended to run with debug enabled in production.

Further, OpenStack considers security risks exposed by DEBUG level
logging to be only hardening opportunities, and as such these often
linger unfixed or don't get backported to earlier releases (in other
words, we consider running in production with DEBUG level logging to
be a risky from an information security standpoint).
-- 
Jeremy Stanley


signature.asc
Description: PGP signature
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


  1   2   3   4   5   6   >