Re: [openstack-dev] [nova] about filter the flavor

2018-11-20 Thread Matt Riedemann

On 11/19/2018 9:32 PM, Rambo wrote:
       I have an idea.Now we can't filter the special flavor according 
to the property.Can we achieve it?If we achieved this,we can filter the 
flavor according the property's key and value to filter the flavor. What 
do you think of the idea?Can you tell me more about this ?Thank you very 
much.


To be clear, you want to filter flavors by extra spec key and/or value? 
So something like:


GET /flavors?key=hw%3Acpu_policy

would return all flavors with an extra spec with key "hw:cpu_policy".

And:

GET /flavors?key=hw%3Acpu_policy=dedicated

would return all flavors with extra spec "hw:cpu_policy" with value 
"dedicated".


The query parameter semantics are probably what gets messiest about 
this. Because I could see wanting to couple the key and value together, 
but I'm not sure how you do that, because I don't think you can do this:


GET /flavors?spec=hw%3Acpu_policy=dedicated

Maybe you'd do:

GET /flavors?hw%3Acpu_policy=dedicated

The problem with that is we wouldn't be able to perform any kind of 
request schema validation of it, especially since flavor extra specs are 
not standardized.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Stein forum session notes

2018-11-19 Thread Matt Riedemann

On 11/19/2018 3:17 AM, melanie witt wrote:
- Not directly related to the session, but CERN (hallway track) and 
NeCTAR (dev ML) have both given feedback and asked that the 
policy-driven idea for handling quota for down cells be avoided. Revived 
the "propose counting quota in placement" spec to see if there's any way 
forward here


Should this be abandoned then?

https://review.openstack.org/#/c/614783/

Since there is no microversion impact to that change, it could be added 
separately as a bug fix for the down cell case if other operators want 
that functionality. But maybe we don't know what other operators want 
since no one else is at multi-cell cells v2 yet.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Can we deprecate the server backup API please?

2018-11-19 Thread Matt Riedemann

On 11/18/2018 6:51 AM, Alex Xu wrote:
Sounds make sense to me, and then we needn't fix this strange behaviour 
also https://review.openstack.org/#/c/409644/


The same discussion was had in the spec for that change:

https://review.openstack.org/#/c/511825/

Ultimately it amounted to a big "meh, let's just not fix the bug but 
also no one really cares about deprecating the API either".


The only thing deprecating the API would do is signal that it probably 
shouldn't be used. We would still support it on older microversions. If 
all anyone cares about is signalling not to use the API then deprecation 
is probably fine, but I personally don't feel too strongly about it 
either way.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron] boot server with more than one subnet selection question

2018-11-13 Thread Matt Riedemann

On 11/13/2018 4:45 AM, Chen CH Ji wrote:

Got it, this is what I am looking for .. thank you


Regarding that you can do with server create, I believe it's:

1. don't specify anything for networking, you get a port on the network 
available to you; if there are multiple networks, it's a failure and the 
user has to specify one.


2. specify a network, nova creates a port on that network

3. specify a port, nova uses that port and doesn't create anything in 
neutron


4. specify a network and fixed IP, nova creates a port on that network 
using that fixed IP.


It sounds like you want #3 or #4.

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [goals][upgrade-checkers] Week R-22 Update

2018-11-10 Thread Matt Riedemann
No major updates this week, but there is some decent progress in more 
projects getting their framework patch merged [1]. Thanks again to Rajat 
and Akhil for their persistent effort. There are more open reviews 
available for adding the framework to projects [2]. Some projects, like 
cloudkitty [3], are going beyond the initial placeholder framework check 
and adding a real upgrade check which is nice to see.


[1] https://review.openstack.org/#/q/topic:upgrade-checkers+status:merged
[2] https://review.openstack.org/#/q/topic:upgrade-checkers+status:open
[3] https://review.openstack.org/#/c/613076/

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder][neutron] Cross-cell cold migration

2018-11-06 Thread Matt Riedemann
After hacking on the PoC for awhile [1] I have finally pushed up a spec 
[2]. Behold it in all its dark glory!


[1] https://review.openstack.org/#/c/603930/
[2] https://review.openstack.org/#/c/616037/

On 8/22/2018 8:23 PM, Matt Riedemann wrote:

Hi everyone,

I have started an etherpad for cells topics at the Stein PTG [1]. The 
main issue in there right now is dealing with cross-cell cold migration 
in nova.


At a high level, I am going off these requirements:

* Cells can shard across flavors (and hardware type) so operators would 
like to move users off the old flavors/hardware (old cell) to new 
flavors in a new cell.


* There is network isolation between compute hosts in different cells, 
so no ssh'ing the disk around like we do today. But the image service is 
global to all cells.


Based on this, for the initial support for cross-cell cold migration, I 
am proposing that we leverage something like shelve offload/unshelve 
masquerading as resize. We shelve offload from the source cell and 
unshelve in the target cell. This should work for both volume-backed and 
non-volume-backed servers (we use snapshots for shelved offloaded 
non-volume-backed servers).


There are, of course, some complications. The main ones that I need help 
with right now are what happens with volumes and ports attached to the 
server. Today we detach from the source and attach at the target, but 
that's assuming the storage backend and network are available to both 
hosts involved in the move of the server. Will that be the case across 
cells? I am assuming that depends on the network topology (are routed 
networks being used?) and storage backend (routed storage?). If the 
network and/or storage backend are not available across cells, how do we 
migrate volumes and ports? Cinder has a volume migrate API for admins 
but I do not know how nova would know the proper affinity per-cell to 
migrate the volume to the proper host (cinder does not have a routed 
storage concept like routed provider networks in neutron, correct?). And 
as far as I know, there is no such thing as port migration in Neutron.


Could Placement help with the volume/port migration stuff? Neutron 
routed provider networks rely on placement aggregates to schedule the VM 
to a compute host in the same network segment as the port used to create 
the VM, however, if that segment does not span cells we are kind of 
stuck, correct?


To summarize the issues as I see them (today):

* How to deal with the targeted cell during scheduling? This is so we 
can even get out of the source cell in nova.


* How does the API deal with the same instance being in two DBs at the 
same time during the move?


* How to handle revert resize?

* How are volumes and ports handled?

I can get feedback from my company's operators based on what their 
deployment will look like for this, but that does not mean it will work 
for others, so I need as much feedback from operators, especially those 
running with multiple cells today, as possible. Thanks in advance.


[1] https://etherpad.openstack.org/p/nova-ptg-stein-cells




--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [Openstack-sigs] Dropping lazy translation support

2018-11-06 Thread Matt Riedemann

On 11/6/2018 5:24 PM, Rochelle Grober wrote:

Maybe the fastest way to get info would be to turn it off and see where the 
code barfs in a long run (to catch as many projects as possible)?


There is zero integration testing for lazy translation, so "turning it 
off and seeing what breaks" wouldn't result in anything breaking.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [goals][upgrade-checkers] FYI on "TypeError: Message objects do not support addition." errors

2018-11-06 Thread Matt Riedemann

On 11/5/2018 10:43 AM, Matt Riedemann wrote:
If you are seeing this error when implementing and running the upgrade 
check command in your project:


Traceback (most recent call last):
   File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/oslo_upgradecheck/upgradecheck.py", 
line 184, in main

     return conf.command.action_fn()
   File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/oslo_upgradecheck/upgradecheck.py", 
line 134, in check

     print(t)
   File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/prettytable.py", 
line 237, in __str__

     return self.__unicode__()
   File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/prettytable.py", 
line 243, in __unicode__

     return self.get_string()
   File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/prettytable.py", 
line 995, in get_string

     lines.append(self._stringify_header(options))
   File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/prettytable.py", 
line 1066, in _stringify_header
     bits.append(" " * lpad + self._justify(fieldname, width, 
self._align[field]) + " " * rpad)
   File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/prettytable.py", 
line 187, in _justify

     return text + excess * " "
   File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/oslo_i18n/_message.py", 
line 230, in __add__

     raise TypeError(msg)
TypeError: Message objects do not support addition.

It is due to calling oslo_i18n.enable_lazy() somewhere in the command 
import path. That should be removed from the project since lazy 
translation is not supported in openstack and as an effort was abandoned 
several years ago. It is probably still called in a lot of "big 
tent/stackforge" projects because of initially copying it from the more 
core projects. Anyway, just remove it.


I'm talking with the oslo team about deprecating that interface so 
projects don't mistakenly use it and expect great things to happen.


If anyone is still running into this, require oslo.upgradecheck>=0.1.1 
to pick up this workaround:


https://review.openstack.org/#/c/615610/

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-sigs] Dropping lazy translation support

2018-11-05 Thread Matt Riedemann

On 11/5/2018 1:36 PM, Doug Hellmann wrote:

I think the lazy stuff was all about the API responses. The log
translations worked a completely different way.


Yeah maybe. And if so, I came across this in one of the blueprints:

https://etherpad.openstack.org/p/disable-lazy-translation

Which says that because of a critical bug, the lazy translation was 
disabled in Havana to be fixed in Icehouse but I don't think that ever 
happened before IBM developers dropped it upstream, which is further 
justification for nuking this code from the various projects.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][placement] Placement requests and caching in the resource tracker

2018-11-05 Thread Matt Riedemann

On 11/5/2018 1:17 PM, Matt Riedemann wrote:
I'm thinking of a case like, resize and instance but rather than 
confirm/revert it, the user deletes the instance. That would cleanup the 
allocations from the target node but potentially not from the source node.


Well this case is at least not an issue:

https://review.openstack.org/#/c/615644/

It took me a bit to sort out how that worked but it does and I've added 
a test to confirm it.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][placement] Placement requests and caching in the resource tracker

2018-11-05 Thread Matt Riedemann

On 11/5/2018 12:28 PM, Mohammed Naser wrote:

Have you dug into any of the operations around these instances to
determine what might have gone wrong? For example, was a live migration
performed recently on these instances and if so, did it fail? How about
evacuations (rebuild from a down host).

To be honest, I have not, however, I suspect a lot of those happen from the
fact that it is possible that the service which makes the claim is not the
same one that deletes it

I'm not sure if this is something that's possible but say the compute2 makes
a claim for migrating to compute1 but something fails there, the revert happens
in compute1 but compute1 is already borked so it doesn't work

This isn't necessarily the exact case that's happening but it's a summary
of what I believe happens.



The computes don't create the resource allocations in placement though, 
the scheduler does, unless this deployment still has at least one 
compute that is 

The compute service should only be removing allocations for things like 
server delete, failed move operation (cleanup the allocations created by 
the scheduler), or a successful move operation (cleanup the allocations 
for the source node held by the migration record).


I wonder if you have migration records (from the cell DB migrations 
table) holding allocations in placement for some reason, even though the 
migration is complete. I know you have an audit script to look for 
allocations that are not held by instances, assuming those instances 
have been deleted and the allocations were leaked, but they could have 
also been held by the migration record and maybe leaked that way? 
Although if you delete the instance, the related migrations records are 
also removed (but maybe not their allocations?). I'm thinking of a case 
like, resize and instance but rather than confirm/revert it, the user 
deletes the instance. That would cleanup the allocations from the target 
node but potentially not from the source node.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Dropping lazy translation support

2018-11-05 Thread Matt Riedemann
This is a follow up to a dev ML email [1] where I noticed that some 
implementations of the upgrade-checkers goal were failing because some 
projects still use the oslo_i18n.enable_lazy() hook for lazy log message 
translation (and maybe API responses?).


The very old blueprints related to this can be found here [2][3][4].

If memory serves me correctly from my time working at IBM on this, this 
was needed to:


1. Generate logs translated in other languages.

2. Return REST API responses if the "Accept-Language" header was used 
and a suitable translation existed for that language.


#1 is a dead horse since I think at least the Ocata summit when we 
agreed to no longer translate logs since no one used them.


#2 is probably something no one knows about. I can't find end-user 
documentation about it anywhere. It's not tested and therefore I have no 
idea if it actually works anymore.


I would like to (1) deprecate the oslo_i18n.enable_lazy() function so 
new projects don't use it and (2) start removing the enable_lazy() usage 
from existing projects like keystone, glance and cinder.


Are there any users, deployments or vendor distributions that still rely 
on this feature? If so, please speak up now.


[1] 
http://lists.openstack.org/pipermail/openstack-dev/2018-November/136285.html

[2] https://blueprints.launchpad.net/oslo-incubator/+spec/i18n-messages
[3] https://blueprints.launchpad.net/nova/+spec/i18n-messages
[4] https://blueprints.launchpad.net/nova/+spec/user-locale-api

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Announcing new Focal Point for s390x libvirt/kvm Nova

2018-11-05 Thread Matt Riedemann

On 11/2/2018 3:47 AM, Andreas Scheuring wrote:

Dear Nova Community,
I want to announce the new focal point for Nova s390x libvirt/kvm.

Please welcome "Cathy Zhang” to the Nova team. She and her team will be 
responsible for maintaining the s390x libvirt/kvm Thirdparty CI  [1] and any s390x 
specific code in nova and os-brick.
I personally took a new opportunity already a few month ago but kept 
maintaining the CI as good as possible. With new manpower we can hopefully 
contribute more to the community again.

You can reach her via
* email:bjzhj...@linux.vnet.ibm.com
* IRC: Cathyz

Cathy, I wish you and your team all the best for this exciting role! I also 
want to say thank you for the last years. It was a great time, I learned a lot 
from you all, will miss it!

Cheers,

Andreas (irc: scheuran)


[1]https://wiki.openstack.org/wiki/ThirdPartySystems/IBM_zKVM_CI


Welcome Cathy.

Andreas - thanks for the update and good luck on the new position.

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] about live-resize the instance

2018-11-05 Thread Matt Riedemann

On 11/4/2018 10:17 PM, Chen CH Ji wrote:
Yes, this has been discussed for long time and If I remember this 
correctly seems S PTG also had some discussion on it (maybe public Cloud 
WG ? ), Claudiu has been pushing this for several cycles and he actually 
had some code at [1] but no additional progress there...
[1] 
https://review.openstack.org/#/q/status:abandoned+topic:bp/instance-live-resize


It's a question of priorities. It's a complicated change and low 
priority, in my opinion. We've said several times before that we'd do 
it, but there are a lot of other higher priority efforts taking the 
attention of the core team. Getting agreement on the spec is the first 
step and then the runways process should be used to deal with actual 
code reviews, but I think the spec review has stalled (I know I am 
guilty of not looking at the latest updates to the spec).


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [goals][upgrade-checkers] FYI on "TypeError: Message objects do not support addition." errors

2018-11-05 Thread Matt Riedemann
If you are seeing this error when implementing and running the upgrade 
check command in your project:


Traceback (most recent call last):
  File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/oslo_upgradecheck/upgradecheck.py", 
line 184, in main

return conf.command.action_fn()
  File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/oslo_upgradecheck/upgradecheck.py", 
line 134, in check

print(t)
  File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/prettytable.py", 
line 237, in __str__

return self.__unicode__()
  File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/prettytable.py", 
line 243, in __unicode__

return self.get_string()
  File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/prettytable.py", 
line 995, in get_string

lines.append(self._stringify_header(options))
  File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/prettytable.py", 
line 1066, in _stringify_header
bits.append(" " * lpad + self._justify(fieldname, width, 
self._align[field]) + " " * rpad)
  File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/prettytable.py", 
line 187, in _justify

return text + excess * " "
  File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/oslo_i18n/_message.py", 
line 230, in __add__

raise TypeError(msg)
TypeError: Message objects do not support addition.

It is due to calling oslo_i18n.enable_lazy() somewhere in the command 
import path. That should be removed from the project since lazy 
translation is not supported in openstack and as an effort was abandoned 
several years ago. It is probably still called in a lot of "big 
tent/stackforge" projects because of initially copying it from the more 
core projects. Anyway, just remove it.


I'm talking with the oslo team about deprecating that interface so 
projects don't mistakenly use it and expect great things to happen.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [goals][upgrade-checkers] Week R-23 Update

2018-11-05 Thread Matt Riedemann
There is not much news this week. There are several open changes which 
add the base command framework to projects [1]. Those need reviews from 
the related core teams. gmann and I have been trying to go through them 
first to make sure they are ready for core review.


There is one neutron change to note [2] which adds an extension point 
for neutron stadium projects (and ML2 plugins?) to hook in their own 
upgrade checks. Given the neutron architecture, this makes sense. My 
only worry is about making sure the interface is clearly defined, but I 
suspect this isn't the first time the neutron team has had to deal with 
something like this.


[1] https://review.openstack.org/#/q/topic:upgrade-checkers+status:open
[2] https://review.openstack.org/#/c/615196/

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][placement] Placement requests and caching in the resource tracker

2018-11-05 Thread Matt Riedemann

On 11/5/2018 5:52 AM, Chris Dent wrote:

* We need to have further discussion and investigation on
   allocations getting out of sync. Volunteers?


This is something I've already spent a lot of time on with the 
heal_allocations CLI, and have already started asking mnaser questions 
about this elsewhere in the thread.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][placement] Placement requests and caching in the resource tracker

2018-11-05 Thread Matt Riedemann

On 11/4/2018 4:22 AM, Mohammed Naser wrote:

Just for information sake, a clean state cloud which had no reported issues
over maybe a period of 2-3 months already has 4 allocations which are
incorrect and 12 allocations pointing to the wrong resource provider, so I
think this comes does to committing to either "self-healing" to fix those
issues or not.


Is this running Rocky or an older release?

Have you dug into any of the operations around these instances to 
determine what might have gone wrong? For example, was a live migration 
performed recently on these instances and if so, did it fail? How about 
evacuations (rebuild from a down host).


By "4 allocations which are incorrect" I assume that means they are 
pointing at the correct compute node resource provider but the values 
for allocated VCPU, MEMORY_MB and DISK_GB is wrong? If so, how do the 
allocations align with old/new flavors used to resize the instance? Did 
the resize fail?


Are there mixed compute versions at all, i.e. are you moving instances 
around during a rolling upgrade?


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][placement] Placement requests and caching in the resource tracker

2018-11-02 Thread Matt Riedemann

On 11/2/2018 2:22 PM, Eric Fried wrote:

Based on a (long) discussion yesterday [1] I have put up a patch [2]
whereby you can set [compute]resource_provider_association_refresh to
zero and the resource tracker will never* refresh the report client's
provider cache. Philosophically, we're removing the "healing" aspect of
the resource tracker's periodic and trusting that placement won't
diverge from whatever's in our cache. (If it does, it's because the op
hit the CLI, in which case they should SIGHUP - see below.)

*except:
- When we initially create the compute node record and bootstrap its
resource provider.
- When the virt driver's update_provider_tree makes a change,
update_from_provider_tree reflects them in the cache as well as pushing
them back to placement.
- If update_from_provider_tree fails, the cache is cleared and gets
rebuilt on the next periodic.
- If you send SIGHUP to the compute process, the cache is cleared.

This should dramatically reduce the number of calls to placement from
the compute service. Like, to nearly zero, unless something is actually
changing.

Can I get some initial feedback as to whether this is worth polishing up
into something real? (It will probably need a bp/spec if so.)

[1]
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-11-01.log.html#t2018-11-01T17:32:03
[2]https://review.openstack.org/#/c/614886/

==
Background
==
In the Queens release, our friends at CERN noticed a serious spike in
the number of requests to placement from compute nodes, even in a
stable-state cloud. Given that we were in the process of adding a ton of
infrastructure to support sharing and nested providers, this was not
unexpected. Roughly, what was previously:

  @periodic_task:
  GET/resource_providers/$compute_uuid
  GET/resource_providers/$compute_uuid/inventories

became more like:

  @periodic_task:
  # In Queens/Rocky, this would still just return the compute RP
  GET /resource_providers?in_tree=$compute_uuid
  # In Queens/Rocky, this would return nothing
  GET /resource_providers?member_of=...=MISC_SHARES...
  for each provider returned above:  # i.e. just one in Q/R
  GET/resource_providers/$compute_uuid/inventories
  GET/resource_providers/$compute_uuid/traits
  GET/resource_providers/$compute_uuid/aggregates

In a cloud the size of CERN's, the load wasn't acceptable. But at the
time, CERN worked around the problem by disabling refreshing entirely.
(The fact that this seems to have worked for them is an encouraging sign
for the proposed code change.)

We're not actually making use of most of that information, but it sets
the stage for things that we're working on in Stein and beyond, like
multiple VGPU types, bandwidth resource providers, accelerators, NUMA,
etc., so removing/reducing the amount of information we look at isn't
really an option strategically.


A few random points from the long discussion that should probably 
re-posed here for wider thought:


* There was probably a lot of discussion about why we needed to do this 
caching and stuff in the compute in the first place. What has changed 
that we no longer need to aggressively refresh the cache on every 
periodic? I thought initially it came up because people really wanted 
the compute to be fully self-healing to any external changes, including 
hot plugging resources like disk on the host to automatically reflect 
those changes in inventory. Similarly, external user/service 
interactions with the placement API which would then be automatically 
picked up by the next periodic run - is that no longer a desire, and/or 
how was the decision made previously that simply requiring a SIGHUP in 
that case wasn't sufficient/desirable.


* I believe I made the point yesterday that we should probably not 
refresh by default, and let operators opt-in to that behavior if they 
really need it, i.e. they are frequently making changes to the 
environment, potentially by some external service (I could think of 
vCenter doing this to reflect changes from vCenter back into 
nova/placement), but I don't think that should be the assumed behavior 
by most and our defaults should reflect the "normal" use case.


* I think I've noted a few times now that we don't actually use the 
provider aggregates information (yet) in the compute service. Nova host 
aggregate membership is mirror to placement since Rocky [1] but that 
happens in the API, not the the compute. The only thing I can think of 
that relied on resource provider aggregate information in the compute is 
the shared storage providers concept, but that's not supported (yet) 
[2]. So do we need to keep retrieving aggregate information when nothing 
in compute uses it yet?


* Similarly, why do we need to get traits on each periodic? The only 
in-tree virt driver I'm aware of that *reports* traits is the libvirt 
driver for CPU features [3]. Otherwise I think the idea behind getting 
the 

Re: [openstack-dev] StoryBoard Forum Session: Remaining Blockers

2018-11-02 Thread Matt Riedemann

On 11/1/2018 7:22 PM, Kendall Nelson wrote:
We've made a lot of progress in StoryBoard-land over the last couple of 
releases cleaning up bugs, fixing UI annoyances, and adding features 
that people have requested. All along we've also continued to migrate 
projects as they've become unblocked. While there are still a few 
blockers on our to-do list, we want to make sure our list is complete[1].


We have a session at the upcoming forum to collect any remaining 
blockers that you may have encountered while messing around with the dev 
storyboard[2] site or using the real storyboard interacting with 
projects that have already migrated. If you encountered any issues that  
are blocking your project from migrating, please come share them with 
with us[3]. Hope to see you there!


-Kendall (diablo_rojo) & the StoryBoard team

[1] https://storyboard.openstack.org/#!/worklist/493 



I'm not sure why/how but you seem to have an encoded URL for this [1] 
link, which when I was using it redirected me to my own dashboard in 
storyboard. The real link, 
https://storyboard.openstack.org/#!/worklist/493, does work though. Just 
FYI for anyone else having the same problem.



[2] https://storyboard-dev.openstack.org/
[2] 
https://www.openstack.org/summit/berlin-2018/summit-schedule/events/22839/storyboard-migration-the-remaining-blockers 




--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Is anyone running their own script to purge old instance_faults table entries?

2018-11-01 Thread Matt Riedemann
I came across this bug [1] in triage today and I thought this was fixed 
already [2] but either something regressed or there is more to do here.


I'm mostly just wondering, are operators already running any kind of 
script which purges old instance_faults table records before an instance 
is deleted and archived/purged? Because if so, that might be something 
we want to add as a nova-manage command.


[1] https://bugs.launchpad.net/nova/+bug/1800755
[2] https://review.openstack.org/#/c/409943/

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Zuul Queue backlogs and resource usage

2018-10-30 Thread Matt Riedemann

On 10/30/2018 11:03 AM, Clark Boylan wrote:

If you find any of this interesting and would like to help feel free to reach 
out to myself or the infra team.


I find this interesting and thanks for providing the update to the 
mailing list. That's mostly what I wanted to say.


FWIW I've still got https://review.openstack.org/#/c/606981/ and the 
related changes to drop the nova-multiattach job and enable the 
multiattach volume tests in the integrated gate, but am hung up on some 
test failures in the multi-node tempest job as a result of that (the 
nova-multiattach job is single-node). There must be something weird that 
tickles those tests in a multi-node configuration and I just haven't dug 
into it yet, but maybe one of our intrepid contributors can lend a hand 
and debug it.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] FYI: change in semantics for virt driver update_provider_tree()

2018-10-29 Thread Matt Riedemann
This is a notice to any out of tree virt driver implementors of the 
ComputeDriver.update_provider_tree() interface that they now need to set 
the allocation_ratio and reserved amounts for VCPU, MEMORY_MB and 
DISK_GB inventory from the update_provider_tree() method assuming [1] 
merges. The patch below that one in the series shows how it's 
implemented for the libvirt driver.


[1] https://review.openstack.org/#/c/613991/

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Volunteer needed to write reshaper FFU hook

2018-10-29 Thread Matt Riedemann
Given the outstanding results of my recruiting job last week [1] I have 
been tasked with recruiting one of our glorious and most talented 
contributors to work on the fast-forward-upgrade script changes needed 
for the reshape-provider-tree blueprint.


The work item is nicely detailed in the spec [2]. A few things to keep 
in mind:


1. There are currently no virt drivers which run the reshape routine. 
However, patches are up for review for libvirt [3] and xen [4]. There 
are also functional tests which exercise the ResourceTracker code with a 
faked out virt driver interface to test reshaping [5].


2. The FFU entry point will mimic the reshape routine that will happen 
on nova-compute service startup in the ResourceTracker [6].


3. The FFU script will need to run per-compute service rather than 
globally (or per cell) since it actually needs to call the virt driver's 
update_provider_tree() interface which might need to inspect the 
hardware (like for GPUs).


Given there is already a model to follow from the ResourceTracker this 
should not be too hard, the work will likely mostly be writing tests.


What do you get if you volunteer? The usual: fame, fortune, the respect 
of your peers, etc.


[1] 
http://lists.openstack.org/pipermail/openstack-dev/2018-October/136075.html
[2] 
https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/reshape-provider-tree.html#offline-upgrade-script

[3] https://review.openstack.org/#/c/599208/
[4] https://review.openstack.org/#/c/521041/
[5] 
https://github.com/openstack/nova/blob/a0eacbf7f/nova/tests/functional/test_servers.py#L1839
[6] 
https://github.com/openstack/nova/blob/a0eacbf7f/nova/compute/resource_tracker.py#L917-L940


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [goals][upgrade-checkers] Week R-24 Update

2018-10-26 Thread Matt Riedemann
There isn't much news this week except some of the base framework 
changes being proposed to projects are getting merged which is nice to see.


https://storyboard.openstack.org/#!/story/2003657

https://review.openstack.org/#/q/topic:upgrade-checkers+status:merged

And there are a lot of patches that are ready for review:

https://review.openstack.org/#/q/topic:upgrade-checkers+status:open

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][limits] Does ANYONE at all use the quota class functionality in Nova?

2018-10-25 Thread Matt Riedemann

On 10/25/2018 2:55 PM, Chris Friesen wrote:
2) The main benefit (as I see it) of the quota class API is to allow 
dynamic adjustment of the default quotas without restarting services.


I could be making this up, but I want to say back at the Pike PTG people 
were also complaining that not having an API to change this, and only do 
it via config, was not good. But if the keystone limits API solves that 
then it's a non-issue.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo][openstack-ansible][nova][placement] Owners needed for placement extraction upgrade deployment tooling

2018-10-25 Thread Matt Riedemann

Hello OSA/TripleO people,

A plan/checklist was put in place at the Stein PTG for extracting 
placement from nova [1]. The first item in that list is done in grenade 
[2], which is the devstack-based upgrade project in the integrated gate. 
That should serve as a template for the necessary upgrade steps in 
deployment projects. The related devstack change for extracted placement 
on the master branch (Stein) is [3]. Note that change has some dependencies.


The second point in the plan from the PTG was getting extracted 
placement upgrade tooling support in a deployment project, notably 
TripleO (and/or OpenStackAnsible).


Given the grenade change is done and passing tests, TripleO/OSA should 
be able to start coding up and testing an upgrade step when going from 
Rocky to Stein. My question is who can we name as an owner in either 
project to start this work? Because we really need to be starting this 
as soon as possible to flush out any issues before they are too late to 
correct in Stein.


So if we have volunteers or better yet potential patches that I'm just 
not aware of, please speak up here so we know who to contact about 
status updates and if there are any questions with the upgrade.


[1] 
http://lists.openstack.org/pipermail/openstack-dev/2018-September/134541.html

[2] https://review.openstack.org/#/c/604454/
[3] https://review.openstack.org/#/c/600162/

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova cellsv2 and DBs / down cells / quotas

2018-10-25 Thread Matt Riedemann

On 10/24/2018 6:55 PM, Sam Morrison wrote:

I’ve been thinking of some hybrid cellsv1/v2 thing where we’d still have the 
top level api cell DB but the API would only ever read from it. Nova-api would 
only write to the compute cell DBs.
Then keep the nova-cells processes just doing instance_update_at_top to keep 
the nova-cell-api db up to date.


There was also the "read from searchlight" idea [1] but that died in Boston.

[1] 
https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/list-instances-using-searchlight.html


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][limits] Does ANYONE at all use the quota class functionality in Nova?

2018-10-24 Thread Matt Riedemann

On 10/24/2018 10:10 AM, Jay Pipes wrote:
I'd like to propose deprecating this API and getting rid of this 
functionality since it conflicts with the new Keystone /limits endpoint, 
is highly coupled with RAX's turnstile middleware and I can't seem to 
find anyone who has ever used it. Deprecating this API and functionality 
would make the transition to a saner quota management system much easier 
and straightforward.


I was trying to do this before it was cool:

https://review.openstack.org/#/c/411035/

I think it was the Pike PTG in ATL where people said, "meh, let's just 
wait for unified limits from keystone and let this rot on the vine".


I'd be happy to restore and update that spec.

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [goals][upgrade-checkers] Week R-25 Update

2018-10-23 Thread Matt Riedemann

On 10/23/2018 1:41 PM, Sean McGinnis wrote:

Yeah, but part of the reason for placeholders was consistency across all of
the services. I guess if there are never going to be upgrade checks in
adjutant then I could see skipping it, but otherwise I would prefer to at
least get the framework in place.


+1

Even if there is nothing to check at this point, I think having the facility
there is a benefit for projects and scripts that are going to be consuming
these checks. Having nothing to check, but having the status check there, is
going to be better than everything needing to keep a list of which projects to
run the checks on and which not.



Sure, that works for me as well. I'm not against adding placeholder/noop 
checks knowing that nothing is immediately obvious to replace those in 
Stein, but could be done later when the opportunity arises. If it's 
debatable on a per-project basis, then I'd defer to the core team for 
the project.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [goals][upgrade-checkers] Week R-25 Update

2018-10-23 Thread Matt Riedemann

On 10/23/2018 8:09 AM, Ben Nemec wrote:
Can't we just add a noop command like we are for the services that don't 
currently need upgrade checks?


We could, but I was also hoping that for most projects we will actually 
be able to replace the noop / placeholder check with *something* useful 
in Stein.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [goals][upgrade-checkers] Week R-25 Update

2018-10-22 Thread Matt Riedemann

On 10/22/2018 4:35 PM, Adrian Turjak wrote:

The one other open question I have is about the Adjutant change [2]. I
know Adjutant is very new and I'm not sure what upgrades look like for
that project, so I don't really know how valuable adding the upgrade
check framework is to that project. Is it like Horizon where it's
mostly stateless and fed off plugins? Because we don't have an upgrade
check CLI for Horizon for that reason.

[1]
https://review.openstack.org/#/q/topic:upgrade-checkers+(status:open+OR+status:merged)
[2]https://review.openstack.org/#/c/611812/


Adjutant's codebase is also going to be a bit unstable for the next few
cycles while we refactor some internals (we're not marking it 1.0 yet).
Once the current set of ugly refactors planned for late Stein are done I
may look at building some upgrade checking, once we also work out what
out upgrade checking should look like. Probably mostly checking config
changes, database migration states, and plugin compatibility.

Adjutant already has a concept of startup checks at least, which while
not anywhere near as extensive as they should be, mostly amount to
making sure your config file looks 'mostly' sane regarding plugins
before starting up the service, and we do intend to expand on that, plus
we can reuse a large chunk of that for upgrade checking.


OK it seems there is not really any point in trying to satisfy the 
upgrade checkers goal for Adjutant in Stein then. Should we just abandon 
the change?


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Metadata API cross joining "instance_metadata" and "instance_system_metadata"

2018-10-22 Thread Matt Riedemann

On 10/22/2018 11:59 AM, Matt Riedemann wrote:
Thanks for this. Have you debugged to the point of knowing where the 
initial DB query is starting from?


Looking at history, my guess is this is the change which introduced it 
for all requests:


https://review.openstack.org/#/c/276861/


From that change, as far as I can tell, we only needed to pre-join on 
metadata because of setting the "launch_metadata" variable:


https://review.openstack.org/#/c/276861/1/nova/api/metadata/base.py@145

I don't see anything directly using system_metadata, although that one 
is sometimes tricky and could be lazy-loaded elsewhere.


I do know that starting in ocata we use system_metadata for dynamic 
vendor metadata:


https://github.com/openstack/nova/blob/stable/ocata/nova/api/metadata/vendordata_dynamic.py#L85

Added in change: https://review.openstack.org/#/c/417780/

But if you don't provide vendor data then that should not be a problem.

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Metadata API cross joining "instance_metadata" and "instance_system_metadata"

2018-10-22 Thread Matt Riedemann

On 10/22/2018 11:25 AM, Sergio A. de Carvalho Jr. wrote:

Hi,

While troubleshooting a production issue we identified that the Nova 
metadata API is fetching a lot more raw data from the database than 
seems necessary. The problem appears to be caused by the SQL query used 
to fetch instance data that joins the "instance" table with, among 
others, two metadata tables: "instance_metadata" and 
"instance_system_metadata". Below is a simplified version of this query 
(I've added the full query at the end of this message for reference):


SELECT ...
   FROM (SELECT ...
           FROM `instances`
          WHERE `instances` . `deleted` = ?
            AND `instances` . `uuid` = ?
          LIMIT ?) AS `anon_1`
   LEFT OUTER JOIN `instance_system_metadata` AS 
`instance_system_metadata_1`
     ON `anon_1` . `instances_uuid` = `instance_system_metadata_1` . 
`instance_uuid`
   LEFT OUTER JOIN (`security_group_instance_association` AS 
`security_group_instance_association_1`

                    INNER JOIN `security_groups` AS `security_groups_1`
                    ON `security_groups_1` . `id` = 
`security_group_instance_association_1` . `security_group_id`
                    AND `security_group_instance_association_1` . 
`deleted` = ?

                    AND `security_groups_1` . `deleted` = ? )
     ON `security_group_instance_association_1` . `instance_uuid` = 
`anon_1` . `instances_uuid`

    AND `anon_1` . `instances_deleted` = ?
   LEFT OUTER JOIN `security_group_rules` AS `security_group_rules_1`
     ON `security_group_rules_1` . `parent_group_id` = 
`security_groups_1` . `id`

    AND `security_group_rules_1` . `deleted` = ?
   LEFT OUTER JOIN `instance_info_caches` AS `instance_info_caches_1`
     ON `instance_info_caches_1` . `instance_uuid` = `anon_1` . 
`instances_uuid`

   LEFT OUTER JOIN `instance_extra` AS `instance_extra_1`
     ON `instance_extra_1` . `instance_uuid` = `anon_1` . `instances_uuid`
   LEFT OUTER JOIN `instance_metadata` AS `instance_metadata_1`
     ON `instance_metadata_1` . `instance_uuid` = `anon_1` . 
`instances_uuid`

    AND `instance_metadata_1` . `deleted` = ?

The instance table has a 1-to-many relationship to both 
"instance_metadata" and "instance_system_metadata" tables, so the query 
is effectively producing a cross join of both metadata tables.


To illustrate the impact of this query, I have an instance that has 2 
records in "instance_metadata" and 5 records in "instance_system_metadata":


 > select instance_uuid,`key`,value from instance_metadata where 
instance_uuid = 'a6cf4a6a-effe-4438-9b7f-d61b23117b9b';

+--+---++
| instance_uuid                        | key       | value  |
+--+---++
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property1 | value1 |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property2 | value  |
+--+---++
2 rows in set (0.61 sec)

 > select instance_uuid,`key`,valusystem_metadata where instance_uuid = 
'a6cf4a6a-effe-4438-9b7f-d61b23117b9b';

++--+
| key                    | value                                |
++--+
| image_disk_format      | qcow2                                |
| image_min_ram          | 0                                    |
| image_min_disk         | 20                                   |
| image_base_image_ref   | 39cd564f-6a29-43e2-815b-62097968486a |
| image_container_format | bare                                 |
++--+
5 rows in set (0.00 sec)

For this particular instance, the query used by the metadata API will 
fetch 10 records from the database:


+--+-+---++--+
| anon_1_instances_uuid                | instance_metadata_1_key | 
instance_metadata_1_value | instance_system_metadata_1_key | 
instance_system_metadata_1_value     |

+--+-+---++--+
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property1               | 
value1                    | image_disk_format              | qcow2   
                          |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property2               | value 
                     | image_disk_format              | qcow2   
                      |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property1               | 
value1                    | image_min_ram                  | 0   
                          |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property2               | value 
                     | image_min_ram                  | 0   

[openstack-dev] [goals][upgrade-checkers] Week R-25 Update

2018-10-19 Thread Matt Riedemann
The big news this week is we have a couple of volunteer developers from 
NEC (Akhil Jain and Rajat Dhasmana) who are pushing the base framework 
changes across a lot of the projects [1]. I'm trying to review as many 
of these as I can. The request now is for the core teams on these 
projects to review them as well so we can keep moving, and then start 
thinking about non-placeholder specific checks for each project.


The one other open question I have is about the Adjutant change [2]. I 
know Adjutant is very new and I'm not sure what upgrades look like for 
that project, so I don't really know how valuable adding the upgrade 
check framework is to that project. Is it like Horizon where it's mostly 
stateless and fed off plugins? Because we don't have an upgrade check 
CLI for Horizon for that reason.


[1] 
https://review.openstack.org/#/q/topic:upgrade-checkers+(status:open+OR+status:merged)

[2] https://review.openstack.org/#/c/611812/

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [goals][upgrade-checkers] Week R-26 Update

2018-10-19 Thread Matt Riedemann
Top posting just to try and summarize my thought that for the goal in 
Stein, I think we should focus on getting the base framework in place 
for each service project, along with any non-config (including policy) 
specific upgrade checks that make sense for each project.


As Ben mentioned, there are existing tools for validating config (I know 
BlueBox used to use the fatal_deprecations config in their CI/CD 
pipeline to know when they needed to change their deploy scripts because 
deploying new code from pre-prod would fail). Once we get the basics 
covered we can work, as a community, to figure out how best to integrate 
config validation into upgrade checks, because I don't really think we 
want to have upgrade checks that dump warnings for all deprecated 
options in addition to what is already provided by oslo.config/log. I 
have a feeling that would get so noisy that no one would ever pay 
attention to it. I'm mostly interested in the scenario that config is 
removed from code but still being set in the config file which could 
fail an upgrade on service restart (if an alias was removed for 
example), but I also tend to think those types of issues are case-by-case.


On 10/15/2018 3:29 PM, Ben Nemec wrote:



On 10/15/18 3:27 AM, Jean-Philippe Evrard wrote:

On Fri, 2018-10-12 at 17:05 -0500, Matt Riedemann wrote:

The big update this week is version 0.1.0 of oslo.upgradecheck was
released. The documentation along with usage examples can be found
here
[1]. A big thanks to Ben Nemec for getting that done since a few
projects were waiting for it.

In other updates, some changes were proposed in other projects [2].

And finally, Lance Bragstad and I had a discussion this week [3]
about
the validity of upgrade checks looking for deleted configuration
options. The main scenario I'm thinking about here is FFU where
someone
is going from Mitaka to Pike. Let's say a config option was
deprecated
in Newton and then removed in Ocata. As the operator is rolling
through
from Mitaka to Pike, they might have missed the deprecation signal
in
Newton and removal in Ocata. Does that mean we should have upgrade
checks that look at the configuration for deleted options, or
options
where the deprecated alias is removed? My thought is that if things
will
not work once they get to the target release and restart the service
code, which would definitely impact the upgrade, then checking for
those
scenarios is probably OK. If on the other hand the removed options
were
just tied to functionality that was removed and are otherwise not
causing any harm then I don't think we need a check for that. It was
noted that oslo.config has a new validation tool [4] so that would
take
care of some of this same work if run during upgrades. So I think
whether or not an upgrade check should be looking for config option
removal ultimately depends on the severity of what happens if the
manual
intervention to handle that removed option is not performed. That's
pretty broad, but these upgrade checks aren't really set in stone
for
what is applied to them. I'd like to get input from others on this,
especially operators and if they would find these types of checks
useful.

[1] https://docs.openstack.org/oslo.upgradecheck/latest/
[2] https://storyboard.openstack.org/#!/story/2003657
[3]
http://eavesdrop.openstack.org/irclogs/%23openstack-dev/%23openstack-dev.2018-10-10.log.html#t2018-10-10T15:17:17 


[4]
http://lists.openstack.org/pipermail/openstack-dev/2018-October/135688.html 





Hey,

Nice topic, thanks Matt!

TL:DR; I would rather fail explicitly for all removals, warning on all
deprecations. My concern is, by being more surgical, we'd have to
decide what's "not causing any harm" (and I think deployers/users are
best to determine what's not causing them any harm).
Also, it's probably more work to classify based on "severity".
The quick win here (for upgrade-checks) is not about being smart, but
being an exhaustive, standardized across projects, and _always used_
source of truth for upgrades, which is complemented by release notes.

Long answer:

At some point in the past, I was working full time on upgrades using
OpenStack-Ansible.

Our process was the following:
1) Read all the project's releases notes to find upgrade documentation
2) With said release notes, Adapt our deploy tools to handle the
upgrade, or/and write ourselves extra documentation+release notes for
our deployers.
3) Try the upgrade manually, fail because some release note was missing
x or y. Find root cause and retry from step 2 until success.

Here is where I see upgrade checkers improving things:
1) No need for deployment projects to parse all release notes for
configuration changes, as tooling to upgrade check would be directly
outputting things that need to change for scenario x or y that is
included in the deployment project. No need to iterate either.

2) Test real deployer use cases. The deployers using openstack-ansible
have ultimate flexibil

Re: [openstack-dev] [goals][upgrade-checkers] Call for Volunteers to work on upgrade-checkers stein goal

2018-10-17 Thread Matt Riedemann

On 10/16/2018 9:43 AM, Ghanshyam Mann wrote:

I was discussing with mriedem [1] about idea of building a volunteer team which 
can work with him on upgrade-checkers goal [2]. There are lot of work needed 
for this goal[3], few projects which does not have upgrade impact yet needs CLI 
framework with placeholder only and other projects with upgrade impact need 
actual upgrade checks implementation.

Idea is to build the volunteer team who can work with goal champion to finish 
the work early. This will help to share some work from goal champion as well 
from project side.

  - This email is request to call for volunteers (upstream developers from any 
projects) who can work closely with mriedem on upgrade-checkers goal.
  - Currently two developers has volunteered.
 1. Akhil Jain (IRC: akhil_jain, email:akhil.j...@india.nec.com)
 2. Rajat Dhasmana (IRC: whoami-rajat email:rajatdhasm...@gmail.com)
  - Anyone who would like to help on this work, feel free to reply this email 
or ping mriedem  on IRC.
  - As next step, mriedem will plan the work distribution to volunteers.


Thanks Ghanshyam.

As can be seen from the cyborg [1] and congress [2] changes posted by 
Rajat and Akhil, the initial framework changes are pretty trivial. The 
harder part is working with core teams / PTLs to determine which real 
upgrade checks should be added based on the release notes. But having 
the framework done as a baseline across all service projects is a great 
start.


[1] https://review.openstack.org/#/c/611368/
[2] https://review.openstack.org/#/c/66/

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [horizon][nova][cinder][keystone][glance][neutron][swift] Horizon feature gaps

2018-10-17 Thread Matt Riedemann

On 10/17/2018 9:24 AM, Ivan Kolodyazhny wrote:


As you may know, unfortunately, Horizon doesn't support all features 
provided by APIs. That's why we created feature gaps list [1].


I'd got a lot of great conversations with projects teams during the PTG 
and we tried to figure out what should be done prioritize these tasks. 
It's really helpful for Horizon to get feedback from other teams to 
understand what features should be implemented next.


While I'm filling launchpad with new bugs and blueprints for [1], it 
would be good to review this list again and find some volunteers to 
decrease feature gaps.


[1] https://etherpad.openstack.org/p/horizon-feature-gap

Thanks everybody for any of your contributions to Horizon.


+openstack-sigs
+openstack-operators

I've left some notes for nova. This looks very similar to the compute 
API OSC gap analysis I did [1]. Unfortunately it's hard to prioritize 
what to really work on without some user/operator feedback - maybe we 
can get the user work group involved in trying to help prioritize what 
people really want that is missing from horizon, at least for compute?


[1] https://etherpad.openstack.org/p/compute-api-microversion-gap-in-osc

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [placement] devstack, grenade, database management

2018-10-16 Thread Matt Riedemann

On 10/16/2018 5:48 AM, Chris Dent wrote:

* We need to address database creation scripts and database migrations.

   There's a general consensus that we should use alembic, and start
   things from a collapsed state. That is, we don't need to represent
   already existing migrations in the new repo, just the present-day
   structure of the tables.

   Right now the devstack code relies on a stubbed out command line
   tool at https://review.openstack.org/#/c/600161/ to create tables
   with a metadata.create_all(). This is a useful thing to have but
   doesn't follow the "db_sync" pattern set elsewhere, so I haven't
   followed through on making it pretty but can do so if people think
   it is useful. Whether we do that or not, we'll still need some
   kind of "db_sync" command. Do people want me to make a cleaned up
   "create" command?

   Ed has expressed some interest in exploring setting up alembic and
   the associated tools but that can easily be a more than one person
   job. Is anyone else interested?

It would be great to get all this stuff working sooner than later.
Without it we can't do two important tasks:

* Integration tests with the extracted placement [1].
* Hacking on extracted placement in/with devstack.


Another thing that came up today in IRC [1] which is maybe not as 
obvious from this email is what happens with the one online data 
migration we have for placement (create_incomplete_consumers). If we 
drop that online data migration from the placement repo, then ideally 
we'd have something to check it's done before people upgrade to stein 
and the extracted placement repo. There are some options there: 
placement-manage db sync could fail if there are missing consumers or we 
could simply have a placement-status upgrade check for it.




Another issue that needs some attention, but is not quite as urgent
is the desire to support other databases during the upgrade,
captured in this change

https://review.openstack.org/#/c/604028/


I have a grenade patch to test the postgresql-migrate-db.sh script now. [2]

[1] 
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-10-16.log.html#t2018-10-16T19:37:25

[2] https://review.openstack.org/#/c/611020/

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Forum Schedule - Seeking Community Review

2018-10-15 Thread Matt Riedemann

On 10/15/2018 3:01 PM, Jimmy McArthur wrote:
The Forum schedule is now up 
(https://www.openstack.org/summit/berlin-2018/summit-schedule/#track=262). 
If you see a glaring content conflict within the Forum itself, please 
let me know.


Not a conflict, but it looks like there is a duplicate for Lee talking 
about encrypted volumes:


https://www.openstack.org/summit/berlin-2018/summit-schedule/global-search?t=yarwood

Unless he just loves it so much he needs to talk about it twice.

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [goals][upgrade-checkers] Week R-26 Update

2018-10-12 Thread Matt Riedemann
The big update this week is version 0.1.0 of oslo.upgradecheck was 
released. The documentation along with usage examples can be found here 
[1]. A big thanks to Ben Nemec for getting that done since a few 
projects were waiting for it.


In other updates, some changes were proposed in other projects [2].

And finally, Lance Bragstad and I had a discussion this week [3] about 
the validity of upgrade checks looking for deleted configuration 
options. The main scenario I'm thinking about here is FFU where someone 
is going from Mitaka to Pike. Let's say a config option was deprecated 
in Newton and then removed in Ocata. As the operator is rolling through 
from Mitaka to Pike, they might have missed the deprecation signal in 
Newton and removal in Ocata. Does that mean we should have upgrade 
checks that look at the configuration for deleted options, or options 
where the deprecated alias is removed? My thought is that if things will 
not work once they get to the target release and restart the service 
code, which would definitely impact the upgrade, then checking for those 
scenarios is probably OK. If on the other hand the removed options were 
just tied to functionality that was removed and are otherwise not 
causing any harm then I don't think we need a check for that. It was 
noted that oslo.config has a new validation tool [4] so that would take 
care of some of this same work if run during upgrades. So I think 
whether or not an upgrade check should be looking for config option 
removal ultimately depends on the severity of what happens if the manual 
intervention to handle that removed option is not performed. That's 
pretty broad, but these upgrade checks aren't really set in stone for 
what is applied to them. I'd like to get input from others on this, 
especially operators and if they would find these types of checks useful.


[1] https://docs.openstack.org/oslo.upgradecheck/latest/
[2] https://storyboard.openstack.org/#!/story/2003657
[3] 
http://eavesdrop.openstack.org/irclogs/%23openstack-dev/%23openstack-dev.2018-10-10.log.html#t2018-10-10T15:17:17
[4] 
http://lists.openstack.org/pipermail/openstack-dev/2018-October/135688.html


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-10 Thread Matt Riedemann

On 10/10/2018 7:46 AM, Jay Pipes wrote:

2) in the old microversions change the blind allocation copy to gather
every resource from a nested source RPs too and try to allocate that
from the destination root RP. In nested allocation cases putting this
allocation to placement will fail and nova will fail the migration /
evacuation. However it will succeed if the server does not need nested
allocation neither on the source nor on the destination host (a.k.a the
legacy case). Or if the server has nested allocation on the source host
but does not need nested allocation on the destination host (for
example the dest host does not have nested RP tree yet).


I disagree on this. I'd rather just do a simple check for >1 provider in 
the allocations on the source and if True, fail hard.


The reverse (going from a non-nested source to a nested destination) 
will hard fail anyway on the destination because the POST /allocations 
won't work due to capacity exceeded (or failure to have any inventory at 
all for certain resource classes on the destination's root compute node).


I agree with Jay here. If we know the source has allocations on >1 
provider, just fail fast, why even walk the tree and try to claim those 
against the destination - the nested providers aren't going to be the 
same UUIDs on the destination, *and* trying to squash all of the source 
nested allocations into the single destination root provider and hope it 
works is super hacky and I don't think we should attempt that. Just fail 
if being forced and nested allocations exist on the source.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-10 Thread Matt Riedemann

On 10/9/2018 10:08 AM, Balázs Gibizer wrote:

Question for you as well: if we remove (or change) the force flag in a
new microversion then how should the old microversions behave when
nested allocations would be required?


Fail fast if we can detect we have nested. We don't support forcing 
those types of servers.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Rocky RC time regression analysis

2018-10-09 Thread Matt Riedemann

On 10/5/2018 6:59 PM, melanie witt wrote:
5) when live migration fails due to a internal error rollback is not 
handled correctly https://bugs.launchpad.net/nova/+bug/1788014


- Bug was reported on 2018-08-20
- The change that caused the regression landed on 2018-07-26, FF day 
https://review.openstack.org/434870

- Unrelated to a blueprint, the regression was part of a bug fix
- Was found because sean-k-mooney was doing live migrations and found 
that when a LM failed because of a QEMU internal error, the VM remained 
ACTIVE but the VM no longer had network connectivity.

- Question: why wasn't this caught earlier?
- Answer: We would need a live migration job scenario that intentionally 
initiates and fails a live migration, then verify network connectivity 
after the rollback occurs.

- Question: can we add something like that?


Not in Tempest, no, but we could run something in the 
nova-live-migration job since that executes via its own script. We could 
hack something in like what we have proposed for testing evacuate:


https://review.openstack.org/#/c/602174/

The trick is figuring out how to introduce a fault in the destination 
host without taking down the service, because if the compute service is 
down we won't schedule to it.




6) nova-manage db online_data_migrations hangs on instances with no host 
set https://bugs.launchpad.net/nova/+bug/1788115


- Bug was reported on 2018-08-21
- The patch that introduced the bug landed on 2018-05-30 
https://review.openstack.org/567878

- Unrelated to a blueprint, the regression was part of a bug fix
- Question: why wasn't this caught earlier?
- Answer: To hit the bug, you had to have had instances with no host set 
(that failed to schedule) in your database during an upgrade. This does 
not happen during the grenade job
- Question: could we add anything to the grenade job that would leave 
some instances with no host set to cover cases like this?


Probably - I'd think creating a server on the old side with some 
parameters that we know won't schedule would do it, maybe requesting an 
AZ that doesn't exist, or some other kind of scheduler hint that we know 
won't work so we get a NoValidHost. However, online_data_migrations in 
grenade probably don't run on the cell0 database, so I'm not sure we 
would have caught that case.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron][stable] Stable Core Team Update

2018-10-09 Thread Matt Riedemann

On 10/9/2018 11:08 AM, Miguel Lavalle wrote:
Since it has been more than a week since this nomination was posted and 
we have received only positive feedback, can we move ahead and add 
Bernard Cafarelli to Neutron Stable core team?


Done:

https://review.openstack.org/#/admin/groups/539,members

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] [nova] Do we need a "force" parameter in cinder "re-image" API?

2018-10-09 Thread Matt Riedemann

On 10/9/2018 8:04 AM, Erlon Cruz wrote:
If you are planning to re-image an image on a bootable volume then yes 
you should use a force parameter. I have lost the discussion about this 
on PTG. What is the main use cases? This seems to me something that 
could be leveraged with the current revert-to-snapshot API, which would 
be even better. The flow would be:


1 - create a volume from image
2 - create an snapshot
3 - do whatever you wan't
4 - revert the snapshot

Would that help in your the use cases?


As the spec mentions, this is for enabling re-imaging the root volume on 
a server when nova rebuilds the server. That is not allowed today 
because the compute service can't re-image the root volume. We don't 
want to jump through a bunch of gross alternative hoops to create a new 
root volume with the new image and swap them out (the reasons why are in 
the spec, and have been discussed previously in the ML). So nova is 
asking cinder to provide an API to change the image in a volume which 
the nova rebuild operation will use to re-image the root volume on a 
volume-backed server. I don't know if revert-to-snapshot solves that use 
case, but it doesn't sound like it. With the nova rebuild API, the user 
provides an image reference and that is used to re-image the root disk 
on the server. So it might not be a snapshot, it could be something new.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [stable][octavia] Backport patch adding new configuration options

2018-10-08 Thread Matt Riedemann

On 10/8/2018 11:05 AM, Carlos Goncalves wrote:
The Octavia team merged a patch in master [1] that fixed an issue where 
load balancers could be deleted whenever queue_event_streamer driver is 
enabled and RabbitMQ goes down [2].


As this is a critical bug, we would like to backport as much back as 
possible. The question is whether these backports comply with the stable 
policy because it adds two new configuration options and deprecates one. 
The patch was prepared so that the deprecated option has precedence if 
set over the other two.


Reading the review guidelines [3], I only see "Incompatible config file 
changes" as relevant, but the patch doesn't seem to go against that. We 
had a patch that added a new config option backported to Queens that 
raised some concern, so we'd like to be on the safe side this time ;-)


We'd appreciate guidance to whether such backports are acceptable or not.



Well, a few things:

* I would have introduced the new config options as part of the bug fix 
but *not* deprecated the existing option in the same change but rather 
as a follow up. Then the new options, which do nothing by default (?), 
could be backported and the deprecation would remain on master.


* The release note mentions the new options as a feature, but that's not 
really correct is it? They are for fixing a bug, not new feature 
functionality as much.


In general, as long as the new options don't introduce new behavior by 
default for existing configuration (as you said, the existing option 
takes precedence if set), and don't require configuration then it should 
be OK to backport those new options. But the sticky parts here are (1) 
deprecating an option on stable (we shouldn't do that) and (2) the 
release note mentioning a feature.


What I'd probably do is (1) change the 'feature' release note to a 
'fixes' release note on master and then (2) backport the change but (a) 
drop the deprecation and (b) fix the release note in the backport to not 
call out a feature (since it's not a feature I don't think?) - and just 
make it clear with a note in the backport commit message why the 
backport is different from the original change.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [goals][upgrade-checkers] Oslo library status

2018-10-08 Thread Matt Riedemann

On 10/7/2018 4:10 AM, Slawomir Kaplonski wrote:

I start working on „neutron-status upgrade check” tool with noop operation for 
now. Patch is in [1]
I started using this new oslo_upgradecheck library in version 0.0.1.dev15 which 
is available on pypi.org but I see that in master there are some changes 
already (like shorted names of base classes).
So my question here is: should I just wait a bit more for kind of „stable” 
version of this lib and then push neutron patch to review (do You have any eta 
for that?), or maybe we shouldn’t rely on this oslo library in this release and 
implement all on our own, like it is done currently in nova?

[1]https://review.openstack.org/#/c/608444/


I would wait. I think there are just a couple of changes we need to get 
into the library (one of which changes the interface) and then we can do 
a release. Sean McGinnis is waiting on the release for Cinder as well.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][stable] Preparing for ocata-em (extended maintenance)

2018-10-05 Thread Matt Riedemann

The ocata-em tag request is up for review:

https://review.openstack.org/#/c/608296/

On 9/28/2018 11:21 AM, Matt Riedemann wrote:
Per the other thread on this [1] I've created an etherpad [2] to track 
what needs to happen to get nova's stable/ocata branch ready for 
Extended Maintenance [3] which means we need to flush our existing Ocata 
backports that we want in the final Ocata release before tagging the 
branch as ocata-em, after which point we won't do releases from that 
branch anymore.


The etherpad lists each open ocata backport along with any of its 
related backports on newer branches like pike/queens/etc. Since we need 
the backports to go in order, we need to review and merge the changes on 
the newer branches first. With the state of the gate lately, we really 
can't sit on our hands here because it will probably take up to a week 
just to merge all of the changes for each branch.


Once the Ocata backports are flushed through, we'll cut the final 
release and tag the branch as being in extended maintenance.


Do we want to coordinate a review day next week for the 
nova-stable-maint core team, like Tuesday, or just trust that you all 
know who you are and will help out as necessary in getting these reviews 
done? Non-stable cores are also welcome to help review here to make sure 
we're not missing something, which is also a good way to get noticed as 
caring about stable branches and eventually get you on the stable maint 
core team.


[1] 
http://lists.openstack.org/pipermail/openstack-dev/2018-September/thread.html#134810 


[2] https://etherpad.openstack.org/p/nova-ocata-em
[3] 
https://docs.openstack.org/project-team-guide/stable-branches.html#extended-maintenance



--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [goals][python3][heat][stable] how should we proceed with ocata branch

2018-10-03 Thread Matt Riedemann

On 10/3/2018 11:21 AM, Zane Bitter wrote:

That patch is the only thing blocking the cleanup patch in
project-config, so I would like to get a definitive answer about what to
do. Should we close the branch, or does someone want to try to fix
things up?


I think we agreed on closing the branch, and Rico was looking into the 
procedure for how to actually do that.



Doug

[1]https://review.openstack.org/#/c/597272/


I'm assuming heat-agents is a service, not a library, since it doesn't 
show up in upper-constraints.


It's a guest agent, so neither :)

Based on that, does heat itself plan on putting its stable/ocata 
branch into extended maintenance mode and if 


Wearing my Red Hat hat, I would be happy to EOL it. But wearing my 
upstream hat, I'm happy to keep maintaining it, and I was not proposing 
that we EOL heat's stable/ocata as well.


so, does that mean EOLing the heat-agents stable/ocata branch could 
cause problems for the heat stable/ocata branch? In other words, will 
it be reasonable to run CI for stable/ocata heat changes against a 
heat-agents ocata-eol tag?


I don't think that's a problem. The guest agents rarely change, and I 
don't think there's ever been a patch backported by 4 releases.


OK, cool, sounds like killing the heat-agent ocata branch is the thing 
to do then.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] Proposing Gorka Eguileor to Stable Core ...

2018-10-03 Thread Matt Riedemann

On 10/3/2018 9:45 AM, Jay S. Bryant wrote:

Team,

We had discussed the possibility of adding Gorka to the stable core team 
during the PTG.  He does review a number of our backport patches and is 
active in that area.


If there are no objections in the next week I will add him to the list.

Thanks!

Jay (jungleboyj)


+1 from me in the stable-maint-core peanut gallery.

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] We should fail to boot a server if PF passthrough is requested and we don't honor it, right?

2018-10-03 Thread Matt Riedemann
I came across [1] today while triaging a bug [2]. Unless I'm mistaken, 
the user has requested SR-IOV PF passthrough for their server and for 
whatever reason we can't find the PCI device for the PF passthrough port 
so we don't reflect the actual device MAC address on the port. Is that 
worth stopping the server create? Or is logging an ERROR enough here?


The reason being we get an IndexError here [3]. Ultimately if we found a 
PCI device but it's not whitelisted, we'll raise an exception anyway 
when building the port binding profile [4].


So is it reasonable to just raise PciDeviceNotFound whenever we can't 
find a PCI device on a compute host given a pci_request_id? In other 
words, it seems something failed earlier during scheduling and/or the 
PCI device resource claim if we get this far and things are still messed up.


[1] 
https://github.com/openstack/nova/blob/237ced4737a28728408eb30c3d20c6b2c13b4a8d/nova/network/neutronv2/api.py#L1426

[2] https://bugs.launchpad.net/nova/+bug/1795064
[3] 
https://github.com/openstack/nova/blob/237ced4737a28728408eb30c3d20c6b2c13b4a8d/nova/network/neutronv2/api.py#L1404
[4] 
https://github.com/openstack/nova/blob/237ced4737a28728408eb30c3d20c6b2c13b4a8d/nova/network/neutronv2/api.py#L1393


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [goals][python3][heat][stable] how should we proceed with ocata branch

2018-10-03 Thread Matt Riedemann

On 10/3/2018 7:58 AM, Doug Hellmann wrote:

There is one more patch to import the zuul configuration for the
heat-agents repository's stable/ocata branch. That branch is apparently
broken, and Zane suggested on the review [1] that we abandon the patch
and close the branch.

That patch is the only thing blocking the cleanup patch in
project-config, so I would like to get a definitive answer about what to
do. Should we close the branch, or does someone want to try to fix
things up?

Doug

[1]https://review.openstack.org/#/c/597272/


I'm assuming heat-agents is a service, not a library, since it doesn't 
show up in upper-constraints. Based on that, does heat itself plan on 
putting its stable/ocata branch into extended maintenance mode and if 
so, does that mean EOLing the heat-agents stable/ocata branch could 
cause problems for the heat stable/ocata branch? In other words, will it 
be reasonable to run CI for stable/ocata heat changes against a 
heat-agents ocata-eol tag?


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron][stable] Stable Core Team Update

2018-10-02 Thread Matt Riedemann

On 10/2/2018 10:41 AM, Miguel Lavalle wrote:

Hi Stable Team,

I want to nominate Bernard Cafarrelli as a stable core reviewer for 
Neutron and related projects. Bernard has been increasing the number of 
stable reviews he is doing for the project [1]. Besides that, he is a 
stable maintainer downstream for his employer (Red Hat), so he can bring 
that valuable experience to the Neutron stable team.


Thanks and regards

Miguel

[1] 
https://review.openstack.org/#/q/(project:openstack/neutron+OR+openstack/networking-sfc+OR+project:openstack/networking-ovn)++branch:%255Estable/.*+reviewedby:%22Bernard+Cafarelli+%253Cbcafarel%2540redhat.com%253E%22 



+1 from me.

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder][qa] Should we enable multiattach in tempest-full?

2018-10-01 Thread Matt Riedemann

On 10/1/2018 8:37 AM, Ghanshyam Mann wrote:

+1 on adding multiattach on integrated job. It is always good to cover more 
features in integrate-gate instead of separate jobs. These tests does not take 
much time, it should be ok to add in tempest-full [1]. We should make only 
really slow test as 'slow' otherwise it should be fine to run in tempest-full.

I thought adding tempest-slow on cinder was merged but it is not[2]

[1]http://logs.openstack.org/80/606880/2/check/nova-multiattach/7f8681e/job-output.txt.gz#_2018-10-01_10_12_55_482653
[2]https://review.openstack.org/#/c/591354/2


Actually it will be enabled in both tempest-full and tempest-slow, 
because there is also a multiattach test marked as 'slow': 
TestMultiAttachVolumeSwap.


I'll push patches today.

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Placement extraction update

2018-09-30 Thread Matt Riedemann

On 9/30/2018 11:02 AM, Matt Riedemann wrote:
Maybe that's some conditional branch logic we can hack into 
devstack-gate [7] like we do for neutron? [8]


I'm hoping this works:

https://review.openstack.org/#/c/606853/

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Placement extraction update

2018-09-30 Thread Matt Riedemann
I finally got a passing neutron-grenade run in change [1]. That's the 
grenade change which populates the placement DB in Stein from the 
placement-related table contents of the nova_api DB from Rocky. It also 
writes out the placement.conf file for Stein before starting the Stein 
services.


As a result, I'm +2 on Dan's mysql-migrate-db.sh script [2].

The grenade change is also dependent on three other changes for neutron 
[3], ironic [4] and heat [5] grenade jobs to require the 
openstack/placement project when zuul/devstack-gate clones its required 
projects before running grenade.sh.


Those are just the related project grenade jobs that are hit as part of 
the grenade patch. There could be others I'm missing, which means 
projects might need to update their grenade job definitions after the 
grenade change merges. It looks like that could be quite a few projects 
[6]. If the infra/QA teams have a better idea of how to require 
openstack/placement in stein+ only, I'm all ears. Maybe that's some 
conditional branch logic we can hack into devstack-gate [7] like we do 
for neutron? [8]


[1] https://review.openstack.org/#/c/604454/
[2] https://review.openstack.org/#/c/603234/
[3] https://review.openstack.org/#/c/604458/
[4] https://review.openstack.org/#/c/606850/
[5] https://review.openstack.org/#/c/606851/
[6] 
http://codesearch.openstack.org/?q=export%20PROJECTS%3D%22openstack-dev%5C%2Fgrenade%20%5C%24PROJECTS%22=nope==
[7] 
https://github.com/openstack-infra/devstack-gate/blob/95fa4343104eafa655375cce3546d27139211d13/devstack-vm-gate-wrap.sh#L138
[8] 
https://github.com/openstack-infra/devstack-gate/blob/95fa4343104eafa655375cce3546d27139211d13/devstack-vm-gate-wrap.sh#L195


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][python-novaclient] A Test issue in python-novaclient.

2018-09-30 Thread Matt Riedemann

On 9/29/2018 10:01 PM, Tao Li wrote:
I found this test is added about ten days ago in this patch 
https://review.openstack.org/#/c/599276/,


I checked it and don’t know why it failed. I think my commit shouldn’t 
cause this issue. So do you any suggestion to me?




Yes it must be an intermittent race bug introduced by that change for 
the 2.66 microversion. Since it deals with filtering based on time, we 
might not have a time window that is big enough (we expect to get a 
result of changes < $before but are getting <= $before).


http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22%7C%20%20%20%20%20testtools.matchers._impl.MismatchError%3A%20%5B'create'%5D%20!%3D%20%5B'create'%2C%20'stop'%5D%5C%22%20AND%20tags%3A%5C%22console%5C%22=7d

Please report a bug against python-novaclient.

The 2.66 test is based on a similar changes_since test, so we should see 
why they are behaving differently.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][cinder][qa] Should we enable multiattach in tempest-full?

2018-09-29 Thread Matt Riedemann
Nova, cinder and tempest run the nova-multiattach job in their check and 
gate queues. The job was added in Queens and was a specific job because 
we had to change the ubuntu cloud archive we used in Queens to get 
multiattach working. Since Rocky, devstack defaults to a version of the 
UCA that works for multiattach, so there isn't really anything 
preventing us from running the tempest multiattach tests in the 
integrated gate. The job tries to be as minimal as possible by only 
running tempest.api.compute.* tests, but it still means spinning up a 
new node and devstack for testing.


Given the state of the gate recently, I'm thinking it would be good if 
we dropped the nova-multiattach job in Stein and just enable the 
multiattach tests in one of the other integrated gate jobs. I initially 
was just going to enable it in the nova-next job, but we don't run that 
on cinder or tempest changes. I'm not sure if tempest-full is a good 
place for this though since that job already runs a lot of tests and has 
been timing out a lot lately [1][2].


The tempest-slow job is another option, but cinder doesn't currently run 
that job (it probably should since it runs volume-related tests, 
including the only tempest tests that use encrypted volumes).


Are there other ideas/options for enabling multiattach in another job 
that nova/cinder/tempest already use so we can drop the now mostly 
redundant nova-multiattach job?


[1] http://status.openstack.org/elastic-recheck/#1686542
[2] http://status.openstack.org/elastic-recheck/#1783405

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [goals][upgrade-checkers] Week R-28 Update

2018-09-28 Thread Matt Riedemann
There isn't really anything to report this week. There are no new 
changes up for review that I'm aware of. If your team has posted changes 
for your project, please update the related task in the story [1].


I'm also waiting for some feedback from glance-minded people about [2].

[1] https://storyboard.openstack.org/#!/story/2003657
[2] 
http://lists.openstack.org/pipermail/openstack-dev/2018-September/135025.html


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Zuul job backlog

2018-09-28 Thread Matt Riedemann

On 9/28/2018 3:12 PM, Clark Boylan wrote:

I was asked to write a followup to this as the long Zuul queues have persisted 
through this week. Largely because the situation from last week hasn't changed 
much. We were down the upgraded cloud region while we worked around a network 
configuration bug, then once that was addressed we ran into neutron port 
assignment and deletion issues. We think these are both fixed and we are 
running in this region again as of today.

Other good news is our classification rate is up significantly. We can use that 
information to go through the top identified gate bugs:

Network Connectivity issues to test nodes [2]. This is the current top of the 
list, but I think its impact is relatively small. What is happening here is 
jobs fail to connect to their test nodes early in the pre-run playbook and then 
fail. Zuul will rerun these jobs for us because they failed in the pre-run 
step. Prior to zuulv3 we had nodepool run a ready script before marking test 
nodes as ready, this script would've caught and filtered out these broken 
network nodes early. We now notice them late during the pre-run of a job.

Pip fails to find distribution for package [3]. Earlier in the week we had the in 
region mirror fail in two different regions for unrelated errors. These mirrors 
were fixed and the only other hits for this bug come from Ara which tried to 
install the 'black' package on python3.5 but this package requires python>=3.6.

yum, no more mirrors to try [4]. At first glance this appears to be an 
infrastructure issue because the mirror isn't serving content to yum. On 
further investigation it turned out to be a DNS resolution issue caused by the 
installation of designate in the tripleo jobs. Tripleo is aware of this issue 
and working to correct it.

Stackviz failing on py3 [5]. This is a real bug in stackviz caused by subunit 
data being binary not utf8 encoded strings. I've written a fix for this problem 
athttps://review.openstack.org/606184, but in doing so found that this was a 
known issue back in March and there was already a proposed 
fix,https://review.openstack.org/#/c/555388/3. It would be helpful if the QA 
team could care for this project and get a fix in. Otherwise, we should 
consider disabling stackviz on our tempest jobs (though the output from 
stackviz is often useful).

There are other bugs being tracked by e-r. Some are bugs in the openstack 
software and I'm sure some are also bugs in the infrastructure. I have not yet 
had the time to work through the others though. It would be helpful if project 
teams could prioritize the debugging and fixing of these issues though.

[2]http://status.openstack.org/elastic-recheck/gate.html#1793370
[3]http://status.openstack.org/elastic-recheck/gate.html#1449136
[4]http://status.openstack.org/elastic-recheck/gate.html#1708704
[5]http://status.openstack.org/elastic-recheck/gate.html#1758054


Thanks for the update Clark.

Another thing this week is the logstash indexing is behind by at least 
half a day. That's because workers were hitting OOM errors due to giant 
screen log files that aren't formatted properly so that we only index 
INFO+ level logs, and were instead trying to index the entire file, 
which some of which are 33MB *compressed*. So indexing of those 
identified problematic screen logs has been disabled:


https://review.openstack.org/#/c/606197/

I've reported bugs against each related project.

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Are we ready to put stable/ocata into extended maintenance mode?

2018-09-28 Thread Matt Riedemann

On 9/21/2018 9:08 AM, Elõd Illés wrote:

Hi,

Here is an etherpad with the teams that have stable:follow-policy tag on 
their repos:


https://etherpad.openstack.org/p/ocata-final-release-before-em

On the links you can find reports about the open and unreleased changes, 
that could be a useful input for the before-EM/final release.
Please have a look at the report (and review the open patches if there 
are) so that a release can be made if necessary.


Thanks,

Előd


I've added nova's ocata-em tracking etherpad to the list.

https://etherpad.openstack.org/p/nova-ocata-em

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][stable] Preparing for ocata-em (extended maintenance)

2018-09-28 Thread Matt Riedemann
Per the other thread on this [1] I've created an etherpad [2] to track 
what needs to happen to get nova's stable/ocata branch ready for 
Extended Maintenance [3] which means we need to flush our existing Ocata 
backports that we want in the final Ocata release before tagging the 
branch as ocata-em, after which point we won't do releases from that 
branch anymore.


The etherpad lists each open ocata backport along with any of its 
related backports on newer branches like pike/queens/etc. Since we need 
the backports to go in order, we need to review and merge the changes on 
the newer branches first. With the state of the gate lately, we really 
can't sit on our hands here because it will probably take up to a week 
just to merge all of the changes for each branch.


Once the Ocata backports are flushed through, we'll cut the final 
release and tag the branch as being in extended maintenance.


Do we want to coordinate a review day next week for the 
nova-stable-maint core team, like Tuesday, or just trust that you all 
know who you are and will help out as necessary in getting these reviews 
done? Non-stable cores are also welcome to help review here to make sure 
we're not missing something, which is also a good way to get noticed as 
caring about stable branches and eventually get you on the stable maint 
core team.


[1] 
http://lists.openstack.org/pipermail/openstack-dev/2018-September/thread.html#134810

[2] https://etherpad.openstack.org/p/nova-ocata-em
[3] 
https://docs.openstack.org/project-team-guide/stable-branches.html#extended-maintenance


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [ironic] [nova] [tripleo] Deprecation of Nova's integration with Ironic Capabilities and ComputeCapabilitiesFilter

2018-09-27 Thread Matt Riedemann

On 9/27/2018 3:02 PM, Jay Pipes wrote:
A great example of this would be the proposed "deploy template" from 
[2]. This is nothing more than abusing the placement traits API in order 
to allow passthrough of instance configuration data from the nova flavor 
extra spec directly into the nodes.instance_info field in the Ironic 
database. It's a hack that is abusing the entire concept of the 
placement traits concept, IMHO.


We should have a way *in Nova* of allowing instance configuration 
key/value information to be passed through to the virt driver's spawn() 
method, much the same way we provide for user_data that gets exposed 
after boot to the guest instance via configdrive or the metadata service 
API. What this deploy template thing is is just a hack to get around the 
fact that nova doesn't have a basic way of passing through some collated 
instance configuration key/value information, which is a darn shame and 
I'm really kind of annoyed with myself for not noticing this sooner. :(


We talked about this in Dublin through right? We said a good thing to do 
would be to have some kind of template/profile/config/whatever stored 
off in glare where schema could be registered on that thing, and then 
you pass a handle (ID reference) to that to nova when creating the 
(baremetal) server, nova pulls it down from glare and hands it off to 
the virt driver. It's just that no one is doing that work.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-sigs] [goals][tc][ptl][uc] starting goal selection for T series

2018-09-27 Thread Matt Riedemann

On 9/27/2018 2:33 PM, Fox, Kevin M wrote:

If the project plugins were maintained by the OSC project still, maybe there 
would be incentive for the various other projects to join the OSC project, 
scaling things up?


Sure, I don't really care about governance. But I also don't really care 
about all of the non-compute API things in OSC either.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-sigs] [goals][tc][ptl][uc] starting goal selection for T series

2018-09-27 Thread Matt Riedemann

On 9/27/2018 10:13 AM, Dean Troyer wrote:

On Thu, Sep 27, 2018 at 9:10 AM, Doug Hellmann  wrote:

Monty Taylor  writes:

Main difference is making sure these new deconstructed plugin teams
understand the client support lifecycle - which is that we don't drop
support for old versions of services in OSC (or SDK). It's a shift from
the support lifecycle and POV of python-*client, but it's important and
we just need to all be on the same page.

That sounds like a reason to keep the governance of the libraries under
the client tool project.

Hmmm... I think that may address a big chunk of my reservations about
being able to maintain consistency and user experience in a fully
split-OSC world.

dt


My biggest worry with splitting everything out into plugins with new 
core teams, even with python-openstackclient-core as a superset, is that 
those core teams will all start approving things that don't fit with the 
overall guidelines of how OSC commands should be written. I've had to go 
to the "Dean well" several times when reviewing osc-placement commands.


But the python-openstackclient-core team probably isn't going to scale 
to fit the need of all of these gaps that need closing from the various 
teams, either. So how does that get fixed? I've told Dean and Steve 
before that if they want me to review / ack something compute-specific 
in OSC that they can call on me, like a liaison. Maybe that's all we 
need to start? Because I've definitely disagreed with compute CLI 
changes in OSC that have a +2 from the core team because of a lack of 
understanding from both the contributor and the reviewers about what the 
compute API actually does, or how a microversion behaves. Or maybe we 
just do some kind of subteam thing where OSC core doesn't look at a 
change until the subteam has +1ed it. We have a similar concept in nova 
with virt driver subteams.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Stein PTG summary

2018-09-27 Thread Matt Riedemann

On 9/27/2018 5:23 AM, Sylvain Bauza wrote:



On Thu, Sep 27, 2018 at 2:46 AM Matt Riedemann <mailto:mriede...@gmail.com>> wrote:


On 9/26/2018 5:30 PM, Sylvain Bauza wrote:
 > So, during this day, we also discussed about NUMA affinity and we
said
 > that we could possibly use nested resource providers for NUMA
cells in
 > Stein, but given we don't have yet a specific Placement API
query, NUMA
 > affinity should still be using the NUMATopologyFilter.
 > That said, when looking about how to use this filter for vGPUs,
it looks
 > to me that I'd need to provide a new version for the NUMACell
object and
 > modify the virt.hardware module. Are we also accepting this
(given it's
 > a temporary question), or should we need to wait for the
Placement API
 > support ?
 >
 > Folks, what are you thoughts ?

I'm pretty sure we've said several times already that modeling NUMA in
Placement is not something for which we're holding up the extraction.


It's not an extraction question. Just about knowing whether the Nova 
folks would accept us to modify some o.vo object and module just for a 
temporary time until Placement API has some new query parameter.
Whether Placement is extracted or not isn't really the problem, it's 
more about the time it will take for this query parameter ("numbered 
request groups to be in the same subtree") to be implemented in the 
Placement API.
The real problem we have with vGPUs is that if we don't have NUMA 
affinity, the performance would be around 10% less for vGPUs (if the 
pGPU isn't on the same NUMA cell than the pCPU). Not sure large 
operators would accept that :(


-Sylvain


I don't know how close we are to having whatever we need for modeling 
NUMA in the placement API, but I'll go out on a limb and assume we're 
not close. Given that, if we have to do something within nova for NUMA 
affinity for vGPUs for the NUMATopologyFilter, then I'd be OK with that 
since it's short term like you said (although our "short term" 
workarounds tend to last for many releases). Anyone that cares about 
NUMA today already has to enable the scheduler filter anyway.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Stein PTG summary

2018-09-26 Thread Matt Riedemann

On 9/26/2018 5:30 PM, Sylvain Bauza wrote:
So, during this day, we also discussed about NUMA affinity and we said 
that we could possibly use nested resource providers for NUMA cells in 
Stein, but given we don't have yet a specific Placement API query, NUMA 
affinity should still be using the NUMATopologyFilter.
That said, when looking about how to use this filter for vGPUs, it looks 
to me that I'd need to provide a new version for the NUMACell object and 
modify the virt.hardware module. Are we also accepting this (given it's 
a temporary question), or should we need to wait for the Placement API 
support ?


Folks, what are you thoughts ?


I'm pretty sure we've said several times already that modeling NUMA in 
Placement is not something for which we're holding up the extraction.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [goals][tc][ptl][uc] starting goal selection for T series

2018-09-26 Thread Matt Riedemann

On 9/26/2018 3:01 PM, Doug Hellmann wrote:

Monty Taylor  writes:


On 09/26/2018 01:55 PM, Tim Bell wrote:

Doug,

Thanks for raising this. I'd like to highlight the goal "Finish moving legacy 
python-*client CLIs to python-openstackclient" from the etherpad and propose this 
for a T/U series goal.


I would personally like to thank the person that put that goal in the 
etherpad...they must have had amazing foresight and unparalleled modesty.




To give it some context and the motivation:

At CERN, we have more than 3000 users of the OpenStack cloud. We write an 
extensive end user facing documentation which explains how to use the OpenStack 
along with CERN specific features (such as workflows for requesting 
projects/quotas/etc.).

One regular problem we come across is that the end user experience is 
inconsistent. In some cases, we find projects which are not covered by the 
unified OpenStack client (e.g. Manila). In other cases, there are subsets of 
the function which require the native project client.

I would strongly support a goal which targets

- All new projects should have the end user facing functionality fully exposed 
via the unified client
- Existing projects should aim to close the gap within 'N' cycles (N to be 
defined)
- Many administrator actions would also benefit from integration (reader roles 
are end users too so list and show need to be covered too)
- Users should be able to use a single openrc for all interactions with the 
cloud (e.g. not switch between password for some CLIs and Kerberos for OSC)

The end user perception of a solution will be greatly enhanced by a single 
command line tool with consistent syntax and authentication framework.

It may be a multi-release goal but it would really benefit the cloud consumers 
and I feel that goals should include this audience also.

++

It's also worth noting that we're REALLY close to a 1.0 of openstacksdk
(all the patches are in flight, we just need to land them) - and once
we've got that we'll be in a position to start shifting
python-openstackclient to using openstacksdk instead of python-*client.

This will have the additional benefit that, once we've migrated CLIs to
python-openstackclient as per this goal, and once we've migrated
openstackclient itself to openstacksdk, the number of different
libraries one needs to install to interact with openstack will be
_dramatically_  lower.

Would it be useful to have the SDK work in OSC as a prerequisite to the
goal work? I would hate to have folks have to write a bunch of things
twice.

Do we have any sort of list of which projects aren't currently being
handled by OSC? If we could get some help building such a list, that
would help us understand the scope of the work.


I started documenting the compute API gaps in OSC last release [1]. It's 
a big gap and needs a lot of work, even for existing CLIs (the cold/live 
migration CLIs in OSC are a mess, and you can't even boot from volume 
where nova creates the volume for you). That's also why I put something 
into the etherpad about the OSC core team even being able to handle an 
onslaught of changes for a goal like this.




As far as admin features, I think we've been hesitant to add those to
OSC in the past, but I can see the value. I wonder if having them in a
separate library makes sense? Or is it better to have commands in the
tool that regular users can't access, and just report the permission
error when they try to run the command?


I thought the same, and we talked about this at the Austin summit, but 
OSC is inconsistent about this (you can live migrate a server but you 
can't evacuate it - there is no CLI for evacuation). It also came up at 
the Stein PTG with Dean in the nova room giving us some direction. [2] I 
believe the summary of that discussion was:


a) to deal with the core team sprawl, we could move the compute stuff 
out of python-openstackclient and into an osc-compute plugin (like the 
osc-placement plugin for the placement service); then we could create a 
new core team which would have python-openstackclient-core as a superset


b) Dean suggested that we close the compute API gaps in the SDK first, 
but that could take a long time as well...but it sounded like we could 
use the SDK for things that existed in the SDK and use novaclient for 
things that didn't yet exist in the SDK


This might be a candidate for one of these multi-release goals that the 
TC started talking about at the Stein PTG. I could see something like 
this being a goal for Stein:


"Each project owns its own osc- plugin for OSC CLIs"

That deals with the core team and sprawl issue, especially with stevemar 
being gone and dtroyer being distracted by shiny x-men bird related 
things. That also seems relatively manageable for all projects to do in 
a single release. Having a single-release goal of "close all gaps across 
all service types" is going to be extremely tough for any older projects 
that had CLIs before OSC was created 

Re: [openstack-dev] [ironic] [nova] [tripleo] Deprecation of Nova's integration with Ironic Capabilities and ComputeCapabilitiesFilter

2018-09-25 Thread Matt Riedemann

On 9/25/2018 8:36 AM, John Garbutt wrote:

Another thing is about existing flavors configured for these
capabilities-scoped specs. Are you saying during the deprecation we'd
continue to use those even if the filter is disabled? In the review I
had suggested that we add a pre-upgrade check which inspects the
flavors
and if any of these are found, we report a warning meaning those
flavors
need to be updated to use traits rather than capabilities. Would
that be
reasonable?


I like the idea of a warning, but there are features that have not yet 
moved to traits:

https://specs.openstack.org/openstack/ironic-specs/specs/juno-implemented/uefi-boot-for-ironic.html

There is a more general plan that will help, but its not quite ready yet:
https://review.openstack.org/#/c/504952/

As such, I think we can't get pull the plug on flavors including 
capabilities and passing them to Ironic, but (after a cycle of 
deprecation) I think we can now stop pushing capabilities from Ironic 
into Nova and using them for placement.


Forgive my ignorance, but if traits are not on par with capabilities, 
why are we deprecating the capabilities filter?


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [penstack-dev]Discussion about the future of OpenStack in China

2018-09-24 Thread Matt Riedemann

On 9/24/2018 12:12 PM, Jay Pipes wrote:
There were a couple points that I did manage to decipher, though. One 
thing that both articles seemed to say was that OpenStack doesn't meet 
public (AWS-ish) cloud use cases and OpenStack doesn't compare favorably 
to VMWare either.


Yeah I picked up on that also - trying to be all things to all people 
means we do less well at any single thing. No surprises there.




Is there a large contingent of Chinese OpenStack users that expect 
OpenStack to be a free (as in beer) version of VMware technology?


What are the 3 most important features that Chinese OpenStack users 
would like to see included in OpenStack projects?


Yeah I picked up on a few things as well. The article was talking about 
gaps in upstream services:


a) they did a bunch of work on trove for their dbaas solution, but did 
they contribute any of that work?


b) they mentioned a lack of DRS and HA support, but didn't mention the 
Watcher or Masakari projects - maybe they didn't know they exist?


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [glance][upgrade-checkers] Question about glance rocky upgrade release note

2018-09-24 Thread Matt Riedemann

On 9/24/2018 2:06 PM, Matt Riedemann wrote:
Are there more specific docs about how to configure the 'image import' 
feature so that I can be sure I'm careful? In other words, are there 
specific things a "glance-status upgrade check" check could look at and 
say, "your image import configuration is broken, here are details on how 
you should do this"?


I guess this answers the question about docs:

https://docs.openstack.org/glance/latest/admin/interoperable-image-import.html

Would a basic upgrade check be such that if glance-api.conf contains 
enable_image_import=False, you're going to have issues since that option 
is removed in Rocky?


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [glance][upgrade-checkers] Question about glance rocky upgrade release note

2018-09-24 Thread Matt Riedemann
Looking at the upgrade-checkers goal [1] for glance and the Rocky 
upgrade release notes [2], one upgrade note says:


"As Image Import will be always enabled, care needs to be taken that it 
is configured properly from this release forward. The 
‘enable_image_import’ option is silently ignored."


Are there more specific docs about how to configure the 'image import' 
feature so that I can be sure I'm careful? In other words, are there 
specific things a "glance-status upgrade check" check could look at and 
say, "your image import configuration is broken, here are details on how 
you should do this"?


I'm willing to help write the upgrade check for glance, but need more 
details on that release note.


[1] https://storyboard.openstack.org/#!/story/2003657
[2] https://docs.openstack.org/releasenotes/glance/rocky.html#upgrade-notes

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [goals][upgrade-checkers] Week R-29 Update

2018-09-22 Thread Matt Riedemann

On 9/21/2018 4:19 PM, Ben Nemec wrote:
* The only two projects that I'm aware of with patches up at this 
point are monasca [2] and designate [3]. The monasca one is tricky 
because as I've found going through release notes for some projects, 
they don't really have any major upgrade impacts so writing checks is 
not obvious. I don't have a great solution here. What monasca has done 
is add the framework with a noop check. If others are in the same 
situation, I'd like to hear your thoughts on what you think makes 
sense here. The alternative is these projects opt out of the goal for 
Stein and just add the check code later when it makes sense (but 
people might forget or not care to do that later if it's not a goal).


My inclination is for the command to exist with a noop check, the main 
reason being that if we create it for everyone this cycle then the 
deployment tools can implement calls to the status commands all at once. 
If we wait until checks are needed then someone has to not only 
implement it in the service but also remember to go update all of the 
deployment tools. Implementing a noop check should be pretty trivial 
with the library so it isn't a huge imposition.


Yeah, I agree, and I've left comments on the patch to give some ideas on 
how to write the noop check with a description that explains it's an 
initial check but doesn't really do anything. The alternative would be 
to dump the table header for the results but then not have any rows, 
which could be more confusing.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [goals][upgrade-checkers] Week R-29 Update

2018-09-21 Thread Matt Riedemann

On 9/21/2018 3:53 PM, Matt Riedemann wrote:
* The reference docs I wrote for writing upgrade checks is published now 
[4]. As I've been answering some questions in storyboard and IRC, it's 
obvious that I need to add some FAQs into those docs because I've taken 
some of this for granted on how it works in nova, so I'll push a docs 
change for some of that as well and link it back into the story.


https://review.openstack.org/#/c/604486/ for anyone that thinks I missed 
something.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [goals][upgrade-checkers] Week R-29 Update

2018-09-21 Thread Matt Riedemann

Updates for this week:

* As bnemec noted in the last update [1], he's making some progress with 
the oslo.upgradecheck library. He's retrofitting the nova-status upgrade 
check code to use the library and has a patch up for designate to use it.


* The only two projects that I'm aware of with patches up at this point 
are monasca [2] and designate [3]. The monasca one is tricky because as 
I've found going through release notes for some projects, they don't 
really have any major upgrade impacts so writing checks is not obvious. 
I don't have a great solution here. What monasca has done is add the 
framework with a noop check. If others are in the same situation, I'd 
like to hear your thoughts on what you think makes sense here. The 
alternative is these projects opt out of the goal for Stein and just add 
the check code later when it makes sense (but people might forget or not 
care to do that later if it's not a goal).


* The reference docs I wrote for writing upgrade checks is published now 
[4]. As I've been answering some questions in storyboard and IRC, it's 
obvious that I need to add some FAQs into those docs because I've taken 
some of this for granted on how it works in nova, so I'll push a docs 
change for some of that as well and link it back into the story.


As always, feel free to reach out to me with any questions or issues you 
might have with completing this goal (or just getting started).


[1] 
http://lists.openstack.org/pipermail/openstack-dev/2018-September/134972.html

[2] https://review.openstack.org/#/c/603465/
[3] https://review.openstack.org/#/c/604430/
[4] https://docs.openstack.org/nova/latest/reference/upgrade-checks.html

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] OpenStack Project Navigator

2018-09-21 Thread Matt Riedemann

On 9/21/2018 1:11 PM, Michael Johnson wrote:

Thank you Jimmy for making this available for updates.

I was unable to find the code backing the project tags section of the
Project Navigator pages.
Our page is missing some upgrade tags and is showing duplicate "Stable
branch policy" tags.

https://www.openstack.org/software/releases/rocky/components/octavia

Is there a different repository for the tags code?


Those are down in the project details section of the page, look to the 
right and there is a 'tag details' column. The tags are descriptive and 
link to the details on each tag.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Are we ready to put stable/ocata into extended maintenance mode?

2018-09-20 Thread Matt Riedemann

On 9/20/2018 12:08 PM, Elõd Illés wrote:

Hi Matt,

About 1.: I think it is a good idea to cut a final release (especially 
as some vendor/operator would be glad even if there would be some 
release in Extended Maintenance, too, what most probably won't 
happen...) -- saying that without knowing how much of a burden would it 
be for projects to do this final release...
After that it sounds reasonably to tag the branches EM (as it is written 
in the mentioned resolution).


Do you have any plan about how to coordinate the 'final releases' and do 
the EM-tagging?


Thanks for raising these questions!

Cheers,

Előd


For anyone following along and that cares about this (hopefully PTLs), 
Előd, Doug, Sean and I formulated a plan in IRC today [1].


[1] 
http://eavesdrop.openstack.org/irclogs/%23openstack-stable/%23openstack-stable.2018-09-20.log.html#t2018-09-20T17:10:56


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Forum Topic Submission Period

2018-09-20 Thread Matt Riedemann

On 9/20/2018 10:23 AM, Jimmy McArthur wrote:
This is basically the CFP equivalent: 
https://www.openstack.org/summit/berlin-2018/vote-for-speakers  Voting 
isn't necessary, of course, but it should allow you to see submissions 
as they roll in.


Does this work for your purposes?


Yup, that should do it, thanks!

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ironic] [nova] [tripleo] Deprecation of Nova's integration with Ironic Capabilities and ComputeCapabilitiesFilter

2018-09-20 Thread Matt Riedemann

On 9/20/2018 4:16 AM, John Garbutt wrote:
Following on from the PTG discussions, I wanted to bring everyone's 
attention to Nova's plans to deprecate ComputeCapabilitiesFilter, 
including most of the the integration with Ironic Capabilities.


To be specific, this is my proposal in code form:
https://review.openstack.org/#/c/603102/

Once the code we propose to deprecate is removed we will stop using 
capabilities pushed up from Ironic for 'scheduling', but we would still 
pass capabilities request in the flavor down to Ironic (until we get 
some standard traits and/or deploy templates sorted for things like UEFI).


Functionally, we believe all use cases can be replaced by using the 
simpler placement traits (this is more efficient than post placement 
filtering done using capabilities):

https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/ironic-driver-traits.html

Please note the recent addition of forbidden traits that helps improve 
the usefulness of the above approach:

https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/placement-forbidden-traits.html

For example, a flavor request for GPUs >= 2 could be replaced by a 
custom trait trait that reports if a given Ironic node has 
CUSTOM_MORE_THAN_2_GPUS. That is a bad example (longer term we don't 
want to use traits for this, but that is a discussion for another day) 
but it is the example that keeps being raised in discussions on this topic.


The main reason for reaching out in this email is to ask if anyone has 
needs that the ResourceClass and Traits scheme does not currently 
address, or can think of a problem with a transition to the newer approach.


I left a few comments in the change, but I'm assuming as part of the 
deprecation we'd remove the filter from the default enabled_filters list 
so new installs don't automatically get warnings during scheduling?


Another thing is about existing flavors configured for these 
capabilities-scoped specs. Are you saying during the deprecation we'd 
continue to use those even if the filter is disabled? In the review I 
had suggested that we add a pre-upgrade check which inspects the flavors 
and if any of these are found, we report a warning meaning those flavors 
need to be updated to use traits rather than capabilities. Would that be 
reasonable?


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Forum Topic Submission Period

2018-09-20 Thread Matt Riedemann

On 9/17/2018 11:13 AM, Jimmy McArthur wrote:
The Forum Topic Submission session started September 12 and will run 
through September 26th.  Now is the time to wrangle the topics you 
gathered during your Brainstorming Phase and start pushing forum topics 
through. Don't rely only on a PTL to make the agenda... step on up and 
place the items you consider important front and center.


As you may have noticed on the Forum Wiki 
(https://wiki.openstack.org/wiki/Forum), we're reusing the normal CFP 
tool this year. We did our best to remove Summit specific language, but 
if you notice something, just know that you are submitting to the 
Forum.  URL is here:


https://www.openstack.org/summit/berlin-2018/call-for-presentations

Looking forward to seeing everyone's submissions!

If you have questions or concerns about the process, please don't 
hesitate to reach out.


Another question. In the before times, when we just had that simple form 
to submit forum sessions and then the TC/UC/Foundation reviewed the list 
and picked the sessions, it was very simple to see what other sessions 
were proposed and say, "oh good someone is covering this already, I 
don't need to worry about it". With the move to the CFP forms like the 
summit sessions, that is no longer available, as far as I know. There 
have been at least a few cases this week where someone has said, "this 
might be a good topic, but keystone is probably already covering it, or 
$FOO SIG is probably already covering it", but without herding the cats 
to ask and find out who is all doing what, it's hard to know.


Is there some way we can get back to having a public view of what has 
been proposed for the forum so we an avoid overlap, or at worst not 
proposing something because people assume someone else is going to cover it?


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Zuul job backlog

2018-09-19 Thread Matt Riedemann

On 9/19/2018 2:45 PM, Matt Riedemann wrote:

Another one we need to make a decision on is:

https://bugs.launchpad.net/tempest/+bug/1783405

Which I'm suggesting we need to mark more slow tests with the actual 
"slow" tag in Tempest so they move to only be run in the tempest-slow 
job. gmann and I talked about this last week over IRC but I forgot to 
update the bug report with details. I think rather than increase the 
timeout of the tempest-full job we should be marking more slow tests as 
slow. Increasing timeouts gives some short-term relief but eventually we 
just have to look at these issues again, and a tempest run shouldn't 
take over 2 hours (remember when it used to take ~45 minutes?).


https://review.openstack.org/#/c/603900/

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Zuul job backlog

2018-09-19 Thread Matt Riedemann

On 9/19/2018 2:11 PM, Clark Boylan wrote:

Unfortunately, right now our classification rate is very poor (only 15%), which 
makes it difficult to know what exactly is causing these failures. Mriedem and 
I have quickly scanned the unclassified list, and it appears there is a db 
migration testing issue causing these tests to timeout across several projects. 
Mriedem is working to get this classified and tracked which should help, but we 
will also need to fix the bug. On top of that it appears that Glance has flaky 
functional tests (both python2 and python3) which are causing resets and should 
be looked into.

If you'd like to help, let mriedem or myself know and we'll gladly work with 
you to get elasticsearch queries added to elastic-recheck. We are likely less 
help when it comes to fixing functional tests in Glance, but I'm happy to point 
people in the right direction for that as much as I can. If you can take a few 
minutes to do this before/after you issue a recheck it does help quite a bit.


Things have gotten bad enough that I've started proposing changes to 
skip particularly high failure rate tests that are not otherwise getting 
attention to help triage and fix the bugs. For example:


https://review.openstack.org/#/c/602649/

https://review.openstack.org/#/c/602656/

Generally this is a last resort since it means we're losing test 
coverage, but when we hit a critical mass of random failures it becomes 
extremely difficult to merge code.


Another one we need to make a decision on is:

https://bugs.launchpad.net/tempest/+bug/1783405

Which I'm suggesting we need to mark more slow tests with the actual 
"slow" tag in Tempest so they move to only be run in the tempest-slow 
job. gmann and I talked about this last week over IRC but I forgot to 
update the bug report with details. I think rather than increase the 
timeout of the tempest-full job we should be marking more slow tests as 
slow. Increasing timeouts gives some short-term relief but eventually we 
just have to look at these issues again, and a tempest run shouldn't 
take over 2 hours (remember when it used to take ~45 minutes?).


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Nominating Tetsuro Nakamura for placement-core

2018-09-19 Thread Matt Riedemann

On 9/19/2018 10:25 AM, Chris Dent wrote:



I'd like to nominate Tetsuro Nakamura for membership in the
placement-core team. Throughout placement's development Tetsuro has
provided quality reviews; done the hard work of creating rigorous
functional tests, making them fail, and fixing them; and implemented
some of the complex functionality required at the persistence layer.
He's aware of and respects the overarching goals of placement and has
demonstrated pragmatism when balancing those goals against the
requirements of nova, blazar and other projects.

Please follow up with a +1/-1 to express your preference. No need to
be an existing placement core, everyone with an interest is welcome.


Soft +1 from me given I mostly have defer to those that work more 
closely with Tetsuro. I agree he's a solid contributor, works hard, 
finds issues, fixes them before being asked, etc. That's awesome. 
Reminds me a lot of gibi when we nominated him.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Super fun unshelve image_ref bugs

2018-09-19 Thread Matt Riedemann

On 12/1/2017 2:47 PM, Matt Riedemann wrote:
Andrew Laski also mentioned in IRC that we didn't replace the original 
instance.image_ref with the shelved image id because the shelve 
operation should be transparent to the end user, they have the same 
image (not really), same volumes, same IPs, etc once they unshelve. And 
he mentioned that if you rebuild, for example, you'd then rebuild to the 
original image instead of the shelved snapshot image.


I'm not sure how much I agree with that rebuild argument. I understand 
it, but I'm not sure I agree with it. I think it's much easier to just 
track things for what they are, which means saying if you create a guest 
from a given image id, then track that in the instances table, don't lie 
about it being something else.


Dredging this back up since it will affect cross-cell resize which will 
rely on shelve/unshelve.


I had a thought recently (and noted in 
https://bugs.launchpad.net/nova/+bug/1732428) that the RequestSpec 
points at the original image used to create the server, or last rebuild 
it (if the server was rebuilt with a new image). What if we used that 
during rebuilds rather than the instance.image_ref?


Then unshelve could leave the instance.image_ref pointing at the shelve 
snapshot image (since that's what is actually backing the server at the 
time of unshelve and should fix the resize qcow2 bug linked above) but 
rebuild could still rebuild from the original (or last rebuild) image 
rather than the shelve snapshot image?


The only hiccup I'm aware of is we then still need to *not* delete the 
snapshot image on unshelve that the instance is pointing at, which means 
shelve snapshot images could pile up over time, especially with 
cross-cell resize. Is that a problem? If so, could we have a periodic 
that cleans up the old snapshot images based on some configured value?


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] When can/should we change additionalProperties=False in GET /servers(/detail)?

2018-09-19 Thread Matt Riedemann

On 9/18/2018 12:26 PM, Matt Riedemann wrote:

On 9/17/2018 9:41 PM, Ghanshyam Mann wrote:
   On Tue, 18 Sep 2018 09:33:30 +0900 Alex Xu  
wrote 
  > That only means after 599276 we only have servers API and 
os-instance-action API stopped accepting the undefined query parameter.
  > What I'm thinking about is checking all the APIs, add 
json-query-param checking with additionalProperties=True if the API 
don't have yet. And using another microversion set 
additionalProperties to False, then the whole Nova API become consistent.


I too vote for doing it for all other API together. Restricting the 
unknown query or request param are very useful for API consistency. 
Item#1 in this etherpadhttps://etherpad.openstack.org/p/nova-api-cleanup


If you would like, i can propose a quick spec for that and positive 
response to do all together then we skip to do that in 599276 
otherwise do it for GET servers in 599276.


-gmann


I don't care too much about changing all of the other 
additionalProperties=False in a single microversion given we're already 
kind of inconsistent with this in a few APIs. Consistency is ideal, but 
I thought we'd be lumping in other cleanups from the etherpad into the 
same microversion/spec which will likely slow it down during spec 
review. For example, I'd really like to get rid of the weird server 
response field prefixes like "OS-EXT-SRV-ATTR:". Would we put those into 
the same mass cleanup microversion / spec or split them into individual 
microversions? I'd prefer not to see an explosion of microversions for 
cleaning up oddities in the API, but I could see how doing them all in a 
single microversion could be complicated.


Just an update on https://review.openstack.org/#/c/599276/ - the change 
is approved. We left additionalProperties=True in the GET 
/servers(/detail) APIs for consistency with 2.5 and 2.26, and for 
expediency in just getting the otherwise pretty simple change approved.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [python3] tempest and grenade conversion to python 3.6

2018-09-18 Thread Matt Riedemann

On 9/18/2018 9:52 PM, Matt Riedemann wrote:

On 9/18/2018 12:28 PM, Doug Hellmann wrote:

What's probably missing is a version of the grenade job that allows us
to control that USE_PYTHON3 variable before and after the upgrade.

I see a few different grenade jobs (neutron-grenade,
neutron-grenade-multinode,
legacy-grenade-dsvm-neutron-multinode-live-migration, possibly others).
Which ones are "current" and would make a good candidate as a base for a
new job?


Grenade just runs devstack on the old side (e.g. stable/rocky) using the 
devstack stackrc file (which could have USE_PYTHON3 in it), runs tempest 
'smoke' tests to create some resources, saves off some information about 
those resources in a "database" (just an ini file), then runs devstack 
on the new side (e.g. master) using the new side stackrc file and 
verifies those saved off resources made it through the upgrade. It's all 
bash so there isn't anything python-specific about grenade.


I saw, but didn't comment on, the other thread about if it would be 
possible to create a grenade-2to3 job. I'd think that is pretty doable 
based on the USE_PYTHON3 variable. We'd just have that False on the old 
side, and True on the new side, and devstack will do it's thing. Right 
now the USE_PYTHON3 variable is global in devstack-gate [1] (which is 
the thing that orchestrates the grenade run for the legacy jobs), but 
I'm sure we could hack that to be specific to the base (old) and target 
(new) release for the grenade run.


[1] 
https://github.com/openstack-infra/devstack-gate/blob/95fa4343104eafa655375cce3546d27139211d13/devstack-vm-gate-wrap.sh#L434 





To answer Doug's original question, neutron-grenade-multinode is 
probably best to model for a new job if you want to test rolling 
upgrades, because that job has two compute nodes and leaves one on the 
'old' side so it would upgrade the controller services and one compute 
to Stein and leave the other compute at Rocky. So if you start with 
python2 on the old side and upgrade to python3 for everything except one 
compute, you'll have a pretty good idea of whether or not that rolling 
upgrade works through our various services and libraries, like the 
oslo.messaging stuff noted in the other thread.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [python3] tempest and grenade conversion to python 3.6

2018-09-18 Thread Matt Riedemann

On 9/18/2018 12:28 PM, Doug Hellmann wrote:

What's probably missing is a version of the grenade job that allows us
to control that USE_PYTHON3 variable before and after the upgrade.

I see a few different grenade jobs (neutron-grenade,
neutron-grenade-multinode,
legacy-grenade-dsvm-neutron-multinode-live-migration, possibly others).
Which ones are "current" and would make a good candidate as a base for a
new job?


Grenade just runs devstack on the old side (e.g. stable/rocky) using the 
devstack stackrc file (which could have USE_PYTHON3 in it), runs tempest 
'smoke' tests to create some resources, saves off some information about 
those resources in a "database" (just an ini file), then runs devstack 
on the new side (e.g. master) using the new side stackrc file and 
verifies those saved off resources made it through the upgrade. It's all 
bash so there isn't anything python-specific about grenade.


I saw, but didn't comment on, the other thread about if it would be 
possible to create a grenade-2to3 job. I'd think that is pretty doable 
based on the USE_PYTHON3 variable. We'd just have that False on the old 
side, and True on the new side, and devstack will do it's thing. Right 
now the USE_PYTHON3 variable is global in devstack-gate [1] (which is 
the thing that orchestrates the grenade run for the legacy jobs), but 
I'm sure we could hack that to be specific to the base (old) and target 
(new) release for the grenade run.


[1] 
https://github.com/openstack-infra/devstack-gate/blob/95fa4343104eafa655375cce3546d27139211d13/devstack-vm-gate-wrap.sh#L434


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Forum Topic Submission Period

2018-09-18 Thread Matt Riedemann

On 9/17/2018 11:13 AM, Jimmy McArthur wrote:

Hello Everyone!

The Forum Topic Submission session started September 12 and will run 
through September 26th.  Now is the time to wrangle the topics you 
gathered during your Brainstorming Phase and start pushing forum topics 
through. Don't rely only on a PTL to make the agenda... step on up and 
place the items you consider important front and center.


As you may have noticed on the Forum Wiki 
(https://wiki.openstack.org/wiki/Forum), we're reusing the normal CFP 
tool this year. We did our best to remove Summit specific language, but 
if you notice something, just know that you are submitting to the 
Forum.  URL is here:


https://www.openstack.org/summit/berlin-2018/call-for-presentations

Looking forward to seeing everyone's submissions!

If you have questions or concerns about the process, please don't 
hesitate to reach out.


Cheers,
Jimmy


Just a process question. I submitted a presentation for the normal 
marketing blitz part of the summit which wasn't accepted (I'm still 
dealing with this emotionally, btw...) but when I look at the CFP link 
for Forum topics, my thing shows up there as "Received" so does that 
mean my non-Forum-at-all submission is now automatically a candidate for 
the Forum because that would not be my intended audience (only suits and 
big wigs please).


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Are we ready to put stable/ocata into extended maintenance mode?

2018-09-18 Thread Matt Riedemann
The release page says Ocata is planned to go into extended maintenance 
mode on Aug 27 [1]. There really isn't much to this except it means we 
don't do releases for Ocata anymore [2]. There is a caveat that project 
teams that do not wish to maintain stable/ocata after this point can 
immediately end of life the branch for their project [3]. We can still 
run CI using tags, e.g. if keystone goes ocata-eol, devstack on 
stable/ocata can still continue to install from stable/ocata for nova 
and the ocata-eol tag for keystone. Having said that, if there is no 
undue burden on the project team keeping the lights on for stable/ocata, 
I would recommend not tagging the stable/ocata branch end of life at 
this point.


So, questions that need answering are:

1. Should we cut a final release for projects with stable/ocata branches 
before going into extended maintenance mode? I tend to think "yes" to 
flush the queue of backports. In fact, [3] doesn't mention it, but the 
resolution said we'd tag the branch [4] to indicate it has entered the 
EM phase.


2. Are there any projects that would want to skip EM and go directly to 
EOL (yes this feels like a Monopoly question)?


[1] https://releases.openstack.org/
[2] 
https://docs.openstack.org/project-team-guide/stable-branches.html#maintenance-phases
[3] 
https://docs.openstack.org/project-team-guide/stable-branches.html#extended-maintenance
[4] 
https://governance.openstack.org/tc/resolutions/20180301-stable-branch-eol.html#end-of-life


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] When can/should we change additionalProperties=False in GET /servers(/detail)?

2018-09-18 Thread Matt Riedemann

On 9/17/2018 9:41 PM, Ghanshyam Mann wrote:

   On Tue, 18 Sep 2018 09:33:30 +0900 Alex Xu  wrote 
  > That only means after 599276 we only have servers API and 
os-instance-action API stopped accepting the undefined query parameter.
  > What I'm thinking about is checking all the APIs, add json-query-param 
checking with additionalProperties=True if the API don't have yet. And using 
another microversion set additionalProperties to False, then the whole Nova API 
become consistent.

I too vote for doing it for all other API together. Restricting the unknown 
query or request param are very useful for API consistency. Item#1 in this 
etherpadhttps://etherpad.openstack.org/p/nova-api-cleanup

If you would like, i can propose a quick spec for that and positive response to 
do all together then we skip to do that in 599276 otherwise do it for GET 
servers in 599276.

-gmann


I don't care too much about changing all of the other 
additionalProperties=False in a single microversion given we're already 
kind of inconsistent with this in a few APIs. Consistency is ideal, but 
I thought we'd be lumping in other cleanups from the etherpad into the 
same microversion/spec which will likely slow it down during spec 
review. For example, I'd really like to get rid of the weird server 
response field prefixes like "OS-EXT-SRV-ATTR:". Would we put those into 
the same mass cleanup microversion / spec or split them into individual 
microversions? I'd prefer not to see an explosion of microversions for 
cleaning up oddities in the API, but I could see how doing them all in a 
single microversion could be complicated.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] When can/should we change additionalProperties=False in GET /servers(/detail)?

2018-09-17 Thread Matt Riedemann

On 9/17/2018 3:06 PM, Jay Pipes wrote:
My vote would be just change additionalProperties to False in the 599276 
patch and be done with it.


Well, it would be on a microversion boundary so the user would be opting 
into this stricter validation, but that's the point of microversions. So 
my custom API extension that handles GET /servers?bestpet=cats will 
continue to work as long as I'm using microversion < 2.66.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] When can/should we change additionalProperties=False in GET /servers(/detail)?

2018-09-17 Thread Matt Riedemann
This is a question from a change [1] which adds a new changes-before 
filter to the servers, os-instance-actions and os-migrations APIs.


For context, the os-instance-actions API stopped accepting undefined 
query parameters in 2.58 when we added paging support.


The os-migrations API stopped allowing undefined query parameters in 
2.59 when we added paging support.


The open question on the review is if we should change GET /servers and 
GET /servers/detail to stop allowing undefined query parameters starting 
with microversion 2.66 [2]. Apparently when we added support for 2.5 and 
2.26 for listing servers we didn't think about this. It means that a 
user can specify a query parameter, documented in the API reference, but 
with an older microversion and it will be silently ignored. That is 
backward compatible but confusing from an end user perspective since it 
would appear to them that the filter is not being applied, when it fact 
it would be if they used the correct microversion.


So do we want to start enforcing query parameters when listing servers 
to our defined list with microversion 2.66 or just continue to silently 
ignore them if used incorrectly?


Note that starting in Rocky, the Neutron API will start rejecting 
unknown query parameteres [3] if the filter-validation extension is 
enabled (since Neutron doesn't use microversions). So there is some 
precedent in OpenStack for starting to enforce query parameters.


[1] https://review.openstack.org/#/c/599276/
[2] 
https://review.openstack.org/#/c/599276/23/nova/api/openstack/compute/schemas/servers.py

[3] https://docs.openstack.org/releasenotes/neutron/rocky.html#upgrade-notes

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [election][tc]Question for candidates about global reachout

2018-09-16 Thread Matt Riedemann

On 9/15/2018 9:50 PM, Fred Li wrote:
As a non-native English speaker, it is nice-to-have that some TC or BoD 
can stay in the local social media, like wechat group in China. But it 
is also very difficult for non-native Chinese speakers to stay find 
useful information in ton of Chinese chats.

My thoughts (even I am not a TC candidate) on this is,
1. it is kind of you to stay in the local group.
2. if we know that you are in, we will say English if we want you to notice.
3. since there is local OpenStack operation manager, hope he/she can 
identify some information and help to translate, or remind them to 
translate.


My one cent.


Is there a generic openstack group on wechat? Does one have to be 
invited to it? Is there a specific openstack/nova group on wechat? I'm 
on wechat anyway so I don't mind being in those groups if someone wants 
to reach out.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [election][tc]Question for candidates about global reachout

2018-09-15 Thread Matt Riedemann

On 9/14/2018 1:52 PM, Zhipeng Huang wrote:

This is a joint question from mnaser and me :)

For the candidates who are running for tc seats, please reply to this 
email to indicate if you are open to use certain social media app in 
certain region (like Wechat in China, Line in Japan, etc.), in order to 
reach out to the OpenStack developers in that region and help them to 
connect to the upstream community as well as answering questions or 
other activities that will help. (sorry for the long sentence ... )


Rico and I already sign up for Wechat communication for sure :)


Having had some experience with WeChat, I can't imagine I'd be very 
useful in a nova channel in WeChat since the majority of people in that 
group wouldn't be speaking English so I wouldn't be of much help, unless 
someone directly asked me a question in English. I realize the double 
standard here with expecting non-native English speakers to show up in 
the #openstack-nova freenode IRC channel to ask questions. It's 
definitely a hard problem when people simply can't speak the same 
language and I don't have a great solution. Probably the best common 
solution we have is having more people across time zones and language 
barriers engaging in more discussion in the mailing list (and Gerrit 
reviews of course). So maybe that means if you're in WeChat and someone 
is blocked or has a bigger question for a specific project team, 
encourage them to send an email to the dev ML - but that requires 
ambassadors to be in WeChat channels to make that suggestion. I think of 
this like working with product teams within your own company. Lots of 
those people aren't active upstream contributors and to avoid being the 
middleman (and thus bottleneck) for all communication between upstream 
and downstream teams, I've encouraged the downstream folk to send an 
email upstream to start a discussion.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [goals][upgrade-checkers] Week R-30 update

2018-09-15 Thread Matt Riedemann

Just a couple of updates this week.

* I have assigned PTLs (for projects that have PTLs [1]) to their 
respective tasks in StoryBoard [2]. If someone else on your team is 
planning on working on the pre-upgrade check goal then please just 
reassign ownership of the task.


* I have started going through some project release notes looking for 
upgrade impacts and leaving notes in the task assigned per project. 
There were some questions at the PTG about what some projects could add 
for pre-upgrade checks so check your task to see if I've left any 
thoughts. I have not gone through all projects yet.


* Ben Nemec has extracted the common upgrade check CLI framework into a 
library [3] (thanks Ben!) and is working on getting that imported into 
Gerrit. It would be great if projects that start working on the goal can 
try using that library and provide feedback.


[1] https://governance.openstack.org/election/results/stein/ptl.html
[2] https://storyboard.openstack.org/#!/story/2003657
[3] https://github.com/cybertron/oslo.upgradecheck

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][publiccloud-wg] Proposal to shelve on stop/suspend

2018-09-14 Thread Matt Riedemann
tl;dr: I'm proposing a new parameter to the server stop (and suspend?) 
APIs to control if nova shelve offloads the server.


Long form: This came up during the public cloud WG session this week 
based on a couple of feature requests [1][2]. When a user stops/suspends 
a server, the hypervisor frees up resources on the host but nova 
continues to track those resources as being used on the host so the 
scheduler can't put more servers there. What operators would like to do 
is that when a user stops a server, nova actually shelve offloads the 
server from the host so they can schedule new servers on that host. On 
start/resume of the server, nova would find a new host for the server. 
This also came up in Vancouver where operators would like to free up 
limited expensive resources like GPUs when the server is stopped. This 
is also the behavior in AWS.


The problem with shelve is that it's great for operators but users just 
don't use it, maybe because they don't know what it is and stop works 
just fine. So how do you get users to opt into shelving their server?


I've proposed a high-level blueprint [3] where we'd add a new 
(microversioned) parameter to the stop API with three options:


* auto
* offload
* retain

Naming is obviously up for debate. The point is we would default to auto 
and if auto is used, the API checks a config option to determine the 
behavior - offload or retain. By default we would retain for backward 
compatibility. For users that don't care, they get auto and it's fine. 
For users that do care, they either (1) don't opt into the microversion 
or (2) specify the specific behavior they want. I don't think we need to 
expose what the cloud's configuration for auto is because again, if you 
don't care then it doesn't matter and if you do care, you can opt out of 
this.


"How do we get users to use the new microversion?" I'm glad you asked.

Well, nova CLI defaults to using the latest available microversion 
negotiated between the client and the server, so by default, anyone 
using "nova stop" would get the 'auto' behavior (assuming the client and 
server are new enough to support it). Long-term, openstack client plans 
on doing the same version negotiation.


As for the server status changes, if the server is stopped and shelved, 
the status would be 'SHELVED_OFFLOADED' rather than 'SHUTDOWN'. I 
believe this is fine especially if a user is not being specific and 
doesn't care about the actual backend behavior. On start, the API would 
allow starting (unshelving) shelved offloaded (rather than just stopped) 
instances. Trying to hide shelved servers as stopped in the API would be 
overly complex IMO so I don't want to try and mask that.


It is possible that a user that stopped and shelved their server could 
hit a NoValidHost when starting (unshelving) the server, but that really 
shouldn't happen in a cloud that's configuring nova to shelve by default 
because if they are doing this, their SLA needs to reflect they have the 
capacity to unshelve the server. If you can't honor that SLA, don't 
shelve by default.


So, what are the general feelings on this before I go off and start 
writing up a spec?


[1] https://bugs.launchpad.net/openstack-publiccloud-wg/+bug/1791681
[2] https://bugs.launchpad.net/openstack-publiccloud-wg/+bug/1791679
[3] https://blueprints.launchpad.net/nova/+spec/shelve-on-stop

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Hard fail if you try to rename an AZ with instances in it?

2018-09-14 Thread Matt Riedemann

On 3/28/2018 4:35 PM, Jay Pipes wrote:

On 03/28/2018 03:35 PM, Matt Riedemann wrote:

On 3/27/2018 10:37 AM, Jay Pipes wrote:


If we want to actually fix the issue once and for all, we need to 
make availability zones a real thing that has a permanent identifier 
(UUID) and store that permanent identifier in the instance (not the 
instance metadata).


Or we can continue to paper over major architectural weaknesses like 
this.


Stepping back a second from the rest of this thread, what if we do the 
hard fail bug fix thing, which could be backported to stable branches, 
and then we have the option of completely re-doing this with aggregate 
UUIDs as the key rather than the aggregate name? Because I think the 
former could get done in Rocky, but the latter probably not.


I'm fine with that (and was fine with it before, just stating that 
solving the problem long-term requires different thinking)


Best,
-jay


Just FYI for anyone that cared about this thread, we agreed at the Stein 
PTG to resolve the immediate bug [1] by blocking AZ renames while the AZ 
has instances in it. There won't be a microversion for that change and 
we'll be able to backport it (with a release note I suppose).


[1] https://bugs.launchpad.net/nova/+bug/1782539

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-sigs] Open letter/request to TC candidates (and existing elected officials)

2018-09-12 Thread Matt Riedemann

On 9/12/2018 5:32 PM, Melvin Hillsman wrote:
We basically spent the day focusing on two things specific to what you 
bring up and are in agreement with you regarding action not just talk 
around feedback and outreach. [1]
We wiped the agenda clean, discussed our availability (set reasonable 
expectations), and revisited how we can be more diligent and successful 
around these two principles which target your first comment, "...get 
their RFE/bug list ranked from the operator community (because some of 
the requests are not exclusive to public cloud), and then put pressure 
on the TC to help project manage the delivery of the top issue..."


I will not get into much detail because again this response is specific 
to a portion of your email so in keeping with feedback and outreach the 
UC is making it a point to be intentional. We have already got action 
items [2] which target the concern you raise. We have agreed to hold 
each other accountable and adjusted our meeting structure to facilitate 
being successful.


Not that the UC (elected members) are the only ones who can do this but 
we believe it is our responsibility to; regardless of what anyone else 
does. The UC is also expected to enlist others and hopefully through our 
efforts others are encouraged participate and enlist others.


[1] https://etherpad.openstack.org/p/uc-stein-ptg
[2] https://etherpad.openstack.org/p/UC-Election-Qualifications


Awesome, thank you Melvin and others on the UC.

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-sigs] Open letter/request to TC candidates (and existing elected officials)

2018-09-12 Thread Matt Riedemann

On 9/12/2018 5:13 PM, Jeremy Stanley wrote:

Sure, and I'm saying that instead I think the influence of TC
members_can_  be more valuable in finding and helping additional
people to do these things rather than doing it all themselves, and
it's not just about the limited number of available hours in the day
for one person versus many. The successes goal champions experience,
the connections they make and the elevated reputation they gain
throughout the community during the process of these efforts builds
new leaders for us all.


Again, I'm not saying TC members should be doing all of the work 
themselves. That's not realistic, especially when critical parts of any 
major effort are going to involve developers from projects on which none 
of the TC members are active contributors (e.g. nova). I want to see TC 
members herd cats, for lack of a better analogy, and help out 
technically (with code) where possible.


Given the repeated mention of how the "help wanted" list continues to 
not draw in contributors, I think the recruiting role of the TC should 
take a back seat to actually stepping in and helping work on those items 
directly. For example, Sean McGinnis is taking an active role in the 
operators guide and other related docs that continue to be discussed at 
every face to face event since those docs were dropped from 
openstack-manuals (in Pike).


I think it's fair to say that the people generally elected to the TC are 
those most visible in the community (it's a popularity contest) and 
those people are generally the most visible because they have the luxury 
of working upstream the majority of their time. As such, it's their duty 
to oversee and spend time working on the hard cross-project technical 
deliverables that operators and users are asking for, rather than think 
of an infinite number of ways to try and draw *others* to help work on 
those gaps. As I think it's the role of a PTL within a given project to 
have a finger on the pulse of the technical priorities of that project 
and manage the developers involved (of which the PTL certainly may be 
one), it's the role of the TC to do the same across openstack as a 
whole. If a PTL doesn't have the time or willingness to do that within 
their project, they shouldn't be the PTL. The same goes for TC members IMO.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


  1   2   3   4   5   6   7   8   9   10   >