from:"matt"


On 11/5/2018 1:36 PM, Doug Hellmann wrote:

I think the lazy stuff was all about the API responses. The log
translations worked a completely different way.


Yeah maybe. And if so, I came across this in one of the blueprints:

https://etherpad.openstack.org/p/disable-lazy-translation

Which says that because of a critical bug, the lazy translation was 
disabled in Havana to be fixed in Icehouse but I don't think that ever 
happened before IBM developers dropped it upstream, which is further 
justification for nuking this code from the various projects.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][placement] Placement requests and caching in the resource tracker


On 11/5/2018 1:17 PM, Matt Riedemann wrote:
I'm thinking of a case like, resize and instance but rather than 
confirm/revert it, the user deletes the instance. That would cleanup the 
allocations from the target node but potentially not from the source node.


Well this case is at least not an issue:

https://review.openstack.org/#/c/615644/

It took me a bit to sort out how that worked but it does and I've added 
a test to confirm it.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][placement] Placement requests and caching in the resource tracker


On 11/5/2018 12:28 PM, Mohammed Naser wrote:

Have you dug into any of the operations around these instances to
determine what might have gone wrong? For example, was a live migration
performed recently on these instances and if so, did it fail? How about
evacuations (rebuild from a down host).

To be honest, I have not, however, I suspect a lot of those happen from the
fact that it is possible that the service which makes the claim is not the
same one that deletes it

I'm not sure if this is something that's possible but say the compute2 makes
a claim for migrating to compute1 but something fails there, the revert happens
in compute1 but compute1 is already borked so it doesn't work

This isn't necessarily the exact case that's happening but it's a summary
of what I believe happens.



The computes don't create the resource allocations in placement though, 
the scheduler does, unless this deployment still has at least one 
compute that is 

The compute service should only be removing allocations for things like 
server delete, failed move operation (cleanup the allocations created by 
the scheduler), or a successful move operation (cleanup the allocations 
for the source node held by the migration record).


I wonder if you have migration records (from the cell DB migrations 
table) holding allocations in placement for some reason, even though the 
migration is complete. I know you have an audit script to look for 
allocations that are not held by instances, assuming those instances 
have been deleted and the allocations were leaked, but they could have 
also been held by the migration record and maybe leaked that way? 
Although if you delete the instance, the related migrations records are 
also removed (but maybe not their allocations?). I'm thinking of a case 
like, resize and instance but rather than confirm/revert it, the user 
deletes the instance. That would cleanup the allocations from the target 
node but potentially not from the source node.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] Dropping lazy translation support

This is a follow up to a dev ML email [1] where I noticed that some 
implementations of the upgrade-checkers goal were failing because some 
projects still use the oslo_i18n.enable_lazy() hook for lazy log message 
translation (and maybe API responses?).


The very old blueprints related to this can be found here [2][3][4].

If memory serves me correctly from my time working at IBM on this, this 
was needed to:


1. Generate logs translated in other languages.

2. Return REST API responses if the "Accept-Language" header was used 
and a suitable translation existed for that language.


#1 is a dead horse since I think at least the Ocata summit when we 
agreed to no longer translate logs since no one used them.


#2 is probably something no one knows about. I can't find end-user 
documentation about it anywhere. It's not tested and therefore I have no 
idea if it actually works anymore.


I would like to (1) deprecate the oslo_i18n.enable_lazy() function so 
new projects don't use it and (2) start removing the enable_lazy() usage 
from existing projects like keystone, glance and cinder.


Are there any users, deployments or vendor distributions that still rely 
on this feature? If so, please speak up now.


[1] 
http://lists.openstack.org/pipermail/openstack-dev/2018-November/136285.html

[2] https://blueprints.launchpad.net/oslo-incubator/+spec/i18n-messages
[3] https://blueprints.launchpad.net/nova/+spec/i18n-messages
[4] https://blueprints.launchpad.net/nova/+spec/user-locale-api

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Announcing new Focal Point for s390x libvirt/kvm Nova


On 11/2/2018 3:47 AM, Andreas Scheuring wrote:

Dear Nova Community,
I want to announce the new focal point for Nova s390x libvirt/kvm.

Please welcome "Cathy Zhang” to the Nova team. She and her team will be 
responsible for maintaining the s390x libvirt/kvm Thirdparty CI  [1] and any s390x 
specific code in nova and os-brick.
I personally took a new opportunity already a few month ago but kept 
maintaining the CI as good as possible. With new manpower we can hopefully 
contribute more to the community again.

You can reach her via
* email:bjzhj...@linux.vnet.ibm.com
* IRC: Cathyz

Cathy, I wish you and your team all the best for this exciting role! I also 
want to say thank you for the last years. It was a great time, I learned a lot 
from you all, will miss it!

Cheers,

Andreas (irc: scheuran)


[1]https://wiki.openstack.org/wiki/ThirdPartySystems/IBM_zKVM_CI


Welcome Cathy.

Andreas - thanks for the update and good luck on the new position.

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] about live-resize the instance


On 11/4/2018 10:17 PM, Chen CH Ji wrote:
Yes, this has been discussed for long time and If I remember this 
correctly seems S PTG also had some discussion on it (maybe public Cloud 
WG ? ), Claudiu has been pushing this for several cycles and he actually 
had some code at [1] but no additional progress there...
[1] 
https://review.openstack.org/#/q/status:abandoned+topic:bp/instance-live-resize


It's a question of priorities. It's a complicated change and low 
priority, in my opinion. We've said several times before that we'd do 
it, but there are a lot of other higher priority efforts taking the 
attention of the core team. Getting agreement on the spec is the first 
step and then the runways process should be used to deal with actual 
code reviews, but I think the spec review has stalled (I know I am 
guilty of not looking at the latest updates to the spec).


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [goals][upgrade-checkers] FYI on "TypeError: Message objects do not support addition." errors

If you are seeing this error when implementing and running the upgrade 
check command in your project:


Traceback (most recent call last):
  File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/oslo_upgradecheck/upgradecheck.py", 
line 184, in main

return conf.command.action_fn()
  File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/oslo_upgradecheck/upgradecheck.py", 
line 134, in check

print(t)
  File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/prettytable.py", 
line 237, in __str__

return self.__unicode__()
  File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/prettytable.py", 
line 243, in __unicode__

return self.get_string()
  File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/prettytable.py", 
line 995, in get_string

lines.append(self._stringify_header(options))
  File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/prettytable.py", 
line 1066, in _stringify_header
bits.append(" " * lpad + self._justify(fieldname, width, 
self._align[field]) + " " * rpad)
  File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/prettytable.py", 
line 187, in _justify

return text + excess * " "
  File 
"/home/osboxes/git/searchlight/.tox/venv/lib/python3.5/site-packages/oslo_i18n/_message.py", 
line 230, in __add__

raise TypeError(msg)
TypeError: Message objects do not support addition.

It is due to calling oslo_i18n.enable_lazy() somewhere in the command 
import path. That should be removed from the project since lazy 
translation is not supported in openstack and as an effort was abandoned 
several years ago. It is probably still called in a lot of "big 
tent/stackforge" projects because of initially copying it from the more 
core projects. Anyway, just remove it.


I'm talking with the oslo team about deprecating that interface so 
projects don't mistakenly use it and expect great things to happen.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [goals][upgrade-checkers] Week R-23 Update

There is not much news this week. There are several open changes which 
add the base command framework to projects [1]. Those need reviews from 
the related core teams. gmann and I have been trying to go through them 
first to make sure they are ready for core review.


There is one neutron change to note [2] which adds an extension point 
for neutron stadium projects (and ML2 plugins?) to hook in their own 
upgrade checks. Given the neutron architecture, this makes sense. My 
only worry is about making sure the interface is clearly defined, but I 
suspect this isn't the first time the neutron team has had to deal with 
something like this.


[1] https://review.openstack.org/#/q/topic:upgrade-checkers+status:open
[2] https://review.openstack.org/#/c/615196/

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][placement] Placement requests and caching in the resource tracker


On 11/5/2018 5:52 AM, Chris Dent wrote:

* We need to have further discussion and investigation on
   allocations getting out of sync. Volunteers?


This is something I've already spent a lot of time on with the 
heal_allocations CLI, and have already started asking mnaser questions 
about this elsewhere in the thread.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][placement] Placement requests and caching in the resource tracker


On 11/4/2018 4:22 AM, Mohammed Naser wrote:

Just for information sake, a clean state cloud which had no reported issues
over maybe a period of 2-3 months already has 4 allocations which are
incorrect and 12 allocations pointing to the wrong resource provider, so I
think this comes does to committing to either "self-healing" to fix those
issues or not.


Is this running Rocky or an older release?

Have you dug into any of the operations around these instances to 
determine what might have gone wrong? For example, was a live migration 
performed recently on these instances and if so, did it fail? How about 
evacuations (rebuild from a down host).


By "4 allocations which are incorrect" I assume that means they are 
pointing at the correct compute node resource provider but the values 
for allocated VCPU, MEMORY_MB and DISK_GB is wrong? If so, how do the 
allocations align with old/new flavors used to resize the instance? Did 
the resize fail?


Are there mixed compute versions at all, i.e. are you moving instances 
around during a rolling upgrade?


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][placement] Placement requests and caching in the resource tracker

2018-11-02 Thread Matt Riedemann

 idea behind getting 
the latest traits is so the virt driver doesn't overwrite any traits set 
externally on the compute node root resource provider. I think that 
still stands and is probably OK, even though we have generations now 
which should keep us from overwriting if we don't have the latest 
traits, but I wanted to bring it up since it's related to the "why do we 
need provider aggregates in the compute?" question.


* Regardless of what we do, I think we should probably *at least* make 
that refresh associations config allow 0 to disable it so CERN (and 
others) can avoid the need to continually forward-porting code to 
disable it.


[1] 
https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/placement-mirror-host-aggregates.html

[2] https://bugs.launchpad.net/nova/+bug/1784020
[3] 
https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/report-cpu-features-as-traits.html


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] StoryBoard Forum Session: Remaining Blockers

2018-11-02 Thread Matt Riedemann


On 11/1/2018 7:22 PM, Kendall Nelson wrote:
We've made a lot of progress in StoryBoard-land over the last couple of 
releases cleaning up bugs, fixing UI annoyances, and adding features 
that people have requested. All along we've also continued to migrate 
projects as they've become unblocked. While there are still a few 
blockers on our to-do list, we want to make sure our list is complete[1].


We have a session at the upcoming forum to collect any remaining 
blockers that you may have encountered while messing around with the dev 
storyboard[2] site or using the real storyboard interacting with 
projects that have already migrated. If you encountered any issues that  
are blocking your project from migrating, please come share them with 
with us[3]. Hope to see you there!


-Kendall (diablo_rojo) & the StoryBoard team

[1] https://storyboard.openstack.org/#!/worklist/493 
<https://storyboard.openstack.org/#%21/worklist/493>


I'm not sure why/how but you seem to have an encoded URL for this [1] 
link, which when I was using it redirected me to my own dashboard in 
storyboard. The real link, 
https://storyboard.openstack.org/#!/worklist/493, does work though. Just 
FYI for anyone else having the same problem.



[2] https://storyboard-dev.openstack.org/
[2] 
https://www.openstack.org/summit/berlin-2018/summit-schedule/events/22839/storyboard-migration-the-remaining-blockers 




--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova] Is anyone running their own script to purge old instance_faults table entries?

2018-11-01 Thread Matt Riedemann

I came across this bug [1] in triage today and I thought this was fixed 
already [2] but either something regressed or there is more to do here.


I'm mostly just wondering, are operators already running any kind of 
script which purges old instance_faults table records before an instance 
is deleted and archived/purged? Because if so, that might be something 
we want to add as a nova-manage command.


[1] https://bugs.launchpad.net/nova/+bug/1800755
[2] https://review.openstack.org/#/c/409943/

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Zuul Queue backlogs and resource usage

2018-10-30 Thread Matt Riedemann


On 10/30/2018 11:03 AM, Clark Boylan wrote:

If you find any of this interesting and would like to help feel free to reach 
out to myself or the infra team.


I find this interesting and thanks for providing the update to the 
mailing list. That's mostly what I wanted to say.


FWIW I've still got https://review.openstack.org/#/c/606981/ and the 
related changes to drop the nova-multiattach job and enable the 
multiattach volume tests in the integrated gate, but am hung up on some 
test failures in the multi-node tempest job as a result of that (the 
nova-multiattach job is single-node). There must be something weird that 
tickles those tests in a multi-node configuration and I just haven't dug 
into it yet, but maybe one of our intrepid contributors can lend a hand 
and debug it.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova] FYI: change in semantics for virt driver update_provider_tree()

2018-10-29 Thread Matt Riedemann

This is a notice to any out of tree virt driver implementors of the 
ComputeDriver.update_provider_tree() interface that they now need to set 
the allocation_ratio and reserved amounts for VCPU, MEMORY_MB and 
DISK_GB inventory from the update_provider_tree() method assuming [1] 
merges. The patch below that one in the series shows how it's 
implemented for the libvirt driver.


[1] https://review.openstack.org/#/c/613991/

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova] Volunteer needed to write reshaper FFU hook

2018-10-29 Thread Matt Riedemann

Given the outstanding results of my recruiting job last week [1] I have 
been tasked with recruiting one of our glorious and most talented 
contributors to work on the fast-forward-upgrade script changes needed 
for the reshape-provider-tree blueprint.


The work item is nicely detailed in the spec [2]. A few things to keep 
in mind:


1. There are currently no virt drivers which run the reshape routine. 
However, patches are up for review for libvirt [3] and xen [4]. There 
are also functional tests which exercise the ResourceTracker code with a 
faked out virt driver interface to test reshaping [5].


2. The FFU entry point will mimic the reshape routine that will happen 
on nova-compute service startup in the ResourceTracker [6].


3. The FFU script will need to run per-compute service rather than 
globally (or per cell) since it actually needs to call the virt driver's 
update_provider_tree() interface which might need to inspect the 
hardware (like for GPUs).


Given there is already a model to follow from the ResourceTracker this 
should not be too hard, the work will likely mostly be writing tests.


What do you get if you volunteer? The usual: fame, fortune, the respect 
of your peers, etc.


[1] 
http://lists.openstack.org/pipermail/openstack-dev/2018-October/136075.html
[2] 
https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/reshape-provider-tree.html#offline-upgrade-script

[3] https://review.openstack.org/#/c/599208/
[4] https://review.openstack.org/#/c/521041/
[5] 
https://github.com/openstack/nova/blob/a0eacbf7f/nova/tests/functional/test_servers.py#L1839
[6] 
https://github.com/openstack/nova/blob/a0eacbf7f/nova/compute/resource_tracker.py#L917-L940


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [goals][upgrade-checkers] Week R-24 Update

2018-10-26 Thread Matt Riedemann

There isn't much news this week except some of the base framework 
changes being proposed to projects are getting merged which is nice to see.


https://storyboard.openstack.org/#!/story/2003657

https://review.openstack.org/#/q/topic:upgrade-checkers+status:merged

And there are a lot of patches that are ready for review:

https://review.openstack.org/#/q/topic:upgrade-checkers+status:open

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][limits] Does ANYONE at all use the quota class functionality in Nova?

2018-10-25 Thread Matt Riedemann


On 10/25/2018 2:55 PM, Chris Friesen wrote:
2) The main benefit (as I see it) of the quota class API is to allow 
dynamic adjustment of the default quotas without restarting services.


I could be making this up, but I want to say back at the Pike PTG people 
were also complaining that not having an API to change this, and only do 
it via config, was not good. But if the keystone limits API solves that 
then it's a non-issue.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [tripleo][openstack-ansible][nova][placement] Owners needed for placement extraction upgrade deployment tooling

2018-10-25 Thread Matt Riedemann


Hello OSA/TripleO people,

A plan/checklist was put in place at the Stein PTG for extracting 
placement from nova [1]. The first item in that list is done in grenade 
[2], which is the devstack-based upgrade project in the integrated gate. 
That should serve as a template for the necessary upgrade steps in 
deployment projects. The related devstack change for extracted placement 
on the master branch (Stein) is [3]. Note that change has some dependencies.


The second point in the plan from the PTG was getting extracted 
placement upgrade tooling support in a deployment project, notably 
TripleO (and/or OpenStackAnsible).


Given the grenade change is done and passing tests, TripleO/OSA should 
be able to start coding up and testing an upgrade step when going from 
Rocky to Stein. My question is who can we name as an owner in either 
project to start this work? Because we really need to be starting this 
as soon as possible to flush out any issues before they are too late to 
correct in Stein.


So if we have volunteers or better yet potential patches that I'm just 
not aware of, please speak up here so we know who to contact about 
status updates and if there are any questions with the upgrade.


[1] 
http://lists.openstack.org/pipermail/openstack-dev/2018-September/134541.html

[2] https://review.openstack.org/#/c/604454/
[3] https://review.openstack.org/#/c/600162/

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] nova cellsv2 and DBs / down cells / quotas

2018-10-25 Thread Matt Riedemann


On 10/24/2018 6:55 PM, Sam Morrison wrote:

I’ve been thinking of some hybrid cellsv1/v2 thing where we’d still have the 
top level api cell DB but the API would only ever read from it. Nova-api would 
only write to the compute cell DBs.
Then keep the nova-cells processes just doing instance_update_at_top to keep 
the nova-cell-api db up to date.


There was also the "read from searchlight" idea [1] but that died in Boston.

[1] 
https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/list-instances-using-searchlight.html


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][limits] Does ANYONE at all use the quota class functionality in Nova?

2018-10-24 Thread Matt Riedemann


On 10/24/2018 10:10 AM, Jay Pipes wrote:
I'd like to propose deprecating this API and getting rid of this 
functionality since it conflicts with the new Keystone /limits endpoint, 
is highly coupled with RAX's turnstile middleware and I can't seem to 
find anyone who has ever used it. Deprecating this API and functionality 
would make the transition to a saner quota management system much easier 
and straightforward.


I was trying to do this before it was cool:

https://review.openstack.org/#/c/411035/

I think it was the Pike PTG in ATL where people said, "meh, let's just 
wait for unified limits from keystone and let this rot on the vine".


I'd be happy to restore and update that spec.

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [goals][upgrade-checkers] Week R-25 Update

2018-10-23 Thread Matt Riedemann


On 10/23/2018 1:41 PM, Sean McGinnis wrote:

Yeah, but part of the reason for placeholders was consistency across all of
the services. I guess if there are never going to be upgrade checks in
adjutant then I could see skipping it, but otherwise I would prefer to at
least get the framework in place.


+1

Even if there is nothing to check at this point, I think having the facility
there is a benefit for projects and scripts that are going to be consuming
these checks. Having nothing to check, but having the status check there, is
going to be better than everything needing to keep a list of which projects to
run the checks on and which not.



Sure, that works for me as well. I'm not against adding placeholder/noop 
checks knowing that nothing is immediately obvious to replace those in 
Stein, but could be done later when the opportunity arises. If it's 
debatable on a per-project basis, then I'd defer to the core team for 
the project.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [goals][upgrade-checkers] Week R-25 Update

2018-10-23 Thread Matt Riedemann


On 10/23/2018 8:09 AM, Ben Nemec wrote:
Can't we just add a noop command like we are for the services that don't 
currently need upgrade checks?


We could, but I was also hoping that for most projects we will actually 
be able to replace the noop / placeholder check with *something* useful 
in Stein.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [goals][upgrade-checkers] Week R-25 Update

2018-10-22 Thread Matt Riedemann


On 10/22/2018 4:35 PM, Adrian Turjak wrote:

The one other open question I have is about the Adjutant change [2]. I
know Adjutant is very new and I'm not sure what upgrades look like for
that project, so I don't really know how valuable adding the upgrade
check framework is to that project. Is it like Horizon where it's
mostly stateless and fed off plugins? Because we don't have an upgrade
check CLI for Horizon for that reason.

[1]
https://review.openstack.org/#/q/topic:upgrade-checkers+(status:open+OR+status:merged)
[2]https://review.openstack.org/#/c/611812/


Adjutant's codebase is also going to be a bit unstable for the next few
cycles while we refactor some internals (we're not marking it 1.0 yet).
Once the current set of ugly refactors planned for late Stein are done I
may look at building some upgrade checking, once we also work out what
out upgrade checking should look like. Probably mostly checking config
changes, database migration states, and plugin compatibility.

Adjutant already has a concept of startup checks at least, which while
not anywhere near as extensive as they should be, mostly amount to
making sure your config file looks 'mostly' sane regarding plugins
before starting up the service, and we do intend to expand on that, plus
we can reuse a large chunk of that for upgrade checking.


OK it seems there is not really any point in trying to satisfy the 
upgrade checkers goal for Adjutant in Stein then. Should we just abandon 
the change?


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Metadata API cross joining "instance_metadata" and "instance_system_metadata"

2018-10-22 Thread Matt Riedemann


On 10/22/2018 11:59 AM, Matt Riedemann wrote:
Thanks for this. Have you debugged to the point of knowing where the 
initial DB query is starting from?


Looking at history, my guess is this is the change which introduced it 
for all requests:


https://review.openstack.org/#/c/276861/


From that change, as far as I can tell, we only needed to pre-join on 
metadata because of setting the "launch_metadata" variable:


https://review.openstack.org/#/c/276861/1/nova/api/metadata/base.py@145

I don't see anything directly using system_metadata, although that one 
is sometimes tricky and could be lazy-loaded elsewhere.


I do know that starting in ocata we use system_metadata for dynamic 
vendor metadata:


https://github.com/openstack/nova/blob/stable/ocata/nova/api/metadata/vendordata_dynamic.py#L85

Added in change: https://review.openstack.org/#/c/417780/

But if you don't provide vendor data then that should not be a problem.

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Metadata API cross joining "instance_metadata" and "instance_system_metadata"

2018-10-22 Thread Matt Riedemann

ociation_1` . `security_group_id`
                    AND `security_group_instance_association_1` . 
`deleted` = ?

                    AND `security_groups_1` . `deleted` = ? )
     ON `security_group_instance_association_1` . `instance_uuid` = 
`anon_1` . `instances_uuid`

    AND `anon_1` . `instances_deleted` = ?
   LEFT OUTER JOIN `security_group_rules` AS `security_group_rules_1`
     ON `security_group_rules_1` . `parent_group_id` = 
`security_groups_1` . `id`

    AND `security_group_rules_1` . `deleted` = ?
   LEFT OUTER JOIN `instance_info_caches` AS `instance_info_caches_1`
     ON `instance_info_caches_1` . `instance_uuid` = `anon_1` . 
`instances_uuid`

   LEFT OUTER JOIN `instance_extra` AS `instance_extra_1`
     ON `instance_extra_1` . `instance_uuid` = `anon_1` . `instances_uuid`
   LEFT OUTER JOIN `instance_metadata` AS `instance_metadata_1`
     ON `instance_metadata_1` . `instance_uuid` = `anon_1` . 
`instances_uuid`

    AND `instance_metadata_1` . `deleted` = ?



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



Thanks for this. Have you debugged to the point of knowing where the 
initial DB query is starting from?


Looking at history, my guess is this is the change which introduced it 
for all requests:


https://review.openstack.org/#/c/276861/

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [goals][upgrade-checkers] Week R-25 Update

2018-10-19 Thread Matt Riedemann

The big news this week is we have a couple of volunteer developers from 
NEC (Akhil Jain and Rajat Dhasmana) who are pushing the base framework 
changes across a lot of the projects [1]. I'm trying to review as many 
of these as I can. The request now is for the core teams on these 
projects to review them as well so we can keep moving, and then start 
thinking about non-placeholder specific checks for each project.


The one other open question I have is about the Adjutant change [2]. I 
know Adjutant is very new and I'm not sure what upgrades look like for 
that project, so I don't really know how valuable adding the upgrade 
check framework is to that project. Is it like Horizon where it's mostly 
stateless and fed off plugins? Because we don't have an upgrade check 
CLI for Horizon for that reason.


[1] 
https://review.openstack.org/#/q/topic:upgrade-checkers+(status:open+OR+status:merged)

[2] https://review.openstack.org/#/c/611812/

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Openstack-operators] [goals][upgrade-checkers] Week R-26 Update

2018-10-19 Thread Matt Riedemann

Top posting just to try and summarize my thought that for the goal in
Stein, I think we should focus on getting the base framework in place
for each service project, along with any non-config (including policy)
specific upgrade checks that make sense for each project.

As Ben mentioned, there are existing tools for validating config (I know
BlueBox used to use the fatal_deprecations config in their CI/CD
pipeline to know when they needed to change their deploy scripts because
deploying new code from pre-prod would fail). Once we get the basics
covered we can work, as a community, to figure out how best to integrate
config validation into upgrade checks, because I don't really think we
want to have upgrade checks that dump warnings for all deprecated
options in addition to what is already provided by oslo.config/log. I
have a feeling that would get so noisy that no one would ever pay
attention to it. I'm mostly interested in the scenario that config is
removed from code but still being set in the config file which could
fail an upgrade on service restart (if an alias was removed for
example), but I also tend to think those types of issues are case-by-case.

On 10/15/2018 3:29 PM, Ben Nemec wrote:

On 10/15/18 3:27 AM, Jean-Philippe Evrard wrote:

On Fri, 2018-10-12 at 17:05 -0500, Matt Riedemann wrote:

The big update this week is version 0.1.0 of oslo.upgradecheck was
released. The documentation along with usage examples can be found
here
[1]. A big thanks to Ben Nemec for getting that done since a few
projects were waiting for it.

In other updates, some changes were proposed in other projects [2].

And finally, Lance Bragstad and I had a discussion this week [3]
about
the validity of upgrade checks looking for deleted configuration
options. The main scenario I'm thinking about here is FFU where
someone
is going from Mitaka to Pike. Let's say a config option was
deprecated
in Newton and then removed in Ocata. As the operator is rolling
through
from Mitaka to Pike, they might have missed the deprecation signal
in
Newton and removal in Ocata. Does that mean we should have upgrade
checks that look at the configuration for deleted options, or
options
where the deprecated alias is removed? My thought is that if things
will
not work once they get to the target release and restart the service
code, which would definitely impact the upgrade, then checking for
those
scenarios is probably OK. If on the other hand the removed options
were
just tied to functionality that was removed and are otherwise not
causing any harm then I don't think we need a check for that. It was
noted that oslo.config has a new validation tool [4] so that would
take
care of some of this same work if run during upgrades. So I think
whether or not an upgrade check should be looking for config option
removal ultimately depends on the severity of what happens if the
manual
intervention to handle that removed option is not performed. That's
pretty broad, but these upgrade checks aren't really set in stone
for
what is applied to them. I'd like to get input from others on this,
especially operators and if they would find these types of checks
useful.

[1] https://docs.openstack.org/oslo.upgradecheck/latest/
[2] https://storyboard.openstack.org/#!/story/2003657
[3]
http://eavesdrop.openstack.org/irclogs/%23openstack-dev/%23openstack-dev.2018-10-10.log.html#t2018-10-10T15:17:17

[4]
http://lists.openstack.org/pipermail/openstack-dev/2018-October/135688.html

Hey,

Nice topic, thanks Matt!

TL:DR; I would rather fail explicitly for all removals, warning on all
deprecations. My concern is, by being more surgical, we'd have to
decide what's "not causing any harm" (and I think deployers/users are
best to determine what's not causing them any harm).
Also, it's probably more work to classify based on "severity".
The quick win here (for upgrade-checks) is not about being smart, but
being an exhaustive, standardized across projects, and _always used_
source of truth for upgrades, which is complemented by release notes.

Long answer:

At some point in the past, I was working full time on upgrades using
OpenStack-Ansible.

Our process was the following:
1) Read all the project's releases notes to find upgrade documentation
2) With said release notes, Adapt our deploy tools to handle the
upgrade, or/and write ourselves extra documentation+release notes for
our deployers.
3) Try the upgrade manually, fail because some release note was missing
x or y. Find root cause and retry from step 2 until success.

Here is where I see upgrade checkers improving things:
1) No need for deployment projects to parse all release notes for
configuration changes, as tooling to upgrade check would be directly
outputting things that need to change for scenario x or y that is
included in the deployment project. No need to iterate either.

2) Test real deployer use cases. The deployers using openstack-ansible
have ultimate flexibil

Re: [openstack-dev] [goals][upgrade-checkers] Call for Volunteers to work on upgrade-checkers stein goal

2018-10-17 Thread Matt Riedemann


On 10/16/2018 9:43 AM, Ghanshyam Mann wrote:

I was discussing with mriedem [1] about idea of building a volunteer team which 
can work with him on upgrade-checkers goal [2]. There are lot of work needed 
for this goal[3], few projects which does not have upgrade impact yet needs CLI 
framework with placeholder only and other projects with upgrade impact need 
actual upgrade checks implementation.

Idea is to build the volunteer team who can work with goal champion to finish 
the work early. This will help to share some work from goal champion as well 
from project side.

  - This email is request to call for volunteers (upstream developers from any 
projects) who can work closely with mriedem on upgrade-checkers goal.
  - Currently two developers has volunteered.
 1. Akhil Jain (IRC: akhil_jain, email:akhil.j...@india.nec.com)
 2. Rajat Dhasmana (IRC: whoami-rajat email:rajatdhasm...@gmail.com)
  - Anyone who would like to help on this work, feel free to reply this email 
or ping mriedem  on IRC.
  - As next step, mriedem will plan the work distribution to volunteers.


Thanks Ghanshyam.

As can be seen from the cyborg [1] and congress [2] changes posted by 
Rajat and Akhil, the initial framework changes are pretty trivial. The 
harder part is working with core teams / PTLs to determine which real 
upgrade checks should be added based on the release notes. But having 
the framework done as a baseline across all service projects is a great 
start.


[1] https://review.openstack.org/#/c/611368/
[2] https://review.openstack.org/#/c/66/

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [horizon][nova][cinder][keystone][glance][neutron][swift] Horizon feature gaps

2018-10-17 Thread Matt Riedemann


On 10/17/2018 9:24 AM, Ivan Kolodyazhny wrote:


As you may know, unfortunately, Horizon doesn't support all features 
provided by APIs. That's why we created feature gaps list [1].


I'd got a lot of great conversations with projects teams during the PTG 
and we tried to figure out what should be done prioritize these tasks. 
It's really helpful for Horizon to get feedback from other teams to 
understand what features should be implemented next.


While I'm filling launchpad with new bugs and blueprints for [1], it 
would be good to review this list again and find some volunteers to 
decrease feature gaps.


[1] https://etherpad.openstack.org/p/horizon-feature-gap

Thanks everybody for any of your contributions to Horizon.


+openstack-sigs
+openstack-operators

I've left some notes for nova. This looks very similar to the compute 
API OSC gap analysis I did [1]. Unfortunately it's hard to prioritize 
what to really work on without some user/operator feedback - maybe we 
can get the user work group involved in trying to help prioritize what 
people really want that is missing from horizon, at least for compute?


[1] https://etherpad.openstack.org/p/compute-api-microversion-gap-in-osc

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [placement] devstack, grenade, database management

2018-10-16 Thread Matt Riedemann

On 10/16/2018 5:48 AM, Chris Dent wrote:

* We need to address database creation scripts and database migrations.

There's a general consensus that we should use alembic, and start
things from a collapsed state. That is, we don't need to represent
already existing migrations in the new repo, just the present-day
structure of the tables.

Right now the devstack code relies on a stubbed out command line
tool at https://review.openstack.org/#/c/600161/ to create tables
with a metadata.create_all(). This is a useful thing to have but
doesn't follow the "db_sync" pattern set elsewhere, so I haven't
followed through on making it pretty but can do so if people think
it is useful. Whether we do that or not, we'll still need some
kind of "db_sync" command. Do people want me to make a cleaned up
"create" command?

Ed has expressed some interest in exploring setting up alembic and
the associated tools but that can easily be a more than one person
job. Is anyone else interested?

It would be great to get all this stuff working sooner than later.
Without it we can't do two important tasks:

* Integration tests with the extracted placement [1].
* Hacking on extracted placement in/with devstack.

Another thing that came up today in IRC [1] which is maybe not as
obvious from this email is what happens with the one online data
migration we have for placement (create_incomplete_consumers). If we
drop that online data migration from the placement repo, then ideally
we'd have something to check it's done before people upgrade to stein
and the extracted placement repo. There are some options there:
placement-manage db sync could fail if there are missing consumers or we
could simply have a placement-status upgrade check for it.

Another issue that needs some attention, but is not quite as urgent
is the desire to support other databases during the upgrade,
captured in this change

https://review.openstack.org/#/c/604028/

I have a grenade patch to test the postgresql-migrate-db.sh script now. [2]

[1]
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-10-16.log.html#t2018-10-16T19:37:25

[2] https://review.openstack.org/#/c/611020/

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Forum Schedule - Seeking Community Review

2018-10-15 Thread Matt Riedemann


On 10/15/2018 3:01 PM, Jimmy McArthur wrote:
The Forum schedule is now up 
(https://www.openstack.org/summit/berlin-2018/summit-schedule/#track=262). 
If you see a glaring content conflict within the Forum itself, please 
let me know.


Not a conflict, but it looks like there is a duplicate for Lee talking 
about encrypted volumes:


https://www.openstack.org/summit/berlin-2018/summit-schedule/global-search?t=yarwood

Unless he just loves it so much he needs to talk about it twice.

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [goals][upgrade-checkers] Week R-26 Update

2018-10-12 Thread Matt Riedemann

The big update this week is version 0.1.0 of oslo.upgradecheck was
released. The documentation along with usage examples can be found here
[1]. A big thanks to Ben Nemec for getting that done since a few
projects were waiting for it.

In other updates, some changes were proposed in other projects [2].

And finally, Lance Bragstad and I had a discussion this week [3] about
the validity of upgrade checks looking for deleted configuration
options. The main scenario I'm thinking about here is FFU where someone
is going from Mitaka to Pike. Let's say a config option was deprecated
in Newton and then removed in Ocata. As the operator is rolling through
from Mitaka to Pike, they might have missed the deprecation signal in
Newton and removal in Ocata. Does that mean we should have upgrade
checks that look at the configuration for deleted options, or options
where the deprecated alias is removed? My thought is that if things will
not work once they get to the target release and restart the service
code, which would definitely impact the upgrade, then checking for those
scenarios is probably OK. If on the other hand the removed options were
just tied to functionality that was removed and are otherwise not
causing any harm then I don't think we need a check for that. It was
noted that oslo.config has a new validation tool [4] so that would take
care of some of this same work if run during upgrades. So I think
whether or not an upgrade check should be looking for config option
removal ultimately depends on the severity of what happens if the manual
intervention to handle that removed option is not performed. That's
pretty broad, but these upgrade checks aren't really set in stone for
what is applied to them. I'd like to get input from others on this,
especially operators and if they would find these types of checks useful.

Thanks,

Matt

Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-10 Thread Matt Riedemann


On 10/10/2018 7:46 AM, Jay Pipes wrote:

2) in the old microversions change the blind allocation copy to gather
every resource from a nested source RPs too and try to allocate that
from the destination root RP. In nested allocation cases putting this
allocation to placement will fail and nova will fail the migration /
evacuation. However it will succeed if the server does not need nested
allocation neither on the source nor on the destination host (a.k.a the
legacy case). Or if the server has nested allocation on the source host
but does not need nested allocation on the destination host (for
example the dest host does not have nested RP tree yet).


I disagree on this. I'd rather just do a simple check for >1 provider in 
the allocations on the source and if True, fail hard.


The reverse (going from a non-nested source to a nested destination) 
will hard fail anyway on the destination because the POST /allocations 
won't work due to capacity exceeded (or failure to have any inventory at 
all for certain resource classes on the destination's root compute node).


I agree with Jay here. If we know the source has allocations on >1 
provider, just fail fast, why even walk the tree and try to claim those 
against the destination - the nested providers aren't going to be the 
same UUIDs on the destination, *and* trying to squash all of the source 
nested allocations into the single destination root provider and hope it 
works is super hacky and I don't think we should attempt that. Just fail 
if being forced and nested allocations exist on the source.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-10 Thread Matt Riedemann


On 10/9/2018 10:08 AM, Balázs Gibizer wrote:

Question for you as well: if we remove (or change) the force flag in a
new microversion then how should the old microversions behave when
nested allocations would be required?


Fail fast if we can detect we have nested. We don't support forcing 
those types of servers.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Rocky RC time regression analysis

2018-10-09 Thread Matt Riedemann


On 10/5/2018 6:59 PM, melanie witt wrote:
5) when live migration fails due to a internal error rollback is not 
handled correctly https://bugs.launchpad.net/nova/+bug/1788014


- Bug was reported on 2018-08-20
- The change that caused the regression landed on 2018-07-26, FF day 
https://review.openstack.org/434870

- Unrelated to a blueprint, the regression was part of a bug fix
- Was found because sean-k-mooney was doing live migrations and found 
that when a LM failed because of a QEMU internal error, the VM remained 
ACTIVE but the VM no longer had network connectivity.

- Question: why wasn't this caught earlier?
- Answer: We would need a live migration job scenario that intentionally 
initiates and fails a live migration, then verify network connectivity 
after the rollback occurs.

- Question: can we add something like that?


Not in Tempest, no, but we could run something in the 
nova-live-migration job since that executes via its own script. We could 
hack something in like what we have proposed for testing evacuate:


https://review.openstack.org/#/c/602174/

The trick is figuring out how to introduce a fault in the destination 
host without taking down the service, because if the compute service is 
down we won't schedule to it.




6) nova-manage db online_data_migrations hangs on instances with no host 
set https://bugs.launchpad.net/nova/+bug/1788115


- Bug was reported on 2018-08-21
- The patch that introduced the bug landed on 2018-05-30 
https://review.openstack.org/567878

- Unrelated to a blueprint, the regression was part of a bug fix
- Question: why wasn't this caught earlier?
- Answer: To hit the bug, you had to have had instances with no host set 
(that failed to schedule) in your database during an upgrade. This does 
not happen during the grenade job
- Question: could we add anything to the grenade job that would leave 
some instances with no host set to cover cases like this?


Probably - I'd think creating a server on the old side with some 
parameters that we know won't schedule would do it, maybe requesting an 
AZ that doesn't exist, or some other kind of scheduler hint that we know 
won't work so we get a NoValidHost. However, online_data_migrations in 
grenade probably don't run on the cell0 database, so I'm not sure we 
would have caught that case.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron][stable] Stable Core Team Update

2018-10-09 Thread Matt Riedemann


On 10/9/2018 11:08 AM, Miguel Lavalle wrote:
Since it has been more than a week since this nomination was posted and 
we have received only positive feedback, can we move ahead and add 
Bernard Cafarelli to Neutron Stable core team?


Done:

https://review.openstack.org/#/admin/groups/539,members

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [cinder] [nova] Do we need a "force" parameter in cinder "re-image" API?

2018-10-09 Thread Matt Riedemann


On 10/9/2018 8:04 AM, Erlon Cruz wrote:
If you are planning to re-image an image on a bootable volume then yes 
you should use a force parameter. I have lost the discussion about this 
on PTG. What is the main use cases? This seems to me something that 
could be leveraged with the current revert-to-snapshot API, which would 
be even better. The flow would be:


1 - create a volume from image
2 - create an snapshot
3 - do whatever you wan't
4 - revert the snapshot

Would that help in your the use cases?


As the spec mentions, this is for enabling re-imaging the root volume on 
a server when nova rebuilds the server. That is not allowed today 
because the compute service can't re-image the root volume. We don't 
want to jump through a bunch of gross alternative hoops to create a new 
root volume with the new image and swap them out (the reasons why are in 
the spec, and have been discussed previously in the ML). So nova is 
asking cinder to provide an API to change the image in a volume which 
the nova rebuild operation will use to re-image the root volume on a 
volume-backed server. I don't know if revert-to-snapshot solves that use 
case, but it doesn't sound like it. With the nova rebuild API, the user 
provides an image reference and that is used to re-image the root disk 
on the server. So it might not be a snapshot, it could be something new.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [stable][octavia] Backport patch adding new configuration options

2018-10-08 Thread Matt Riedemann


On 10/8/2018 11:05 AM, Carlos Goncalves wrote:
The Octavia team merged a patch in master [1] that fixed an issue where 
load balancers could be deleted whenever queue_event_streamer driver is 
enabled and RabbitMQ goes down [2].


As this is a critical bug, we would like to backport as much back as 
possible. The question is whether these backports comply with the stable 
policy because it adds two new configuration options and deprecates one. 
The patch was prepared so that the deprecated option has precedence if 
set over the other two.


Reading the review guidelines [3], I only see "Incompatible config file 
changes" as relevant, but the patch doesn't seem to go against that. We 
had a patch that added a new config option backported to Queens that 
raised some concern, so we'd like to be on the safe side this time ;-)


We'd appreciate guidance to whether such backports are acceptable or not.



Well, a few things:

* I would have introduced the new config options as part of the bug fix 
but *not* deprecated the existing option in the same change but rather 
as a follow up. Then the new options, which do nothing by default (?), 
could be backported and the deprecation would remain on master.


* The release note mentions the new options as a feature, but that's not 
really correct is it? They are for fixing a bug, not new feature 
functionality as much.


In general, as long as the new options don't introduce new behavior by 
default for existing configuration (as you said, the existing option 
takes precedence if set), and don't require configuration then it should 
be OK to backport those new options. But the sticky parts here are (1) 
deprecating an option on stable (we shouldn't do that) and (2) the 
release note mentioning a feature.


What I'd probably do is (1) change the 'feature' release note to a 
'fixes' release note on master and then (2) backport the change but (a) 
drop the deprecation and (b) fix the release note in the backport to not 
call out a feature (since it's not a feature I don't think?) - and just 
make it clear with a note in the backport commit message why the 
backport is different from the original change.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [goals][upgrade-checkers] Oslo library status

2018-10-08 Thread Matt Riedemann


On 10/7/2018 4:10 AM, Slawomir Kaplonski wrote:

I start working on „neutron-status upgrade check” tool with noop operation for 
now. Patch is in [1]
I started using this new oslo_upgradecheck library in version 0.0.1.dev15 which 
is available on pypi.org but I see that in master there are some changes 
already (like shorted names of base classes).
So my question here is: should I just wait a bit more for kind of „stable” 
version of this lib and then push neutron patch to review (do You have any eta 
for that?), or maybe we shouldn’t rely on this oslo library in this release and 
implement all on our own, like it is done currently in nova?

[1]https://review.openstack.org/#/c/608444/


I would wait. I think there are just a couple of changes we need to get 
into the library (one of which changes the interface) and then we can do 
a release. Sean McGinnis is waiting on the release for Cinder as well.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][stable] Preparing for ocata-em (extended maintenance)

2018-10-05 Thread Matt Riedemann


The ocata-em tag request is up for review:

https://review.openstack.org/#/c/608296/

On 9/28/2018 11:21 AM, Matt Riedemann wrote:
Per the other thread on this [1] I've created an etherpad [2] to track 
what needs to happen to get nova's stable/ocata branch ready for 
Extended Maintenance [3] which means we need to flush our existing Ocata 
backports that we want in the final Ocata release before tagging the 
branch as ocata-em, after which point we won't do releases from that 
branch anymore.


The etherpad lists each open ocata backport along with any of its 
related backports on newer branches like pike/queens/etc. Since we need 
the backports to go in order, we need to review and merge the changes on 
the newer branches first. With the state of the gate lately, we really 
can't sit on our hands here because it will probably take up to a week 
just to merge all of the changes for each branch.


Once the Ocata backports are flushed through, we'll cut the final 
release and tag the branch as being in extended maintenance.


Do we want to coordinate a review day next week for the 
nova-stable-maint core team, like Tuesday, or just trust that you all 
know who you are and will help out as necessary in getting these reviews 
done? Non-stable cores are also welcome to help review here to make sure 
we're not missing something, which is also a good way to get noticed as 
caring about stable branches and eventually get you on the stable maint 
core team.


[1] 
http://lists.openstack.org/pipermail/openstack-dev/2018-September/thread.html#134810 


[2] https://etherpad.openstack.org/p/nova-ocata-em
[3] 
https://docs.openstack.org/project-team-guide/stable-branches.html#extended-maintenance



--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [goals][python3][heat][stable] how should we proceed with ocata branch


On 10/3/2018 11:21 AM, Zane Bitter wrote:

That patch is the only thing blocking the cleanup patch in
project-config, so I would like to get a definitive answer about what to
do. Should we close the branch, or does someone want to try to fix
things up?


I think we agreed on closing the branch, and Rico was looking into the 
procedure for how to actually do that.



Doug

[1]https://review.openstack.org/#/c/597272/


I'm assuming heat-agents is a service, not a library, since it doesn't 
show up in upper-constraints.


It's a guest agent, so neither :)

Based on that, does heat itself plan on putting its stable/ocata 
branch into extended maintenance mode and if 


Wearing my Red Hat hat, I would be happy to EOL it. But wearing my 
upstream hat, I'm happy to keep maintaining it, and I was not proposing 
that we EOL heat's stable/ocata as well.


so, does that mean EOLing the heat-agents stable/ocata branch could 
cause problems for the heat stable/ocata branch? In other words, will 
it be reasonable to run CI for stable/ocata heat changes against a 
heat-agents ocata-eol tag?


I don't think that's a problem. The guest agents rarely change, and I 
don't think there's ever been a patch backported by 4 releases.


OK, cool, sounds like killing the heat-agent ocata branch is the thing 
to do then.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [cinder] Proposing Gorka Eguileor to Stable Core ...


On 10/3/2018 9:45 AM, Jay S. Bryant wrote:

Team,

We had discussed the possibility of adding Gorka to the stable core team 
during the PTG.  He does review a number of our backport patches and is 
active in that area.


If there are no objections in the next week I will add him to the list.

Thanks!

Jay (jungleboyj)


+1 from me in the stable-maint-core peanut gallery.

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova] We should fail to boot a server if PF passthrough is requested and we don't honor it, right?

I came across [1] today while triaging a bug [2]. Unless I'm mistaken, 
the user has requested SR-IOV PF passthrough for their server and for 
whatever reason we can't find the PCI device for the PF passthrough port 
so we don't reflect the actual device MAC address on the port. Is that 
worth stopping the server create? Or is logging an ERROR enough here?


The reason being we get an IndexError here [3]. Ultimately if we found a 
PCI device but it's not whitelisted, we'll raise an exception anyway 
when building the port binding profile [4].


So is it reasonable to just raise PciDeviceNotFound whenever we can't 
find a PCI device on a compute host given a pci_request_id? In other 
words, it seems something failed earlier during scheduling and/or the 
PCI device resource claim if we get this far and things are still messed up.


[1] 
https://github.com/openstack/nova/blob/237ced4737a28728408eb30c3d20c6b2c13b4a8d/nova/network/neutronv2/api.py#L1426

[2] https://bugs.launchpad.net/nova/+bug/1795064
[3] 
https://github.com/openstack/nova/blob/237ced4737a28728408eb30c3d20c6b2c13b4a8d/nova/network/neutronv2/api.py#L1404
[4] 
https://github.com/openstack/nova/blob/237ced4737a28728408eb30c3d20c6b2c13b4a8d/nova/network/neutronv2/api.py#L1393


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [goals][python3][heat][stable] how should we proceed with ocata branch


On 10/3/2018 7:58 AM, Doug Hellmann wrote:

There is one more patch to import the zuul configuration for the
heat-agents repository's stable/ocata branch. That branch is apparently
broken, and Zane suggested on the review [1] that we abandon the patch
and close the branch.

That patch is the only thing blocking the cleanup patch in
project-config, so I would like to get a definitive answer about what to
do. Should we close the branch, or does someone want to try to fix
things up?

Doug

[1]https://review.openstack.org/#/c/597272/


I'm assuming heat-agents is a service, not a library, since it doesn't 
show up in upper-constraints. Based on that, does heat itself plan on 
putting its stable/ocata branch into extended maintenance mode and if 
so, does that mean EOLing the heat-agents stable/ocata branch could 
cause problems for the heat stable/ocata branch? In other words, will it 
be reasonable to run CI for stable/ocata heat changes against a 
heat-agents ocata-eol tag?


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron][stable] Stable Core Team Update

2018-10-02 Thread Matt Riedemann


On 10/2/2018 10:41 AM, Miguel Lavalle wrote:

Hi Stable Team,

I want to nominate Bernard Cafarrelli as a stable core reviewer for 
Neutron and related projects. Bernard has been increasing the number of 
stable reviews he is doing for the project [1]. Besides that, he is a 
stable maintainer downstream for his employer (Red Hat), so he can bring 
that valuable experience to the Neutron stable team.


Thanks and regards

Miguel

[1] 
https://review.openstack.org/#/q/(project:openstack/neutron+OR+openstack/networking-sfc+OR+project:openstack/networking-ovn)++branch:%255Estable/.*+reviewedby:%22Bernard+Cafarelli+%253Cbcafarel%2540redhat.com%253E%22 
<https://review.openstack.org/#/q/%28project:openstack/neutron+OR+openstack/networking-sfc+OR+project:openstack/networking-ovn%29++branch:%255Estable/.*+reviewedby:%22Bernard+Cafarelli+%253Cbcafarel%2540redhat.com%253E%22>


+1 from me.

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Placement extraction update

2018-09-30 Thread Matt Riedemann


On 9/30/2018 11:02 AM, Matt Riedemann wrote:
Maybe that's some conditional branch logic we can hack into 
devstack-gate [7] like we do for neutron? [8]


I'm hoping this works:

https://review.openstack.org/#/c/606853/

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] Placement extraction update

2018-09-30 Thread Matt Riedemann

I finally got a passing neutron-grenade run in change [1]. That's the 
grenade change which populates the placement DB in Stein from the 
placement-related table contents of the nova_api DB from Rocky. It also 
writes out the placement.conf file for Stein before starting the Stein 
services.


As a result, I'm +2 on Dan's mysql-migrate-db.sh script [2].

The grenade change is also dependent on three other changes for neutron 
[3], ironic [4] and heat [5] grenade jobs to require the 
openstack/placement project when zuul/devstack-gate clones its required 
projects before running grenade.sh.


Those are just the related project grenade jobs that are hit as part of 
the grenade patch. There could be others I'm missing, which means 
projects might need to update their grenade job definitions after the 
grenade change merges. It looks like that could be quite a few projects 
[6]. If the infra/QA teams have a better idea of how to require 
openstack/placement in stein+ only, I'm all ears. Maybe that's some 
conditional branch logic we can hack into devstack-gate [7] like we do 
for neutron? [8]


[1] https://review.openstack.org/#/c/604454/
[2] https://review.openstack.org/#/c/603234/
[3] https://review.openstack.org/#/c/604458/
[4] https://review.openstack.org/#/c/606850/
[5] https://review.openstack.org/#/c/606851/
[6] 
http://codesearch.openstack.org/?q=export%20PROJECTS%3D%22openstack-dev%5C%2Fgrenade%20%5C%24PROJECTS%22=nope==
[7] 
https://github.com/openstack-infra/devstack-gate/blob/95fa4343104eafa655375cce3546d27139211d13/devstack-vm-gate-wrap.sh#L138
[8] 
https://github.com/openstack-infra/devstack-gate/blob/95fa4343104eafa655375cce3546d27139211d13/devstack-vm-gate-wrap.sh#L195


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][python-novaclient] A Test issue in python-novaclient.

2018-09-30 Thread Matt Riedemann


On 9/29/2018 10:01 PM, Tao Li wrote:
I found this test is added about ten days ago in this patch 
https://review.openstack.org/#/c/599276/,


I checked it and don’t know why it failed. I think my commit shouldn’t 
cause this issue. So do you any suggestion to me?




Yes it must be an intermittent race bug introduced by that change for 
the 2.66 microversion. Since it deals with filtering based on time, we 
might not have a time window that is big enough (we expect to get a 
result of changes < $before but are getting <= $before).


http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22%7C%20%20%20%20%20testtools.matchers._impl.MismatchError%3A%20%5B'create'%5D%20!%3D%20%5B'create'%2C%20'stop'%5D%5C%22%20AND%20tags%3A%5C%22console%5C%22=7d

Please report a bug against python-novaclient.

The 2.66 test is based on a similar changes_since test, so we should see 
why they are behaving differently.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova][cinder][qa] Should we enable multiattach in tempest-full?

2018-09-29 Thread Matt Riedemann

Nova, cinder and tempest run the nova-multiattach job in their check and 
gate queues. The job was added in Queens and was a specific job because 
we had to change the ubuntu cloud archive we used in Queens to get 
multiattach working. Since Rocky, devstack defaults to a version of the 
UCA that works for multiattach, so there isn't really anything 
preventing us from running the tempest multiattach tests in the 
integrated gate. The job tries to be as minimal as possible by only 
running tempest.api.compute.* tests, but it still means spinning up a 
new node and devstack for testing.


Given the state of the gate recently, I'm thinking it would be good if 
we dropped the nova-multiattach job in Stein and just enable the 
multiattach tests in one of the other integrated gate jobs. I initially 
was just going to enable it in the nova-next job, but we don't run that 
on cinder or tempest changes. I'm not sure if tempest-full is a good 
place for this though since that job already runs a lot of tests and has 
been timing out a lot lately [1][2].


The tempest-slow job is another option, but cinder doesn't currently run 
that job (it probably should since it runs volume-related tests, 
including the only tempest tests that use encrypted volumes).


Are there other ideas/options for enabling multiattach in another job 
that nova/cinder/tempest already use so we can drop the now mostly 
redundant nova-multiattach job?


[1] http://status.openstack.org/elastic-recheck/#1686542
[2] http://status.openstack.org/elastic-recheck/#1783405

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [goals][upgrade-checkers] Week R-28 Update

There isn't really anything to report this week. There are no new 
changes up for review that I'm aware of. If your team has posted changes 
for your project, please update the related task in the story [1].


I'm also waiting for some feedback from glance-minded people about [2].

[1] https://storyboard.openstack.org/#!/story/2003657
[2] 
http://lists.openstack.org/pipermail/openstack-dev/2018-September/135025.html


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] Zuul job backlog


On 9/28/2018 3:12 PM, Clark Boylan wrote:

I was asked to write a followup to this as the long Zuul queues have persisted 
through this week. Largely because the situation from last week hasn't changed 
much. We were down the upgraded cloud region while we worked around a network 
configuration bug, then once that was addressed we ran into neutron port 
assignment and deletion issues. We think these are both fixed and we are 
running in this region again as of today.

Other good news is our classification rate is up significantly. We can use that 
information to go through the top identified gate bugs:

Network Connectivity issues to test nodes [2]. This is the current top of the 
list, but I think its impact is relatively small. What is happening here is 
jobs fail to connect to their test nodes early in the pre-run playbook and then 
fail. Zuul will rerun these jobs for us because they failed in the pre-run 
step. Prior to zuulv3 we had nodepool run a ready script before marking test 
nodes as ready, this script would've caught and filtered out these broken 
network nodes early. We now notice them late during the pre-run of a job.

Pip fails to find distribution for package [3]. Earlier in the week we had the in 
region mirror fail in two different regions for unrelated errors. These mirrors 
were fixed and the only other hits for this bug come from Ara which tried to 
install the 'black' package on python3.5 but this package requires python>=3.6.

yum, no more mirrors to try [4]. At first glance this appears to be an 
infrastructure issue because the mirror isn't serving content to yum. On 
further investigation it turned out to be a DNS resolution issue caused by the 
installation of designate in the tripleo jobs. Tripleo is aware of this issue 
and working to correct it.

Stackviz failing on py3 [5]. This is a real bug in stackviz caused by subunit 
data being binary not utf8 encoded strings. I've written a fix for this problem 
athttps://review.openstack.org/606184, but in doing so found that this was a 
known issue back in March and there was already a proposed 
fix,https://review.openstack.org/#/c/555388/3. It would be helpful if the QA 
team could care for this project and get a fix in. Otherwise, we should 
consider disabling stackviz on our tempest jobs (though the output from 
stackviz is often useful).

There are other bugs being tracked by e-r. Some are bugs in the openstack 
software and I'm sure some are also bugs in the infrastructure. I have not yet 
had the time to work through the others though. It would be helpful if project 
teams could prioritize the debugging and fixing of these issues though.

[2]http://status.openstack.org/elastic-recheck/gate.html#1793370
[3]http://status.openstack.org/elastic-recheck/gate.html#1449136
[4]http://status.openstack.org/elastic-recheck/gate.html#1708704
[5]http://status.openstack.org/elastic-recheck/gate.html#1758054


Thanks for the update Clark.

Another thing this week is the logstash indexing is behind by at least 
half a day. That's because workers were hitting OOM errors due to giant 
screen log files that aren't formatted properly so that we only index 
INFO+ level logs, and were instead trying to index the entire file, 
which some of which are 33MB *compressed*. So indexing of those 
identified problematic screen logs has been disabled:


https://review.openstack.org/#/c/606197/

I've reported bugs against each related project.

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Are we ready to put stable/ocata into extended maintenance mode?


On 9/21/2018 9:08 AM, Elõd Illés wrote:

Hi,

Here is an etherpad with the teams that have stable:follow-policy tag on 
their repos:


https://etherpad.openstack.org/p/ocata-final-release-before-em

On the links you can find reports about the open and unreleased changes, 
that could be a useful input for the before-EM/final release.
Please have a look at the report (and review the open patches if there 
are) so that a release can be made if necessary.


Thanks,

Előd


I've added nova's ocata-em tracking etherpad to the list.

https://etherpad.openstack.org/p/nova-ocata-em

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova][stable] Preparing for ocata-em (extended maintenance)

Per the other thread on this [1] I've created an etherpad [2] to track 
what needs to happen to get nova's stable/ocata branch ready for 
Extended Maintenance [3] which means we need to flush our existing Ocata 
backports that we want in the final Ocata release before tagging the 
branch as ocata-em, after which point we won't do releases from that 
branch anymore.


The etherpad lists each open ocata backport along with any of its 
related backports on newer branches like pike/queens/etc. Since we need 
the backports to go in order, we need to review and merge the changes on 
the newer branches first. With the state of the gate lately, we really 
can't sit on our hands here because it will probably take up to a week 
just to merge all of the changes for each branch.


Once the Ocata backports are flushed through, we'll cut the final 
release and tag the branch as being in extended maintenance.


Do we want to coordinate a review day next week for the 
nova-stable-maint core team, like Tuesday, or just trust that you all 
know who you are and will help out as necessary in getting these reviews 
done? Non-stable cores are also welcome to help review here to make sure 
we're not missing something, which is also a good way to get noticed as 
caring about stable branches and eventually get you on the stable maint 
core team.


[1] 
http://lists.openstack.org/pipermail/openstack-dev/2018-September/thread.html#134810

[2] https://etherpad.openstack.org/p/nova-ocata-em
[3] 
https://docs.openstack.org/project-team-guide/stable-branches.html#extended-maintenance


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Openstack-operators] [ironic] [nova] [tripleo] Deprecation of Nova's integration with Ironic Capabilities and ComputeCapabilitiesFilter


On 9/27/2018 3:02 PM, Jay Pipes wrote:
A great example of this would be the proposed "deploy template" from 
[2]. This is nothing more than abusing the placement traits API in order 
to allow passthrough of instance configuration data from the nova flavor 
extra spec directly into the nodes.instance_info field in the Ironic 
database. It's a hack that is abusing the entire concept of the 
placement traits concept, IMHO.


We should have a way *in Nova* of allowing instance configuration 
key/value information to be passed through to the virt driver's spawn() 
method, much the same way we provide for user_data that gets exposed 
after boot to the guest instance via configdrive or the metadata service 
API. What this deploy template thing is is just a hack to get around the 
fact that nova doesn't have a basic way of passing through some collated 
instance configuration key/value information, which is a darn shame and 
I'm really kind of annoyed with myself for not noticing this sooner. :(


We talked about this in Dublin through right? We said a good thing to do 
would be to have some kind of template/profile/config/whatever stored 
off in glare where schema could be registered on that thing, and then 
you pass a handle (ID reference) to that to nova when creating the 
(baremetal) server, nova pulls it down from glare and hands it off to 
the virt driver. It's just that no one is doing that work.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Openstack-sigs] [goals][tc][ptl][uc] starting goal selection for T series


On 9/27/2018 2:33 PM, Fox, Kevin M wrote:

If the project plugins were maintained by the OSC project still, maybe there 
would be incentive for the various other projects to join the OSC project, 
scaling things up?


Sure, I don't really care about governance. But I also don't really care 
about all of the non-compute API things in OSC either.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Openstack-sigs] [goals][tc][ptl][uc] starting goal selection for T series


On 9/27/2018 10:13 AM, Dean Troyer wrote:

On Thu, Sep 27, 2018 at 9:10 AM, Doug Hellmann  wrote:

Monty Taylor  writes:

Main difference is making sure these new deconstructed plugin teams
understand the client support lifecycle - which is that we don't drop
support for old versions of services in OSC (or SDK). It's a shift from
the support lifecycle and POV of python-*client, but it's important and
we just need to all be on the same page.

That sounds like a reason to keep the governance of the libraries under
the client tool project.

Hmmm... I think that may address a big chunk of my reservations about
being able to maintain consistency and user experience in a fully
split-OSC world.

dt


My biggest worry with splitting everything out into plugins with new 
core teams, even with python-openstackclient-core as a superset, is that 
those core teams will all start approving things that don't fit with the 
overall guidelines of how OSC commands should be written. I've had to go 
to the "Dean well" several times when reviewing osc-placement commands.


But the python-openstackclient-core team probably isn't going to scale 
to fit the need of all of these gaps that need closing from the various 
teams, either. So how does that get fixed? I've told Dean and Steve 
before that if they want me to review / ack something compute-specific 
in OSC that they can call on me, like a liaison. Maybe that's all we 
need to start? Because I've definitely disagreed with compute CLI 
changes in OSC that have a +2 from the core team because of a lack of 
understanding from both the contributor and the reviewers about what the 
compute API actually does, or how a microversion behaves. Or maybe we 
just do some kind of subteam thing where OSC core doesn't look at a 
change until the subteam has +1ed it. We have a similar concept in nova 
with virt driver subteams.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Stein PTG summary

On 9/27/2018 5:23 AM, Sylvain Bauza wrote:

On Thu, Sep 27, 2018 at 2:46 AM Matt Riedemann <mailto:mriede...@gmail.com>> wrote:

On 9/26/2018 5:30 PM, Sylvain Bauza wrote:
 > So, during this day, we also discussed about NUMA affinity and we
said
 > that we could possibly use nested resource providers for NUMA
cells in
 > Stein, but given we don't have yet a specific Placement API
query, NUMA
 > affinity should still be using the NUMATopologyFilter.
 > That said, when looking about how to use this filter for vGPUs,
it looks
 > to me that I'd need to provide a new version for the NUMACell
object and
 > modify the virt.hardware module. Are we also accepting this
(given it's
 > a temporary question), or should we need to wait for the
Placement API
 > support ?
 >
 > Folks, what are you thoughts ?

I'm pretty sure we've said several times already that modeling NUMA in
Placement is not something for which we're holding up the extraction.

It's not an extraction question. Just about knowing whether the Nova 
folks would accept us to modify some o.vo object and module just for a 
temporary time until Placement API has some new query parameter.
Whether Placement is extracted or not isn't really the problem, it's 
more about the time it will take for this query parameter ("numbered 
request groups to be in the same subtree") to be implemented in the 
Placement API.
The real problem we have with vGPUs is that if we don't have NUMA 
affinity, the performance would be around 10% less for vGPUs (if the 
pGPU isn't on the same NUMA cell than the pCPU). Not sure large 
operators would accept that :(

-Sylvain

I don't know how close we are to having whatever we need for modeling 
NUMA in the placement API, but I'll go out on a limb and assume we're 
not close. Given that, if we have to do something within nova for NUMA 
affinity for vGPUs for the NUMATopologyFilter, then I'd be OK with that 
since it's short term like you said (although our "short term" 
workarounds tend to last for many releases). Anyone that cares about 
NUMA today already has to enable the scheduler filter anyway.

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Stein PTG summary

2018-09-26 Thread Matt Riedemann


On 9/26/2018 5:30 PM, Sylvain Bauza wrote:
So, during this day, we also discussed about NUMA affinity and we said 
that we could possibly use nested resource providers for NUMA cells in 
Stein, but given we don't have yet a specific Placement API query, NUMA 
affinity should still be using the NUMATopologyFilter.
That said, when looking about how to use this filter for vGPUs, it looks 
to me that I'd need to provide a new version for the NUMACell object and 
modify the virt.hardware module. Are we also accepting this (given it's 
a temporary question), or should we need to wait for the Placement API 
support ?


Folks, what are you thoughts ?


I'm pretty sure we've said several times already that modeling NUMA in 
Placement is not something for which we're holding up the extraction.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [goals][tc][ptl][uc] starting goal selection for T series

2018-09-26 Thread Matt Riedemann

s before OSC was created (nova/cinder/glance/keystone). For 
newer projects, like placement, it's not a problem because they never 
created any other CLI outside of OSC.


[1] https://etherpad.openstack.org/p/compute-api-microversion-gap-in-osc
[2] https://etherpad.openstack.org/p/nova-ptg-stein (~L721)

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ironic] [nova] [tripleo] Deprecation of Nova's integration with Ironic Capabilities and ComputeCapabilitiesFilter

2018-09-25 Thread Matt Riedemann


On 9/25/2018 8:36 AM, John Garbutt wrote:

Another thing is about existing flavors configured for these
capabilities-scoped specs. Are you saying during the deprecation we'd
continue to use those even if the filter is disabled? In the review I
had suggested that we add a pre-upgrade check which inspects the
flavors
and if any of these are found, we report a warning meaning those
flavors
need to be updated to use traits rather than capabilities. Would
that be
reasonable?


I like the idea of a warning, but there are features that have not yet 
moved to traits:

https://specs.openstack.org/openstack/ironic-specs/specs/juno-implemented/uefi-boot-for-ironic.html

There is a more general plan that will help, but its not quite ready yet:
https://review.openstack.org/#/c/504952/

As such, I think we can't get pull the plug on flavors including 
capabilities and passing them to Ironic, but (after a cycle of 
deprecation) I think we can now stop pushing capabilities from Ironic 
into Nova and using them for placement.


Forgive my ignorance, but if traits are not on par with capabilities, 
why are we deprecating the capabilities filter?


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [penstack-dev]Discussion about the future of OpenStack in China

2018-09-24 Thread Matt Riedemann


On 9/24/2018 12:12 PM, Jay Pipes wrote:
There were a couple points that I did manage to decipher, though. One 
thing that both articles seemed to say was that OpenStack doesn't meet 
public (AWS-ish) cloud use cases and OpenStack doesn't compare favorably 
to VMWare either.


Yeah I picked up on that also - trying to be all things to all people 
means we do less well at any single thing. No surprises there.




Is there a large contingent of Chinese OpenStack users that expect 
OpenStack to be a free (as in beer) version of VMware technology?


What are the 3 most important features that Chinese OpenStack users 
would like to see included in OpenStack projects?


Yeah I picked up on a few things as well. The article was talking about 
gaps in upstream services:


a) they did a bunch of work on trove for their dbaas solution, but did 
they contribute any of that work?


b) they mentioned a lack of DRS and HA support, but didn't mention the 
Watcher or Masakari projects - maybe they didn't know they exist?


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [glance][upgrade-checkers] Question about glance rocky upgrade release note

2018-09-24 Thread Matt Riedemann


On 9/24/2018 2:06 PM, Matt Riedemann wrote:
Are there more specific docs about how to configure the 'image import' 
feature so that I can be sure I'm careful? In other words, are there 
specific things a "glance-status upgrade check" check could look at and 
say, "your image import configuration is broken, here are details on how 
you should do this"?


I guess this answers the question about docs:

https://docs.openstack.org/glance/latest/admin/interoperable-image-import.html

Would a basic upgrade check be such that if glance-api.conf contains 
enable_image_import=False, you're going to have issues since that option 
is removed in Rocky?


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [glance][upgrade-checkers] Question about glance rocky upgrade release note

2018-09-24 Thread Matt Riedemann

Looking at the upgrade-checkers goal [1] for glance and the Rocky 
upgrade release notes [2], one upgrade note says:


"As Image Import will be always enabled, care needs to be taken that it 
is configured properly from this release forward. The 
‘enable_image_import’ option is silently ignored."


Are there more specific docs about how to configure the 'image import' 
feature so that I can be sure I'm careful? In other words, are there 
specific things a "glance-status upgrade check" check could look at and 
say, "your image import configuration is broken, here are details on how 
you should do this"?


I'm willing to help write the upgrade check for glance, but need more 
details on that release note.


[1] https://storyboard.openstack.org/#!/story/2003657
[2] https://docs.openstack.org/releasenotes/glance/rocky.html#upgrade-notes

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [goals][upgrade-checkers] Week R-29 Update

2018-09-22 Thread Matt Riedemann


On 9/21/2018 4:19 PM, Ben Nemec wrote:
* The only two projects that I'm aware of with patches up at this 
point are monasca [2] and designate [3]. The monasca one is tricky 
because as I've found going through release notes for some projects, 
they don't really have any major upgrade impacts so writing checks is 
not obvious. I don't have a great solution here. What monasca has done 
is add the framework with a noop check. If others are in the same 
situation, I'd like to hear your thoughts on what you think makes 
sense here. The alternative is these projects opt out of the goal for 
Stein and just add the check code later when it makes sense (but 
people might forget or not care to do that later if it's not a goal).


My inclination is for the command to exist with a noop check, the main 
reason being that if we create it for everyone this cycle then the 
deployment tools can implement calls to the status commands all at once. 
If we wait until checks are needed then someone has to not only 
implement it in the service but also remember to go update all of the 
deployment tools. Implementing a noop check should be pretty trivial 
with the library so it isn't a huge imposition.


Yeah, I agree, and I've left comments on the patch to give some ideas on 
how to write the noop check with a description that explains it's an 
initial check but doesn't really do anything. The alternative would be 
to dump the table header for the results but then not have any rows, 
which could be more confusing.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [goals][upgrade-checkers] Week R-29 Update

2018-09-21 Thread Matt Riedemann


On 9/21/2018 3:53 PM, Matt Riedemann wrote:
* The reference docs I wrote for writing upgrade checks is published now 
[4]. As I've been answering some questions in storyboard and IRC, it's 
obvious that I need to add some FAQs into those docs because I've taken 
some of this for granted on how it works in nova, so I'll push a docs 
change for some of that as well and link it back into the story.


https://review.openstack.org/#/c/604486/ for anyone that thinks I missed 
something.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [goals][upgrade-checkers] Week R-29 Update

2018-09-21 Thread Matt Riedemann


Updates for this week:

* As bnemec noted in the last update [1], he's making some progress with 
the oslo.upgradecheck library. He's retrofitting the nova-status upgrade 
check code to use the library and has a patch up for designate to use it.


* The only two projects that I'm aware of with patches up at this point 
are monasca [2] and designate [3]. The monasca one is tricky because as 
I've found going through release notes for some projects, they don't 
really have any major upgrade impacts so writing checks is not obvious. 
I don't have a great solution here. What monasca has done is add the 
framework with a noop check. If others are in the same situation, I'd 
like to hear your thoughts on what you think makes sense here. The 
alternative is these projects opt out of the goal for Stein and just add 
the check code later when it makes sense (but people might forget or not 
care to do that later if it's not a goal).


* The reference docs I wrote for writing upgrade checks is published now 
[4]. As I've been answering some questions in storyboard and IRC, it's 
obvious that I need to add some FAQs into those docs because I've taken 
some of this for granted on how it works in nova, so I'll push a docs 
change for some of that as well and link it back into the story.


As always, feel free to reach out to me with any questions or issues you 
might have with completing this goal (or just getting started).


[1] 
http://lists.openstack.org/pipermail/openstack-dev/2018-September/134972.html

[2] https://review.openstack.org/#/c/603465/
[3] https://review.openstack.org/#/c/604430/
[4] https://docs.openstack.org/nova/latest/reference/upgrade-checks.html

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] OpenStack Project Navigator

2018-09-21 Thread Matt Riedemann


On 9/21/2018 1:11 PM, Michael Johnson wrote:

Thank you Jimmy for making this available for updates.

I was unable to find the code backing the project tags section of the
Project Navigator pages.
Our page is missing some upgrade tags and is showing duplicate "Stable
branch policy" tags.

https://www.openstack.org/software/releases/rocky/components/octavia

Is there a different repository for the tags code?


Those are down in the project details section of the page, look to the 
right and there is a 'tag details' column. The tags are descriptive and 
link to the details on each tag.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Are we ready to put stable/ocata into extended maintenance mode?


On 9/20/2018 12:08 PM, Elõd Illés wrote:

Hi Matt,

About 1.: I think it is a good idea to cut a final release (especially 
as some vendor/operator would be glad even if there would be some 
release in Extended Maintenance, too, what most probably won't 
happen...) -- saying that without knowing how much of a burden would it 
be for projects to do this final release...
After that it sounds reasonably to tag the branches EM (as it is written 
in the mentioned resolution).


Do you have any plan about how to coordinate the 'final releases' and do 
the EM-tagging?


Thanks for raising these questions!

Cheers,

Előd


For anyone following along and that cares about this (hopefully PTLs), 
Előd, Doug, Sean and I formulated a plan in IRC today [1].


[1] 
http://eavesdrop.openstack.org/irclogs/%23openstack-stable/%23openstack-stable.2018-09-20.log.html#t2018-09-20T17:10:56


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Forum Topic Submission Period


On 9/20/2018 10:23 AM, Jimmy McArthur wrote:
This is basically the CFP equivalent: 
https://www.openstack.org/summit/berlin-2018/vote-for-speakers  Voting 
isn't necessary, of course, but it should allow you to see submissions 
as they roll in.


Does this work for your purposes?


Yup, that should do it, thanks!

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ironic] [nova] [tripleo] Deprecation of Nova's integration with Ironic Capabilities and ComputeCapabilitiesFilter


On 9/20/2018 4:16 AM, John Garbutt wrote:
Following on from the PTG discussions, I wanted to bring everyone's 
attention to Nova's plans to deprecate ComputeCapabilitiesFilter, 
including most of the the integration with Ironic Capabilities.


To be specific, this is my proposal in code form:
https://review.openstack.org/#/c/603102/

Once the code we propose to deprecate is removed we will stop using 
capabilities pushed up from Ironic for 'scheduling', but we would still 
pass capabilities request in the flavor down to Ironic (until we get 
some standard traits and/or deploy templates sorted for things like UEFI).


Functionally, we believe all use cases can be replaced by using the 
simpler placement traits (this is more efficient than post placement 
filtering done using capabilities):

https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/ironic-driver-traits.html

Please note the recent addition of forbidden traits that helps improve 
the usefulness of the above approach:

https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/placement-forbidden-traits.html

For example, a flavor request for GPUs >= 2 could be replaced by a 
custom trait trait that reports if a given Ironic node has 
CUSTOM_MORE_THAN_2_GPUS. That is a bad example (longer term we don't 
want to use traits for this, but that is a discussion for another day) 
but it is the example that keeps being raised in discussions on this topic.


The main reason for reaching out in this email is to ask if anyone has 
needs that the ResourceClass and Traits scheme does not currently 
address, or can think of a problem with a transition to the newer approach.


I left a few comments in the change, but I'm assuming as part of the 
deprecation we'd remove the filter from the default enabled_filters list 
so new installs don't automatically get warnings during scheduling?


Another thing is about existing flavors configured for these 
capabilities-scoped specs. Are you saying during the deprecation we'd 
continue to use those even if the filter is disabled? In the review I 
had suggested that we add a pre-upgrade check which inspects the flavors 
and if any of these are found, we report a warning meaning those flavors 
need to be updated to use traits rather than capabilities. Would that be 
reasonable?


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Forum Topic Submission Period


On 9/17/2018 11:13 AM, Jimmy McArthur wrote:
The Forum Topic Submission session started September 12 and will run 
through September 26th.  Now is the time to wrangle the topics you 
gathered during your Brainstorming Phase and start pushing forum topics 
through. Don't rely only on a PTL to make the agenda... step on up and 
place the items you consider important front and center.


As you may have noticed on the Forum Wiki 
(https://wiki.openstack.org/wiki/Forum), we're reusing the normal CFP 
tool this year. We did our best to remove Summit specific language, but 
if you notice something, just know that you are submitting to the 
Forum.  URL is here:


https://www.openstack.org/summit/berlin-2018/call-for-presentations

Looking forward to seeing everyone's submissions!

If you have questions or concerns about the process, please don't 
hesitate to reach out.


Another question. In the before times, when we just had that simple form 
to submit forum sessions and then the TC/UC/Foundation reviewed the list 
and picked the sessions, it was very simple to see what other sessions 
were proposed and say, "oh good someone is covering this already, I 
don't need to worry about it". With the move to the CFP forms like the 
summit sessions, that is no longer available, as far as I know. There 
have been at least a few cases this week where someone has said, "this 
might be a good topic, but keystone is probably already covering it, or 
$FOO SIG is probably already covering it", but without herding the cats 
to ask and find out who is all doing what, it's hard to know.


Is there some way we can get back to having a public view of what has 
been proposed for the forum so we an avoid overlap, or at worst not 
proposing something because people assume someone else is going to cover it?


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] Zuul job backlog


On 9/19/2018 2:45 PM, Matt Riedemann wrote:

Another one we need to make a decision on is:

https://bugs.launchpad.net/tempest/+bug/1783405

Which I'm suggesting we need to mark more slow tests with the actual 
"slow" tag in Tempest so they move to only be run in the tempest-slow 
job. gmann and I talked about this last week over IRC but I forgot to 
update the bug report with details. I think rather than increase the 
timeout of the tempest-full job we should be marking more slow tests as 
slow. Increasing timeouts gives some short-term relief but eventually we 
just have to look at these issues again, and a tempest run shouldn't 
take over 2 hours (remember when it used to take ~45 minutes?).


https://review.openstack.org/#/c/603900/

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] Zuul job backlog


On 9/19/2018 2:11 PM, Clark Boylan wrote:

Unfortunately, right now our classification rate is very poor (only 15%), which 
makes it difficult to know what exactly is causing these failures. Mriedem and 
I have quickly scanned the unclassified list, and it appears there is a db 
migration testing issue causing these tests to timeout across several projects. 
Mriedem is working to get this classified and tracked which should help, but we 
will also need to fix the bug. On top of that it appears that Glance has flaky 
functional tests (both python2 and python3) which are causing resets and should 
be looked into.

If you'd like to help, let mriedem or myself know and we'll gladly work with 
you to get elasticsearch queries added to elastic-recheck. We are likely less 
help when it comes to fixing functional tests in Glance, but I'm happy to point 
people in the right direction for that as much as I can. If you can take a few 
minutes to do this before/after you issue a recheck it does help quite a bit.


Things have gotten bad enough that I've started proposing changes to 
skip particularly high failure rate tests that are not otherwise getting 
attention to help triage and fix the bugs. For example:


https://review.openstack.org/#/c/602649/

https://review.openstack.org/#/c/602656/

Generally this is a last resort since it means we're losing test 
coverage, but when we hit a critical mass of random failures it becomes 
extremely difficult to merge code.


Another one we need to make a decision on is:

https://bugs.launchpad.net/tempest/+bug/1783405

Which I'm suggesting we need to mark more slow tests with the actual 
"slow" tag in Tempest so they move to only be run in the tempest-slow 
job. gmann and I talked about this last week over IRC but I forgot to 
update the bug report with details. I think rather than increase the 
timeout of the tempest-full job we should be marking more slow tests as 
slow. Increasing timeouts gives some short-term relief but eventually we 
just have to look at these issues again, and a tempest run shouldn't 
take over 2 hours (remember when it used to take ~45 minutes?).


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Nominating Tetsuro Nakamura for placement-core


On 9/19/2018 10:25 AM, Chris Dent wrote:



I'd like to nominate Tetsuro Nakamura for membership in the
placement-core team. Throughout placement's development Tetsuro has
provided quality reviews; done the hard work of creating rigorous
functional tests, making them fail, and fixing them; and implemented
some of the complex functionality required at the persistence layer.
He's aware of and respects the overarching goals of placement and has
demonstrated pragmatism when balancing those goals against the
requirements of nova, blazar and other projects.

Please follow up with a +1/-1 to express your preference. No need to
be an existing placement core, everyone with an interest is welcome.


Soft +1 from me given I mostly have defer to those that work more 
closely with Tetsuro. I agree he's a solid contributor, works hard, 
finds issues, fixes them before being asked, etc. That's awesome. 
Reminds me a lot of gibi when we nominated him.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Super fun unshelve image_ref bugs


On 12/1/2017 2:47 PM, Matt Riedemann wrote:
Andrew Laski also mentioned in IRC that we didn't replace the original 
instance.image_ref with the shelved image id because the shelve 
operation should be transparent to the end user, they have the same 
image (not really), same volumes, same IPs, etc once they unshelve. And 
he mentioned that if you rebuild, for example, you'd then rebuild to the 
original image instead of the shelved snapshot image.


I'm not sure how much I agree with that rebuild argument. I understand 
it, but I'm not sure I agree with it. I think it's much easier to just 
track things for what they are, which means saying if you create a guest 
from a given image id, then track that in the instances table, don't lie 
about it being something else.


Dredging this back up since it will affect cross-cell resize which will 
rely on shelve/unshelve.


I had a thought recently (and noted in 
https://bugs.launchpad.net/nova/+bug/1732428) that the RequestSpec 
points at the original image used to create the server, or last rebuild 
it (if the server was rebuilt with a new image). What if we used that 
during rebuilds rather than the instance.image_ref?


Then unshelve could leave the instance.image_ref pointing at the shelve 
snapshot image (since that's what is actually backing the server at the 
time of unshelve and should fix the resize qcow2 bug linked above) but 
rebuild could still rebuild from the original (or last rebuild) image 
rather than the shelve snapshot image?


The only hiccup I'm aware of is we then still need to *not* delete the 
snapshot image on unshelve that the instance is pointing at, which means 
shelve snapshot images could pile up over time, especially with 
cross-cell resize. Is that a problem? If so, could we have a periodic 
that cleans up the old snapshot images based on some configured value?


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] When can/should we change additionalProperties=False in GET /servers(/detail)?

On 9/18/2018 12:26 PM, Matt Riedemann wrote:

On 9/17/2018 9:41 PM, Ghanshyam Mann wrote:
   On Tue, 18 Sep 2018 09:33:30 +0900 Alex Xu  
wrote 
  > That only means after 599276 we only have servers API and 
os-instance-action API stopped accepting the undefined query parameter.
  > What I'm thinking about is checking all the APIs, add 
json-query-param checking with additionalProperties=True if the API 
don't have yet. And using another microversion set 
additionalProperties to False, then the whole Nova API become consistent.

I too vote for doing it for all other API together. Restricting the 
unknown query or request param are very useful for API consistency. 
Item#1 in this etherpadhttps://etherpad.openstack.org/p/nova-api-cleanup

If you would like, i can propose a quick spec for that and positive 
response to do all together then we skip to do that in 599276 
otherwise do it for GET servers in 599276.

-gmann

I don't care too much about changing all of the other 
additionalProperties=False in a single microversion given we're already 
kind of inconsistent with this in a few APIs. Consistency is ideal, but 
I thought we'd be lumping in other cleanups from the etherpad into the 
same microversion/spec which will likely slow it down during spec 
review. For example, I'd really like to get rid of the weird server 
response field prefixes like "OS-EXT-SRV-ATTR:". Would we put those into 
the same mass cleanup microversion / spec or split them into individual 
microversions? I'd prefer not to see an explosion of microversions for 
cleaning up oddities in the API, but I could see how doing them all in a 
single microversion could be complicated.

Just an update on https://review.openstack.org/#/c/599276/ - the change 
is approved. We left additionalProperties=True in the GET 
/servers(/detail) APIs for consistency with 2.5 and 2.26, and for 
expediency in just getting the otherwise pretty simple change approved.

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [python3] tempest and grenade conversion to python 3.6


On 9/18/2018 9:52 PM, Matt Riedemann wrote:

On 9/18/2018 12:28 PM, Doug Hellmann wrote:

What's probably missing is a version of the grenade job that allows us
to control that USE_PYTHON3 variable before and after the upgrade.

I see a few different grenade jobs (neutron-grenade,
neutron-grenade-multinode,
legacy-grenade-dsvm-neutron-multinode-live-migration, possibly others).
Which ones are "current" and would make a good candidate as a base for a
new job?


Grenade just runs devstack on the old side (e.g. stable/rocky) using the 
devstack stackrc file (which could have USE_PYTHON3 in it), runs tempest 
'smoke' tests to create some resources, saves off some information about 
those resources in a "database" (just an ini file), then runs devstack 
on the new side (e.g. master) using the new side stackrc file and 
verifies those saved off resources made it through the upgrade. It's all 
bash so there isn't anything python-specific about grenade.


I saw, but didn't comment on, the other thread about if it would be 
possible to create a grenade-2to3 job. I'd think that is pretty doable 
based on the USE_PYTHON3 variable. We'd just have that False on the old 
side, and True on the new side, and devstack will do it's thing. Right 
now the USE_PYTHON3 variable is global in devstack-gate [1] (which is 
the thing that orchestrates the grenade run for the legacy jobs), but 
I'm sure we could hack that to be specific to the base (old) and target 
(new) release for the grenade run.


[1] 
https://github.com/openstack-infra/devstack-gate/blob/95fa4343104eafa655375cce3546d27139211d13/devstack-vm-gate-wrap.sh#L434 





To answer Doug's original question, neutron-grenade-multinode is 
probably best to model for a new job if you want to test rolling 
upgrades, because that job has two compute nodes and leaves one on the 
'old' side so it would upgrade the controller services and one compute 
to Stein and leave the other compute at Rocky. So if you start with 
python2 on the old side and upgrade to python3 for everything except one 
compute, you'll have a pretty good idea of whether or not that rolling 
upgrade works through our various services and libraries, like the 
oslo.messaging stuff noted in the other thread.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [python3] tempest and grenade conversion to python 3.6

On 9/18/2018 12:28 PM, Doug Hellmann wrote:

What's probably missing is a version of the grenade job that allows us
to control that USE_PYTHON3 variable before and after the upgrade.

I see a few different grenade jobs (neutron-grenade,
neutron-grenade-multinode,
legacy-grenade-dsvm-neutron-multinode-live-migration, possibly others).
Which ones are "current" and would make a good candidate as a base for a
new job?

Grenade just runs devstack on the old side (e.g. stable/rocky) using the
devstack stackrc file (which could have USE_PYTHON3 in it), runs tempest
'smoke' tests to create some resources, saves off some information about
those resources in a "database" (just an ini file), then runs devstack
on the new side (e.g. master) using the new side stackrc file and
verifies those saved off resources made it through the upgrade. It's all
bash so there isn't anything python-specific about grenade.

I saw, but didn't comment on, the other thread about if it would be
possible to create a grenade-2to3 job. I'd think that is pretty doable
based on the USE_PYTHON3 variable. We'd just have that False on the old
side, and True on the new side, and devstack will do it's thing. Right
now the USE_PYTHON3 variable is global in devstack-gate [1] (which is
the thing that orchestrates the grenade run for the legacy jobs), but
I'm sure we could hack that to be specific to the base (old) and target
(new) release for the grenade run.

[1]
https://github.com/openstack-infra/devstack-gate/blob/95fa4343104eafa655375cce3546d27139211d13/devstack-vm-gate-wrap.sh#L434

Thanks,

Matt

Re: [openstack-dev] Forum Topic Submission Period


On 9/17/2018 11:13 AM, Jimmy McArthur wrote:

Hello Everyone!

The Forum Topic Submission session started September 12 and will run 
through September 26th.  Now is the time to wrangle the topics you 
gathered during your Brainstorming Phase and start pushing forum topics 
through. Don't rely only on a PTL to make the agenda... step on up and 
place the items you consider important front and center.


As you may have noticed on the Forum Wiki 
(https://wiki.openstack.org/wiki/Forum), we're reusing the normal CFP 
tool this year. We did our best to remove Summit specific language, but 
if you notice something, just know that you are submitting to the 
Forum.  URL is here:


https://www.openstack.org/summit/berlin-2018/call-for-presentations

Looking forward to seeing everyone's submissions!

If you have questions or concerns about the process, please don't 
hesitate to reach out.


Cheers,
Jimmy


Just a process question. I submitted a presentation for the normal 
marketing blitz part of the summit which wasn't accepted (I'm still 
dealing with this emotionally, btw...) but when I look at the CFP link 
for Forum topics, my thing shows up there as "Received" so does that 
mean my non-Forum-at-all submission is now automatically a candidate for 
the Forum because that would not be my intended audience (only suits and 
big wigs please).


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] Are we ready to put stable/ocata into extended maintenance mode?

The release page says Ocata is planned to go into extended maintenance 
mode on Aug 27 [1]. There really isn't much to this except it means we 
don't do releases for Ocata anymore [2]. There is a caveat that project 
teams that do not wish to maintain stable/ocata after this point can 
immediately end of life the branch for their project [3]. We can still 
run CI using tags, e.g. if keystone goes ocata-eol, devstack on 
stable/ocata can still continue to install from stable/ocata for nova 
and the ocata-eol tag for keystone. Having said that, if there is no 
undue burden on the project team keeping the lights on for stable/ocata, 
I would recommend not tagging the stable/ocata branch end of life at 
this point.


So, questions that need answering are:

1. Should we cut a final release for projects with stable/ocata branches 
before going into extended maintenance mode? I tend to think "yes" to 
flush the queue of backports. In fact, [3] doesn't mention it, but the 
resolution said we'd tag the branch [4] to indicate it has entered the 
EM phase.


2. Are there any projects that would want to skip EM and go directly to 
EOL (yes this feels like a Monopoly question)?


[1] https://releases.openstack.org/
[2] 
https://docs.openstack.org/project-team-guide/stable-branches.html#maintenance-phases
[3] 
https://docs.openstack.org/project-team-guide/stable-branches.html#extended-maintenance
[4] 
https://governance.openstack.org/tc/resolutions/20180301-stable-branch-eol.html#end-of-life


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] When can/should we change additionalProperties=False in GET /servers(/detail)?