Re: [openstack-dev] StarlingX gap analysis to converge with OpenStack master

2018-11-21 Thread melanie witt

On Wed, 21 Nov 2018 21:23:51 +0100, Melanie Witt wrote:

On Wed, 21 Nov 2018 12:11:50 -0600, Miguel Lavalle wrote:

One of the key goals of StarlingX during the current cycle is to
converge with the OpenStack projects master branches. In order to
accomplish this goal, the Technical Steering Committee put together a
gap analysis that shows the specs and patches that need to merge in the
different OpenStack projects by the end of Stein. The attached PDF
document shows this analysis. Although other projects are involved, most
of the work has to be done in Nova, Neutron and Horizon. Hopefully all
the involved projects will help StarlingX achieve this important goal.

It has to be noted that work is still on-going to refine this gap
analysis, so there might be some updates to it in the near future.


Thanks for sending this out. I'm going to reply about what I know of the
status of nova-related planned upstream features.

On NUMA-aware live migration, it was identified as the top priority
issue in the NFV/HPC pain points forum session [1]. The spec has been
approved before in the past for the Rocky cycle, so it's a matter of
re-proposing it for re-approval in Stein. We need to confirm with artom
and/or stephenfin whether one of them can pick it up this cycle.


Turns out this spec has already been re-proposed for Stein as of Sep 4:

https://review.openstack.org/599587

and is under active review now. Apologies for missing this in my 
previous reply.



I don't know as much about the shared/dedicated vCPUs on a single host
or the shared vCPU extension, but the cited spec [2] has one +2 already.
If we can find a second approver, we can work on this too in Stein.

The vTPM support spec was merged about two weeks ago and we are awaiting
implementation patches from cfriesen.

The HPET support spec was merged about two weeks ago and the
implementation patch is under active review in a runway with one +2 now.

For vCPU model, I'm not aware of any new proposed spec for Stein from
the STX community as of today. Let us know if/when the spec is proposed.

For disk performance fixes, the specless blueprint patch is currently
under active review in a runway.

The extra spec validation spec [3] is under active review now.

For the bits that will be addressed using upstream features that are
already available, I assume the STX community will take care of this.
Please reach out to us in #openstack-nova or on the ML if there are
questions/issues.

For the bugs, again feel free to reach out to us for reviews/help.

Cheers,
-melanie

[1] https://etherpad.openstack.org/p/BER-nfv-hpc-pain-points
[2] https://review.openstack.org/555081
[3] https://review.openstack.org/618542









__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] StarlingX gap analysis to converge with OpenStack master

2018-11-21 Thread melanie witt

On Wed, 21 Nov 2018 12:11:50 -0600, Miguel Lavalle wrote:
One of the key goals of StarlingX during the current cycle is to 
converge with the OpenStack projects master branches. In order to 
accomplish this goal, the Technical Steering Committee put together a 
gap analysis that shows the specs and patches that need to merge in the 
different OpenStack projects by the end of Stein. The attached PDF 
document shows this analysis. Although other projects are involved, most 
of the work has to be done in Nova, Neutron and Horizon. Hopefully all 
the involved projects will help StarlingX achieve this important goal.


It has to be noted that work is still on-going to refine this gap 
analysis, so there might be some updates to it in the near future.


Thanks for sending this out. I'm going to reply about what I know of the 
status of nova-related planned upstream features.


On NUMA-aware live migration, it was identified as the top priority 
issue in the NFV/HPC pain points forum session [1]. The spec has been 
approved before in the past for the Rocky cycle, so it's a matter of 
re-proposing it for re-approval in Stein. We need to confirm with artom 
and/or stephenfin whether one of them can pick it up this cycle.


I don't know as much about the shared/dedicated vCPUs on a single host 
or the shared vCPU extension, but the cited spec [2] has one +2 already. 
If we can find a second approver, we can work on this too in Stein.


The vTPM support spec was merged about two weeks ago and we are awaiting 
implementation patches from cfriesen.


The HPET support spec was merged about two weeks ago and the 
implementation patch is under active review in a runway with one +2 now.


For vCPU model, I'm not aware of any new proposed spec for Stein from 
the STX community as of today. Let us know if/when the spec is proposed.


For disk performance fixes, the specless blueprint patch is currently 
under active review in a runway.


The extra spec validation spec [3] is under active review now.

For the bits that will be addressed using upstream features that are 
already available, I assume the STX community will take care of this. 
Please reach out to us in #openstack-nova or on the ML if there are 
questions/issues.


For the bugs, again feel free to reach out to us for reviews/help.

Cheers,
-melanie

[1] https://etherpad.openstack.org/p/BER-nfv-hpc-pain-points
[2] https://review.openstack.org/555081
[3] https://review.openstack.org/618542




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Stein forum session notes

2018-11-20 Thread melanie witt

On Mon, 19 Nov 2018 17:19:22 +0100, Surya Seetharaman wrote:



On Mon, Nov 19, 2018 at 2:39 PM Matt Riedemann <mailto:mriede...@gmail.com>> wrote:


On 11/19/2018 3:17 AM, melanie witt wrote:
 > - Not directly related to the session, but CERN (hallway track) and
 > NeCTAR (dev ML) have both given feedback and asked that the
 > policy-driven idea for handling quota for down cells be avoided.
Revived
 > the "propose counting quota in placement" spec to see if there's
any way
 > forward here

Should this be abandoned then?

https://review.openstack.org/#/c/614783/

Since there is no microversion impact to that change, it could be added
separately as a bug fix for the down cell case if other operators want
that functionality. But maybe we don't know what other operators want
since no one else is at multi-cell cells v2 yet.


I thought the policy check was needed until the "propose counting quota 
in placement"
has been implemented as a workaround and that is what the "handling down 
cell" spec also proposed,
unless the former spec would be implemented within this cycle in which 
case we do not need the

policy check.


Right, I don't think that anyone _wants_ the policy check approach. That 
was just the workaround, last resort idea we had for dealing with down 
cells in the absence of being able to count quota usage from placement.


The operators we've discussed with (CERN, NeCTAR, Oath) would like quota 
counting not to depend on cell databases, if possible. But they are 
understanding and will accept the policy-driven workaround if we can't 
move forward with counting quota usage from placement.


If we can get agreement on the count quota usage from placement spec (I 
have updated it with new proposed details), then we should abandon the 
policy-driven behavior patch. I am eager to find out what everyone 
thinks of the latest proposal.


Cheers,
-melanie

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Stein forum session notes

2018-11-19 Thread melanie witt

Hey all,

Here's some notes I took in forum sessions I attended -- feel free to 
add notes on sessions I missed.


Etherpad links: https://wiki.openstack.org/wiki/Forum/Berlin2018

Cheers,
-melanie


TUE
---

Cells v2 updates

- Went over the etherpad, no objections to anything
- Not directly related to the session, but CERN (hallway track) and 
NeCTAR (dev ML) have both given feedback and asked that the 
policy-driven idea for handling quota for down cells be avoided. Revived 
the "propose counting quota in placement" spec to see if there's any way 
forward here


Getting users involved in the project
=
- Disconnect between SIGs/WGs and project teams
- Too steep a first step to get involved by subscribing to ML
- People confused about how to participate

Community outreach when culture, time zones, and language differ

- Most discussion around how to synchronize real-time communication 
considering different time zones
- Best to emphasize asynchronous communication. Discussion on ML and 
gerrit reviews
- Helpful to create weekly meeting agenda in advance so contributors 
from other time zones can add notes/response to discussion items


WED
---

NFV/HPC pain points
===
Top issues for immediate action: NUMA-aware live migration (spec just 
needs re-approval), improved scheduler logging (resurrect cfriesen's 
patch and clean it up), distant third is SRIOV live migration


BFV improvements

- Went over the etherpad, no major objections to anything
- Agree: we should expose boot_index from the attachments API
- Unclear what to do about post-create delete_on_termination. Being able 
to specify it for attach sounds reasonable, but is it enough for those 
asking? Or would it end up serving no one?


Better expose what we produce
=
- Project teams should propose patches to openstack/openstack-map to 
improve their project pages
- Would be ideal if project pages included a longer paragraph explaining 
the project, have a diagram, list SIGs/WGs related to the project, etc


Blazar reservations to new resource types
=
- For nova compute hosts, reservations are done by putting reserved 
hosts into "blazar" host aggregate and then a special scheduler filter 
is used to exclude those hosts from scheduling. But how to extend that 
concept to other projects?
- Note: the nova approach will change from scheduler filter => placement 
request filter


Edge use cases and requirements
===
- Showed the reference architectures again
- Most popular use case was "Mobile service provider 5G/4G virtual RAN 
deployment and Edge Cloud B2B2X" with seven +1s on the etherpad


Deletion of project and project resources
=
- What is wanted: a delete API per service that takes a project_id and 
force deletes all resources owned by it with --dry-run component
- Challenge to work out the dependencies for the order of deletion of 
all resources in all projects. Disable project, then delete things in 
order of dependency
- Idea: turn os-purge into a REST API and each project implement a 
plugin for it


Getting operators' bug fixes upstreamed
===
- Problem: operator reports a bug and provides a solution, for example, 
pastes a diff in launchpad or otherwise describes how to fix the bug. 
How can we increase the chances of those fixes making it to gerrit?
- Concern: are there legal issues with accepting patches pasted into 
launchpad by someone who hasn't signed the ICLA?
- Possible actions: create a best practices guide tailored for operators 
and socialize it among the ops docs/meetup/midcycle group. Example: 
guidance on how to indicate you don't have time to add test coverage, 
etc when you propose a patch


THU
---

Bug triage: why not all the community?
==
- Cruft and mixing tasks with defect reports makes triage more difficult 
to manage. Example: difference between a defect reported by a user vs an 
effective TODO added by a developer. If New bugs were reliably from end 
users, would we be more likely to triage?

- Bug deputy weekly ML reporting could help
- Action: copy the generic portion of the nova bug triage wiki doc into 
the contributor guide docs. The idea/hope being that easy-to-understand 
instructions available to the wider community might increase the chances 
of people outside of the project team being capable of triaging bugs, so 
all of it doesn't fall on project teams
- Idea: should we remove the bug supervisor requirement from nova to 
allow people who haven't joined the bug team to set Status and Importance?


Current state of volume encryption
==
- Feedback: public clouds can't offer encryption because keys are stored 
in the cloud. 

[openstack-dev] [nova] no meeting the next two weeks

2018-11-08 Thread melanie witt

Howdy all,

This is a heads up that we will not hold a nova meeting next week 
November 15 because of summit week. And we will also not hold a nova 
meeting the following week November 22 because of the US holiday of 
Thanksgiving, as we're unlikely to find anyone to run it during the 2100 
UTC time slot.


Meetings will resume on November 29 at 1400 UTC.

I'm looking forward to seeing all of you at the summit next week!

Cheers,
-melanie

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Announcing new Focal Point for s390x libvirt/kvm Nova

2018-11-05 Thread melanie witt

On Fri, 2 Nov 2018 09:47:42 +0100, Andreas Scheuring wrote:

Dear Nova Community,
I want to announce the new focal point for Nova s390x libvirt/kvm.

Please welcome "Cathy Zhang” to the Nova team. She and her team will be 
responsible for maintaining the s390x libvirt/kvm Thirdparty CI  [1] and any s390x 
specific code in nova and os-brick.
I personally took a new opportunity already a few month ago but kept 
maintaining the CI as good as possible. With new manpower we can hopefully 
contribute more to the community again.

You can reach her via
* email: bjzhj...@linux.vnet.ibm.com
* IRC: Cathyz

Cathy, I wish you and your team all the best for this exciting role! I also 
want to say thank you for the last years. It was a great time, I learned a lot 
from you all, will miss it!

Cheers,

Andreas (irc: scheuran)


[1] https://wiki.openstack.org/wiki/ThirdPartySystems/IBM_zKVM_CI


Thanks Andreas, for sending this note. It has been a pleasure working 
with you over these years. We wish you the best of luck in your new 
opportunity!


Welcome to the Nova community, Cathy! We look forward to working with 
you. Please feel free to reach out to us on IRC in the #openstack-nova 
channel and on this mailing list with the [nova] tag to ask questions 
and share info.


Best,
-melanie





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Stein summit forum sessions and presentations of interest

2018-11-01 Thread melanie witt

Howdy all,

We've made a list of cross-project forum sessions and nova-related 
sessions/presentations that you might be interested in attending at the 
summit and added them to our forum etherpad:


https://etherpad.openstack.org/p/nova-forum-stein

The "Cross-project Forum sessions that should include nova 
participation" section contains a list of community sessions where it 
would be nice to have a nova representative in attendance. Please feel 
free to add your name to sessions you think you could attend and bring 
back any interesting info to the team.


Let know if I've missed any cross-project sessions or nova-related 
sessions/presentations and I can add them.


Looking forward to seeing you all at the summit!

Cheers,
-melanie

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][limits] Does ANYONE at all use the quota class functionality in Nova?

2018-10-25 Thread melanie witt

On Thu, 25 Oct 2018 15:06:38 -0500, Matt Riedemann wrote:

On 10/25/2018 2:55 PM, Chris Friesen wrote:

2) The main benefit (as I see it) of the quota class API is to allow
dynamic adjustment of the default quotas without restarting services.


I could be making this up, but I want to say back at the Pike PTG people
were also complaining that not having an API to change this, and only do
it via config, was not good. But if the keystone limits API solves that
then it's a non-issue.


Right, the default limits are "registered limits" [1] in the keystone 
API. And "project limits" can be set to override "registered limits".


So the keystone limits API does solve that case.

-melanie

[1] 
https://docs.openstack.org/keystone/latest/admin/identity-unified-limits.html#registered-limits







__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][limits] Does ANYONE at all use the quota class functionality in Nova?

2018-10-25 Thread melanie witt

On Thu, 25 Oct 2018 14:00:08 -0400, Jay Pipes wrote:

On 10/25/2018 01:38 PM, Chris Friesen wrote:

On 10/24/2018 9:10 AM, Jay Pipes wrote:

Nova's API has the ability to create "quota classes", which are
basically limits for a set of resource types. There is something
called the "default quota class" which corresponds to the limits in
the CONF.quota section. Quota classes are basically templates of
limits to be applied if the calling project doesn't have any stored
project-specific limits.

Has anyone ever created a quota class that is different from "default"?


The Compute API specifically says:

"Only ‘default’ quota class is valid and used to set the default quotas,
all other quota class would not be used anywhere."

What this API does provide is the ability to set new default quotas for
*all* projects at once rather than individually specifying new defaults
for each project.


It's a "defaults template", yes.

The alternative is, you know, to just set the default values in
CONF.quota, which is what I said above. Or, if you want project X to
have different quota limits from those CONF-driven defaults, then set
the quotas for the project to some different values via the
os-quota-sets API (or better yet, just use Keystone's /limits API when
we write the "limits driver" into Nova). The issue is that the
os-quota-classes API is currently blocking *me* writing that "limits
driver" in Nova because I don't want to port nova-specific functionality
(like quota classes) to a limits driver when the Keystone /limits
endpoint doesn't have that functionality and nobody I know of has ever
used it.


When you say it's blocking you from writing the "limits driver" in nova, 
are you saying you're picking up John's unified limits spec [1]? It's 
been in -W mode and hasn't been updated in 4 weeks. In the spec, 
migration from quota classes => registered limits and deprecation of the 
existing quota API and quota classes is described.


Cheers,
-melanie

[1] https://review.openstack.org/602201




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova cellsv2 and DBs / down cells / quotas

2018-10-25 Thread melanie witt

On Thu, 25 Oct 2018 10:55:15 +1100, Sam Morrison wrote:




On 24 Oct 2018, at 4:01 pm, melanie witt  wrote:

On Wed, 24 Oct 2018 10:54:31 +1100, Sam Morrison wrote:

Hi nova devs,
Have been having a good look into cellsv2 and how we migrate to them (we’re 
still on cellsv1 and about to upgrade to queens and still run cells v1 for now).
One of the problems I have is that now all our nova cell database servers need 
to respond to API requests.
With cellsv1 our architecture was to have a big powerful DB cluster (3 physical 
servers) at the API level to handle the API cell and then a smallish non HA DB 
server (usually just a VM) for each of the compute cells.
This architecture won’t work with cells V2 and we’ll now need to have a lot of 
highly available and responsive DB servers for all the cells.
It will also mean that our nova-apis which reside in Melbourne, Australia will 
now need to talk to database servers in Auckland, New Zealand.
The biggest issue we have is when a cell is down. We sometimes have cells go 
down for an hour or so planned or unplanned and with cellsv1 this does not 
affect other cells.
Looks like some good work going on here 
https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/handling-down-cell
But what about quota? If a cell goes down then it would seem that a user all of 
a sudden would regain some quota from the instances that are in the down cell?
Just wondering if anyone has thought about this?


Yes, we've discussed it quite a bit. The current plan is to offer a policy-driven 
behavior as part of the "down" cell handling which will control whether nova 
will:

a) Reject a server create request if the user owns instances in "down" cells

b) Go ahead and count quota usage "as-is" if the user owns instances in "down" 
cells and allow quota limit to be potentially exceeded

We would like to know if you think this plan will work for you.

Further down the road, if we're able to come to an agreement on a consumer 
type/owner or partitioning concept in placement (to be certain we are counting 
usage our instance of nova owns, as placement is a shared service), we could 
count quota usage from placement instead of querying cells.


OK great, always good to know other people are thinking for you :-) , I don’t 
really like a or b but the idea about using placement sounds like a good one to 
me.


Your honesty is appreciated. :) We do want to get to where we can use 
placement for quota usage. There is a significant amount of higher 
priority placement-related work in flight right now (getting nested 
resource providers working end-to-end, for one) for it to receive 
adequate attention at this moment. We've been discussing it on the spec 
[1] the past few days, if you're interested.



I guess our architecture is pretty unique in a way but I wonder if other people 
are also a little scared about the whole all DB servers need to up to serve API 
requests?


You are not alone. At CERN, they are experiencing the same challenges. 
They too have an architecture where they had deployed less powerful 
database servers in cells and also have cell sites that are located 
geographically far away. They have been driving the "handling of a down 
cell" work.



I’ve been thinking of some hybrid cellsv1/v2 thing where we’d still have the 
top level api cell DB but the API would only ever read from it. Nova-api would 
only write to the compute cell DBs.
Then keep the nova-cells processes just doing instance_update_at_top to keep 
the nova-cell-api db up to date.

We’d still have syncing issues but we have that with placement now and that is 
more frequent than nova-cells-v1 is for us.


I have had similar thoughts, but keep ending up at the syncing/racing 
issues, like you said. I think it's something we'll need to discuss and 
explore more, to see if we can come up with a reasonable way to address 
the increased demand on cell databases as it's been a considerable pain 
point for deployments like yours and CERN's.


Cheers,
-melanie

[1] https://review.openstack.org/509042


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][limits] Does ANYONE at all use the quota class functionality in Nova?

2018-10-25 Thread melanie witt

On Thu, 25 Oct 2018 14:12:51 +0900, ボーアディネシュ[bhor Dinesh] wrote:
We were having a similar use case like *Preemptible Instances* called as 
*Rich-VM’s* which


are high in resources and are deployed each per hypervisor. We have a 
custom code in


production which tracks the quota for such instances separately and for 
the same reason


we have *rich_instances* custom quota class same as *instances* quota class.


Please see the last reply I recently sent on this thread. I have been 
thinking the same as you about how we could use quota classes to 
implement the quota piece of preemptible instances. I think we can 
achieve the same thing using unified limits, specifically registered 
limits [1], which span across all projects. So, I think we are covered 
moving forward with migrating to unified limits and deprecation of quota 
classes. Let me know if you spot any issues with this idea.


Cheers,
-melanie

[1] 
https://developer.openstack.org/api-ref/identity/v3/?expanded=create-registered-limits-detail,create-limits-detail#create-registered-limits





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][limits] Does ANYONE at all use the quota class functionality in Nova?

2018-10-25 Thread melanie witt

On Wed, 24 Oct 2018 12:54:00 -0700, Melanie Witt wrote:

On Wed, 24 Oct 2018 13:57:05 -0500, Matt Riedemann wrote:

On 10/24/2018 10:10 AM, Jay Pipes wrote:

I'd like to propose deprecating this API and getting rid of this
functionality since it conflicts with the new Keystone /limits endpoint,
is highly coupled with RAX's turnstile middleware and I can't seem to
find anyone who has ever used it. Deprecating this API and functionality
would make the transition to a saner quota management system much easier
and straightforward.

I was trying to do this before it was cool:

https://review.openstack.org/#/c/411035/

I think it was the Pike PTG in ATL where people said, "meh, let's just
wait for unified limits from keystone and let this rot on the vine".

I'd be happy to restore and update that spec.


Yeah, we were thinking the presence of the API and code isn't harming
anything and sometimes we talk about situations where we could use them.

Quota classes come up occasionally whenever we talk about preemptible
instances. Example: we could create and use a quota class "preemptible"
and decorate preemptible flavors with that quota_class in order to give
them unlimited quota. There's also talk of quota classes in the "Count
quota based on resource class" spec [1] where we could have leveraged
quota classes to create and enforce quota limits per custom resource
class. But I think the consensus there was to hold off on quota by
custom resource class until we migrate to unified limits and oslo.limit.

So, I think my concern in removing the internal code that is capable of
enforcing quota limit per quota class is the preemptible instance use
case. I don't have my mind wrapped around if/how we could solve it using
unified limits yet.

And I was just thinking, if we added a project_id column to the
quota_classes table and correspondingly added it to the
os-quota-class-sets API, we could pretty simply implement quota by
flavor, which is a feature operators like Oath need. An operator could
create a quota class limit per project_id and then decorate flavors with
quota_class to enforce them per flavor.

I recognize that maybe it would be too confusing to solve use cases with
quota classes given that we're going to migrate to united limits. At the
same time, I'm hesitant to close the door on a possibility before we
have some idea about how we'll solve them without quota classes. Has
anyone thought about how we can solve the use cases with unified limits
for things like preemptible instances and quota by flavor?

[1] https://review.openstack.org/56901


After I sent this, I realized that I _have_ thought about how to solve 
these use cases with unified limits before and commented about it on the 
"Count quota based on resource class" spec some months ago.


For preemptible instances, we could leverage registered limits in 
keystone [2] (registered limits span across all projects) by creating a 
limit with resource_name='preemptible', for example. Then we could 
decorate a flavor with quota_resource_name='preemptible' which would 
designate a preemptible instance type. Then we use the 
quota_resource_name from the flavor to check the quota for the 
corresponding registered limit in keystone. This way, preemptible 
instances can be assigned their own special quota (probably unlimited).


And for quota by flavor, same concept. I think we could use registered 
limits and project limits [3] by creating limits with 
resource_name='flavorX', for example. We could decorate flavors with 
quota_resource_name='flavorX' and check quota for special quota for flavorX.


Unified limits provide all of the same ability as quota classes, as far 
as I can tell. Given that, I think we are OK to deprecate quota classes.


Cheers,
-melanie

[2] 
https://developer.openstack.org/api-ref/identity/v3/?expanded=create-registered-limits-detail,create-limits-detail#create-registered-limits
[3] 
https://developer.openstack.org/api-ref/identit/v3/?expanded=create-registered-limits-detail,create-limits-detail#create-limits





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][limits] Does ANYONE at all use the quota class functionality in Nova?

2018-10-24 Thread melanie witt

On Wed, 24 Oct 2018 13:57:05 -0500, Matt Riedemann wrote:

On 10/24/2018 10:10 AM, Jay Pipes wrote:

I'd like to propose deprecating this API and getting rid of this
functionality since it conflicts with the new Keystone /limits endpoint,
is highly coupled with RAX's turnstile middleware and I can't seem to
find anyone who has ever used it. Deprecating this API and functionality
would make the transition to a saner quota management system much easier
and straightforward.

I was trying to do this before it was cool:

https://review.openstack.org/#/c/411035/

I think it was the Pike PTG in ATL where people said, "meh, let's just
wait for unified limits from keystone and let this rot on the vine".

I'd be happy to restore and update that spec.


Yeah, we were thinking the presence of the API and code isn't harming 
anything and sometimes we talk about situations where we could use them.


Quota classes come up occasionally whenever we talk about preemptible 
instances. Example: we could create and use a quota class "preemptible" 
and decorate preemptible flavors with that quota_class in order to give 
them unlimited quota. There's also talk of quota classes in the "Count 
quota based on resource class" spec [1] where we could have leveraged 
quota classes to create and enforce quota limits per custom resource 
class. But I think the consensus there was to hold off on quota by 
custom resource class until we migrate to unified limits and oslo.limit.


So, I think my concern in removing the internal code that is capable of 
enforcing quota limit per quota class is the preemptible instance use 
case. I don't have my mind wrapped around if/how we could solve it using 
unified limits yet.


And I was just thinking, if we added a project_id column to the 
quota_classes table and correspondingly added it to the 
os-quota-class-sets API, we could pretty simply implement quota by 
flavor, which is a feature operators like Oath need. An operator could 
create a quota class limit per project_id and then decorate flavors with 
quota_class to enforce them per flavor.


I recognize that maybe it would be too confusing to solve use cases with 
quota classes given that we're going to migrate to united limits. At the 
same time, I'm hesitant to close the door on a possibility before we 
have some idea about how we'll solve them without quota classes. Has 
anyone thought about how we can solve the use cases with unified limits 
for things like preemptible instances and quota by flavor?


Cheers,
-melanie

[1] https://review.openstack.org/569011




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] [nova] Problem of Volume(in-use) Live Migration with ceph backend

2018-10-24 Thread melanie witt

On Tue, 23 Oct 2018 10:01:42 -0400, Jon Bernard wrote:

* melanie witt  wrote:

On Mon, 22 Oct 2018 11:45:55 +0800 (GMT+08:00), Boxiang Zhu wrote:

I created a new vm and a new volume with type 'ceph'[So that the volume
will be created on one of two hosts. I assume that the volume created on
host dev@rbd-1#ceph this time]. Next step is to attach the volume to the
vm. At last I want to migrate the volume from host dev@rbd-1#ceph to
host dev@rbd-2#ceph, but it failed with the exception
'NotImplementedError(_("Swap only supports host devices")'.

So that, my real problem is that is there any work to migrate
volume(*in-use*)(*ceph rbd*) from one host(pool) to another host(pool)
in the same ceph cluster?
The difference between the spec[2] with my scope is only one is
*available*(the spec) and another is *in-use*(my scope).


[1] http://docs.ceph.com/docs/master/rbd/rbd-openstack/
[2] https://review.openstack.org/#/c/296150


Ah, I think I understand now, thank you for providing all of those details.
And I think you explained it in your first email, that cinder supports
migration of ceph volumes if they are 'available' but not if they are
'in-use'. Apologies that I didn't get your meaning the first time.

I see now the code you were referring to is this [3]:

if volume.status not in ('available', 'retyping', 'maintenance'):
 LOG.debug('Only available volumes can be migrated using backend '
   'assisted migration. Falling back to generic migration.')
 return refuse_to_migrate

So because your volume is not 'available', 'retyping', or 'maintenance',
it's falling back to generic migration, which will end up with an error in
nova because the source_path is not set in the volume config.

Can anyone from the cinder team chime in about whether the ceph volume
migration could be expanded to allow migration of 'in-use' volumes? Is there
a reason not to allow migration of 'in-use' volumes?


Generally speaking, Nova must facilitate the migration of a live (or
in-use) volume.  A volume attached to a running instance requires code
in the I/O path to correctly route traffic to the correct location - so
Cinder must refuse (or defer) a migrate operation if the volume is
attached.  Until somewhat recently Qemu and Libvirt did not support the
migration to non-block (RBD) targets which is the reason for lack of
support.  I believe we now have all of the pieces to perform this
operation successfully, but I suspect it will require a setup with
correct versions of all the related software.  I will try to verify this
during the current release cycle and report back.


OK, thanks for this info, Jon. I'll be interested in your findings.

Cheers,
-melanie




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova cellsv2 and DBs / down cells / quotas

2018-10-23 Thread melanie witt

On Wed, 24 Oct 2018 10:54:31 +1100, Sam Morrison wrote:

Hi nova devs,

Have been having a good look into cellsv2 and how we migrate to them 
(we’re still on cellsv1 and about to upgrade to queens and still run 
cells v1 for now).


One of the problems I have is that now all our nova cell database 
servers need to respond to API requests.
With cellsv1 our architecture was to have a big powerful DB cluster (3 
physical servers) at the API level to handle the API cell and then a 
smallish non HA DB server (usually just a VM) for each of the compute 
cells.


This architecture won’t work with cells V2 and we’ll now need to have a 
lot of highly available and responsive DB servers for all the cells.


It will also mean that our nova-apis which reside in Melbourne, 
Australia will now need to talk to database servers in Auckland, New 
Zealand.


The biggest issue we have is when a cell is down. We sometimes have 
cells go down for an hour or so planned or unplanned and with cellsv1 
this does not affect other cells.
Looks like some good work going on here 
https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/handling-down-cell


But what about quota? If a cell goes down then it would seem that a user 
all of a sudden would regain some quota from the instances that are in 
the down cell?

Just wondering if anyone has thought about this?


Yes, we've discussed it quite a bit. The current plan is to offer a 
policy-driven behavior as part of the "down" cell handling which will 
control whether nova will:


a) Reject a server create request if the user owns instances in "down" cells

b) Go ahead and count quota usage "as-is" if the user owns instances in 
"down" cells and allow quota limit to be potentially exceeded


We would like to know if you think this plan will work for you.

Further down the road, if we're able to come to an agreement on a 
consumer type/owner or partitioning concept in placement (to be certain 
we are counting usage our instance of nova owns, as placement is a 
shared service), we could count quota usage from placement instead of 
querying cells.


Cheers,
-melanie


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] [nova] Problem of Volume(in-use) Live Migration with ceph backend

2018-10-22 Thread melanie witt

On Mon, 22 Oct 2018 11:45:55 +0800 (GMT+08:00), Boxiang Zhu wrote:
I created a new vm and a new volume with type 'ceph'[So that the volume 
will be created on one of two hosts. I assume that the volume created on 
host dev@rbd-1#ceph this time]. Next step is to attach the volume to the 
vm. At last I want to migrate the volume from host dev@rbd-1#ceph to 
host dev@rbd-2#ceph, but it failed with the exception 
'NotImplementedError(_("Swap only supports host devices")'.


So that, my real problem is that is there any work to migrate 
volume(*in-use*)(*ceph rbd*) from one host(pool) to another host(pool) 
in the same ceph cluster?
The difference between the spec[2] with my scope is only one is 
*available*(the spec) and another is *in-use*(my scope).



[1] http://docs.ceph.com/docs/master/rbd/rbd-openstack/
[2] https://review.openstack.org/#/c/296150


Ah, I think I understand now, thank you for providing all of those 
details. And I think you explained it in your first email, that cinder 
supports migration of ceph volumes if they are 'available' but not if 
they are 'in-use'. Apologies that I didn't get your meaning the first time.


I see now the code you were referring to is this [3]:

if volume.status not in ('available', 'retyping', 'maintenance'):
LOG.debug('Only available volumes can be migrated using backend '
  'assisted migration. Falling back to generic migration.')
return refuse_to_migrate

So because your volume is not 'available', 'retyping', or 'maintenance', 
it's falling back to generic migration, which will end up with an error 
in nova because the source_path is not set in the volume config.


Can anyone from the cinder team chime in about whether the ceph volume 
migration could be expanded to allow migration of 'in-use' volumes? Is 
there a reason not to allow migration of 'in-use' volumes?


[3] 
https://github.com/openstack/cinder/blob/c42fdc470223d27850627fd4fc9d8cb15f2941f8/cinder/volume/drivers/rbd.py#L1618-L1621


Cheers,
-melanie






__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] spec review day is ON for next tuesday oct 23

2018-10-22 Thread melanie witt

On Tue, 16 Oct 2018 10:56:38 -0700, Melanie Witt wrote:

Thanks everyone for your replies on the thread to help organize this.

Looks like most of the team is available to participate, so we will have
a spec review day next week on Tuesday October 23.


Just wanted to remind everybody that the spec review day is tomorrow!

Cheers,
-melanie




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] [nova] Problem of Volume(in-use) Live Migration with ceph backend

2018-10-19 Thread melanie witt

On Fri, 19 Oct 2018 23:21:01 +0800 (GMT+08:00), Boxiang Zhu wrote:


The version of my cinder and nova is Rocky. The scope of the cinder spec[1]
is only for available volume migration between two pools from the same 
ceph cluster.
If the volume is in-use status[2], it will call the generic migration 
function. So that as you
describe it, on the nova side, it raises NotImplementedError(_("Swap 
only supports host devices").

The get_config of net volume[3] has not source_path.


Ah, OK, so you're trying to migrate a volume across two separate ceph 
clusters, and that is not supported.


So does anyone try to succeed to migrate volume(in-use) with ceph 
backend or is anyone doing something of it?


Hopefully someone can share their experience with trying to migrate 
volumes across separate ceph clusters. I unfortunately don't know 
anything about it.


Best,
-melanie


[1] https://review.openstack.org/#/c/296150
[2] https://review.openstack.org/#/c/256091/23/cinder/volume/drivers/rbd.py
[3] 
https://github.com/openstack/nova/blob/stable/rocky/nova/virt/libvirt/volume/net.py#L101






__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] [nova] Problem of Volume(in-use) Live Migration with ceph backend

2018-10-19 Thread melanie witt

On Fri, 19 Oct 2018 11:33:52 +0800 (GMT+08:00), Boxiang Zhu wrote:
When I use the LVM backend to create the volume, then attach it to a vm. 
I can migrate the volume(in-use) from one host to another. The nova 
libvirt will call the 'rebase' to finish it. But if using ceph backend, 
it raises exception 'Swap only supports host devices'. So now it does 
not support to migrate volume(in-use). Does anyone do this work now? Or 
Is there any way to let me migrate volume(in-use) with ceph backend?


What version of cinder and nova are you using?

I found this question/answer on ask.openstack.org:

https://ask.openstack.org/en/question/112954/volume-migration-fails-notimplementederror-swap-only-supports-host-devices/

and it looks like there was some work done on the cinder side [1] to 
enable migration of in-use volumes with ceph semi-recently (Queens).


On the nova side, the code looks for the source_path in the volume 
config, and if there is not one present, it raises 
NotImplementedError(_("Swap only supports host devices"). So in your 
environment, the volume configs must be missing a source_path.


If you are using at least Queens version, then there must be something 
additional missing that we would need to do to make the migration work.


[1] https://blueprints.launchpad.net/cinder/+spec/ceph-volume-migrate

Cheers,
-melanie





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] spec review day is ON for next tuesday oct 23

2018-10-16 Thread melanie witt

Thanks everyone for your replies on the thread to help organize this.

Looks like most of the team is available to participate, so we will have 
a spec review day next week on Tuesday October 23.


See ya at the nova-specs gerrit,
-melanie

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] shall we do a spec review day next tuesday oct 23?

2018-10-15 Thread melanie witt

Hey all,

Milestone s-1 is coming up next week on Thursday Oct 25 [1] and I was 
thinking it would be a good idea to have a spec review day next week on 
Tuesday Oct 23 to spend some focus on spec reviews together.


Spec freeze is s-2 Jan 10, so the review day isn't related to any 
deadlines, but would just be a way to organize and make sure we have 
initial review on the specs that have been proposed so far.


How does Tuesday Oct 23 work for everyone? Let me know if another day 
works better.


So far, efried and mriedem are on board when I asked in the 
#openstack-nova channel. I'm sending this mail to gather more responses 
asynchronously.


Cheers,
-melanie

[1] https://wiki.openstack.org/wiki/Nova/Stein_Release_Schedule

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] add service discovery, proxysql, vault, fabio and FQDN endpoints

2018-10-09 Thread melanie witt

On Tue, 9 Oct 2018 10:35:23 -0700, Melanie Witt wrote:

On Tue, 9 Oct 2018 07:23:03 -0400, Jay Pipes wrote:

That explains where the source of the problem comes from (it's the use
of SELECT FOR UPDATE, which has been removed from Nova's quota-handling
code in the Rocky release).

Small correction, the SELECT FOR UPDATE was removed from Nova's
quota-handling code in the Pike release.


Elaboration: the calls to quota reserve/commit/rollback were removed in 
the Pike release, so with_lockmode('update') is not called for quota 
operations, even though the reserve/commit/rollback methods are still 
there for use by old (Ocata) computes during an Ocata => Pike upgrade. 
Then, the reserve/commit/rollback methods were removed in Queens once no 
old computes could be calling them.


-melanie




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] add service discovery, proxysql, vault, fabio and FQDN endpoints

2018-10-09 Thread melanie witt

On Tue, 9 Oct 2018 07:23:03 -0400, Jay Pipes wrote:

That explains where the source of the problem comes from (it's the use
of SELECT FOR UPDATE, which has been removed from Nova's quota-handling
code in the Rocky release).


Small correction, the SELECT FOR UPDATE was removed from Nova's 
quota-handling code in the Pike release.


-melanie




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Rocky RC time regression analysis

2018-10-05 Thread melanie witt

Hey everyone,

During our Rocky retrospective discussion at the PTG [1], we talked 
about the spec freeze deadline (milestone 2, historically it had been 
milestone 1) and whether or not it was related to the hectic 
late-breaking regression RC time we had last cycle. I had an action item 
to go through the list of RC time bugs [2] and dig into each one, 
examining: when the patch that introduced the bug landed vs when the bug 
was reported, why it wasn't caught sooner, and report back so we can 
take a look together and determine whether they were related to the spec 
freeze deadline.


I used this etherpad to make notes [3], which I will [mostly] copy-paste 
here. These are all after RC1 and I'll paste them in chronological order 
of when the bug was reported.


Milestone 1 r-1 was 2018-04-19.
Spec freeze was milestone 2 r-2 was 2018-06-07.
Feature freeze (FF) was on 2018-07-26.
RC1 was on 2018-08-09.

1) Broken live migration bandwidth minimum => maximum based on neutron 
event https://bugs.launchpad.net/nova/+bug/1786346


- Bug was reported on 2018-08-09, the day of RC1
- The patch that caused the regression landed on 2018-03-30 
https://review.openstack.org/497457

- Unrelated to a blueprint, the regression was part of a bug fix
- Was found because prometheanfire was doing live migrations and noticed 
they seemed to be stuck at 1MiB/s for linuxbridge VMs

- The bug was due to a race, so the gate didn't hit it
- Comment on the regression bug from dansmith: "The few hacked up gate 
jobs we used to test this feature at merge time likely didn't notice the 
race because the migrations finished before the potential timeout and/or 
are on systems so loaded that the neutron event came late enough for us 
to win the race repeatedly."


2) Docs for the zvm driver missing

- All zvm driver code changes were merged by 2018-07-17 but the 
documentation was overlooked but was noticed near RC time

- Blueprint was approved on 2018-02-12

3) Volume status remains "detaching" after a failure to detach a volume 
due to DeviceDetachFailed https://bugs.launchpad.net/nova/+bug/1786318


- Bug was reported on 2018-08-09, the day of RC1
- The change that introduced the regression landed on 2018-02-21 
https://review.openstack.org/546423

- Unrelated to a blueprint, the regression was part of a bug fix
- Question: why wasn't this caught earlier?
- Answer: Unit tests were not asserting the call to the roll_detaching 
volume API. Coverage has since been added along with the bug fix 
https://review.openstack.org/590439


4) OVB overcloud deploy fails on nova placement errors 
https://bugs.launchpad.net/nova/+bug/1787910


- Bug was reported on 2018-08-20
- Change that caused the regression landed on 2018-07-26, FF day 
https://review.openstack.org/517921

- Blueprint was approved on 2018-05-16
- Was found because of a failure in the 
legacy-periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master 
CI job. The ironic-inspector CI upstream also failed because of this, as 
noted by dtantsur.
- Question: why did it take nearly a month for the failure to be 
noticed? Is there any way we can cover this in our 
ironic-tempest-dsvm-ipa-wholedisk-bios-agent_ipmitool-tinyipa job?


5) when live migration fails due to a internal error rollback is not 
handled correctly https://bugs.launchpad.net/nova/+bug/1788014


- Bug was reported on 2018-08-20
- The change that caused the regression landed on 2018-07-26, FF day 
https://review.openstack.org/434870

- Unrelated to a blueprint, the regression was part of a bug fix
- Was found because sean-k-mooney was doing live migrations and found 
that when a LM failed because of a QEMU internal error, the VM remained 
ACTIVE but the VM no longer had network connectivity.

- Question: why wasn't this caught earlier?
- Answer: We would need a live migration job scenario that intentionally 
initiates and fails a live migration, then verify network connectivity 
after the rollback occurs.

- Question: can we add something like that?

6) nova-manage db online_data_migrations hangs on instances with no host 
set https://bugs.launchpad.net/nova/+bug/1788115


- Bug was reported on 2018-08-21
- The patch that introduced the bug landed on 2018-05-30 
https://review.openstack.org/567878

- Unrelated to a blueprint, the regression was part of a bug fix
- Question: why wasn't this caught earlier?
- Answer: To hit the bug, you had to have had instances with no host set 
(that failed to schedule) in your database during an upgrade. This does 
not happen during the grenade job
- Question: could we add anything to the grenade job that would leave 
some instances with no host set to cover cases like this?


7) release notes erroneously say that nova-consoleauth doesn't have to 
run in Rocky https://bugs.launchpad.net/nova/+bug/1788470


- Bug was reported on 2018-08-22
- The patches that conveyed the wrong information for the docs landed on 
2018-05-07 https://review.openstack.org/565367

- Blueprint was 

Re: [openstack-dev] [placement] update 18-40

2018-10-05 Thread melanie witt

On Fri, 5 Oct 2018 14:31:05 +0100 (BST), Chris Dent wrote:

*
Propose counting quota usage from placement and API database
(A bit out of date but may be worth resurrecting)


I'd like to resurrect this spec but it really depends on being able to 
ask for usage scoped only to a particular instance of Nova (GET /usages 
for NovaA vs GET /usages for NovaB). From what I understand, we don't 
have a concrete plan for being able to differentiate ownership of 
allocations yet.


Until then, we will be using a policy-based switch to control the quota 
behavior in the event of down/slow cells in a multi-cell deployment 
(fail build if project has servers in down/slow cells vs allow 
potentially violating quota limits if project has servers in down/slow 
cells). So, being able to leverage the placement API for /usages is not 
considered critical, since we have an interim plan.


-melanie




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][xenapi] can we deprecate the xenapi-specific 'nova-console' service?

2018-10-03 Thread melanie witt

Greetings Devs and Ops,

Today I noticed that our code does not handle the 'nova-console' service 
properly in a multi-cell deployment and given that no one has complained 
or reported bugs about it, we're wondering if anyone still uses the 
nova-console service. The documentation [1] says that the nova-console 
service is a "XenAPI-specific service that most recent VNC proxy 
architectures do not use."


Can anyone from xenapi land shed some light on whether the nova-console 
service is still useful in deployments using the xenapi driver, or is it 
an old relic that we should deprecate and remove?


Thanks for your help,
-melanie

[1] https://docs.openstack.org/nova/latest/admin/remote-console-access.html

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] team photos from the Stein PTG

2018-10-03 Thread melanie witt

Hey all,

Sorry for not posting this earlier but here's a direct link to our photo 
folder on dropbox for the Stein PTG team photos:


https://www.dropbox.com/sh/2pmvfkstudih2wf/AAB--3TRAFaU2qN7GKDj_eZha/Nova?dl=0_nav_tracking=1

You can view and download them from there ^. I think they came out 
really nice and the funny ones gave me a chuckle. :)


Cheers,
-melanie

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [placement] The "intended purpose" of traits

2018-09-28 Thread melanie witt

On Fri, 28 Sep 2018 15:42:23 -0500, Eric Fried wrote:

On 09/28/2018 09:41 AM, Balázs Gibizer wrote:



On Fri, Sep 28, 2018 at 3:25 PM, Eric Fried  wrote:

It's time somebody said this.

Every time we turn a corner or look under a rug, we find another use
case for provider traits in placement. But every time we have to have
the argument about whether that use case satisfies the original
"intended purpose" of traits.

That's only reason I've ever been able to glean: that it (whatever "it"
is) wasn't what the architects had in mind when they came up with the
idea of traits. We're not even talking about anything that would require
changes to the placement API. Just, "Oh, that's not a *capability* -
shut it down."

Bubble wrap was originally intended as a textured wallpaper and a
greenhouse insulator. Can we accept the fact that traits have (many,
many) uses beyond marking capabilities, and quit with the arbitrary
restrictions?


How far are we willing to go? Does an arbitrary (key: value) pair
encoded in a trait name like key_`str(value)` (e.g. CURRENT_TEMPERATURE:
85 encoded as CUSTOM_TEMPERATURE_85) something we would be OK to see in
placement?


Great question. Perhaps TEMPERATURE_DANGEROUSLY_HIGH is okay, but
TEMPERATURE_ is not. This thread isn't about setting
these parameters; it's about getting us to a point where we can discuss
a question just like this one without running up against:

"That's a hard no, because you shouldn't encode key/value pairs in traits."

"Oh, why's that?"

"Because that's not what we intended when we created traits."

"But it would work, and the alternatives are way harder."

"-1"

"But..."

"-1"
I think it's not so much about the intention when traits were created 
and more about what UX callers of the API are left with, if we were to 
recommend representing everything with traits and not providing another 
API for key-value use cases. We need to think about what the maintenance 
of their deployments will look like if traits are the only tool we provide.


I get that we don't want to put artificial restrictions on how API 
callers can and can't use the traits API, but will they be left with a 
manageable experience if that's all that's available?


I don't have time right now to come up with a really great example, but 
I'm thinking along the lines of, can this get out of hand (a la "flavor 
explosion") for an operator using traits to model what their compute 
hosts can do?


Please forgive the oversimplified example I'm going to try to use to 
illustrate my concern:


We all agree we can have traits for resource providers like:

* HAS_SSD
* HAS_GPU
* HAS_WINDOWS

But things get less straightforward when we think of traits like:

* HAS_OWNER_CINDER
* HAS_OWNER_NEUTRON
* HAS_OWNER_CYBORG
* HAS_RAID_0
* HAS_RAID_1
* HAS_RAID_5
* HAS_RAID_6
* HAS_RAID_10
* HAS_NUMA_CELL_0
* HAS_NUMA_CELL_1
* HAS_NUMA_CELL_2
* HAS_NUMA_CELL_3

I'm concerned about a lot of repetition here and maintenance headache 
for operators. That's where the thoughts about whether we should provide 
something like a key-value construct to API callers where they can 
instead say:


* OWNER=CINDER
* RAID=10
* NUMA_CELL=0

for each resource provider.

If I'm off base with my example, please let me know. I'm not a placement 
expert.


Anyway, I hope that gives an idea of what I'm thinking about in this 
discussion. I agree we need to pick a direction and go with it. I'm just 
trying to look out for the experience operators are going to be using 
this and maintaining it in their deployments.


Cheers,
-melanie













__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [ironic] [nova] [tripleo] Deprecation of Nova's integration with Ironic Capabilities and ComputeCapabilitiesFilter

2018-09-27 Thread melanie witt

On Thu, 27 Sep 2018 17:23:26 -0500, Matt Riedemann wrote:

On 9/27/2018 3:02 PM, Jay Pipes wrote:

A great example of this would be the proposed "deploy template" from
[2]. This is nothing more than abusing the placement traits API in order
to allow passthrough of instance configuration data from the nova flavor
extra spec directly into the nodes.instance_info field in the Ironic
database. It's a hack that is abusing the entire concept of the
placement traits concept, IMHO.

We should have a way *in Nova* of allowing instance configuration
key/value information to be passed through to the virt driver's spawn()
method, much the same way we provide for user_data that gets exposed
after boot to the guest instance via configdrive or the metadata service
API. What this deploy template thing is is just a hack to get around the
fact that nova doesn't have a basic way of passing through some collated
instance configuration key/value information, which is a darn shame and
I'm really kind of annoyed with myself for not noticing this sooner. :(


We talked about this in Dublin through right? We said a good thing to do
would be to have some kind of template/profile/config/whatever stored
off in glare where schema could be registered on that thing, and then
you pass a handle (ID reference) to that to nova when creating the
(baremetal) server, nova pulls it down from glare and hands it off to
the virt driver. It's just that no one is doing that work.


If I understood correctly, that discussion was around adding a way to 
pass a desired hardware configuration to nova when booting an ironic 
instance. And that it's something that isn't yet possible to do using 
the existing ComputeCapabilitiesFilter. Someone please correct me if I'm 
wrong there.


That said, I still don't understand why we are talking about deprecating 
the ComputeCapabilitiesFilter if there's no supported way to replace it 
yet. If boolean traits are not enough to replace it, then we need to 
hold off on deprecating it, right? Would the 
template/profile/config/whatever in glare approach replace what the 
ComputeCapabilitiesFilter is doing or no? Sorry, I'm just not clearly 
understanding this yet.


-melanie




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Stein PTG summary

2018-09-26 Thread melanie witt

Hello everybody,

I've written up a high level summary of the discussions we had at the 
PTG -- please feel free to reply to this thread to fill in anything I've 
missed.


We used our PTG etherpad:

https://etherpad.openstack.org/p/nova-ptg-stein

as an agenda and each topic we discussed was filled in with agreements, 
todos, and action items during the discussion. Please check out the 
etherpad to find notes relevant to your topics of interest, and reach 
out to us on IRC in #openstack-nova, on this mailing list with the 
[nova] tag, or by email to me if you have any questions.


Now, onto the high level summary:

Rocky retrospective
===
We began Wednesday morning with a retro on the Rocky cycle and captured 
notes on this etherpad:


https://etherpad.openstack.org/p/nova-rocky-retrospective

The runways review process was seen as overall positive and helped get 
some blueprint implementations merged that had languished in previous 
cycles. We agreed to continue with the runways process as-is in Stein 
and use it for approved blueprints. We did note that we could do better 
at queuing important approved work into runways, such as 
placement-related efforts that were not added to runways last cycle.


We discussed whether or not to move the spec freeze deadline back to 
milestone 1 (we used milestone 2 in Rocky). I have an action item to dig 
into whether or not the late breaking regressions we found at RC time:


https://etherpad.openstack.org/p/nova-rocky-release-candidate-todo

were related to the later spec freeze at milestone 2. The question we 
want to answer is: did a later spec freeze lead to implementations 
landing later and resulting in the late detection of regressions at 
release candidate time?


Finally, we discussed a lot of things around project management, 
end-to-end themes for a cycle, and people generally not feeling they had 
clarity throughout the cycle about which efforts and blueprints were 
most important, aside from runways. We got a lot of work done in Rocky, 
but not as much of it materialized into user-facing features and 
improvements as it did in Queens. Last cycle, we had thought runways 
would capture what is a priority at any given time, but looking back, we 
determined it would be helpful if we still had over-arching 
goals/efforts/features written down for people to refer to throughout 
the cycle. We dove deeper into that discussion on Friday during the hour 
before lunch, where we came up with user-facing themes we aim to 
accomplish in the Stein cycle:


https://etherpad.openstack.org/p/nova-ptg-stein-priorities

Note that these are _not_ meant to preempt anything in runways, these 
are just 1) for my use as a project manager and 2) for everyone's use to 
keep a bigger picture of our goals for the cycle in their heads, to aid 
in their work and review outside of runways.


Themes
==
With that, I'll briefly mention the themes we came up with for the cycle:

* Compute nodes capable to upgrade and exist with nested resource 
providers for multiple GPU types


* Multi-cell operational enhancements: resilience to "down" or 
poor-performing cells and cross-cell instance migration


* Volume-backed user experience and API hardening: ability to specify 
volume type during boot-from-volume, detach/attach of root volume, and 
volume-backed rebuild


These are the user-visible features and functionality we aim to deliver 
and we'll keep tabs on these efforts throughout the cycle to keep them 
making progress.


Placement
=
As usual, we had a lot of discussions on placement-related topics, so 
I'll try to highlight the main things that stand out to me. Please see 
the "Placement" section of our PTG etherpad for all the details and 
additional topics we discussed.


We discussed the regression in behavior that happened when we removed 
the Aggregate[Core|Ram|Disk]Filters from the scheduler filters -- these 
filters allowed operators to set overcommit allocation ratios per 
aggregate instead of per host. We agreed on the importance of restoring 
this functionality and hashed out a concrete plan, with two specs needed 
to move forward:


https://review.openstack.org/552105
https://review.openstack.org/544683

The other standout discussions were around the placement extraction and 
closing the gaps in nested resource providers. For the placement 
extraction, we are focusing on full support of an upgrade from 
integrated placement => extracted placement, including assisting with 
making sure deployment tools like OpenStack-Ansible and TripleO are able 
to support the upgrade. For closing the gaps in nested resource 
providers, there are many parts to it that are documented on the 
aforementioned PTG etherpads. By closing the gaps with nested resource 
providers, we'll open the door for being able to support minimum 
bandwidth scheduling as well.


Cells
=
On cells, the main discussions were around resiliency "down" and 
poor-performing cells and 

[openstack-dev] [nova] review runways for Stein are open

2018-09-26 Thread melanie witt
Just wanted to remind everyone that review runways for Stein are OPEN. 
Please feel free to add your approved, ready-for-review blueprints to 
the queue:


https://etherpad.openstack.org/p/nova-runways-stein

Cheers,
-melanie

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] summit forum topic submission deadline Wed Sep 26

2018-09-24 Thread melanie witt

Hey everyone,

This is a reminder that the deadline for submitting summit forum topics 
is coming up soon in two days, Wed Sep 26. Please see our forum topic 
etherpad for links to forum information and instructions on how to 
submit a topic:


https://etherpad.openstack.org/p/nova-forum-stein

Cheers,
-melanie

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][ptl] reduced PTL availability next week Sep 17

2018-09-14 Thread melanie witt

Hey all,

This is just a heads up that I'll be off-site in Boston for work next 
week, so I won't be available on IRC (but I will be replying 
asynchronously to IRC messages and emails when I can).


Gibi will be running the nova meeting on Thursday Sep 20 at 1400 UTC.

I'm going to work on the PTG session summaries for the ML and 
documenting Stein cycle themes next week. I'm thinking to document the 
themes as part of the cycle priorities doc [1].


We've updated the PTG etherpad [2] with action items and agreements for 
all of the topics we covered. Please take a look at the etherpad to find 
what actions and agreements relevant to your topics of interest.


We'll also kick off runways for Stein [3] next week. So, please feel 
free to start adding approved, ready-for-review items to the queue. And 
nova-core can start populating runways.


If you have any questions about PTG topics or runways, just ask us in 
#openstack-nova on IRC or send a mail to the dev mailing list.


Cheers,
-melanie

[1] 
https://specs.openstack.org/openstack/nova-specs/priorities/stein-priorities.html

[2] https://etherpad.openstack.org/p/nova-ptg-stein
[3] https://etherpad.openstack.org/p/nova-runways-stein


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] 2018 User Survey results

2018-09-11 Thread melanie witt

Hey all,

The foundation sent me a copy of 2018 user survey responses to the 
following question about Nova:


"How important is it to be able to customize Nova in your deployment, 
e.g. classload your own managers/drivers, use hooks, plug in API 
extensions, etc?"


Note: this question populates for any user who indicates they are in 
production or testing with the Nova project. It is not a required 
question, so these responses do not necessarily include every user.


There were a total of 373 responses.

The number of responses per multiple choice answer were:

- "Not important; I use pretty much stock Nova with maybe some small 
patches or bug fixes that aren't upstream.": 173 (46.4%)


- "Somewhat important; I have some custom scheduler filters and other 
small patches but nothing major.": 144 (38.6%)


- "Very important; my Nova deployment is heavily customized and 
hooks/plugins/custom APIs are a major part of my operation.": 56 (15.0%)


And I made a google sheets chart out of the responses which you can view 
here:


https://docs.google.com/spreadsheets/d/e/2PACX-1vSFG4ev8VsMMsYXgQHC7Y24WXfdSp6YdwiGX3MGvCsYZ50qG8Po-2i7vOCppJEq8051skxzvb42GIUV/pubhtml?gid=584107382=true

Cheers,
-melanie

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][placement] openstack/placement governance switch plan

2018-09-10 Thread melanie witt

Howdy everyone,

Those of us involved in the placement extraction process sat down 
together today to discuss the plan for openstack/placement governance. 
We agreed on a set of criteria which we will use to determine when we 
will switch the openstack/placement governance from the compute project 
to its own project. I'd like to update everyone with a summary of the 
plan we agreed upon.


Attendees: Balázs Gibizer, Chris Dent, Dan Smith, Ed Leafe, Eric Fried, 
Jay Pipes, Matt Riedemann, Melanie Witt, Mohammed Naser, Sylvain Bauza


The targets we have set are:

- Devstack/grenade job that executes an upgrade which deploys the 
extracted placement code
- Support in one of the deployment tools to deploy extracted placement 
code (Tripleo)
- An upgrade job using any deployment tool (this might have to be a 
manual test by a deployment tool team member if none of the deployment 
tools have an upgrade job)
- Implementation of nested vGPU resource support in the xenapi and 
libvirt drivers
- Functional test with vGPU resources that verifies reshaping of flat 
vGPU resources to nested vGPU resources and successful scheduling to the 
same compute host after reshaping

- Lab test with real hardware of the same ^ (xenapi and libvirt)

Once we have achieved these targets, we will switch openstack/placement 
governance from the compute project to its own project. The 
placement-core team will flatten nova-core into individual members of 
placement-core so it may evolve, the PTL of openstack/placement will be 
the same as the openstack/nova PTL for the remainder of the release 
cycle, and the electorate for the openstack/placement PTL election for 
the next release cycle will be determined by the commit history of the 
extracted placement code repo, probably by date, to include contributors 
from the previous two release cycles, as per usual.


Thank you to Mohammed for facilitating the discussion, we really 
appreciate it.


Cheers,
-melanie





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Stein Forum brainstorming

2018-09-06 Thread melanie witt

Greetings all,

Apparently, we have 6 days left [1] to brainstorm topic ideas for the 
Forum at the Berlin summit and the submission period begins on September 12.


Please feel free to use this etherpad to as a place to capture topic 
ideas [2]. I've added it to the list of etherpads on the forum wiki [3].


Cheers,
-melanie

[1] 
http://lists.openstack.org/pipermail/openstack-dev/2018-September/134336.html

[2] https://etherpad.openstack.org/p/nova-forum-stein
[3] 
https://wiki.openstack.org/wiki/Forum/Berlin2018#Etherpads_from_Teams_and_Working_Groups


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] No weekly meeting on Thursday September 13

2018-09-05 Thread melanie witt

On Tue, 4 Sep 2018 20:36:45 -0500, Matt Riedemann wrote:

On 9/4/2018 4:13 PM, melanie witt wrote:

The next meeting will be on Thursday September 20 at 1400 UTC [1].

I'm assuming we're going to have a meeting*this*  week though, right?


Yes, sorry if I worded that in a confusing way. We have a meeting 
tomorrow September 6 at 1400 UTC, we will _not_ meet during PTG week on 
Thursday September 13, and then we resume meeting on Thursday September 
20 at 1400 UTC.


-melanie




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] extraction (technical) update

2018-09-04 Thread melanie witt

On Tue, 4 Sep 2018 16:16:31 -0500, Eric Fried wrote:

030 is okay as long as nothing goes wrong. If something does it
raises exceptions which would currently fail as the exceptions are
not there. See below for more about exceptions.

Maybe I'm misunderstanding what these migration thingies are supposed to
be doing, but 030 [1] seems like it's totally not applicable to
placement and should be removed. The placement database doesn't (and
shouldn't) have 'flavors', 'cell_mappings', or 'host_mappings' tables in
the first place.

What am I missing?


* Presumably we can trim the placement DB migrations to just stuff
   that is relevant to placement

Yah, I would hope so. What possible reason could there be to do otherwise?


Yes, we should definitely trim the placement DB migrations to only 
things relevant to placement. And we can use this opportunity to get rid 
of cruft too and squash all of the placement migrations together to 
start at migration 1 for the placement repo. If anyone can think of a 
problem with doing that, please shout it out.


-melanie




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] No weekly meeting on Thursday September 13

2018-09-04 Thread melanie witt

Hi everyone,

This is just a reminder we won't have a weekly Nova meeting on Thursday 
September 13 because of PTG week. The next meeting will be on Thursday 
September 20 at 1400 UTC [1].


Cheers,
-melanie

[1] https://wiki.openstack.org/wiki/Meetings/Nova

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Nominating Chris Dent for placement-core

2018-08-31 Thread melanie witt

On Fri, 31 Aug 2018 10:45:14 -0500, Eric Fried wrote:

The openstack/placement project [1] and its core team [2] have been
established in gerrit.

I hereby nominate Chris Dent for membership in the placement-core team.
He has been instrumental in the design, implementation, and stewardship
of the placement API since its inception and has shown clear and
consistent leadership.

As we are effectively bootstrapping placement-core at this time, it
would seem appropriate to consider +1/-1 responses from heavy placement
contributors as well as existing cores (currently nova-core).

[1]https://review.openstack.org/#/admin/projects/openstack/placement
[2]https://review.openstack.org/#/admin/groups/1936,members


+1




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron][nova] Small bandwidth demo on the PTG

2018-08-30 Thread melanie witt

On Thu, 30 Aug 2018 12:43:06 -0500, Miguel Lavalle wrote:

Gibi, Bence,

In fact, I added the demo explicitly to the Neutron PTG agenda from 1:30 
to 2, to give it visiblilty


I'm interested in seeing the demo too. Will the demo be shown at the 
Neutron room or the Nova room? Historically, lunch has ended at 1:30, so 
this will be during the same time as the Neutron/Nova cross project 
time. Should we just co-locate together for the demo and the session? I 
expect anyone watching the demo will want to participate in the 
Neutron/Nova session as well. Either room is fine by me.


-melanie

On Thu, Aug 30, 2018 at 3:55 AM, Balázs Gibizer 
mailto:balazs.gibi...@ericsson.com>> wrote:


Hi,

Based on the Nova PTG planning etherpad [1] there is a need to talk
about the current state of the bandwidth work [2][3]. Bence
(rubasov) has already planned to show a small demo to Neutron folks
about the current state of the implementation. So Bence and I are
wondering about bringing that demo close to the nova - neutron cross
project session. That session is currently planned to happen
Thursday after lunch. So we are think about showing the demo right
before that session starts. It would start 30 minutes before the
nova - neutron cross project session.

Are Nova folks also interested in seeing such a demo?

If you are interested in seeing the demo please drop us a line or
ping us in IRC so we know who should we wait for.

Cheers,
gibi

[1] https://etherpad.openstack.org/p/nova-ptg-stein

[2]

https://specs.openstack.org/openstack/neutron-specs/specs/rocky/minimum-bandwidth-allocation-placement-api.html


[3]

https://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/bandwidth-resource-provider.html




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev







__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][vmware] need help triaging a vmware driver bug

2018-08-29 Thread melanie witt

On Wed, 29 Aug 2018 13:59:26 +0300, Radoslav Gerganov wrote:

On 23.08.2018 23:27, melanie witt wrote:

So, I think we could add something to the launchpad bug template to link to a doc that explains 
tips about reporting VMware related bugs. I suggest linking to a doc because the bug template 
is already really long and looks like it would be best to have something short, like, "For 
tips on reporting VMware virt driver bugs, see this doc: " and provide a link 
to, for example, a openstack wiki about the VMware virt driver (is there one?). The question 
is, where can we put the doc? Wiki? Or maybe here at the bottom [1]? Let me know what you think.


Sorry for the late reply, I was on PTO last week.  I have posted a patch which adds a 
"Troubleshooting" section to the VMware documentation in Nova:

https://review.openstack.org/#/c/597446

If this is OK then we can add a link to this particular paragraph in the bug 
template.


Thanks, Rado. The doc patch has merged and the change propagated to the 
published docs, so I've added a link to it in the bug template.


Cheers,
-melanie




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] extraction (technical) update

2018-08-27 Thread melanie witt

On Mon, 27 Aug 2018 10:31:50 -0500, Matt Riedemann wrote:

On 8/24/2018 7:36 AM, Chris Dent wrote:


Over the past few days a few of us have been experimenting with
extracting placement to its own repo, as has been discussed at
length on this list, and in some etherpads:

https://etherpad.openstack.org/p/placement-extract-stein
https://etherpad.openstack.org/p/placement-extraction-file-notes

As part of that, I've been doing some exploration to tease out the
issues we're going to hit as we do it. None of this is work that
will be merged, rather it is stuff to figure out what we need to
know to do the eventual merging correctly and efficiently.

Please note that doing that is just the near edge of a large
collection of changes that will cascade in many ways to many
projects, tools, distros, etc. The people doing this are aware of
that, and the relative simplicity (and fairly immediate success) of
these experiments is not misleading people into thinking "hey, no
big deal". It's a big deal.

There's a strategy now (described at the end of the first etherpad
listed above) for trimming the nova history to create a thing which
is placement. From the first run of that Ed created a github repo
and I branched that to eventually create:

https://github.com/EdLeafe/placement/pull/2

In that, all the placement unit and functional tests are now
passing, and my placecat [1] integration suite also passes.

That work has highlighted some gaps in the process for trimming
history which will be refined to create another interim repo. We'll
repeat this until the process is smooth, eventually resulting in an
openstack/placement.


We talked about the github strategy a bit in the placement meeting today
[1]. Without being involved in this technical extraction work for the
past few weeks, I came in with a different perspective on the end-game,
and it was not aligned with what Chris/Ed thought as far as how we get
to the official openstack/placement repo.

At a high level, Ed's repo [2] is a fork of nova with large changes on
top using pull requests to do things like remove the non-placement nova
files, update import paths (because the import structure changes from
nova.api.openstack.placement to just placement), and then changes from
Chris [3] to get tests working. Then the idea was to just use that to
seed the openstack/placement repo and rather than review the changes
along the way*, people that care about what changed (like myself) would
see the tests passing and be happy enough.

However, I disagree with this approach since it bypasses our community
code review system of using Gerrit and relying on a core team to approve
changes at the sake of expediency.

What I would like to see are the changes that go into making the seed
repo and what gets it to passing tests done in gerrit like we do for
everything else. There are a couple of options on how this is done though:

1. Seed the openstack/placement repo with the filter_git_history.sh
script output as Ed has done here [4]. This would include moving the
placement files to the root of the tree and dropping nova-specific
files. Then make incremental changes in gerrit like with [5] and the
individual changes which make up Chris's big pull request [3]. I am
primarily interested in making sure there are not content changes
happening, only mechanical tree-restructuring type changes, stuff like
that. I'm asking for more changes in gerrit so they can be sanely
reviewed (per normal).

2. Eric took a slightly different tack in that he's OK with just a
couple of large changes (or even large patch sets within a single
change) in gerrit rather than ~30 individual changes. So that would be
more like at most 3 changes in gerrit for [4][5][3].

3. The 3rd option is we just don't use gerrit at all and seed the
official repo with the results of Chris and Ed's work in Ed's repo in
github. Clearly this would be the fastest way to get us to a new repo
(at the expense of bucking community code review and development process
- is an exception worth it?).

Option 1 would clearly be a drain on at least 2 nova cores to go through
the changes. I think Eric is on board for reviewing options 1 or 2 in
either case, but he prefers option 2. Since I'm throwing a wrench in the
works, I also need to stand up and review the changes if we go with
option 1 or 2. Jay said he'd review them but consider these reviews
lower priority. I expect we could get some help from some other nova
cores though, maybe not on all changes, but at least some (thinking
gibi, alex_xu, sfinucan).

Any CI jobs would be non-voting while going through options 1 or 2 until
we get to a point that tests should finally be passing and we can make
them voting (it should be possible to control this within the repo
itself using zuul v3).

I would like to know from others (nova core or otherwise) what they
would prefer, and if you are a nova core that wants option 1 (or 2) are
you willing to help review those incremental changes knowing it will 

[openstack-dev] [nova][neutron] cross project time at the PTG

2018-08-27 Thread melanie witt

Howdy everyone,

We've scheduled cross project time for Neutron/Nova at the PTG from 
~1:30pm-3pm after lunch on Thursday in the Nova room. Please add topics 
you'd like to discuss during our cross project time to the etherpad in 
the Neutron section at L136:


https://etherpad.openstack.org/p/nova-ptg-stein

Based on the number of topics added, we can add more time to the session 
before lunch (~11:20am - lunch) if needed and do part 1 before lunch and 
part 2 after lunch.


Cheers,
-melanie

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][ironic] cross project time at the PTG

2018-08-27 Thread melanie witt

Howdy everyone,

We've scheduled cross project time for Ironic/Nova at the PTG from 
~3:30pm-5pm on Thursday in the Nova room. Please add topics you'd like 
to discuss during our cross project time to the etherpad in the Ironic 
section at L139:


https://etherpad.openstack.org/p/nova-ptg-stein

Cheers,
-melanie

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][cinder] cross project time at the PTG

2018-08-27 Thread melanie witt

Howdy everyone,

We've scheduled cross project time for Cinder/Nova at the PTG from 
9am-11am on Thursday in the Nova room. Please add topics you'd like to 
discuss during our cross project time to the etherpad at L133:


https://etherpad.openstack.org/p/nova-ptg-stein

Cheers,
-melanie

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder][neutron] Cross-cell cold migration

2018-08-27 Thread melanie witt

On Fri, 24 Aug 2018 10:44:16 -0500, Jay S Bryant wrote:

I haven't checked the PTG agenda yet, but is there a meeting on this?
Because we may want to have one to try to understand the requirements
and figure out if there's a way to do it with current Cinder
functionality of if we'd need something new.

Gorka,

I don't think that this has been put on the agenda yet.  Might be good
to add.  I don't think we have a cross project time officially planned
with Nova.  I will start that discussion with Melanie so that we can
cover the couple of cross projects subjects we have.


Just to update everyone, we've schedule Cinder/Nova cross project time 
for Thursday 9am-11am at the PTG, please add topics starting at L134 in 
the Cinder section:


https://etherpad.openstack.org/p/nova-ptg-stein

Cheers,
-melanie




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][vmware] need help triaging a vmware driver bug

2018-08-23 Thread melanie witt

On Fri, 17 Aug 2018 10:50:30 +0300, Radoslav Gerganov wrote:

Hi,

On 17.08.2018 04:10, melanie witt wrote:


Can anyone help triage this bug?



I have requested more info from the person who submitted this and provided some 
tips how to correlate nova-compute logs to vCenter logs in order to better 
understand what went wrong.
Would it be possible to include this kind of information in the Launchpad bug 
template for VMware related bugs?


Thank you for your help, Rado.

So, I think we could add something to the launchpad bug template to link 
to a doc that explains tips about reporting VMware related bugs. I 
suggest linking to a doc because the bug template is already really long 
and looks like it would be best to have something short, like, "For tips 
on reporting VMware virt driver bugs, see this doc: " and provide 
a link to, for example, a openstack wiki about the VMware virt driver 
(is there one?). The question is, where can we put the doc? Wiki? Or 
maybe here at the bottom [1]? Let me know what you think.


-melanie

[1] 
https://docs.openstack.org/nova/latest/admin/configuration/hypervisor-vmware.html






__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [nova] [placement] placement below or beside compute after extraction?

2018-08-22 Thread melanie witt

On Wed, 22 Aug 2018 09:49:13 -0400, Doug Hellmann wrote:

Excerpts from melanie witt's message of 2018-08-21 15:05:00 -0700:

On Tue, 21 Aug 2018 16:41:11 -0400, Doug Hellmann wrote:

Excerpts from melanie witt's message of 2018-08-21 12:53:43 -0700:

On Tue, 21 Aug 2018 06:50:56 -0500, Matt Riedemann wrote:

At this point, I think we're at:

1. Should placement be extracted into it's own git repo in Stein while
nova still has known major issues which will have dependencies on
placement changes, mainly modeling affinity?

2. If we extract, does it go under compute governance or a new project
with a new PTL.

As I've said, I personally believe that unless we have concrete plans
for the big items in #1, we shouldn't hold up the extraction. We said in
Dublin we wouldn't extract to a new git repo in Rocky but we'd work up
to that point so we could do it in Stein, so this shouldn't surprise
anyone. The actual code extraction and re-packaging and all that is
going to be the biggest technical issue with all of this, and will
likely take all of stein to complete it after all the bugs are shaken out.

For #2, I think for now, in the interim, while we deal with the
technical headache of the code extraction itself, it's best to leave the
new repo under compute governance so the existing team is intact and we
don't conflate the people issue with the technical issue at the same
time. Get the hard technical part done first, and then we can move it
out of compute governance. Once it's in its own git repo, we can change
the core team as needed but I think it should be initialized with
existing nova-core.


I'm in support of extracting placement into its own git repo because
Chris has done a lot of work to reduce dependencies in placement and
moving it into its own repo would help in not having to keep chasing
that. As has been said before, I think all of us agree that placement
should be separate as an end goal. The question is when to fully
separate it from governance.

It's true that we don't have concrete plans for affinity modeling and
shared storage modeling. But I think we do have concrete plans for vGPU
enhancements (being able to have different vGPU types on one compute
host and adding support for traits). vGPU support is an important and
highly sought after feature for operators and users, as we witnessed at
the last Summit in Vancouver. vGPU support is currently using a flat
resource provider structure that needs to be migrated to nested in order
to do the enhancement work, and that's how the reshaper work came about.
(Reshaper work will migrate a flat resource provider structure to a
nested one.)

We have the nested resource provider support in placement but we need to
integrate the Nova side, leveraging the reshaper code. The reshaper code
is still going through code review, then next we have the integration to
do. I think things are bound to break when we integrate it, just because
nothing is ever perfect, as much as we scrutinize it and the real test
is when we start using it for real. I think going through this
integration would be best done *before* extraction to a new repo. But
given that there is never a "good" time to extract something to a new
repo, I am OK with the idea of doing the extraction first, if that is
what most people want to do.

What I'm concerned about on the governance piece is how things look as
far as project priorities between the two projects if they are split.
Affinity modeling and shared storage support are compute features
OpenStack operators and users need. Operators need affinity modeling in
the placement is needed to achieve parity for affinity scheduling with
multiple cells. That means, affinity scheduling in Nova with multiple
cells is susceptible to races and does *not* work as well as the
previous single cell support. Shared storage support is something
operators have badly needed for years now and was envisioned to be
solved with placement.

Given all of that, I'm not seeing how *now* is a good time to separate
the placement project under separate governance with separate goals and
priorities. If operators need things for compute, that are well-known
and that placement was created to solve, how will placement have a
shared interest in solving compute problems, if it is not part of the
compute project?



Who are candidates to be members of a review team for the placement
repository after the code is moved out of openstack/nova?

How many of them are also members of the nova-core team?


I assume you pose this question in the proposed situation I described
where placement is a repo under compute. I expect the review team to be


No, not at all. I'm trying to understand how you think a completely
separate team is going to cause problems. Because it seems like at
least a large portion, if not all, of the contributors want it, and
I need to have a very good reason for denying their request, if we
do. Right now, I understand that there are concerns, but I don't
understand 

Re: [openstack-dev] [all] [nova] [placement] placement below or beside compute after extraction?

2018-08-21 Thread melanie witt

On Tue, 21 Aug 2018 17:36:18 -0500, Eric Fried wrote:

Affinity modeling and shared storage support are compute features
OpenStack operators and users need. Operators need affinity modeling in
the placement is needed to achieve parity for affinity scheduling with
multiple cells. That means, affinity scheduling in Nova with multiple
cells is susceptible to races and does*not*  work as well as the
previous single cell support.

Sorry, I'm confused - are we talking about NUMA cells or cellsv2 cells?
If the latter, what additional placement-side support is needed to
support it?


Cells v2 cells. We were thinking about native affinity modeling in 
placement for this one because the single cell and legacy case relied on 
compute calling up to the API database to do one last check about 
whether affinity policy was violated, once the instance landed on 
compute, in a race situation. If the check failed, the build was aborted 
and sent back for rescheduling. With multiple cells and split message 
queues, compute cannot call up to the API database to do the 
late-affinity check any longer (cannot reach the API database via 
message queue). So we are susceptible to affinity policy violations 
during races with multiple cells and split message queues.


If we were able to model affinity in placement, placement could tell us 
which compute host to place the instance on, satisfying affinity policy 
and protected from races (via claiming we already do in placement).


-melanie






__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [nova] [placement] placement below or beside compute after extraction?

2018-08-21 Thread melanie witt

On Tue, 21 Aug 2018 14:55:26 -0600, Chris Friesen wrote:

On 08/21/2018 01:53 PM, melanie witt wrote:


Given all of that, I'm not seeing how *now* is a good time to separate the
placement project under separate governance with separate goals and priorities.
If operators need things for compute, that are well-known and that placement was
created to solve, how will placement have a shared interest in solving compute
problems, if it is not part of the compute project?


As someone who is not involved in the governance of nova, this seems like kind
of an odd statement for an open-source project.

  From the outside, it seems like there is a fairly small pool of active
placement developers.  And either the placement developers are willing to
implement the capabilities desired by compute or else they're not.  And if
they're not, I don't see how being under compute governance would resolve that
since the only official hard leverage the compute governance has is refusing to
review/merge placement patches (which wouldn't really help implement compute's
desires anyways).


I'm not sure I follow. As of now, placement developers are participating 
in the same priorities and goals setting as the rest of compute, each 
cycle. We discuss work that needs to be done and how to prioritize it, 
in the context of compute. We are one group.


If we separate into two different groups, all of the items I discussed 
in my previous reply will become cross-project efforts. To me, this 
means that the placement group will have their own priorities and goal 
setting process and if their priorities and goals happen to align with 
ours on certain items, we can agree to work on those in collaboration. 
But I won't make assumptions about how much alignment we will have. The 
placement group, as a hypothetical example, won't necessarily find 
helping us fix issues with compute functionality like vGPUs as important 
as we do, if we need additional work in placement to support it.


That's how I'm thinking about it, from a practical standpoint. I'm 
thinking about what it will look like delivering the functionality I 
discussed in my previous reply, for operators and users. I think it 
helps to be one group.


-melanie




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [nova] [placement] placement below or beside compute after extraction?

2018-08-21 Thread melanie witt

On Tue, 21 Aug 2018 16:41:11 -0400, Doug Hellmann wrote:

Excerpts from melanie witt's message of 2018-08-21 12:53:43 -0700:

On Tue, 21 Aug 2018 06:50:56 -0500, Matt Riedemann wrote:

At this point, I think we're at:

1. Should placement be extracted into it's own git repo in Stein while
nova still has known major issues which will have dependencies on
placement changes, mainly modeling affinity?

2. If we extract, does it go under compute governance or a new project
with a new PTL.

As I've said, I personally believe that unless we have concrete plans
for the big items in #1, we shouldn't hold up the extraction. We said in
Dublin we wouldn't extract to a new git repo in Rocky but we'd work up
to that point so we could do it in Stein, so this shouldn't surprise
anyone. The actual code extraction and re-packaging and all that is
going to be the biggest technical issue with all of this, and will
likely take all of stein to complete it after all the bugs are shaken out.

For #2, I think for now, in the interim, while we deal with the
technical headache of the code extraction itself, it's best to leave the
new repo under compute governance so the existing team is intact and we
don't conflate the people issue with the technical issue at the same
time. Get the hard technical part done first, and then we can move it
out of compute governance. Once it's in its own git repo, we can change
the core team as needed but I think it should be initialized with
existing nova-core.


I'm in support of extracting placement into its own git repo because
Chris has done a lot of work to reduce dependencies in placement and
moving it into its own repo would help in not having to keep chasing
that. As has been said before, I think all of us agree that placement
should be separate as an end goal. The question is when to fully
separate it from governance.

It's true that we don't have concrete plans for affinity modeling and
shared storage modeling. But I think we do have concrete plans for vGPU
enhancements (being able to have different vGPU types on one compute
host and adding support for traits). vGPU support is an important and
highly sought after feature for operators and users, as we witnessed at
the last Summit in Vancouver. vGPU support is currently using a flat
resource provider structure that needs to be migrated to nested in order
to do the enhancement work, and that's how the reshaper work came about.
(Reshaper work will migrate a flat resource provider structure to a
nested one.)

We have the nested resource provider support in placement but we need to
integrate the Nova side, leveraging the reshaper code. The reshaper code
is still going through code review, then next we have the integration to
do. I think things are bound to break when we integrate it, just because
nothing is ever perfect, as much as we scrutinize it and the real test
is when we start using it for real. I think going through this
integration would be best done *before* extraction to a new repo. But
given that there is never a "good" time to extract something to a new
repo, I am OK with the idea of doing the extraction first, if that is
what most people want to do.

What I'm concerned about on the governance piece is how things look as
far as project priorities between the two projects if they are split.
Affinity modeling and shared storage support are compute features
OpenStack operators and users need. Operators need affinity modeling in
the placement is needed to achieve parity for affinity scheduling with
multiple cells. That means, affinity scheduling in Nova with multiple
cells is susceptible to races and does *not* work as well as the
previous single cell support. Shared storage support is something
operators have badly needed for years now and was envisioned to be
solved with placement.

Given all of that, I'm not seeing how *now* is a good time to separate
the placement project under separate governance with separate goals and
priorities. If operators need things for compute, that are well-known
and that placement was created to solve, how will placement have a
shared interest in solving compute problems, if it is not part of the
compute project?



Who are candidates to be members of a review team for the placement
repository after the code is moved out of openstack/nova?

How many of them are also members of the nova-core team?


I assume you pose this question in the proposed situation I described 
where placement is a repo under compute. I expect the review team to be 
nova-core as a start with consideration for new additions or removals 
based on our usual process of discussion and consensus as a group. I 
expect there to be members of one group who are not members of the other 
group. But all are members of the compute project and have shared 
interest in achieving shared goals for operators and users.



What do you think those folks are more interested in working on than the
things you listed as needing to be done to support the 

Re: [openstack-dev] [all] [nova] [placement] placement below or beside compute after extraction?

2018-08-21 Thread melanie witt

On Tue, 21 Aug 2018 06:50:56 -0500, Matt Riedemann wrote:

At this point, I think we're at:

1. Should placement be extracted into it's own git repo in Stein while
nova still has known major issues which will have dependencies on
placement changes, mainly modeling affinity?

2. If we extract, does it go under compute governance or a new project
with a new PTL.

As I've said, I personally believe that unless we have concrete plans
for the big items in #1, we shouldn't hold up the extraction. We said in
Dublin we wouldn't extract to a new git repo in Rocky but we'd work up
to that point so we could do it in Stein, so this shouldn't surprise
anyone. The actual code extraction and re-packaging and all that is
going to be the biggest technical issue with all of this, and will
likely take all of stein to complete it after all the bugs are shaken out.

For #2, I think for now, in the interim, while we deal with the
technical headache of the code extraction itself, it's best to leave the
new repo under compute governance so the existing team is intact and we
don't conflate the people issue with the technical issue at the same
time. Get the hard technical part done first, and then we can move it
out of compute governance. Once it's in its own git repo, we can change
the core team as needed but I think it should be initialized with
existing nova-core.


I'm in support of extracting placement into its own git repo because 
Chris has done a lot of work to reduce dependencies in placement and 
moving it into its own repo would help in not having to keep chasing 
that. As has been said before, I think all of us agree that placement 
should be separate as an end goal. The question is when to fully 
separate it from governance.


It's true that we don't have concrete plans for affinity modeling and 
shared storage modeling. But I think we do have concrete plans for vGPU 
enhancements (being able to have different vGPU types on one compute 
host and adding support for traits). vGPU support is an important and 
highly sought after feature for operators and users, as we witnessed at 
the last Summit in Vancouver. vGPU support is currently using a flat 
resource provider structure that needs to be migrated to nested in order 
to do the enhancement work, and that's how the reshaper work came about. 
(Reshaper work will migrate a flat resource provider structure to a 
nested one.)


We have the nested resource provider support in placement but we need to 
integrate the Nova side, leveraging the reshaper code. The reshaper code 
is still going through code review, then next we have the integration to 
do. I think things are bound to break when we integrate it, just because 
nothing is ever perfect, as much as we scrutinize it and the real test 
is when we start using it for real. I think going through this 
integration would be best done *before* extraction to a new repo. But 
given that there is never a "good" time to extract something to a new 
repo, I am OK with the idea of doing the extraction first, if that is 
what most people want to do.


What I'm concerned about on the governance piece is how things look as 
far as project priorities between the two projects if they are split. 
Affinity modeling and shared storage support are compute features 
OpenStack operators and users need. Operators need affinity modeling in 
the placement is needed to achieve parity for affinity scheduling with 
multiple cells. That means, affinity scheduling in Nova with multiple 
cells is susceptible to races and does *not* work as well as the 
previous single cell support. Shared storage support is something 
operators have badly needed for years now and was envisioned to be 
solved with placement.


Given all of that, I'm not seeing how *now* is a good time to separate 
the placement project under separate governance with separate goals and 
priorities. If operators need things for compute, that are well-known 
and that placement was created to solve, how will placement have a 
shared interest in solving compute problems, if it is not part of the 
compute project?


I understand that placement wants to appeal to more consumers (by way of 
splitting governance) but at present, Nova is the only consumer. And by 
consumer, I mean Nova is the only one consuming data *from* placement 
and relying on it to do something. I don't understand why it's really 
important *right now* to separate priorities before there are other 
viable consumers. I would like to share priorities and goals, for now, 
under the compute program to best serve operators and users in solving 
the specific problems I've mentioned in my replies to this thread.


Best,
-melanie




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [nova] [placement] placement below or beside compute after extraction?

2018-08-21 Thread melanie witt

On Tue, 21 Aug 2018 10:28:26 +0100 (BST), Chris Dent wrote:

On Mon, 20 Aug 2018, Matt Riedemann wrote:


Here is an example of the concern. In Sydney we talked about adding types to
the consumers resource in placement so that nova could use placement for
counting quotas [1]. Chris considered it a weird hack but it's pretty
straight-forward from a nova consumption point of view. So if placement were
separately governed with let's say Chris as PTL, would something like that
become a holy war type issue because it's "weird" and convolutes the desire
for a minimalist API? I think Chris' stance on this particular item has
softened over time as more of a "meh" but it's a worry about extracting with
a separate team that is against changes because they are not ideal for
Placement yet are needed for a consumer of Placement. I understand this is
likely selfish on the part of the nova people that want this (including
myself) and maybe close-minded to alternative solutions to the problem (I'm
not sure if it's all been thought out end-to-end yet, Mel would likely know
the latest on this item). Anyway, I like to have examples when I'm stating
something to gain understanding, so that's what I'm trying to do here -
explain, with an example, what I said in the tc channel discussion today.


Since we're airing things out (which I think is a good thing, at
least in the long run), I'll add to this.

I think that's a pretty good example of where I did express some
resistance, especially since were it to come up again, I still would
express some (see below). But let's place that resistance in some
context.

In the epic irc discussion you mentioned that one fear is that I
might want to change the handling of microversions [2] because I'm
somewhat famously ambivalent about them. That's correct, I am.
However, I would hope that the fact that placement has one of the
easier and more flexible microversions systems around (written by
me) and I went to the trouble to extract it to a library [3] and I'm
the author of the latest revision on how to microversion [4] is
powerful evidence that once consensus is reached I will do my utmost
to make things align with our shared plans and goals.

So, with the notion of allocation or consumer types (both have been
discussed): If we start from the position that I've been with
placement from very early on and am cognizant of its several goals
and at the same time also aware of its limited "human resources" it
seems normal and appropriate to me that at least some members of the
group responsible for making it must make sure that we work to
choose the right things (of several choices) to do, in part by by
rigorously questioning additional features when existing planned
features are not yet done. In this case we might ask: is it right to
focus on incompletely thought out consumer type management for the
eventual support of quota handling (and other introspection) when we
haven't yet satisfied what has been described by some downstream
people (eglynn is example, to be specific) as job 1: getting shared
disk working correctly (which we still haven't managed across the
combined picture of nova and placement)?


On this, my recollection of what happened was that I had a topic for the 
PTG to discuss *how* we could solve the problem of quota resource 
counting by querying placement for resource usage information, given 
that one instance of placement can be shared among multiple nova 
deployments, for example. As we know, there is no way to differentiate 
in placement, which resources Nova A PUT /allocations into placement vs 
which resources Nova B PUT /allocations into placement. I was looking 
for input from the placement experts on how that could possibly be 
supported, how Nova A could GET /usages for only itself and not all 
other Novas. From what I remember, the response was that the idea of 
being able to differentiate between the owners of resource allocations 
was disliked and I felt I had no direction to go forward after the 
discussion, even to do the legwork myself to research or contribute 
support to placement.


I never thought we should *focus* on the lower priority quota handling 
work vs a higher priority item like shared storage support. But I had 
hoped to get some direction on what work or research I could do on my 
own to make progress toward being able to solve my quota problem, after 
a PTG discussion about it.


Not looking for a response here -- just sharing my experience since the 
quota handling discussion was brought up.



 From my perspective questioning additional features, so that they
are well proven, is simply part of the job and we all should be
doing it. If we are never hitting questions and disagreements we are
almost certainly running blind and our results will be less good.

Once we've hashed things out, I'll help make what we've chosen
happen. The evidence of this is everywhere. Consider this: I've
known (at least subconsciously) about the big reveal in 

Re: [openstack-dev] [all] [nova] [placement] placement below or beside compute after extraction?

2018-08-17 Thread melanie witt

On Fri, 17 Aug 2018 13:37:41 -0500, Sean Mcginnis wrote:

On Fri, Aug 17, 2018 at 12:47:10PM -0500, Ed Leafe wrote:

On Aug 17, 2018, at 12:30 PM, Dan Smith  wrote:


Splitting it out to another repository within the compute umbrella (what
do we call it these days?) satisfies the _technical_ concern of not
being able to use placement without installing the rest of the nova code
and dependency tree. Artificially creating more "perceived" distance
sounds really political to me, so let's be sure we're upfront about the
reasoning for doing that if so :)


Characterizing the proposed separation as “artificial” seems to be quite 
political in itself.



Other than currently having a common set of interested people, is there
something about placement that makes it something that should be under the
compute umbrella?
I explained why I think placement belongs under the compute umbrella for 
now in my reply [1]. My reply might have been missed in the shuffle.


-melanie

[1] 
http://lists.openstack.org/pipermail/openstack-dev/2018-August/133452.html





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [nova] [placement] placement below or beside compute after extraction?

2018-08-17 Thread melanie witt

On Fri, 17 Aug 2018 16:51:10 +0100 (BST), Chris Dent wrote:


Earlier I posted a message about a planning etherpad for the
extraction of placement

http://lists.openstack.org/pipermail/openstack-dev/2018-August/133319.html
https://etherpad.openstack.org/p/placement-extract-stein

One of the goals of doing the planning and having the etherpad was
to be able to get to the PTG with some of the issues resolved so
that what little time we had at the PTG could be devoted to
resolving any difficult technical details we uncovered in the lead
up.

One of the questions that has come up on the etherpad is about how
placement should be positioned, as a project, after the extraction.
The options are:

* A repo within the compute project
* Its own project, either:
* working towards being official and governed
* official and governed from the start

The etherpad has some discussion about this, but since that etherpad
is primarily for listing out the technical concerns I thought it
might be useful to bring the discussion out into a wider audience,
in a medium more oriented towards discussion. As placement is a
service targeted to serving the entire OpenStack community, talking
about it widely seems warranted.

The outcome I'd like to see happen is the one that makes sure
placement becomes useful to the most people and is worked on by the
most people, as quickly as possible. If how it is arranged as a
project will impact that, now is a good time to figure that out.

If you have thoughts about this, please share them in response.


Thanks for kicking off this discussion, Chris.

I'd like to see placement extracted as a repo within the compute 
project, as a start. My thinking is, placement was developed to solve 
several long-standing problems and limitations in Nova (including poor 
filter scheduler performance, parallel scheduling races, resource 
tracker issues, and shared storage accounting, just to name a few).


We've seen exciting progress in finally solving a lot of these issues as 
we've been developing placement. But, there is still a significant 
amount of important work to do in Nova that depends on placement. For 
example, we need to integrate nested resource providers into the virt 
drivers in Nova to leverage it for vGPUs and NUMA modeling. We need 
affinity modeling in placement to properly handle affinity with multiple 
cells. We need shared storage accounting to properly handle disk usage 
for deployments on shared storage.


As we've worked to develop placement and use it in Nova, we've found in 
most cases that we've had to develop the Nova side and the placement 
side together, at the same time, to make things work. This isn't really 
surprising, as with any brand new functionality, it's difficult to 
fulfill a use case completely without integrating things together and 
iterating until everything works. Given that, I'd rather see placement 
stay under compute so we can iterate quickly, as we still need to 
develop new features in placement and exercise them for the first time, 
in Nova. Once the major aforementioned efforts have been figured out and 
landed with close coordination, I think it would make more sense to look 
at placement being outside of the compute project.


Cheers,
-melanie





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][vmware] need help triaging a vmware driver bug

2018-08-16 Thread melanie witt

Hello VMware peeps,

I've been trying to triage a bug in New status for the VMware driver 
without success:


https://bugs.launchpad.net/nova/+bug/1744182 - can not create instance 
when using vmware nova driver


I tend to think the problem is not related to nova because the instance 
create fails with a message that sounds related to the VMware backend:


2018-01-18 06:40:01.738 7 ERROR nova.compute.manager 
[req-bc40738a-a3ee-4d9c-bd67-32e6fb32df08 
32e0ed602bc549f48f7caf401420b628 7179dd1be7ef4cf2906b41b97970a0f6 - 
default default] [instance: b4b7cabe-f78b-40d9-8856-3b6c213efd73] 
Instance failed to spawn: VimFaultException: An error occurred during 
host configuration.

Faults: ['PlatformConfigFault']

And VMware CI has been running in the gate and successfully creating 
instances during the tempest tests.


Can anyone help triage this bug?

Thanks in advance.

Best,
-melanie

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] nova-specs is open for Stein

2018-08-16 Thread melanie witt

Hey all,

Just wanted to give a quick heads up that the nova-specs repo [1] is now 
open for Stein spec proposals. Here's a link to the docs on the spec 
process:


https://specs.openstack.org/openstack/nova-specs/readme.html

Cheers,
-melanie

[1] https://github.com/openstack/nova-specs

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Stein subteam tracking etherpad now available

2018-08-15 Thread melanie witt

Hi all,

For those of us who use the global team etherpad for helping organize 
reviews for subteams, I just wanted to give a heads up that I've copied 
over the content from the Rocky etherpad [1] and created a new etherpad 
for us to use for Stein [2]. I've removed some things that appeared 
completely unused.


Please feel free to start using the Stein etherpad to help organize 
review for your subteam (note: this is separate from runways and is just 
a way for subteams to coordinate review of non-runway work, like bug 
fixes, etc). If you have a subteam or topic that is missing from the 
etherpad, feel free to add it and use the space for organizing your 
subteam reviews.


Let me know if you have any questions.

Best,
-melanie

[1] https://etherpad.openstack.org/p/rocky-nova-priorities-tracking
[2] https://etherpad.openstack.org/p/stein-nova-subteam-tracking

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Rocky blueprint burndown chart

2018-08-15 Thread melanie witt

Howdy everyone,

Keeping with the tradition of Matt's burndown charts from previous 
cycles [1][2], I have a burndown chart for the Rocky cycle [3] to share 
with you. Apologies for the gap in the data -- I had an issue keeping up 
with the count for that time period. I also focused on only Approved vs 
Completed counts this time. And finally, there are overlapping labels 
for "Spec Review Sprint" on June 5 and "r-2, spec freeze" on June 7 that 
are hard to read, and I didn't find a way to adjust their position in 
google sheets.


Comparing final numbers to Queens
-
Max approved for Queens: 53
Max approved for Rocky: 72
Final completed for Queens: 42
Final completed for Rocky: 59

Our completion percentage of approved blueprints in Queens was 79.2% and 
in Rocky it was 81.9%. We approved far more blueprints in Rocky than we 
did in Queens, but the completion rate was similar.


With runways, we were looking to increase our completion percentage by 
focusing on reviewing the same approved things at the same time but we 
simultaneously were more ambitious with what we approved. So we ended up 
with a similar completion percentage. This doesn't seem like a bad thing 
in that, we completed more blueprints than we did last cycle (and 
presumably got more done overall), but we're still having trouble with 
our approval rate of blueprints that we can realistically finish in one 
cycle.


I think part of the miss on the number of approvals might be because we 
extended the spec freeze date to milestone r-2 because of runways, 
thinking that if we completed enough things, we could approve more 
things. We didn't predict that accurately but with the experience, my 
hope is we can do better in Stein. We could consider moving spec freeze 
back to milestone s-1 or have rough criteria on whether to approve more 
blueprints close to s-2 (for example, if 30%? of approved blueprints 
have been completed, OK to approve more).


If you have feedback or thoughts on any of this, feel free to reply to 
this thread or add your comments to the Rocky retrospective etherpad [4] 
and we can discuss at the PTG.


Cheers,
-melanie

[1] 
http://lists.openstack.org/pipermail/openstack-dev/2017-September/121875.html
[2] 
http://lists.openstack.org/pipermail/openstack-dev/2018-February/127402.html
[3] 
https://docs.google.com/spreadsheets/d/e/2PACX-1vQicKStmnQFcOdnZU56ynJmn8e0__jYsr4FWXs3GrDsDzg1hwHofvJnuSieCH3ExbPngoebmEeY0waH/pubhtml?gid=128173249=true

[4] https://etherpad.openstack.org/p/nova-rocky-retrospective

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] about live-resize down the instance

2018-08-13 Thread melanie witt

On Mon, 13 Aug 2018 09:46:41 -0600, Chris Friesen wrote:

On 08/13/2018 02:07 AM, Rambo wrote:

Hi,all

I find it is important that live-resize the instance in production
environment,especially live downsize the disk.And we have talked it many
years.But I don't know why the bp[1] didn't approved.Can you tell me more about
this ?Thank you very much.

[1]https://review.openstack.org/#/c/141219/



It's been reviewed a number of times...I thought it was going to get approved
for Rocky, but I think it didn't quite make it in...you'd have to ask the nova
cores why not.

It should be noted though that the above live-resize spec explicitly did not
cover resizing smaller, only larger.


From what I find in the PTG notes [1] and the spec, it looks like this 
didn't go forward for lack of general interest. We have a lot of work to 
review every cycle and we generally focus on functionality that impact 
operators the most and look for +1s on specs from operators who are 
interested in the features. From what I can tell from the 
comments/votes, there isn't much/any operator interest about live-resize.


As has been mentioned, resize down is hypervisor-specific whether or not 
it's supported. For example, in the libvirt driver, resize down of 
ephemeral disk is not allowed at all and resize down of root disk is 
only allowed if the instance is boot-from-volume [2]. The xenapi driver 
disallows resize down of ephemeral disk [3], the vmware driver disallows 
resize down of root disk [4], the hyperv driver disallows resize down of 
root disk [5].


So, allowing only live-resize up would be a way to behave consistently 
across virt drivers.


-melanie

[1] https://etherpad.openstack.org/p/nova-ptg-rocky L690
[2] 
https://github.com/openstack/nova/blob/afe4512bf66c89a061b1a7ccd3e7ac8e3b1b284d/nova/virt/libvirt/driver.py#L8243-L8246
[3] 
https://github.com/openstack/nova/blob/afe4512bf66c89a061b1a7ccd3e7ac8e3b1b284d/nova/virt/xenapi/vmops.py#L1357-L1359
[4] 
https://github.com/openstack/nova/blob/afe4512bf66c89a061b1a7ccd3e7ac8e3b1b284d/nova/virt/vmwareapi/vmops.py#L1421-L1427
[5] 
https://github.com/openstack/nova/blob/afe4512bf66c89a061b1a7ccd3e7ac8e3b1b284d/nova/virt/hyperv/migrationops.py#L107-L114




















__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][ptg] Stein PTG planning and Rocky retrospective etherpads

2018-08-03 Thread melanie witt

Howdy folks,

I think I forgot to send an email to alert everyone that we have a 
planning etherpad [1] for the Stein PTG where we're collecting topics of 
interest for discussion at the PTG.


Please add your topics and include your nick with your topics and 
comments so we know who to talk to about the topics.


In usual style, we also have a Rocky retrospective etherpad [2] where we 
can fill in "what went well" and "what went not so well" to discuss at 
the PTG and see if we've made improvements in areas of concern from last 
time and gather concrete actions we can take to improve going forward 
for things we are not doing as well as we could.


Cheers,
-melanie

[1] https://etherpad.openstack.org/p/nova-ptg-stein
[2] https://etherpad.openstack.org/p/nova-rocky-retrospective

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-02 Thread melanie witt

On Thu, 2 Aug 2018 13:20:43 -0500, Eric Fried wrote:

And we could do the same kind of approach with the non-granular request
groups by reducing the single large SQL statement that is used for all
resources and all traits (and all agg associations) into separate SELECT
statements.

It could be slightly less performance-optimized but more readable and
easier to output debug logs like those above.


Okay, but first we should define the actual problem(s) we're trying to
solve, as Chris says, so we can assert that it's worth the (possible)
perf hit and (definite) dev resources, not to mention the potential for
injecting bugs.


The problem is an infamous one, which is, your users are trying to boot 
instances and they get "No Valid Host" and an instance in ERROR state. 
They contact support, and now support is trying to determine why 
NoValidHost happened. In the past, they would turn on DEBUG log level on 
the nova-scheduler, try another request, and take a look at the 
scheduler logs. They'd see a message, for example, "DiskFilter [start: 
2, end: 0]" (there were 2 candidates before DiskFilter ran and there 
were 0 after it ran) when the scheduling fails, indicating that 
scheduling failed because no computes were reporting enough disk to 
fulfill the request. The key thing here is they could see which resource 
was not available in their cluster.


Now, with placement, all the resources are checked in one go and support 
can't tell which resource or trait was rejected, assuming it wasn't all 
of them. They want to know what resource or trait was rejected in order 
to help them find the problematic compute host or configuration or other 
and fix it.


At present, I think the only approach support could take is to query a 
view of resource providers with their resource and trait availability 
and compare against the request flavor that failed, to figure out which 
resources or traits don't pass what's reported as available.


Hope that helps.

-melanie

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-01 Thread melanie witt

On Wed, 1 Aug 2018 12:17:36 -0500, Ben Nemec wrote:



On 08/01/2018 11:23 AM, Chris Friesen wrote:

On 08/01/2018 09:58 AM, Andrey Volkov wrote:

Hi,

It seems you need first to check what placement knows about resources
of your cloud.
This can be done either with REST API [1] or with osc-placement [2].
For osc-placement you could use:

pip install osc-placement
openstack allocation candidate list --resource DISK_GB=20 --resource
MEMORY_MB=2048 --resource VCPU=1 --os-placement-api-version 1.10

And you can explore placement state with other commands like openstack
resource
provider list, resource provider inventory list, resource provider
usage show.



Unfortunately this doesn't help figure out what the missing resources
were *at the time of the failure*.

The fact that there is no real way to get the equivalent of the old
detailed scheduler logs is a known shortcoming in placement, and will
become more of a problem if/when we move more complicated things like
CPU pinning, hugepages, and NUMA-awareness into placement.

The problem is that getting useful logs out of placement would require
significant development work.


Yeah, in my case I only had one compute node so it was obvious what the
problem was, but if I had a scheduling failure on a busy cloud with
hundreds of nodes I don't see how you would ever track it down.  Maybe
we need to have a discussion with operators about how often they do
post-mortem debugging of this sort of thing?


I think it's definitely a significant issue that troubleshooting "No 
allocation candidates returned" from placement is so difficult. However, 
it's not straightforward to log detail in placement when the request for 
allocation candidates is essentially "SELECT * FROM nodes WHERE cpu 
usage < needed and disk usage < needed and memory usage < needed" and 
the result is returned from the API.


I think better logging is something we want to have, so if anyone has 
ideas around it, do share them.


-melanie




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][ptl][election] PTL candidacy for the Stein cycle

2018-07-30 Thread melanie witt

Hello Stackers,

I'd like to announce my candidacy for PTL of Nova for the Stein cycle. I 
feel like Rocky has been a whirlwind of a cycle with a lot of active 
participation by developers, operators, and users. Thank you all for 
bearing with me for the past cycle as I have learned the ropes of being 
a PTL.


We accomplished a lot in Rocky, with some highlights including:

* Experimented with a new review process, "review runways"
* Began using the new Neutron port binding API to minimize network 
downtime during live migrations
* Completed the placement side of nested resource providers (Nova 
integration work still remains)
* Volume-backed instances will no longer claim root_gb for new instances 
and existing instances will heal during move operations

* Made progress on removing nova-network-specific REST APIs
* Added a  nova-manage command to purge archived shadow table data
* Doing more pre-filtering in placement before we iterate over compute 
host candidates with FilterScheduler filters

* Added the ability to boot instances with trusted virtual functions
* Added the ability to disable a cell in cells v2
* Added a way to mitigate Meltdown/Spectre performance degradation via 
cpu flags


Looking toward Stein, we have more work to do with integrating placement 
nested resource providers into Nova, implementing migration of flat 
resource providers => nested tree-based resource providers in placement, 
adding more resiliency in cells v2 for handling "down" or poorer 
performing cells, removing nova-network, and more to be discussed and 
prioritized at the PTG [1].


It would be my privilege to serve the Nova community for another cycle 
and if elected, I endeavor to do a better job using what I have learned 
during the Rocky cycle. I am always trying to improve, so please feel 
free to share your feedback with me.


Thank you for your consideration.

Best,
-melanie

[1] https://etherpad.openstack.org/p/nova-ptg-stein

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [requirements][release] FFE for os-vif 1.11.1

2018-07-26 Thread melanie witt

On Thu, 26 Jul 2018 13:01:18 -0500, Matthew Thode wrote:

On 18-07-26 10:43:05, melanie witt wrote:

Hello,

I'd like to ask for an exception to add os-vif 1.11.1 to stable/rocky. The
current release for rocky, 1.11.0, added a new feature: the NoOp Plugin, but
it's not actually usable (it's not being loaded) because we missed adding a
file to the setup.cfg.

We have fixed the problem in a one liner add to setup.cfg [1] and we would
like to be able to do another release 1.11.1 for rocky to include this fix.
That way, the NoOp Plugin feature advertised in the release notes [2] for
rocky would be usable for consumers.

[1] https://review.openstack.org/585530
[2] 
https://docs.openstack.org/releasenotes/os-vif/unreleased.html#relnotes-1-11-0



Yep, we talked about it in the release channel.

+++--++
| Repository | Filename 
  | Line | Text   |
+++--++
| kuryr-kubernetes   | requirements.txt
   |   18 | os-vif!=1.8.0,>=1.7.0 # Apache-2.0 |
| nova   | requirements.txt
   |   59 | os-vif!=1.8.0,>=1.7.0 # Apache-2.0 |
| nova-lxd   | requirements.txt
   |7 | os-vif!=1.8.0,>=1.9.0 # Apache-2.0 |
| networking-bigswitch   | requirements.txt
   |6 | os-vif>=1.1.0 # Apache-2.0 |
| networking-bigswitch   | test-requirements.txt   
   |   25 | os-vif>=1.1.0 # Apache-2.0 |
| networking-midonet | test-requirements.txt   
   |   40 | os-vif!=1.8.0,>=1.7.0 # Apache-2.0 |
+++--++

All these projects would need re-releases if you plan on raising the
minimum.  They would also need reviews submitted individually for that.
A upper-constraint only fix would not need that, but would also still
allow consumers to encounter the bug, up to you to decide.
LGTM otherwise.


We don't need to raise the minimum -- this will just be a small update 
to fix the existing 1.11.0 release. Thanks!


-melanie

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [requirements][release] FFE for os-vif 1.11.1

2018-07-26 Thread melanie witt

Hello,

I'd like to ask for an exception to add os-vif 1.11.1 to stable/rocky. 
The current release for rocky, 1.11.0, added a new feature: the NoOp 
Plugin, but it's not actually usable (it's not being loaded) because we 
missed adding a file to the setup.cfg.


We have fixed the problem in a one liner add to setup.cfg [1] and we 
would like to be able to do another release 1.11.1 for rocky to include 
this fix. That way, the NoOp Plugin feature advertised in the release 
notes [2] for rocky would be usable for consumers.


Cheers,
-melanie

[1] https://review.openstack.org/585530
[2] 
https://docs.openstack.org/releasenotes/os-vif/unreleased.html#relnotes-1-11-0





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] keypair quota usage info for user

2018-07-26 Thread melanie witt

On Thu, 26 Jul 2018 09:19:38 -0600, Chris Friesen wrote:

On 07/25/2018 06:21 PM, Alex Xu wrote:



2018-07-26 0:29 GMT+08:00 William M Edmonds mailto:edmon...@us.ibm.com>>:


 Ghanshyam Mann mailto:gm...@ghanshyammann.com>>
 wrote on 07/25/2018 05:44:46 AM:
 ... snip ...
 > 1. is it ok to show the keypair used info via API ? any original
 > rational not to do so or it was just like that from starting.

 keypairs aren't tied to a tenant/project, so how could nova track/report a
 quota for them on a given tenant/project? Which is how the API is
 constructed... note the "tenant_id" in GET 
/os-quota-sets/{tenant_id}/detail


Keypairs usage is only value for the API 'GET
/os-quota-sets/{tenant_id}/detail?user_id={user_id}'


The objection is that keypairs are tied to the user, not the tenant, so it
doesn't make sense to specify a tenant_id in the above query.

And for Pike at least I think the above command does not actually show how many
keypairs have been created by that user...it still shows zero.


Yes, for Pike during the re-architecting of quotas to count resources 
instead of tracking usage separately, we kept the "always zero" count 
for usage of keypairs, server group members, and security group rules, 
so as not to change the behavior. It's been my understanding that we 
would need a microversion to change any of those to actually return a 
count. It's true the counts would not make sense under the 'tenant_id' 
part of the URL though.


-melanie




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Bug 1781710 killing the check queue

2018-07-18 Thread melanie witt

On Wed, 18 Jul 2018 15:14:55 -0500, Matt Riedemann wrote:

On 7/18/2018 1:13 PM, melanie witt wrote:

Can we get rid of multi-create?  It keeps causing complications, and
it already
has weird behaviour if you ask for min_count=X and max_count=Y and only X
instances can be scheduled.  (Currently it fails with NoValidHost, but
it should
arguably start up X instances.)

We've discussed that before but I think users do use it and appreciate
the ability to boot instances in batches (one request). The behavior you
describe could be changed with a microversion, though I'm not sure if
that would mean we have to preserve old behavior with the previous
microversion.

Correct, we can't just remove it since that's a backward incompatible
microversion change. Plus, NFV people*love*  it.


Sorry, I think I might have caused confusion with my question about a 
microversion. I was saying that to change the min_count=X and 
max_count=Y behavior of raising NoValidHost if X can be satisfied but Y 
can't, I thought we could change that in a microversion. And I wasn't 
sure if that would also mean we would have to keep the old behavior for 
previous microversions (and thus maintain both behaviors).


-melanie





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Bug 1781710 killing the check queue

2018-07-18 Thread melanie witt

On Wed, 18 Jul 2018 12:05:13 -0600, Chris Friesen wrote:

On 07/18/2018 10:14 AM, Matt Riedemann wrote:

As can be seen from logstash [1] this bug is hurting us pretty bad in the check
queue.

I thought I originally had this fixed with [2] but that turned out to only be
part of the issue.

I think I've identified the problem but I have failed to write a recreate
regression test [3] because (I think) it's due to random ordering of which
request spec we select to send to the scheduler during a multi-create request
(and I tried making that predictable by sorting the instances by uuid in both
conductor and the scheduler but that didn't make a difference in my test).


Can we get rid of multi-create?  It keeps causing complications, and it already
has weird behaviour if you ask for min_count=X and max_count=Y and only X
instances can be scheduled.  (Currently it fails with NoValidHost, but it should
arguably start up X instances.)


We've discussed that before but I think users do use it and appreciate 
the ability to boot instances in batches (one request). The behavior you 
describe could be changed with a microversion, though I'm not sure if 
that would mean we have to preserve old behavior with the previous 
microversion.



After talking with Sean Mooney, we have another fix which is self-contained to
the scheduler [5] so we wouldn't need to make any changes to the RequestSpec
handling in conductor. It's admittedly a bit hairy, so I'm asking for some eyes
on it since either way we go, we should get going soon before we hit the FF and
RC1 rush which *always* kills the gate.


One of your options mentioned using RequestSpec.num_instances to decide if it's
in a multi-create.  Is there any reason to persist RequestSpec.num_instances?
It seems like it's only applicable to the initial request, since after that each
instance is managed individually.

Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev







__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Denver Stein ptg planning

2018-07-11 Thread melanie witt

Hello Devs and Ops,

I've created an etherpad where we can start collecting ideas for topics 
to cover at the Stein PTG. Please feel free to add your comments and 
topics with your IRC nick next to it to make it easier to discuss with you.


https://etherpad.openstack.org/p/nova-ptg-stein

Cheers,
-melanie

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] review runway status

2018-07-11 Thread melanie witt

Howdy everyone,

Here is the current review runway [1] status for blueprints in runways. 
Milestone r-3 (feature freeze) is coming up soon July 26, so this will 
be the last runway before FF unless we can complete some earlier than 
their end dates.


* Allow abort live migrations in queued status 
https://blueprints.launchpad.net/nova/+spec/abort-live-migration-in-queued-status 
(Kevin Zheng) [END DATE: 2018-07-25] starts here 
https://review.openstack.org/563505


* Add z/VM driver 
https://blueprints.launchpad.net/nova/+spec/add-zvm-driver-rocky 
(jichen) [END DATE: 2018-07-25] starts here 
https://review.openstack.org/523387


* Support traits in Glance 
https://blueprints.launchpad.net/nova/+spec/glance-image-traits 
(arvindn05) [END DATE: 2018-07-25] last patch 
https://review.openstack.org/569498


Best,
-melanie

[1] https://etherpad.openstack.org/p/nova-runways-rocky

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Rocky blueprint status tracking

2018-07-11 Thread melanie witt

On Fri, 15 Jun 2018 16:12:21 -0500, Matt Riedemann wrote:

On 6/15/2018 11:23 AM, melanie witt wrote:

Similar to last cycle, we have an etherpad for tracking the status of
approved nova blueprints for Rocky here:

https://etherpad.openstack.org/p/nova-rocky-blueprint-status

that we can use to help us review patches. If I've missed any blueprints
or if anything needs an update, please add a note on the etherpad and
we'll get it sorted.


Thanks for doing this, I find it very useful to get an overall picture
of where we're sitting in the final milestone.


Milestone r-3 (feature freeze) is just around the corner July 26 and 
I've refreshed the status tracking etherpad, mostly because some of the 
wayward blueprints are now ready for review. There are 3 blueprints 
which have only one patch left to merge before they're complete.


Please check out the etherpad and use it as a guide for your reviews, so 
we can complete as many blueprints as we can before FF. And please add 
notes or move/add blueprints that I might have missed.


Thanks all,
-melanie





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack] [nova][api] Novaclient redirect endpoint https into http

2018-07-05 Thread melanie witt

+openstack-dev@

On Wed, 4 Jul 2018 14:50:26 +, Bogdan Katynski wrote:

But, I can not use nova command, endpoint nova have been redirected from https 
to http. Here:http://prntscr.com/k2e8s6  (command: nova –insecure service list)

First of all, it seems that the nova client is hitting /v2.1 instead of /v2.1/ 
URI and this seems to be triggering the redirect.

Since openstack CLI works, I presume it must be using the correct URL and hence 
it’s not getting redirected.

  
And this is error log: Unable to establish connection tohttp://192.168.30.70:8774/v2.1/: ('Connection aborted.', BadStatusLine("''",))
  

Looks to me that nova-api does a redirect to an absolute URL. I suspect SSL is 
terminated on the HAProxy and nova-api itself is configured without SSL so it 
redirects to an http URL.

In my opinion, nova would be more load-balancer friendly if it used a relative 
URI in the redirect but that’s outside of the scope of this question and since 
I don’t know the context behind choosing the absolute URL, I could be wrong on 
that.


Thanks for mentioning this. We do have a bug open in python-novaclient 
around a similar issue [1]. I've added comments based on this thread and 
will consult with the API subteam to see if there's something we can do 
about this in nova-api.


-melanie

[1] https://bugs.launchpad.net/python-novaclient/+bug/1776928




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Rocky blueprint status tracking

2018-06-15 Thread melanie witt

Howdy everyone,

Similar to last cycle, we have an etherpad for tracking the status of 
approved nova blueprints for Rocky here:


https://etherpad.openstack.org/p/nova-rocky-blueprint-status

that we can use to help us review patches. If I've missed any blueprints 
or if anything needs an update, please add a note on the etherpad and 
we'll get it sorted.


Thanks,
-melanie

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova-cells-v1 intermittent gate failure

2018-06-14 Thread melanie witt

On Wed, 13 Jun 2018 15:47:33 -0700, Melanie Witt wrote:

Hi everybody,

Just a heads up that we have an intermittent gate failure of the
nova-cells-v1 job happening right now [1] and a revert of the tempest
change related to it has been approved [2] and will be making its way
through the gate. The nova-cells-v1 job will be failing until [2] merges.

-melanie

[1] https://bugs.launchpad.net/nova/+bug/1776684
[2] https://review.openstack.org/575132


The fix [2] has merged, so it is now safe to recheck your changes that 
were caught up in the nova-cells-v1 gate failure.


Thanks,
-melanie





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] nova-cells-v1 intermittent gate failure

2018-06-13 Thread melanie witt

Hi everybody,

Just a heads up that we have an intermittent gate failure of the 
nova-cells-v1 job happening right now [1] and a revert of the tempest 
change related to it has been approved [2] and will be making its way 
through the gate. The nova-cells-v1 job will be failing until [2] merges.


-melanie

[1] https://bugs.launchpad.net/nova/+bug/1776684
[2] https://review.openstack.org/575132

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] review runways check-in and feedback

2018-06-13 Thread melanie witt

Howdy everyone,

We've been experimenting with a new process this cycle, Review Runways 
[1] and we're about at the middle of the cycle now as we had the r-2 
milestone last week June 7.


I wanted to start a thread and gather thoughts and feedback from the 
nova community about how they think runways have been working or not 
working and lend any suggestions to change or improve as we continue on 
in the rocky cycle.


We decided to try the runways process to increase the chances of core 
reviewers converging on the same changes and thus increasing reviews and 
merges on approved blueprint work. As of today, we have 69 blueprints 
approved and 28 blueprints completed, we just passed r-2 June 7 and r-3 
is July 26 and rc1 is August 9 [2].


Do people feel like they've been receiving more review on their 
blueprints? Does it seem like we're completing more blueprints earlier? 
Is there feedback or suggestions for change that you can share?


Thanks all,
-melanie

[1] https://etherpad.openstack.org/p/nova-runways-rocky
[2] https://wiki.openstack.org/wiki/Nova/Rocky_Release_Schedule

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] review runway status

2018-06-11 Thread melanie witt

Hi everybody,

This is just a brief status about the blueprints currently occupying
review runways [0] and an ask for the nova-core team to give these
reviews priority for their code review focus.

* Certificate Validation - 
https://blueprints.launchpad.net/nova/+spec/nova-validate-certificates 
(bpoulos) [END DATE: 2018-06-15] 
https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/nova-validate-certificates


* Neutron new port binding API for live migration: 
https://blueprints.launchpad.net/nova/+spec/neutron-new-port-binding-api 
(mriedem) [END DATE: 2018-06-20] Starts here: 
https://review.openstack.org/#/c/558001/


* XenAPI: improve the image handler 
configure:https://blueprints.launchpad.net/nova/+spec/xenapi-image-handler-option-improvement 
(naichuans) [END DATE: 2018-06-20] Starts here: 
https://review.openstack.org/#/c/486475/


Best,
-melanie

[0] https://etherpad.openstack.org/p/nova-runways-rocky

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] increasing the number of allowed volumes attached per instance > 26

2018-06-07 Thread melanie witt

Hello Stackers,

Recently, we've received interest about increasing the maximum number of 
allowed volumes to attach to a single instance > 26. The limit of 26 is 
because of a historical limitation in libvirt (if I remember correctly) 
and is no longer limited at the libvirt level in the present day. So, 
we're looking at providing a way to attach more than 26 volumes to a 
single instance and we want your feedback.


We'd like to hear from operators and users about their use cases for 
wanting to be able to attach a large number of volumes to a single 
instance. If you could share your use cases, it would help us greatly in 
moving forward with an approach for increasing the maximum.


Some ideas that have been discussed so far include:

A) Selecting a new, higher maximum that still yields reasonable 
performance on a single compute host (64 or 128, for example). Pros: 
helps prevent the potential for poor performance on a compute host from 
attaching too many volumes. Cons: doesn't let anyone opt-in to a higher 
maximum if their environment can handle it.


B) Creating a config option to let operators choose how many volumes 
allowed to attach to a single instance. Pros: lets operators opt-in to a 
maximum that works in their environment. Cons: it's not discoverable for 
those calling the API.


C) Create a configurable API limit for maximum number of volumes to 
attach to a single instance that is either a quota or similar to a 
quota. Pros: lets operators opt-in to a maximum that works in their 
environment. Cons: it's yet another quota?


Please chime in with your use cases and/or thoughts on the different 
approaches.


Thanks for your help,
-melanie

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] proposal to postpone nova-network core functionality removal to Stein

2018-06-06 Thread melanie witt

On Thu, 31 May 2018 15:04:53 -0500, Matt Riedemann wrote:

On 5/31/2018 1:35 PM, melanie witt wrote:


This cycle at the PTG, we had decided to start making some progress
toward removing nova-network [1] (thanks to those who have helped!) and
so far, we've landed some patches to extract common network utilities
from nova-network core functionality into separate utility modules. And
we've started proposing removal of nova-network REST APIs [2].

At the cells v2 sync with operators forum session at the summit [3], we
learned that CERN is in the middle of migrating from nova-network to
neutron and that holding off on removal of nova-network core
functionality until Stein would help them out a lot to have a safety net
as they continue progressing through the migration.

If we recall correctly, they did say that removal of the nova-network
REST APIs would not impact their migration and Surya Seetharaman is
double-checking about that and will get back to us. If so, we were
thinking we can go ahead and work on nova-network REST API removals this
cycle to make some progress while holding off on removing the core
functionality of nova-network until Stein.

I wanted to send this to the ML to let everyone know what we were
thinking about this and to receive any additional feedback folks might
have about this plan.

Thanks,
-melanie

[1] https://etherpad.openstack.org/p/nova-ptg-rocky L301
[2] https://review.openstack.org/567682
[3]
https://etherpad.openstack.org/p/YVR18-cellsv2-migration-sync-with-operators
L30


As a reminder, this is the etherpad I started to document the nova-net
specific compute REST APIs which are candidates for removal:

https://etherpad.openstack.org/p/nova-network-removal-rocky


Update: In the cells meeting today [4], Surya confirmed that CERN is 
okay with nova-network REST API pieces being removed this cycle while 
leaving the core functionality of nova-network intact, as they continue 
their migration from nova-network to neutron. We're tracking the 
nova-net REST API removal candidates on the aforementioned 
nova-network-removal etherpad.


-melanie

[4] 
http://eavesdrop.openstack.org/meetings/nova_cells/2018/nova_cells.2018-06-06-17.00.html






__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] spec review day next week Tuesday 2018-06-05

2018-06-04 Thread melanie witt

On Wed, 30 May 2018 12:22:20 -0700, Melanie Witt wrote:

Howdy all,

This cycle, we have our spec freeze later than usual at milestone r-2
June 7 because of the review runways system we've been trying out. We
wanted to allow more time for spec approvals as blueprints were
completed via runways.

So, ahead of the spec freeze, let's have a spec review day next week
Tuesday June 5 to ensure we get what spec approvals we can over the line
before the freeze. Please try to make some time on Tuesday to review
some specs and thanks in advance for participating!


Reminder: the spec review day is TODAY Tuesday June 5 (or tomorrow 
depending on your time zone). Please take some time to review some nova 
specs today if you can!


Cheers,
-melanie





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] proposal to postpone nova-network core functionality removal to Stein

2018-05-31 Thread melanie witt

Hello Operators and Devs,

This cycle at the PTG, we had decided to start making some progress 
toward removing nova-network [1] (thanks to those who have helped!) and 
so far, we've landed some patches to extract common network utilities 
from nova-network core functionality into separate utility modules. And 
we've started proposing removal of nova-network REST APIs [2].


At the cells v2 sync with operators forum session at the summit [3], we 
learned that CERN is in the middle of migrating from nova-network to 
neutron and that holding off on removal of nova-network core 
functionality until Stein would help them out a lot to have a safety net 
as they continue progressing through the migration.


If we recall correctly, they did say that removal of the nova-network 
REST APIs would not impact their migration and Surya Seetharaman is 
double-checking about that and will get back to us. If so, we were 
thinking we can go ahead and work on nova-network REST API removals this 
cycle to make some progress while holding off on removing the core 
functionality of nova-network until Stein.


I wanted to send this to the ML to let everyone know what we were 
thinking about this and to receive any additional feedback folks might 
have about this plan.


Thanks,
-melanie

[1] https://etherpad.openstack.org/p/nova-ptg-rocky L301
[2] https://review.openstack.org/567682
[3] 
https://etherpad.openstack.org/p/YVR18-cellsv2-migration-sync-with-operators 
L30


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] core team update

2018-05-30 Thread melanie witt

Howdy everyone,

As I'm sure many of you have noticed, Sean Dague has shifted his focus 
onto other projects outside of Nova for some time now, and with that, 
I'm removing him from the core team at this time.


I consider our team fortunate to have had the opportunity to work with 
Sean over the years and he is certainly welcome back to the core team if 
he returns to active reviewing someday in the future. Thank you Sean, 
for all of your contributions!


Best,
-melanie

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] spec review day next week Tuesday 2018-06-05

2018-05-30 Thread melanie witt

Howdy all,

This cycle, we have our spec freeze later than usual at milestone r-2 
June 7 because of the review runways system we've been trying out. We 
wanted to allow more time for spec approvals as blueprints were 
completed via runways.


So, ahead of the spec freeze, let's have a spec review day next week 
Tuesday June 5 to ensure we get what spec approvals we can over the line 
before the freeze. Please try to make some time on Tuesday to review 
some specs and thanks in advance for participating!


Cheers,
-melanie

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] review runway status

2018-05-29 Thread melanie witt

Hi everybody,

This is just a brief status about the blueprints currently occupying
review runways [0] and an ask for the nova-core team to give these
reviews priority for their code review focus.

Note that these 3 blueprints were in runways during summit week with end 
dates of 2018-05-28 and 2018-05-30. Because of significantly reduced 
review attention during the summit as core team members were in 
attendance and busy, we have extended the end date for these blueprints 
by one week to EOD next Tuesday 2018-06-05.


* PowerVM Driver 
https://blueprints.launchpad.net/nova/+spec/powervm-vscsi (esberglu) 
[END DATE: 2018-06-05] vSCSI Cinder Volume Driver: 
https://review.openstack.org/526094


* Granular Placement Policy 
https://blueprints.launchpad.net/nova/+spec/granular-placement-policy 
(mriedem) [END DATE: 2018-06-05] 
https://review.openstack.org/#/q/topic:bp/granular-placement-policy+status:open


* vGPU work in rocky 
https://blueprints.launchpad.net/nova/+spec/vgpu-rocky (naichuans) [END 
DATE: 2018-06-05] series starting at https://review.openstack.org/520313


Best,
-melanie

[0] https://etherpad.openstack.org/p/nova-runways-rocky

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] summit sessions of interest

2018-05-18 Thread melanie witt

Howdy everyone,

Here's a last-minute (sorry) list of sessions you might find interesting 
from a nova perspective. Some of these are cross-project sessions of 
general interest.


-melanie


Forum sessions
--

Monday
--

* Default Roles Mon 21, 11:35am - 12:15pm
https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21761/default-roles

* Building the path to extracting Placement from Nova Mon 21, 3:10pm - 
3:50pm

https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21716/building-the-path-to-extracting-placement-from-nova

* Ops/Devs: One community Mon 21, 4:20pm - 5:00pm
https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21747/opsdevs-one-community

* Planning to use Placement in Cinder Mon 21, 4:20pm - 5:00pm
https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21718/planning-to-use-placement-in-cinder

* Python 2 Deprecation Timeline Mon 21, 5:10pm - 5:50pm
https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21741/python-2-deprecation-timeline

Tuesday
---

* Multi-attach introduction and future direction Tue 22, 11:50am - 12:30pm
https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21732/multi-attach-introduction-and-future-direction

* Pre-emptible instances - the way forward Tue 22, 1:50pm - 2:30pm
https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21787/pre-emptible-instances-the-way-forward

* nova/neutron + ops cross-project session Tue 22, 3:30pm - 4:10pm
https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21754/novaneutron-ops-cross-project-session

* CellsV2 migration process sync with operators Tue 22, 4:40pm - 5:20pm
https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21755/cellsv2-migration-process-sync-with-operators

Wednesday
-

* Making NFV features easier to use Wed 23, 11:00am - 11:40am
https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21776/making-nfv-features-easier-to-use

* Nova - Project Onboarding Wed 23, 1:50pm - 2:30pm
https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21641/nova-project-onboarding

* Missing features in OpenStack for public clouds Wed 23, 2:40pm - 3:20pm
https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21749/missing-features-in-openstack-for-public-clouds

* API Debt Cleanup Wed 23, 4:40pm - 5:20pm
https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21881/api-debt-cleanup

Thursday


* Extended Maintenance part I: past, present and future 9:00am - 9:40am
https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21721/extended-maintenance-part-i-past-present-and-future

* Extended Maintenance part II: EM and release cycles 9:50am - 10:30am
https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21745/extended-maintenance-part-ii-em-and-release-cycles

* S Release Goals Thu 24, 11:50am - 12:30pm
https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21738/s-release-goals

* Unified Limits Thu 24, 2:40pm - 3:20pm
https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21760/unified-limits


Presentations
-

Monday
--

* Moving from CellsV1 to CellsV2 at CERN Mon 21, 11:35am - 12:15pm
https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/20667/moving-from-cellsv1-to-cellsv2-at-cern

* Call it real : Virtual GPUs in Nova Mon 21, 3:10pm - 3:50pm
https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/20802/call-it-real-virtual-gpus-in-nova

* The multi-release, multi-project road to volume multi-attach Mon 21, 
5:10pm - 5:50pm

https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/20850/the-multi-release-multi-project-road-to-volume-multi-attach

Tuesday
---

* Placement, Present and Future, in Nova and Beyond Tue 22, 4:40pm - 5:20pm
https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/20813/placement-present-and-future-in-nova-and-beyond

Wednesday
-

* Nova - Project Update Wed 23, 11:50am - 12:30pm
https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21598/nova-project-update

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] review runway status

2018-05-15 Thread melanie witt

On Tue, 15 May 2018 14:27:12 +0800, Chen Ch Ji wrote:
Thanks for the sharing, The z/VM driver spec review marked as END DATE: 
2018-05-15
Thanks a couple folks helped a lot on the review and still need more 
review activity on the patch sets, can I apply for extend the end date 
for the run way?


We haven't done any extensions on end dates for blueprints in runways. 
One of the main ideas of runways is to set a consistent time box for 
items in runways and highlight a variety of blueprints throughout the 
release cycle. We have other blueprints in the queue that are waiting 
for their two week time box in a runway too.


Authors can add their blueprints back to the end of the queue if more 
review time is needed and the blueprint will be added to a runway when 
its turn arrives again. So please feel free to do that if more review 
time is needed.


Best,
-melanie

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] review runway status

2018-05-14 Thread melanie witt

Howdy everyone,

This is just a brief status about the blueprints currently occupying
review runways [0] and an ask for the nova-core team to give these
reviews priority for their code review focus.

* Add z/VM driver
https://blueprints.launchpad.net/nova/+spec/add-zvm-driver-rocky 
(jichen) [END DATE: 2018-05-15] spec amendment

https://review.openstack.org/562154 and implementation series starting
at https://review.openstack.org/523387

* Local disk serial numbers
https://blueprints.launchpad.net/nova/+spec/local-disk-serial-numbers
(mdbooth) [END DATE: 2018-05-16] series starting at
https://review.openstack.org/526346

* PowerVM Driver (esberglu) [END DATE: 2018-05-28]
  * Snapshot 
https://blueprints.launchpad.net/nova/+spec/powervm-snapshot: 
https://review.openstack.org/#/c/543023/
  * DiskAdapter parent class 
https://blueprints.launchpad.net/nova/+spec/powervm-localdisk: 
https://review.openstack.org/#/c/549053/
  *Localdisk 
https://blueprints.launchpad.net/nova/+spec/powervm-localdisk: 
https://review.openstack.org/#/c/549300/


Cheers,
-melanie

[0] https://etherpad.openstack.org/p/nova-runways-rocky

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova-manage cell_v2 map_instances uses invalid UUID as marker in the db

2018-05-10 Thread melanie witt

On Thu, 10 May 2018 11:48:31 -0700, Dan Smith wrote:

We already store values in this field that are not 8-4-4-4-12, and the
oslo field warning is just a warning. If people feel like we need to do
something, I propose we just do this:

https://review.openstack.org/#/c/567669/

It is one of those "we normally wouldn't do this with object schemas,
but we know this is okay" sort of situations.


I'm in favor of this "solution" because, as you mentioned earlier, 
project_id/user_id aren't supposed to be restricted to UUID-only or 36 
characters anyway -- they come from the identity service and could be 
any string. We've been good about keeping with String(255) in the 
database schema for project_id/user_id originating from the identity 
service.


And, I noticed Instance.project_id is a StringField too [1]. Really, 
IMHO we should be consistent with this field type among the various 
objects for project_id/user_id.


Best,
-melanie

[1] 
https://github.com/openstack/nova/blob/e35e8d7/nova/objects/instance.py#L121


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][third-party-ci]zKVM (s390x) CI broken

2018-05-09 Thread melanie witt

On Wed, 9 May 2018 10:28:42 -0700, Melanie Witt wrote:

On Wed, 9 May 2018 13:16:45 +0200, Andreas Scheuring wrote:

The root cause seems to be bug [1]. It’s related to nova cells v2
configuration in devstack. Stephen Finucane already promised to have a
look later the day (thx!!!). I keep the CI running for now...

[1] https://bugs.launchpad.net/devstack/+bug/1770143


Thanks for opening the bug about it. I'm going to investigate it too as
it's related to my recent patch to devstack.


Update: I've proposed a fix for the bug at 
https://review.openstack.org/567298


-melanie




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][third-party-ci]zKVM (s390x) CI broken

2018-05-09 Thread melanie witt

On Wed, 9 May 2018 13:16:45 +0200, Andreas Scheuring wrote:
The root cause seems to be bug [1]. It’s related to nova cells v2 
configuration in devstack. Stephen Finucane already promised to have a 
look later the day (thx!!!). I keep the CI running for now...


[1] https://bugs.launchpad.net/devstack/+bug/1770143


Thanks for opening the bug about it. I'm going to investigate it too as 
it's related to my recent patch to devstack.


-melanie

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] review runway status

2018-05-07 Thread melanie witt

Howdy everyone,

This is just a brief status about the blueprints currently occupying
review runways [0] and an ask for the nova-core team to give these
reviews priority for their code review focus.

* XenAPI: Support a new image handler for non-FS based SRs
https://blueprints.launchpad.net/nova/+spec/xenapi-image-handler-option-improvement 


(jianghuaw_) [END DATE: 2018-05-11] series starting at
https://review.openstack.org/497201

* Add z/VM driver
https://blueprints.launchpad.net/nova/+spec/add-zvm-driver-rocky
(jichen) [END DATE: 2018-05-15] spec amendment
https://review.openstack.org/562154 and implementation series starting
at https://review.openstack.org/523387

* Local disk serial numbers 
https://blueprints.launchpad.net/nova/+spec/local-disk-serial-numbers 
(mdbooth) [END DATE: 2018-05-16] series starting at 
https://review.openstack.org/526346


Cheers,
-melanie

[0] https://etherpad.openstack.org/p/nova-runways-rocky

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][placement] Trying to summarize bp/glance-image-traits scheduling alternatives for rebuild

2018-05-02 Thread melanie witt

On Wed, 2 May 2018 17:45:37 -0500, Matt Riedemann wrote:

On 5/2/2018 5:39 PM, Jay Pipes wrote:

My personal preference is to add less technical debt and go with a
solution that checks if image traits have changed in nova-api and if so,
simply refuse to perform a rebuild.


So, what if when I created my server, the image I used, let's say
image1, had required trait A and that fit the host.

Then some external service removes (or somehow changes) trait A from the
compute node resource provider (because people can and will do this,
there are a few vmware specs up that rely on being able to manage traits
out of band from nova), and then I rebuild my server with image2 that
has required trait A. That would match the original trait A in image1
and we'd say, "yup, lgtm!" and do the rebuild even though the compute
node resource provider wouldn't have trait A anymore.

Having said that, it could technically happen before traits if the
operator changed something on the underlying compute host which
invalidated instances running on that host, but I'd think if that
happened the operator would be migrating everything off the host and
disabling it from scheduling before making whatever that kind of change
would be, let's say they change the hypervisor or something less drastic
but still image property invalidating.


This is a scenario I was thinking about too. In the land of software 
licenses, this would be analogous to removing a license from a compute 
host, say. The instance is already there but should we let a rebuild 
proceed that is going to violate the image traits currently supported by 
that host? Do we potentially prolong the life of that instance by 
letting it be re-imaged?


I'm late to this thread but I finally went through the replies and my 
thought is, we should do a pre-flight check to verify with placement 
whether the image traits requested are 1) supported by the compute host 
the instance is residing on and 2) coincide with the already-existing 
allocations. Instead of making an assumption based on "last image" vs 
"new image" and artificially limiting a rebuild that should be valid to 
go ahead. I can imagine scenarios where a user is trying to do a rebuild 
that their cloud admin says should be perfectly valid on their 
hypervisor, but it's getting rejected because old image traits != new 
image traits. It seems like unnecessary user and admin pain.


It doesn't seem correct to reject the request if the current compute 
host can fulfill it, and if I understood correctly, we have placement 
APIs we can call from the conductor to verify the image traits requested 
for the rebuild can be fulfilled. Is there a reason not to do that?


-melanie





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Virtuozzo CI status

2018-05-01 Thread melanie witt

Hi Stackers,

Lately, I noticed the Virtuozzo CI has been having some problems, for 
example on a recent example run [0], the job link is broken:


"The requested URL 
/22/563722/4/check/check-dsvm-tempest-vz7-exe-minimal/d1d1707 was not 
found on this server."


Prior to that, I noticed that the image the job was using wasn't passing 
the ImagePropertiesFilter and failing to have a successful run because 
of it.


Can anyone from the Virtuozzo subteam comment on the status of the third 
party CI?


Thanks,
-melanie

[0] https://review.openstack.org/563722

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] z/VM introducing a new config driveformat

2018-04-30 Thread melanie witt

On Mon, 30 Apr 2018 12:42:40 -0400, Matthew Treinish wrote:

On Mon, Apr 30, 2018 at 09:21:22AM -0700, melanie witt wrote:

On Fri, 27 Apr 2018 17:40:20 +0800, Chen Ch Ji wrote:

According to requirements and comments, now we opened the CI runs with
run_validation = True
And according to [1] below, for example, [2] need the ssh validation
passed the test

And there are a couple of comments need some enhancement on the logs of
CI such as format and legacy incorrect links of logs etc
the newest logs sample can be found [3] (take n-cpu as example and those
logs are with _white.html)

Also, the blueprint [4] requested by previous discussion post here again
for reference


Thank you for alerting us about the completion of the work on the z/VM CI.
The logs look much improved and ssh connectivity and metadata functionality
via config drive is being verified by tempest.

The only strange thing I noticed is it appears tempest starts multiple times
in the log [0]. Do you know what's going on there?


This is normal, it's an artifact of a few things. The first time config is
dumped to the logs is because of tempest verify-config being run as part of
devstack:

https://github.com/openstack-dev/devstack/blob/master/lib/tempest#L590

You also see the API requests this command is making being logged. Then
when the tempest tests are actually being run the config is dumped to the logs
once per test worker process. Basically every time we parse the config file at
debug log levels it get's printed to the log file.

FWIW, you can also see this in a gate run too:
http://logs.openstack.org/90/539590/10/gate/tempest-full/4b0a136/controller/logs/tempest_log.txt

A-ha, thanks for sharing all of that info. I have learned something new. :)

-melanie





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] review runway status

2018-04-30 Thread melanie witt

Howdy everyone,

This is just a brief status about the blueprints currently occupying 
review runways [0] and an ask for the nova-core team to give these 
reviews priority for their code review focus.


* XenAPI: Support a new image handler for non-FS based SRs 
https://blueprints.launchpad.net/nova/+spec/xenapi-image-handler-option-improvement 
(jianghuaw_) [END DATE: 2018-05-11] series starting at 
https://review.openstack.org/497201


* Consoles database backend: 
https://blueprints.launchpad.net/nova/+spec/convert-consoles-to-objects 
(melwitt) [END DATE: 2018-05-01] series starting at 
https://review.openstack.org/325414


* Add z/VM driver 
https://blueprints.launchpad.net/nova/+spec/add-zvm-driver-rocky 
(jichen) [END DATE: 2018-05-15] spec amendment 
https://review.openstack.org/562154 and implementation series starting 
at https://review.openstack.org/523387


Cheers,
-melanie

[0] https://etherpad.openstack.org/p/nova-runways-rocky

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] z/VM introducing a new config driveformat

2018-04-30 Thread melanie witt

On Fri, 27 Apr 2018 17:40:20 +0800, Chen Ch Ji wrote:
According to requirements and comments, now we opened the CI runs with 
run_validation = True
And according to [1] below, for example, [2] need the ssh validation 
passed the test


And there are a couple of comments need some enhancement on the logs of 
CI such as format and legacy incorrect links of logs etc
the newest logs sample can be found [3] (take n-cpu as example and those 
logs are with _white.html)


Also, the blueprint [4] requested by previous discussion post here again 
for reference


Thank you for alerting us about the completion of the work on the z/VM 
CI. The logs look much improved and ssh connectivity and metadata 
functionality via config drive is being verified by tempest.


The only strange thing I noticed is it appears tempest starts multiple 
times in the log [0]. Do you know what's going on there?


That said, since things are looking good with z/VM CI now, we've added 
the z/VM patch series back into a review runway today.


Cheers,
-melanie

[0] 
http://extbasicopstackcilog01.podc.sl.edst.ibm.com/test_logs/jenkins-check-nova-master-17444/logs/tempest.log 
from https://review.openstack.org/527658





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Rocky forum topics brainstorming

2018-04-18 Thread melanie witt

On Fri, 13 Apr 2018 08:00:31 -0700, Melanie Witt wrote:

+openstack-operators (apologies that I forgot to add originally)

On Mon, 9 Apr 2018 10:09:12 -0700, Melanie Witt wrote:

Hey everyone,

Let's collect forum topic brainstorming ideas for the Forum sessions in
Vancouver in this etherpad [0]. Once we've brainstormed, we'll select
and submit our topic proposals for consideration at the end of this
week. The deadline for submissions is Sunday April 15.

Thanks,
-melanie

[0] https://etherpad.openstack.org/p/YVR-nova-brainstorming


Just a reminder that we're collecting forum topic ideas to propose for
Vancouver and input from operators is especially important. Please add
your topics and/or comments to the etherpad [0] and we'll submit
proposals before the Sunday deadline.


Here's a list of nova-related sessions that have been proposed:

* CellsV2 migration process sync with operators:
  http://forumtopics.openstack.org/cfp/details/125

* nova/neutron + ops cross-project session:
  http://forumtopics.openstack.org/cfp/details/124

* Planning to use Placement in Cinder:
  http://forumtopics.openstack.org/cfp/details/89

* Building the path to extracting Placement from Nova:
  http://forumtopics.openstack.org/cfp/details/88

* Multi-attach introduction and future direction:
  http://forumtopics.openstack.org/cfp/details/101

* Making NFV features easier to use:
  http://forumtopics.openstack.org/cfp/details/146

A list of all proposed forum topics can be seen here:

http://forumtopics.openstack.org

Cheers,
-melanie




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] z/VM introducing a new config driveformat

2018-04-17 Thread melanie witt

On Tue, 17 Apr 2018 06:40:35 -0700, Dan Smith wrote:

I propose that we remove the z/VM driver blueprint from the runway at
this time and place it back into the queue while work on the driver
continues. At a minimum, we need to see z/VM CI running with
[validation]run_validation = True in tempest.conf before we add the
z/VM driver blueprint back into a runway in the future.


Agreed. I also want to see the CI reporting cleaned up so that it's
readable and consistent. Yesterday I pointed out some issues with the
fact that the actual config files being used are not the ones being
uploaded. There are also duplicate (but not actually identical) logs
from all services being uploaded, including things like a full compute
log from starting with the libvirt driver.


Yes, we definitely need to see all of these issues fixed.


I'm also pretty troubled by the total lack of support for the metadata
service. I know it's technically optional on our matrix, but it's a
pretty important feature for a lot of scenarios, and it's also a
dependency for other features that we'd like to have wider support for
(like attached device metadata).

Going back to the spec, I see very little detail on some of the things
raised here, and very (very) little review back when it was first
approved. I'd also like to see more detail be added to the spec about
all of these things, especially around required special changes like
this extra AE agent.


Agreed, can someone from the z/VM team please propose an update to the 
driver spec to document these details?


Thanks,
-melanie











__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


  1   2   3   >