Re: [openstack-dev] [nova] [placement] placement update 18-30

2018-07-27 Thread Matt Riedemann

On 7/27/2018 8:07 AM, Chris Dent wrote:

# Questions

I wrote up some analysis of the way the [resource tracker talks to
placement](https://anticdent.org/novas-use-of-placement.html). It
identifies some redundancies. Actually it reinforces that some
redundancies we've known about are still there. Fixing some of these
things might count as bug fixes. What do you think?


Performance issues are definitely bugs so I think that's fair. How big 
of an impact the solution is is another thing.




* "How to deploy / model shared disk. Seems fairly straight-forward,
     and we could even maybe create a multi-node ceph job that does
     this - wouldn't that be awesome?!?!", says an enthusiastic Matt
     Riedemann.


Two updates here:

1. We've effectively disabled the shared storage provider stuff in the 
libvirt driver:


https://bugs.launchpad.net/nova/+bug/1784020

Because of the reasons listed in the bug. That's going to require a spec 
in Stein if we're going to fully support shared storage providers and 
the work items from that bug would be a good start for a spec.


2. Coincidentally, I *just* got a working ceph (single-node) CI job run 
working with a shared storage provider providing DISK_GB for the single 
compute node provider:


https://review.openstack.org/#/c/586363/

Fleshing that out for a multi-node job shouldn't be too hard.

All of that is now entered in the Stein PTG etherpad for discussion in 
Denver.




* The whens and wheres of re-shaping and VGPUs.


I'm not sure anything about this has to be documented for Rocky since we 
didn't get /reshaper done so nothing regarding VGPUs in nova changed, 
right? Except I think Sylvain fixed one VGPU gap in the libvirt driver 
which was updated in the docs, but unrelated to /reshaper.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] [placement] placement update 18-30

2018-07-27 Thread Chris Dent


HTML: https://anticdent.org/placement-update-18-30.html

This is placement update 18-30, a weekly update of ongoing development 
related to the [OpenStack](https://www.openstack.org/) [placement 
service](https://developer.openstack.org/api-ref/placement/).


# Most Important

This week is feature freeze for the Rocky cycle, so the important
stuff is watching already approved code to make sure it actually
merges, bug fixes and testing.

# What's Changed

At yesterday's meeting it was decided the pending work on the
/reshaper will be punted to early Stein. Though the API level is
nearly ready, the code that exercises it from the nova side is very
new and the calculus of confidence, review bandwidth and gate
slowness works against doing an FFE. Some references:

* 

* 


Meanwhile, pending work to get the report client using consumer
generations is also on hold:

* 

As far as I understand it no progress has been made on "Effectively
managing nested and shared resource providers when managing
allocations (such as in migrations)."

Some functionality has merged recently:

* Several changes to make the placement functional tests more
  placement oriented (use placement context, not be based on
  nova.test.TestCase).
* Add 'nova-manage placement sync_aggregates'
* Consumer generation is being used in heal allocations CLI
* Allocations schema no longer allows extra fields
* The report client is more robust about checking and retrying
  provider generations.
* If force_hosts or force_nodes is being used, don't set a limit
  when requesting allocation candidates.

# Questions

I wrote up some analysis of the way the [resource tracker talks to
placement](https://anticdent.org/novas-use-of-placement.html). It
identifies some redundancies. Actually it reinforces that some
redundancies we've known about are still there. Fixing some of these
things might count as bug fixes. What do you think?

# Bugs

* Placement related [bugs not yet in progress](https://goo.gl/TgiPXb):
   14, -1 from last week.
* [In progress placement bugs](https://goo.gl/vzGGDQ) 13, -2 on last
   week.

# Main Themes

## Documentation

Now that we are feature frozen we better document all the stuff. And
more than likely we'll find some bugs while doing that documenting.

This is a section for reminding us to document all the fun stuff we
are enabling. Open areas include:

* "How to deploy / model shared disk. Seems fairly straight-forward,
and we could even maybe create a multi-node ceph job that does
this - wouldn't that be awesome?!?!", says an enthusiastic Matt
Riedemann.

* The whens and wheres of re-shaping and VGPUs.

* Please add more here by responding to this email.

## Consumer Generations

These are in place on the placement side. There's pending work on
the client side, and a semantic fix on the server side, but neither
are going to merge this cycle.

* 
   return 404 when no consumer found in allocs

* 
   Use placement 1.28 in scheduler report client
   (1.28 is consumer gens)

## Reshape Provider Trees

On hold, but still in progress as we hope to get it merged as soon
as there is an opportunity to do so:

It's all at: 

## Mirror Host Aggregates

The command line tool merged, so this is done. It allows
aggregate-based limitation of allocation candidates, a nice little
feature that will speed things up for people.

## Extraction

I wrote up a second [blog
post](https://anticdent.org/placement-extraction-2.html) on some of
the issues associated with placement extraction. There are several
topics on the [PTG
etherpad](https://etherpad.openstack.org/p/nova-ptg-stein) related
to extraction.

# Other

Since we're at feature freeze I'm going to only include things in
the list that were already there and that might count as bug fixes
or potentially relevant for near term review.

So: 11, down from 29.

* 
Add unit test for non-placement resize

* 
Use placement.inventory.inuse in report client

* 
[placement] api-ref: add traits parameter

* 
Convert 'placement_api_docs' into a Sphinx extension

* 
   Add placement.concurrent_udpate to generation pre-checks

* 
   Delete allocations when it is re-allocated
   (This is addressing a TODO in the report client)

* 
   local disk inventory reporting related

*