Here's update 33.

RC2 went to the presses. The result is that we've now got claims
happening earlier and using better information. This ought to mean
that there are fewer retries and failed builds. There's some
cruftiness in the code that manages allocations that will need to be
cleaned up, and bugs and buglets keep getting found in some edge cases
but overall much forward progress. Nice work everyone.

One alternate destinations is done, the next things coming up are
getting shared providers working on the nova side, incorporating
traits in resource requests, and, eventually, nested resource providers.

Presumably at the PTG we'll decide the if/when/how of extracting
placement to its own repo.

This week I've added a section that references bugs that have not yet
seen much action.

# Most Important

Besides reviewing all the stuff in this document, another important
thing to do is to make additions and edits on the PTG etherpad (see
help wanted).

The ongoing work with allocation related functional tests (many listed
below), and the getting alternate destinations working is also
important.

# Help Wanted

There's a swathe of placement related stuff on the PTG planning
etherpad. Please add to that or make some adjustments if you think
something is missing or incomplete:

     https://etherpad.openstack.org/p/nova-ptg-queens

An important aspect of this is determining what kind of dependency
tree is involved with the work.

Also see this new next section.

# Bugs needing attention

(Bugs which are not yet in progress or beyond.)

## Current

* https://bugs.launchpad.net/nova/+bug/1712411
  Allocations may not be removed from dest node during failed migrations

* https://bugs.launchpad.net/nova/+bug/1679750
  Allocations are not cleaned up in placement for instance 'local delete' case

* https://bugs.launchpad.net/nova/+bug/1427772
  Instance that uses force-host still needs to run some filters
  (old bug, but newly relevant in a placement world)

## Old (need to be flushed or refreshed:?)

* https://bugs.launchpad.net/nova/+bug/1683858
  Allocation records do not contain overhead information

* https://bugs.launchpad.net/nova/+bug/1652099
  placement requests from n-cpu logs not found in placement-api logs

* https://bugs.launchpad.net/nova/+bug/1674694
  In placement api error responses choose poor default content-type
  (this was partially fixed in the resource tracker, but not generally.
  as described in the bug, this ought to be relatively straightforward
  to make go)

* https://bugs.launchpad.net/nova/+bug/1662867
  update_available_resource_for_node racing instance deletion
  (Is this one still relevant after all the recent changes to claim
  handling?)

# Docs

There's a stack that documents (with visual aids!) the flow of
scheduler and placement. It is pretty much ready:

    https://review.openstack.org/#/c/475810/

# Main Themes

## Alternate Destinations

There's a stack beginning at https://review.openstack.org/#/c/486215/
which proposes the bits necessary to return alternate destinations
besides the claimed destination. These will be used to do within-cell
(v2) retries in case a build can't be done on the claimed destiantion.

The spec revision for that work: https://review.openstack.org/#/c/471927/

Ed has some concerns about the complexity being created, so he wrote
up some issues at:

    https://blog.leafe.com/handling-unstructured-data/

In his response to https://review.openstack.org/#/c/495854/3 Jay
suggests a named tuple:

    I'm struck that instead of a two-tuple, both elements of the tuple
    having lists of lists, would it not be clearer to have the return
    value from select_destinations() instead be a single list of
    namedtuple elements, where the namedtuple would have a
    chosen_host, alternate_hosts, and allocation_requests attribute

## Traits

Work continues apace on getting filtering by traits working:

      https://review.openstack.org/#/c/489206/

This has some overlap with shared provider handling (below).

## Shared Resource Providers

There's some support for shared resource providers on the placement
side of the scheduling equation, but the resource tracker is not yet
ready to support it. There is some work in progress, starting with
functional tests:

     https://review.openstack.org/#/c/490733/

## Nested Resource Providers

This will start back up after we clean off the windscreen. The stack
begins at https://review.openstack.org/#/c/470575/5

# Other Code

* https://review.openstack.org/#/c/493865/
  functional tests for live migrate

* https://review.openstack.org/#/c/494136/
  Allow shuffling of best weighted hosts

* https://review.openstack.org/#/c/495159/
  tests for resource allocation during soft delete

* https://review.openstack.org/#/c/485209/
  gabbi tests for shared custom resource class

* https://review.openstack.org/#/c/495891/
  WIP: test allocation handling during scheduler retry

* https://review.openstack.org/#/c/480379/
  ensure RP maps to those RPs that share with it
  This is a requirement for getting shared providers working
  correctly.

* https://review.openstack.org/#/c/496853/
  Spec for minimal cache-headers in placement
  poc: https://review.openstack.org/#/c/495380/

* https://review.openstack.org/#/c/469048/
  Update the placement deployment instructions
  This has been around for nearly 4 months.

* https://review.openstack.org/#/c/489633/
  Update RT aggregate map less frequently

* https://review.openstack.org/#/c/494206/
  Remove the Pike migration code for flavor migration

* https://review.openstack.org/#/c/468797/
  Spec for requesting traits in flavors

* https://review.openstack.org/#/c/496933/
  Add uuid to migration table
  (This is relevant to placement and scheduling because it ought to
  make the "doubling" currently used for doing moves cleaner (by
  having two different allocations: one identified by the migration
  uuid. Aren't uuids awesome?)

* https://review.openstack.org/#/c/428481/
  Request zero root disk for boot-from-volume instances
  (Relevant for making sure that disk allocations are correct.)

* https://review.openstack.org/#/c/452006/
  Add functional test for two-cell scheduler behaviors

* https://review.openstack.org/#/c/496202/
  Add functional migrate force_complete test

* https://review.openstack.org/#/c/497399/
  WIP: Test server movings with custom resources

* https://review.openstack.org/#/c/497733/
  WIP spec Report CPU features to placement service by traits API

* https://review.openstack.org/#/c/496976/
  Centralize allocation deletion in ComputeManager

* https://review.openstack.org/#/c/496803/
  Add missing unit tests for FilterScheduler._get_all_host_states

* https://review.openstack.org/#/c/496847/
  Add missing tests for _remove_deleted_instances_allocations

* https://review.openstack.org/#/c/492247/
  Use ksa adapter for placement conf & requests

* https://review.openstack.org/#/c/492571/
  Make compute log less verbose with allocs autocorrection

* https://review.openstack.org/#/c/496936/
  De-duplicate two delete_allocation_for_* methods

--
Chris Dent                      (⊙_⊙')         https://anticdent.org/
freenode: cdent                                         tw: @anticdent
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to