We had a productive PTG and were able to discuss a great many scheduler-related topics. I've put together an etherpad [0] with a summary, reproduced below.

Expect follow-up emails about each priority item in the scheduler track from those contributors working on that area.

Best,
-jay

Placement/scheduler: Rocky PTG Summary

== Key topics ==

- Aggregates
- How we messed up operators using nova host aggregates for allocation ratios - Placement currently doesn't "auto-create" placement aggregates when nova host aggregates change

- Standardizing trait handling for virt drivers

- Placement REST API
  - Partial allocation patching
  - Removing assumptions around generation 0

- Supporting policy/RBAC

-NUMA
- Supporting both shared and dedicated CPU on the same host as well as the same instance

- vGPU handling

- Tracking ingress/egress bandwidth resources using placement

- Finally supporting live migration of CPU-pinned instances

== Agreements and decisions ==

- dansmith's "placement request filters" work is an important enabler of a number of use cases, particularly around aggregate filtering. Spec is already approved here: https://review.openstack.org/#/c/544585/

- We need a method of filtering providers that do NOT have a certain trait. This is tentatively being called "forbidden traits". Spec review here: https://review.openstack.org/548915

- For parity/consistency reasons, we should add the in_tree=<RP_UUID> query parameter to GET /resource_providers

- To assist operators, add some new osc-placement CLI commands for applying traits/allocation ratio to batches of resource providers in an aggregate

- We should allow image metadata to specify required traits in the same fashion as flavor extra specs. Spec review here: https://review.openstack.org/#/c/541507/

- virt drivers should begin reporting their CPU features as traits. Spec review here: https://review.openstack.org/#/c/497733/ - Furthermore, virt drivers should respect the cpu_model CONF option for overriding CPU-related traits

- We will eventually want to provide the ability to patch an already existing allocation - Hot-attaching a network interface is the canonical use case here. We want to add the new NIC resources to the existing allocation for the instance consumer without needing to re-PUT the entire allocation

- In order to do this, we will need to add a generation field to the consumers table, allowing multiple allocation writers to ensure their view of the consumer is consistent (TODO: need a blueprint/spec for this)

- We should extricate the standard resource classes currently defined in `nova.objects.fields.ResourceClass` into a small `os-resource-classes` library (TODO: need a blueprint/spec for this)

- We should use oslo.policy in the placement API (TODO: specless blueprint for this) - Use case here is making the transition to placement easy for operators that currently use the os-aggregates interface for managing compute resources

- Calling code should not assume the initial generation for a resource provider is zero. Spec review here: https://review.openstack.org/#/c/548903/

- Extracting placement into separate packages is not a priority, but we think incrementatl progress to extraction can be made in Rocky - Placement's microversion handling should be extracted into a separate library
  - Trimming nova imports

- We should add some support to nova-manage to assist operators using the caching scheduler to migrate to placement (and get rid of the caching

- VGPU_DISPLAY_HEAD resource class should be removed and replaced with a set of os-traits traits that indicate the maximum supported number of display heads for the vGPU type

- A new PCPU resource class should be created to describe physical CPUs (logical processors in the hardware). Virt drivers will be able to set inventories of PCPU on resource providers representing NUMA nodes and therefore use placement to track dedicated CPU resources (TODO: need a blueprint/spec for this)

- artom is going to write a spec for supporting live migration of CPU-pinned instances (and abandon the complicated old patches)

- Multiple agreements about strict minimum bandwidth support feature in nova - Spec has already been updated accordingly: https://review.openstack.org/#/c/502306/

- For now we keep the hostname as the information connecting the nova-compute and the neutron-agent on the same host but we are aiming for having the hostname as an FQDN to avoid possible ambiguity.

- We agreed not to make this feature dependent on moving the nova port create to the conductor. The current scope is to support pre-created neutron port only.

- Neutron will provide the resource request in the port API so this feature does not depend on the neutron port binding API work

- Neutron will create resource providers in placement under the compute RP. Also Neutron will report inventories on those RPs

- Nova will do the claim of the port related resources in placement and the consumer_id will be the instance UUID

- We should mirror nova host aggregate information to placement using an online data migration technique on the add/remove_host methods of nova.objects.Aggregate and a `nova-manage db online_migration` command

== Priorities for Rocky release cycle ==

1. Merge the update_provider_tree patch series (efried)

2. Placement request filters (dansmith)

3. Mirror aggregate information from nova to placement (jaypipes)

4. Forbidden traits (cdent)

== Non-priority Items for Rocky ==

- Add consumers.generation field and related API plumbing (efried and cdent)

- Support requested traits in image metadata (arvind)

- Provide CLI functionality to set traits and things like allocation ratios for a batch of resource providers via aggregate (ttsurya)

- Migrating off of the caching scheduler and on to placement (mriedem)

- Create `os-resource-classes` library and write migration code to replace `nova.objects.fields.ResourceClass` usage with calls to os_resource_classes (

- Policy/RBAC support in Placement REST API (mriedem)

- Extract placement's microversion handling into separate library (cdent)

- CPU-pinned instance live migration support (stephenfin and artom)

[0] https://etherpad.openstack.org/p/rocky-ptg-scheduler-placement-summary

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to