Steps to reproduce with devstack, on devstack master commit
9be4ceeaa10f6ed92291e77ec52794acfb67c147
The `AggregateInstanceExtraSpecsFilter` is only added to trigger a log
message and/or scheduling failures from the stale aggregate info, extra
debug logging in _update_aggregates will show the inconsistent state
even without the added filter.
### Adding logging to the host_manager helps to see what's going on:
```
diff --git a/nova/scheduler/host_manager.py b/nova/scheduler/host_manager.py
index 8cb775a923..c9894c79fa 100644
--- a/nova/scheduler/host_manager.py
+++ b/nova/scheduler/host_manager.py
@@ -392,6 +392,8 @@ class HostManager(object):
def _update_aggregate(self, aggregate):
self.aggs_by_id[aggregate.id] = aggregate
+
+ LOG.debug(f"update for {aggregate.id} called with {aggregate.hosts}")
for host in aggregate.hosts:
self.host_aggregates_map[host].add(aggregate.id)
# Refreshing the mapping dict to remove all hosts that are no longer
```
### Local.conf:
```
[[local|localrc]]
ADMIN_PASSWORD=secret
DATABASE_PASSWORD=$ADMIN_PASSWORD
RABBIT_PASSWORD=$ADMIN_PASSWORD
SERVICE_PASSWORD=$ADMIN_PASSWORD
VIRT_DRIVER=fake
NUMBER_FAKE_NOVA_COMPUTE=10
[[post-config|$NOVA_CONF]]
# just addition of AggregateInstanceExtraSpecsFilter to exercise the issue
[filter_scheduler]
enabled_filters =
ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,SameHostFilter,DifferentHostFilter,AggregateInstanceExtraSpecsFilter
```
### aggregate and flavor setup for AggregateInstanceExtraSpecsFilter
```
openstack aggregate create test_agg
openstack aggregate set --property "test=true" test_agg
openstack flavor create --ram 512 --disk 1 --vcpus 1 test_flavor
openstack flavor set --property "aggregate_instance_extra_specs:test=true"
test_flavor
```
### add hosts to aggregate in parallel
It is not guaranteed to trigger the issue, so several attempts may be needed.
Looking at the debug logs from host manager will show if the last applied RPC
has an incomplete list of hosts in the aggregate.
The issue seems easier to trigger the more closely spaced in time the requests
are, such as doing it via openstacksdk and reusing the session and avoiding the
python startup time.
```
openstack hypervisor list -c "Hypervisor Hostname" -f value \
| xargs -I {} -P 10 -n 1 \
openstack aggregate add host test_agg -c hosts -f value {}
```
This will show responses like the following:
```
['devstack8']
['devstack8', 'devstack1']
['devstack8', 'devstack1', 'devstack2']
['devstack8', 'devstack3']
['devstack8', 'devstack1', 'devstack7']
['devstack8', 'devstack4']
['devstack8', 'devstack6']
['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack10']
['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack2', 'devstack6',
'devstack9']
['devstack8', 'devstack1', 'devstack3', 'devstack4', 'devstack2', 'devstack6',
'devstack5']
```
At this point, viewing the aggregate info directly does show the correct
memebership
```
$ openstack aggregate show test_agg --max-width=80
+-------------------+----------------------------------------------------------+
| Field | Value |
+-------------------+----------------------------------------------------------+
| availability_zone | None |
| created_at | 2024-04-25T15:43:45.000000 |
| deleted_at | None |
| hosts | devstack1, devstack10, devstack2, devstack3, devstack4, |
| | devstack5, devstack6, devstack7, devstack8, devstack9 |
| id | 1 |
| is_deleted | False |
| name | test_agg |
| properties | test='true' |
| updated_at | None |
| uuid | 6700b896-34fb-4e49-9057-e1d40ce185ec |
+-------------------+----------------------------------------------------------+
```
If the extra logging was applied, we will now see the following in the nova
scheduler debug logs:
```
...
Apr 25 15:48:01 devstack nova-scheduler[172360]: DEBUG
nova.scheduler.host_manager [None req-37a6f8bd-f1b1-4fb9-af2a-b0f66aff54cf
admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3',
'devstack4', 'devstack2', 'devstack6', 'devstack5'] {{(pid=172360)
_update_aggregate /opt/stack/nova/nova/scheduler/host_manager.py:396}}
Apr 25 15:48:01 devstack nova-scheduler[172320]: DEBUG
nova.scheduler.host_manager [None req-3126c7d9-b7f3-4408-aaec-800a78236bb6
admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3',
'devstack4', 'devstack2', 'devstack6', 'devstack9'] {{(pid=172320)
_update_aggregate /opt/stack/nova/nova/scheduler/host_manager.py:396}}
Apr 25 15:48:01 devstack nova-scheduler[172316]: DEBUG
nova.scheduler.host_manager [None req-37a6f8bd-f1b1-4fb9-af2a-b0f66aff54cf
admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3',
'devstack4', 'devstack2', 'devstack6', 'devstack5'] {{(pid=172316)
_update_aggregate /opt/stack/nova/nova/scheduler/host_manager.py:396}}
Apr 25 15:48:01 devstack nova-scheduler[172326]: DEBUG
nova.scheduler.host_manager [None req-6d438ed6-e35d-48c3-b618-d3c62de50ac0
admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3',
'devstack4', 'devstack10'] {{(pid=172326) _update_aggregate
/opt/stack/nova/nova/scheduler/host_manager.py:396}}
Apr 25 15:48:01 devstack nova-scheduler[172273]: DEBUG
nova.scheduler.host_manager [None req-6d438ed6-e35d-48c3-b618-d3c62de50ac0
admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3',
'devstack4', 'devstack10'] {{(pid=172273) _update_aggregate
/opt/stack/nova/nova/scheduler/host_manager.py:396}}
Apr 25 15:48:01 devstack nova-scheduler[172360]: DEBUG
nova.scheduler.host_manager [None req-3126c7d9-b7f3-4408-aaec-800a78236bb6
admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3',
'devstack4', 'devstack2', 'devstack6', 'devstack9'] {{(pid=172360)
_update_aggregate /opt/stack/nova/nova/scheduler/host_manager.py:396}}
Apr 25 15:48:01 devstack nova-scheduler[172320]: DEBUG
nova.scheduler.host_manager [None req-37a6f8bd-f1b1-4fb9-af2a-b0f66aff54cf
admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3',
'devstack4', 'devstack2', 'devstack6', 'devstack5'] {{(pid=172320)
_update_aggregate /opt/stack/nova/nova/scheduler/host_manager.py:396}}
Apr 25 15:48:01 devstack nova-scheduler[172273]: DEBUG
nova.scheduler.host_manager [None req-3126c7d9-b7f3-4408-aaec-800a78236bb6
admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3',
'devstack4', 'devstack2', 'devstack6', 'devstack9'] {{(pid=172273)
_update_aggregate /opt/stack/nova/nova/scheduler/host_manager.py:396}}
Apr 25 15:48:01 devstack nova-scheduler[172326]: DEBUG
nova.scheduler.host_manager [None req-3126c7d9-b7f3-4408-aaec-800a78236bb6
admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3',
'devstack4', 'devstack2', 'devstack6', 'devstack9'] {{(pid=172326)
_update_aggregate /opt/stack/nova/nova/scheduler/host_manager.py:396}}
Apr 25 15:48:01 devstack nova-scheduler[172320]: DEBUG
nova.scheduler.host_manager [None req-6d438ed6-e35d-48c3-b618-d3c62de50ac0
admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3',
'devstack4', 'devstack10'] {{(pid=172320) _update_aggregate
/opt/stack/nova/nova/scheduler/host_manager.py:396}}
Apr 25 15:48:01 devstack nova-scheduler[172273]: DEBUG
nova.scheduler.host_manager [None req-37a6f8bd-f1b1-4fb9-af2a-b0f66aff54cf
admin admin] update for 1 called with ['devstack8', 'devstack1', 'devstack3',
'devstack4', 'devstack2', 'devstack6', 'devstack5'] {{(pid=172273)
_update_aggregate /opt/stack/nova/nova/scheduler/host_manager.py:396}}
```
and if we now schedule some instances, we'll see log entries indicating that
the host_state is still inconsistent.
```
openstack server create \
--image cirros-0.6.2-x86_64-disk \
--network private \
--min=10 --max=10 \
--flavor test_flavor \
instance1
```
```
Apr 25 15:50:56 devstack nova-scheduler[172268]: DEBUG nova.filters [None
req-4afd37c0-ec15-4aae-8f2b-a5ae7920aac8 admin admin] Filter
AggregateInstanceExtraSpecsFilter returned 7 host(s) {{(pid=172268)
get_filtered_objects /opt/stack/nova/nova/filters.py:102}}
```
There's still something I'm not understanding about the nova_fake driver and/or
the AggregateInstanceExtraSpecsFilter, as all 10 hosts still become "active",
even though the filter is excluding 3 of them.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1542491
Title:
Scheduler update_aggregates race causes incorrect aggregate
information
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1542491/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs