[Yahoo-eng-team] [Bug 2043707] Re: Cannot separately enable cpu_power_management and cpu pinning
Reviewed: https://review.opendev.org/c/openstack/nova/+/901188 Committed: https://opendev.org/openstack/nova/commit/b1a0aee1abca0ed61c156dd99544adeaebaf0960 Submitter: "Zuul (22348)" Branch:master commit b1a0aee1abca0ed61c156dd99544adeaebaf0960 Author: Balazs Gibizer Date: Thu Nov 16 18:01:29 2023 +0100 Allow enabling cpu_power_management with 0 dedicated CPUs The CPU power management feature of the libvirt driver, enabled with [libvirt]cpu_power_management, only manages dedicated CPUs and does not touch share CPUs. Today nova-compute refuses to start if configured with [libvirt]cpu_power_management=true [compute]cpu_dedicated_set=None. While this is functionally not limiting it does limit the possibility to independently enable the power management and define the cpu_dedicated_set. E.g. there might be a need to enable the former in the whole cloud in a single step, while not all nodes of the cloud will have dedicated CPUs configured. This patch removes the strict config check. The implementation already handles each PCPU individually, so if there are an empty list of PCPUs then it does nothing. Closes-Bug: #2043707 Change-Id: Ib070e1042c0526f5875e34fa4f0d569590ec2514 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2043707 Title: Cannot separately enable cpu_power_management and cpu pinning Status in OpenStack Compute (nova): Fix Released Bug description: If [libvirt]cpu_power_management is set to true but [compute]cpu_dedicated_set is empty nova-compute is fails to start with: 2023-11-16 10:42:42.444 2 ERROR oslo_service.service [None req-56dbf76c-524c-455d-9c64-d3474509e8d0 - - - - - -] Error starting thread.: nova.exception.InvalidConfiguration: '[compute]/cpu_dedicated_set' is mandatory to be set if '[libvirt]/cpu_power_management' is set.Please provide the CPUs that can be pinned or don't use the power management if you only use shared CPUs. 2023-11-16 10:42:42.444 2 ERROR oslo_service.service Traceback (most recent call last): 2023-11-16 10:42:42.444 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/oslo_service/service.py", line 806, in run_service 2023-11-16 10:42:42.444 2 ERROR oslo_service.service service.start() 2023-11-16 10:42:42.444 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/nova/service.py", line 162, in start 2023-11-16 10:42:42.444 2 ERROR oslo_service.service self.manager.init_host(self.service_ref) 2023-11-16 10:42:42.444 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 1608, in init_host 2023-11-16 10:42:42.444 2 ERROR oslo_service.service self.driver.init_host(host=self.host) 2023-11-16 10:42:42.444 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 831, in init_host 2023-11-16 10:42:42.444 2 ERROR oslo_service.service libvirt_cpu.power_down_all_dedicated_cpus() 2023-11-16 10:42:42.444 2 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/cpu/api.py", line 122, in power_down_all_dedicated_cpus 2023-11-16 10:42:42.444 2 ERROR oslo_service.service raise exception.InvalidConfiguration(msg) 2023-11-16 10:42:42.444 2 ERROR oslo_service.service nova.exception.InvalidConfiguration: '[compute]/cpu_dedicated_set' is mandatory to be set if '[libvirt]/cpu_power_management' is set.Please provide the CPUs that can be pinned or don't use the power management if you only use shared CPUs. This is not a functional bug. But it is a UX bug. I would like to independently enable the CPU power management feature from configuring pinned CPU cores even if it means the no CPU cores is power managed while cpu_dedicated_set is empty. Imagine a deployment engine that would like to enable cpu_power_management automatically by default. But it cannot defined the list of pinned CPU cores at the same time as that his hypervisor HW dependent. The current strict validation prevents enabling cpu_power_management before defining the list of PCPUs. The actual power management logic can gracefully handle the case when zero PCPUs are defined simply by managing all the PCPUs i.e. managing no PCPUs in this case. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2043707/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2044235] [NEW] nova-condutor put instance in error during live-migration due to remote error MessagingTimeout
Public bug reported: Description === Nova-conductor put instance in error if the exception is not know in _build_live_migrate_task during the live-migration. [1] The exception come from _call_livem_checks_on_host and we can see raise exception.MigrationPreCheckError if we facing to messaging.MessagingTimeout. [2] The function check_can_live_migrate_destination do a check also on souce host with check_can_live_migrate_source [3] and this check can also return MessagingTimeout and this one is not catch properly because it's a remote "Remote error: MessagingTimeout" due to dest host try to contact source host and this source host not reply. [1] https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L523 [2] https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L363 [3] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L8546 Steps to reproduce == # Deploy devstack multinode # Create an instance openstack server create --image a3cf22ec-3e24-404c-83cd-47a95874e164 --flavor m1.small --network dd824883-17b8-4ecd-881d-6b3cbd758bb6 test-check_can_live_migrate_source-on-dest-node # In the dest node add in check_can_live_migrate_source (nova/compute/rpcapi.py) a sleep to have time to stop nova-compute on the source node % git diff nova/compute/rpcapi.py diff --git a/nova/compute/rpcapi.py b/nova/compute/rpcapi.py index b58004c6e6..00ca0bd109 100644 --- a/nova/compute/rpcapi.py +++ b/nova/compute/rpcapi.py @@ -608,6 +608,8 @@ class ComputeAPI(object): client = self.router.client(ctxt) source = _compute_host(None, instance) cctxt = client.prepare(server=source, version=version) +import time +time.sleep(600) return cctxt.call(ctxt, 'check_can_live_migrate_source', instance=instance, dest_check_data=dest_check_data) # Stop nova-compute and waiting # After few minutes instance go to error state # We can found in nova super conductor this log error: Nov 21 16:40:58 devstack2-multi-node-1-cp nova-conductor[143072]: ERROR nova.conductor.manager Nov 21 16:40:58 devstack2-multi-node-1-cp nova-conductor[143072]: WARNING nova.scheduler.utils [None req-8795982e-8a37-4d87-9695-806039a3d89b admin admin] [instance: 4969fe65-11ec-495f-a036-386f83d404b0] Setting instance to ERROR state.: oslo_messaging.rpc.client.RemoteError: Remote error: MessagingTimeout Timed out waiting for a reply to message ID c685b202642c469eac1dc06ac187a49c Nov 21 16:40:58 devstack2-multi-node-1-cp nova-conductor[143072]: ERROR oslo_messaging.rpc.server [None req-8795982e-8a37-4d87-9695-806039a3d89b admin admin] Exception during message handling: nova.exception.MigrationError: Migration error: Remote error: MessagingTimeout Timed out waiting for a reply to message ID c685b202642c469eac1dc06ac187a49c Nov 21 16:40:58 devstack2-multi-node-1-cp nova-conductor[143072]: ERROR oslo_messaging.rpc.server Traceback (most recent call last): Nov 21 16:40:58 devstack2-multi-node-1-cp nova-conductor[143072]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/manager.py", line 505, in _live_migrate Nov 21 16:40:58 devstack2-multi-node-1-cp nova-conductor[143072]: ERROR oslo_messaging.rpc. server task.execute() Nov 21 16:40:58 devstack2-multi-node-1-cp nova-conductor[143072]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/tasks/base.py", line 25, in wrap Nov 21 16:40:58 devstack2-multi-node-1-cp nova-conductor[143072]: ERROR oslo_messaging.rpc.server with excutils.save_and_reraise_exception(): Nov 21 16:40:58 devstack2-multi-node-1-cp nova-conductor[143072]: ERROR oslo_messaging.rpc.server File "/opt/stack/data/venv/lib/python3.10/site-packages/oslo
[Yahoo-eng-team] [Bug 2044215] [NEW] Designate in openstack kolla ansible latest version has issues with dns-integration-domain-keywords. Keyword is replaced by project_name instead of project_id even
You have been subscribed to a public bug: Designate in openstack kolla ansible latest version has issues with dns- integration-domain-keywords. Keyword is replaced by project_name instead of project_id even when it's written project_id as the keyword. I have keycloak SSO integration enabled in OpenStack and configured the user email_id as project_name. In such a situation, email id is being added in the A records. For example, test.myem...@gmail.com.xyz.com! This should not happen ever! ** Affects: neutron Importance: Undecided Status: New -- Designate in openstack kolla ansible latest version has issues with dns-integration-domain-keywords. Keyword is replaced by project_name instead of project_id even when it's written project_id as the keyword https://bugs.launchpad.net/bugs/2044215 You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2044215] Re: Designate in openstack kolla ansible latest version has issues with dns-integration-domain-keywords. Keyword is replaced by project_name instead of project_id even w
>From the description this affects the dns-integration in neutron, this is independent of designate. ** Project changed: designate => neutron ** Tags added: dns ** Summary changed: - Designate in openstack kolla ansible latest version has issues with dns-integration-domain-keywords. Keyword is replaced by project_name instead of project_id even when it's written project_id as the keyword + dns: Keyword is replaced by project_name instead of project_id -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2044215 Title: dns: Keyword is replaced by project_name instead of project_id Status in neutron: New Bug description: Designate in openstack kolla ansible latest version has issues with dns-integration-domain-keywords. Keyword is replaced by project_name instead of project_id even when it's written project_id as the keyword. I have keycloak SSO integration enabled in OpenStack and configured the user email_id as project_name. In such a situation, email id is being added in the A records. For example, test.myem...@gmail.com.xyz.com! This should not happen ever! To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2044215/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2040172] Re: [OVN] OvnSbSynchronizer - clean/delete segmenthostmappings for unrelated hosts
Reviewed: https://review.opendev.org/c/openstack/neutron/+/899077 Committed: https://opendev.org/openstack/neutron/commit/3aafeefc8553fd637bad238ee236b1767d8548ea Submitter: "Zuul (22348)" Branch:master commit 3aafeefc8553fd637bad238ee236b1767d8548ea Author: Harald Jensås Date: Mon Oct 23 17:29:00 2023 +0200 [OVN] DB sync host/physnet - filter on agent_type When syncing hostname and physical networks, filter neutron hosts on agent_type. Only segmenthostmappings for hosts with agent 'OVN Controller agent' should be cleaned up. Since change: I935186b6ee95f0cae8dc05869d9742c8fb3353c3 there is de-duplication of segmenthostmapping updates from agents. If the OVN DB sync clears/deletes mappings for hosts owned by other agents/plugins the mappings are never re-created. Closes-Bug: #2040172 Change-Id: Iaf15e560e1b1ec31618b2ebc6206a938463c1094 Signed-off-by: Harald Jensås ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2040172 Title: [OVN] OvnSbSynchronizer - clean/delete segmenthostmappings for unrelated hosts Status in neutron: Fix Released Bug description: neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovn_db_sync.OvnSbSynchronizer When the `sync_hostname_and_physical_networks` method run's it will compare host physical_networks to mappings in neutrons `segmenthostmappings` table. Any mappings in neutron that is not seen on OVN chassis is then treated as "stale" and cleaned up. This is problematic when there are other plug-ins/agents involved, for example ML2 networking-baremetal[2]. Any segment-host mapping created for baremetal nodes are deleted from the database. And since there is de-duplication[1] on updates from agents - the segment-host mappings are not re-created unless services are re-started or the baremetal node is deleted and re-created in the ironic service. The OvnSbSynchronizer should not remove mappings unless they belong to OVN hosts. [1] https://opendev.org/openstack/neutron/commit/176503e610aee16cb5799a77466579bc55129450 [2] https://opendev.org/openstack/networking-baremetal/src/branch/master/networking_baremetal/agent/ironic_neutron_agent.py To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2040172/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2044171] Re: External shared networks may not be seen by other projects
** Changed in: neutron Status: In Progress => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2044171 Title: External shared networks may not be seen by other projects Status in neutron: Invalid Bug description: External shared networks each create its own RBAC entry. If there is a project that access the network through the shared attribute then it may not work. It depends on how mysql returns the records, then using GROUP BY clause it will use the first returned - meaning that if access_as_external is the first record returned, the network will be not treated as shared as it won't match here: https://opendev.org/openstack/neutron/src/commit/cbca72195ae5976d6f8b10bbbd58bde3542956bf/neutron/pecan_wsgi/hooks/ownership_validation.py#L45 This is a regression caused by https://review.opendev.org/c/openstack/neutron- lib/+/884878/1/neutron_lib/db/model_query.py To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2044171/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2044272] [NEW] Inconsistent IGMP configuration across drivers
Public bug reported: Currently there's only one configuration available for IGMP in Neutron: [ovs]/igmp_snooping_enable. By enabling this we will get different behaviors on ML2/OVN and ML2/OVS because the rest of the igmp configuration: "mcast-snooping-flood", "mcast-snooping-flood-reports" and "mcast-snooping-disable-flood- unregistered" are hard coded with different values in both drivers. For example, in the help string for the [ovs]/igmp_snooping_enable it says [0]: """ ... Setting this option to True will also enable the Open vSwitch mcast-snooping-disable-flood-unregistered flag... """ But that's only true for ML2/OVN nowadays where it was changed in 2020 [1] to match the behavior of ML2/OVS. But, in 2021, ML2/OVS changed this behavior again [2] and now this has caused another issue with one of our customers. Right now, ML2/OVN will disable the flooding to unregistered ports and ML2/OVS will enable it. This back and forth changing IGMP values is not new [3], this patch for example disables the "mcast-snooping-flood-reports" in ML2/OVN where it was hard coded as enabled before. The fact is that, since Neutron exposes only one configuration for IGMP but the backend offers a total of 4 config options we will never get it right. There will always be a use case that will have problems with these hard coded settings and we will have to keep changing it indefinitely. This LP is proposing making a definitive and final change for IGMP in Neutron by exposing all these knobs to the operators via config options. I know in OpenStack nowadays we strive to have fewer configuration options where possible but, I think this is one case where this should not be applicable because of the many ways multicast can be configured on each deployment. As part of this work tho, we will have to change the defaults of one of the drivers to make them consistent again and I would argue, given the help string for igmp_snooping_enable, that everything should be disabled by default. [0] https://github.com/openstack/neutron/blob/2be4343756863f252c8289e2ca3e7afe71f566c4/neutron/conf/agent/ovs_conf.py#L41-L46 [1] https://review.opendev.org/c/openstack/neutron/+/762818 [2] https://review.opendev.org/c/openstack/neutron/+/766360 [3] https://review.opendev.org/c/openstack/neutron/+/888127 ** Affects: neutron Importance: High Assignee: Lucas Alvares Gomes (lucasagomes) Status: New ** Tags: ovn ovs ** Changed in: neutron Importance: Undecided => High ** Description changed: Currently there's only one configuration available for IGMP in Neutron: [ovs]/igmp_snooping_enable. By enabling this we will get different behaviors on ML2/OVN and ML2/OVS because the rest of the igmp configuration: "mcast-snooping-flood", "mcast-snooping-flood-reports" and "mcast-snooping-disable-flood- unregistered" are hard coded with different values in both drivers. For example, in the help string for the [ovs]/igmp_snooping_enable it says [0]: """ ... Setting this option to True will also enable the Open vSwitch mcast-snooping-disable-flood-unregistered flag... """ But that's only true for ML2/OVN nowadays where it was changed in 2020 [1] to match the behavior of ML2/OVS. But, in 2021, ML2/OVS changed this behavior again [2] and now this has caused another issue with one of our customers. - Right now, ML2/OVN will disable the flooding to unregistered nodes and + Right now, ML2/OVN will disable the flooding to unregistered ports and ML2/OVS will enable it. This back and forth changing IGMP values is not new [3], this patch for example disables the "mcast-snooping-flood-reports" in ML2/OVN where it was hard coded as enabled before. The fact is that, since Neutron exposes only one configuration for IGMP but the backend offers a total of 4 config options we will never get it right. There will always be a use case that will have problems with these hard coded settings and we will have to keep changing it indefinitely. This LP is proposing making a definitive and final change for IGMP in Neutron by exposing all these knobs to the operators via config options. I know in OpenStack nowadays we strive to have fewer configuration options where possible but, I think this is one case where this should not be applicable because of the many ways multicast can be configured on each deployment. As part of this work tho, we will have to change the defaults of one of the drivers to make them consistent again and I would argue, given the help string for igmp_snooping_enable, that everything should be disabled by default. [0] https://github.com/openstack/neutron/blob/2be4343756863f252c8289e2ca3e7afe71f566c4/neutron/conf/agent/ovs_conf.py#L41-L46 [1] https://review.opendev.org/c/openstack/neutron/+/762818 [2] https://review.opendev.org/c/openstack/neutron/+/766360 [3] https://review.opendev.org/c/openstack/neutron/+/888127 -- You receiv
[Yahoo-eng-team] [Bug 2044331] [NEW] Allocation audit error during live migration
Public bug reported: Description === In live migration, creates destination node allocation which can get accidentally deleted by audit. Steps to reproduce == ()[root@busybox-openstack-db97db44f-vddg8 /]# openstack server list --all-project --long +--+-+++-+-++--+---+---++ | ID | Name| Status | Task State | Power State | Networks| Image Name | Image ID | Availability Zone | Host | Properties | +--+-+++-+-++--+---+---++ | 31af6b71-df56-4f56-87fb-64d75d321285 | test-vm | ACTIVE | None | Running | share_net=192.168.111.9 || | default-az | node-6.domain.tld || +--+-+++-+-++--+---+---++ ()[root@busybox-openstack-db97db44f-vddg8 /]# openstack compute service list --service nova-compute +-+--+---++-+---++ | ID | Binary | Host | Zone | Status | State | Updated At | +-+--+---++-+---++ | 133 | nova-compute | node-3.domain.tld | default-az | enabled | up| 2023-11-23T06:22:15.00 | | 148 | nova-compute | node-1.domain.tld | ddd| enabled | up| 2023-11-23T06:22:15.00 | | 151 | nova-compute | node-2.domain.tld | ddd| enabled | up| 2023-11-23T06:22:15.00 | | 226 | nova-compute | node-6.domain.tld | default-az | enabled | up| 2023-11-23T06:22:15.00 | | 587 | nova-compute | node-7.domain.tld | ddd| enabled | up| 2023-11-23T06:22:15.00 | +-+--+---++-+---++ ()[root@busybox-openstack-db97db44f-vddg8 /]# openstack server migrate 31af6b71-df56-4f56-87fb-64d75d321285 --live node-3.domain.tld ()[root@nova-maintenance-77f7cf548f-p6rrv /]# nova-manage placement audit --verbose --delete Deprecated: Option "notification_format" from group "DEFAULT" is deprecated. Use option "notification_format" from group "notifications". Allocations were set against consumer UUID 31af6b71-df56-4f56-87fb-64d75d321285 but no existing instances or active migrations are related. 2023-11-23 14:23:49.356 144 INFO nova.scheduler.client.report [req-99bf4509-367b-480c-97d9-0c99676a93be - - - - -] Deleted allocation for unknown 31af6b71-df56-4f56-87fb-64d75d321285 Deleted allocations for consumer UUID 31af6b71-df56-4f56-87fb-64d75d321285 on Resource Provider 301d0960-9bc4-4a88-9860-6286b525: {'MEMORY_MB': 1024, 'VCPU': 1} Processed 1 allocation. Expected result === `nova-manage placement audit` can detect active migrations and skip them. Actual result = `nova-manage placement audit` considers the allocation created on the target node during live migration as an orphaned allocation and will delete it.(The reason for this is that the nova database currently lists the original node as the host for this virtual machine instance. This record will be updated to reflect the target node only upon the completion of the live migration process.) Environment === 1. Exact version of OpenStack you are running. See the following Openstack Wallaby 2. Which hypervisor did you use? Libvirt + KVM 2. Which storage type did you use? Ceph 3. Which networking type did you use? Neutron ** Affects: nova Importance: Undecided Assignee: Haidong Pang (haidong-pang) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2044331 Title: Allocation audit error during live migration Status in OpenStack Compute (nova): In Progress Bug description: Description === In live migration, creates destination node allocation which can get accidentally deleted by audit. Steps to reproduce == ()[root@busybox-openstack-db97db44f-vddg8 /]# openstack server list --all-project --long +--+-+++-+-++--+---+---++ | ID | Name| Status | Task State | Power State | Networks| Image Name | Image ID | Availability Zone | Host