[Yahoo-eng-team] [Bug 2053061] Re: Unexpected API Error.
As said by the exception, your certificates are invalid. Please double-check your configs. This isn't a Nova bug, rather a configuration problem, closing the report. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2053061 Title: Unexpected API Error. Status in OpenStack Compute (nova): Invalid Bug description: 2024-02-13 20:01:27.785 2124 ERROR nova.api.openstack.wsgi keystoneauth1.exceptions.connection.SSLError: SSL exception connecting to https://controller:9292/v2/images/7b393dfb-e35d-4915-9da0-f532fecc349c: HTTPSConnectionPool(host='controller', port=9292): Max retries exceeded with url: /v2/images/7b393dfb-e35d-4915-9da0-f532fecc349c (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)) 2024-02-13 20:01:27.785 2124 ERROR nova.api.openstack.wsgi 2024-02-13 20:01:27.794 2124 INFO nova.api.openstack.wsgi [req-7cd9c353-06b6-45c4-b294-5994b63094c0 a694ed41a63240a982b3110fde0248be 6e523b33312a4ac79590df56f886ceb2 - default default] HTTP exception thrown: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible. 2024-02-13 20:01:27.795 2124 INFO nova.osapi_compute.wsgi.server [req-7cd9c353-06b6-45c4-b294-5994b63094c0 a694ed41a63240a982b3110fde0248be 6e523b33312a4ac79590df56f886ceb2 - default default] 127.0.0.1 "POST /v2.1/6e523b33312a4ac79590df56f886ceb2/servers HTTP/1.1" status: 500 len: 651 time: 0.0623181 [root@controller nova(keystone)]# cat nova-api.log | grep SSLError 2024-02-13 19:29:52.080 24184 ERROR nova.api.openstack.wsgi [req-18342d99-d0c7-436b-a6c8-1c6fa8966ae9 a694ed41a63240a982b3110fde0248be 6e523b33312a4ac79590df56f886ceb2 - default default] Unexpected exception in API method: keystoneauth1.exceptions.connection.SSLError: SSL exception connecting to https://controller:9292/v2/images/7b393dfb-e35d-4915-9da0-f532fecc349c: HTTPSConnectionPool(host='controller', port=9292): Max retries exceeded with url: /v2/images/7b393dfb-e35d-4915-9da0-f532fecc349c (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)) 2024-02-13 19:29:52.080 24184 ERROR nova.api.openstack.wsgi ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897) 2024-02-13 19:29:52.080 24184 ERROR nova.api.openstack.wsgi urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='controller', port=9292): Max retries exceeded with url: /v2/images/7b393dfb-e35d-4915-9da0-f532fecc349c (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)) 2024-02-13 19:29:52.080 24184 ERROR nova.api.openstack.wsgi raise SSLError(e, request=request) 2024-02-13 19:29:52.080 24184 ERROR nova.api.openstack.wsgi requests.exceptions.SSLError: HTTPSConnectionPool(host='controller', port=9292): Max retries exceeded with url: /v2/images/7b393dfb-e35d-4915-9da0-f532fecc349c (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)) 2024-02-13 19:29:52.080 24184 ERROR nova.api.openstack.wsgi raise exceptions.SSLError(msg) 2024-02-13 19:29:52.080 24184 ERROR nova.api.openstack.wsgi keystoneauth1.exceptions.connection.SSLError: SSL exception connecting to https://controller:9292/v2/images/7b393dfb-e35d-4915-9da0-f532fecc349c: HTTPSConnectionPool(host='controller', port=9292): Max retries exceeded with url: /v2/images/7b393dfb-e35d-4915-9da0-f532fecc349c (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)) 2024-02-13 19:32:24.553 24184 ERROR nova.api.openstack.wsgi [req-9c284a0d-7109-4c9b-a74b-c653e030a6b6 a694ed41a63240a982b3110fde0248be 6e523b33312a4ac79590df56f886ceb2 - default default] Unexpected exception in API method: keystoneauth1.exceptions.connection.SSLError: SSL exception connecting to https://controller:9292/v2/images/7b393dfb-e35d-4915-9da0-f532fecc349c: HTTPSConnectionPool(host='controller', port=9292): Max retries exceeded with url: /v2/images/7b393dfb-e35d-4915-9da0-f532fecc349c (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)) 2024-02-13 19:32:24.553 24184 ERROR nova.api.openstack.wsgi ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897) 2024-02-13 19:32:24.553 24184 ERROR nova.api.openstack.wsgi urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='controller', port=9292): Max retries exceeded with url: /v2/images/7b393dfb-e35d-4915-9da0-f532fecc349c (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)) 2024-02-13 19:32:24.553 24184 ERROR nova.api.openstack.wsgi
[Yahoo-eng-team] [Bug 2054329] Re: orphan allocations cause orphan resource providers and prevents compute service deletion
This is a known issue that we recently fixed by ensuring that you can't change the hostname silently : https://specs.openstack.org/openstack/nova- specs/specs/2023.1/implemented/stable-compute-uuid.html That series won't be backported to Zed so I'd recommend you to upgrade to Antelope. In the meantime, you can do some janitory on the orphaned resources by using the 'nova-manage placement audit' command which will tell you which placement resources are zombies. ** Changed in: nova Status: New => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2054329 Title: orphan allocations cause orphan resource providers and prevents compute service deletion Status in OpenStack Compute (nova): Won't Fix Bug description: Description === It can happen, that there are orphan allocations against a resource provider. E.g. when something went wrong during a migration. During the deletion of a nova-compute-service, the nova-api tries to delete the resource-provider in placement aswell. When the resource provider has still allocations against it, the deletion of the resource-provider will fail but the deletion of the nova-compute-service will be successfull. This causes orphan resource-providers. This is based on the try-catch around the deletion of the resource-provider: https://opendev.org/openstack/nova/src/commit/6e510eb62e00c34e98a5245a6de2dd2955ffb57a/nova/api/openstack/compute/services.py#L321 If a new nova-compute-service with the same hostname gets created, it will not create a new resource provider as there is already one with the correct hostname. This causes a mismatch between the ID of the nova-compute-service and the ID of the resource-provider. If you now try to delete the new nova-compute-service, it will generate an 'ValueError', due to this mismatch. This also happens for all other requests to placement, where the resource_provider is referenced via the UUID instead of the name. Steps to reproduce == 1. Generate orphaned allocations on a resource provider Can be done by generating a random allocation: ``` openstack resource provider allocation set --allocation="rp=,VCPU=2" --project-id --user-id ``` 2. Delete the nova-compute-service via the nova-api 3. Restart the nova-compute service, so a new nova-compute-service is created 4. You will start to see erros in the logs of placement/nova-api, regarding not finding the resource provider with the old UUID 5. Delete the nova-compute-service via the nova-api, this will generate a 500 error and the nova-compute-service is not deleted. Expected result === No erros in the logs regarding not finding a resource-provider based on its ID. The deletion of the recreated nova-compute-service should be succesfull. Actual result = We see erros in the log regarding not finding the resource provider: ``` An error occurred while updating COMPUTE_STATUS_DISABLED trait on compute node resource provider d5d7cf1c-51ea-4139-9fc3-6007ba58441e. The trait will be synchronized when the update_available_resource periodic task runs. Error: Failed to get traits for resource provider with UUID d5d7cf1c-51ea-4139-9fc3-6007ba58441e ``` We are not able to delete the newly created nova-compute-service, due to a ValueError as it is not able to find the resource-provider based on the nova-compute-service UUID. Environment === We are running Openstack Zed, but based on the Code the issue should be still present on the master branch. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2054329/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2055419] Re: network autoallocation fails for non-admin user
Seems to me a neutron issue, moving the bug report to the proper project. ** Also affects: neutron Importance: Undecided Status: New ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2055419 Title: network autoallocation fails for non-admin user Status in neutron: New Status in OpenStack Compute (nova): Invalid Bug description: Description === Automatic allocation of network topologies (https://docs.openstack.org/neutron/latest/admin/config-auto-allocation.html) causes unexpected API error when requested by user without admin role. Tempest test affected: tempest.api.compute.admin.test_auto_allocate_network.AutoAllocateNetworkTest.test_server_multi_create_auto_allocate is failing. Steps to reproduce == * request server creation with network autoallocation as user without admin role: $ openstack --os-compute-api-version 2.37 server create --flavor --image --nic auto vm1 Expected result === Forbidden response (if i understand documentation correctly) or creation of network and router (if it is allowed). Actual result = Unexpected API Error. ERROR nova.api.openstack.wsgi [None req- - - default default] Unexpected exception in API method: neutronclient.common.exceptions.NotFound: The resource could not be found. Neutron server returns request_ids: ['req-'] ERROR nova.api.openstack.wsgi Traceback (most recent call last): ERROR nova.api.openstack.wsgi File "/var/lib/kolla/venv/lib/python3.10/site-packages/nova/api/openstack/wsgi.py", line 658, in wrapped ERROR nova.api.openstack.wsgi return f(*args, **kwargs) ERROR nova.api.openstack.wsgi File "/var/lib/kolla/venv/lib/python3.10/site-packages/nova/api/validation/__init__.py", line 110, in wrapper ERROR nova.api.openstack.wsgi return func(*args, **kwargs) ERROR nova.api.openstack.wsgi File "/var/lib/kolla/venv/lib/python3.10/site-packages/nova/api/validation/__init__.py", line 110, in wrapper ERROR nova.api.openstack.wsgi return func(*args, **kwargs) ERROR nova.api.openstack.wsgi File "/var/lib/kolla/venv/lib/python3.10/site-packages/nova/api/validation/__init__.py", line 110, in wrapper ERROR nova.api.openstack.wsgi return func(*args, **kwargs) ERROR nova.api.openstack.wsgi [Previous line repeated 11 more times] ERROR nova.api.openstack.wsgi File "/var/lib/kolla/venv/lib/python3.10/site-packages/nova/api/openstack/compute/servers.py", line 786, in create ERROR nova.api.openstack.wsgi instances, resv_id = self.compute_api.create( ERROR nova.api.openstack.wsgi File "/var/lib/kolla/venv/lib/python3.10/site-packages/nova/compute/api.py", line 2207, in create ERROR nova.api.openstack.wsgi return self._create_instance( ERROR nova.api.openstack.wsgi File "/var/lib/kolla/venv/lib/python3.10/site-packages/nova/compute/api.py", line 1683, in _create_instance ERROR nova.api.openstack.wsgi ) = self._validate_and_build_base_options( ERROR nova.api.openstack.wsgi File "/var/lib/kolla/venv/lib/python3.10/site-packages/nova/compute/api.py", line 1081, in _validate_and_build_base_options ERROR nova.api.openstack.wsgi max_network_count = self._check_requested_networks( ERROR nova.api.openstack.wsgi File "/var/lib/kolla/venv/lib/python3.10/site-packages/nova/compute/api.py", line 543, in _check_requested_networks ERROR nova.api.openstack.wsgi return self.network_api.validate_networks(context, requested_networks, ERROR nova.api.openstack.wsgi File "/var/lib/kolla/venv/lib/python3.10/site-packages/nova/network/neutron.py", line 2648, in validate_networks ERROR nova.api.openstack.wsgi ports_needed_per_instance = self._ports_needed_per_instance( ERROR nova.api.openstack.wsgi File "/var/lib/kolla/venv/lib/python3.10/site-packages/nova/network/neutron.py", line 2509, in _ports_needed_per_instance ERROR nova.api.openstack.wsgi if not self._can_auto_allocate_network(context, neutron): ERROR nova.api.openstack.wsgi File "/var/lib/kolla/venv/lib/python3.10/site-packages/nova/network/neutron.py", line 2438, in _can_auto_allocate_network ERROR nova.api.openstack.wsgi neutron.validate_auto_allocated_topology_requirements( ERROR nova.api.openstack.wsgi File "/var/lib/kolla/venv/lib/python3.10/site-packages/nova/network/neutron.py", line 196, in wrapper ERROR nova.api.openstack.wsgi ret = obj(*args, **kwargs) ERROR nova.api.openstack.wsgi File "/var/lib/kolla/venv/lib/python3.10/site-packages/debtcollector/renames.py", line 41, in decorator ERROR nova.api.openstack.wsgi return wrapped(*args, **kwargs) ERROR nova.api.openstack.wsgi File
[Yahoo-eng-team] [Bug 2058248] Re: Bugs in python files
The exception comes from OSC, moving the bug report to that project. ** Also affects: python-openstackclient Importance: Undecided Status: New ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2058248 Title: Bugs in python files Status in OpenStack Compute (nova): Invalid Status in python-openstackclient: New Bug description: Description === Python error that pops up during instance creation Steps to reproduce == I use demo scipt to launch an instance: $ . demo-openrc $ openstack server --debug create --flavor m1.nano --image cirros --nic net-id=698c77d5-49cb-47f2-8e26-766b2be3783d --security-group default --key-name mykey selfservice-instance Expected result === An instance created and show status like https://docs.openstack.org/install-guide/launch-instance-selfservice.html Actual result = An instance created but pop up Python error: Resource.get() takes 1 positional argument but 2 were given Traceback (most recent call last): File "/usr/lib/python3/dist-packages/cliff/app.py", line 410, in run_subcommand result = cmd.run(parsed_args) File "/usr/lib/python3/dist-packages/osc_lib/command/command.py", line 39, in run return super(Command, self).run(parsed_args) File "/usr/lib/python3/dist-packages/cliff/display.py", line 117, in run column_names, data = self.take_action(parsed_args) File "/usr/lib/python3/dist-packages/openstackclient/compute/v2/server.py", line 1964, in take_action details = _prep_server_detail(compute_client, image_client, server) File "/usr/lib/python3/dist-packages/openstackclient/compute/v2/server.py", line 147, in _prep_server_detail server = utils.find_resource(compute_client.servers, info['id']) File "/usr/lib/python3/dist-packages/osc_lib/utils/__init__.py", line 271, in find_resource if (resource.get('id') == name_or_id or TypeError: Resource.get() takes 1 positional argument but 2 were given Environment === 1. Openstack version is 2023.2 2. Hypervisor is Libvirt + KVM, storage type is LVM 3. Networking type is Neutron with OpenVSwitch To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2058248/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2052915] Re: "neutron-ovs-grenade-multinode" and "neutron-ovn-grenade-multinode" failing in 2023.1 and Zed
As discussed on the nova meeting, nova-grenade-multinode is no longer failing, so I'll close this bug report only for Nova. ** Changed in: nova Status: Confirmed => Invalid ** Changed in: nova Status: Invalid => Won't Fix ** Changed in: nova Importance: Critical => Low -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2052915 Title: "neutron-ovs-grenade-multinode" and "neutron-ovn-grenade-multinode" failing in 2023.1 and Zed Status in neutron: Triaged Status in OpenStack Compute (nova): Won't Fix Bug description: The issue seems to be in the neutron-lib version installed: 2024-02-07 16:19:35.155231 | compute1 | ERROR: neutron 21.2.1.dev38 has requirement neutron-lib>=3.1.0, but you'll have neutron-lib 2.20.2 which is incompatible. That leads to an error when starting the Neutron API (an API definition is not found) [1]: Feb 07 16:13:54.385467 np0036680724 neutron-server[67288]: ERROR neutron ImportError: cannot import name 'port_mac_address_override' from 'neutron_lib.api.definitions' (/usr/local/lib/python3.8/dist-packages/neutron_lib/api/definitions/__init__.py) Setting priority to Critical because that affects to the CI. [1]https://9faad8159db8d6994977-b587eccfce0a645f527dfcbc49e54bb4.ssl.cf2.rackcdn.com/891397/4/check/neutron- ovs-grenade-multinode/ba47cef/controller/logs/screen-q-svc.txt To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2052915/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2054404] Re: Self Signed Certs Cause Metadata cert errors seemingly
Doesn't look to me a Nova bug, maybe a Kolla one. Moving this report then. ** Also affects: kolla Importance: Undecided Status: New ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2054404 Title: Self Signed Certs Cause Metadata cert errors seemingly Status in kolla: New Status in OpenStack Compute (nova): Invalid Bug description: ==> /var/log/kolla/nova/nova-metadata-error.log <== 2024-02-18 00:58:15.029954 AH01909: tunninet-server-noel.ny5.lan.tunninet.com:8775:0 server certificate does NOT include an ID which matches the server name 2024-02-18 00:58:16.360069 AH01909: tunninet-server-noel.ny5.lan.tunninet.com:8775:0 server certificate does NOT include an ID which matches the server name I have no cert issues elsewhere, just this. What could cause this? it usually elsewhere has an IP and the fqdn as SAN's. How can i troubleshoot the root cause ? To manage notifications about this bug go to: https://bugs.launchpad.net/kolla/+bug/2054404/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2054409] Re: (HTTP 500) after upgrade
Are you sure you also upgraded Neutron ? Apparently, the Neutron metadata agent tries to call the neutron server by RPC but the server doesn't support the RPC version from the agent. Doesn't look to me a problem from Nova, so I'll close this bug report but please reopen it if you find some problem on Nova. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2054409 Title: (HTTP 500) after upgrade Status in OpenStack Compute (nova): Invalid Bug description: Description === After upgrading from Antelope to Bobcat, I'm unable to manage instances. Steps to reproduce == Following the upgrade guide https://docs.openstack.org/nova/2023.2/admin/upgrades.html * Pre-existing installation, Antelope 2023.1 * Upgraded the controller node to Bobcat 2023.2 * Ran 'nova-manage api_db sync' and 'nova-manage db sync' * Upgraded other services, 'nova-status upgrade check' reports everything as succesful. * 'nova service-list' does not report orphaned records * Ran 'nova-manage db online_data_migrations' Expected result === Upgrade successful, able to use the OpenStack instance Actual result = The last step does not complete succesfully. Even after running it multiple times, there still is a pending operation: # nova-manage db online_data_migrations Modules with known eventlet monkey patching issues were imported prior to eventlet monkey patching: urllib3. This warning can usually be ignored if the caller is only importing and not executing nova code. Running batches of 50 until complete 1 rows matched query populate_instance_compute_id, 0 migrated +-+--+---+ | Migration | Total Needed | Completed | +-+--+---+ | fill_virtual_interface_list | 0 | 0 | | migrate_empty_ratio | 0 | 0 | | migrate_quota_classes_to_api_db | 0 | 0 | |migrate_quota_limits_to_api_db | 0 | 0 | | migration_migrate_to_uuid | 0 | 0 | | populate_dev_uuids | 0 | 0 | | populate_instance_compute_id| 1 | 0 | | populate_missing_availability_zones | 0 | 0 | | populate_queued_for_delete | 0 | 0 | | populate_user_id | 0 | 0 | |populate_uuids | 0 | 0 | +-+--+---+ Listing instances on the dashboard now fails with the error: Error: Unable to retrieve instances. Details Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible. (HTTP 500) (Request-ID: req-2fa94a10-3168-41a5-8b0f-6d7499465ff1) Environment === 1. Exact version of OpenStack you are running? Bobcat 2023.2 2. Which hypervisor did you use? Compute nodes are running qemu-kvm 8.0.0 and libvirt 9.5.0 2. Which storage type did you use? Cinder has two backends, NFS and dcache 3. Which networking type did you use? Neutron with OpenVSwitch Logs & Configs == /var/log/nova/nova-api.log Trying to list instances: 2024-02-20 08:57:14.744 66460 ERROR nova.api.openstack.wsgi [None req-2fa94a10-3168-41a5-8b0f-6d7499465ff1 458ee6e3adf142048041c8d24fabeb85 c824412ae5904653a037e893827aa693 - - default default] Unexpected exception in API method: neutronclient.common.exceptions.InternalServerError: Request Failed: internal server error while processing your request. Neutron server returns request_ids: ['req-8894501a-070a-49be-8e46-e9bac7f1afb0'] 2024-02-20 08:57:14.744 66460 ERROR nova.api.openstack.wsgi Traceback (most recent call last): 2024-02-20 08:57:14.744 66460 ERROR nova.api.openstack.wsgi File "/usr/lib/python3.9/site-packages/nova/api/openstack/wsgi.py", line 658, in wrapped 2024-02-20 08:57:14.744 66460 ERROR nova.api.openstack.wsgi return f(*args, **kwargs) 2024-02-20 08:57:14.744 66460 ERROR nova.api.openstack.wsgi File "/usr/lib/python3.9/site-packages/nova/api/validation/__init__.py", line 192, in wrapper 2024-02-20 08:57:14.744 66460 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2024-02-20 08:57:14.744 66460 ERROR nova.api.openstack.wsgi File "/usr/lib/python3.9/site-packages/nova/api/validation/__init__.py", line 192, in wrapper 2024-02-20 08:57:14.744 66460 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2024-02-20 08:57:14.744 66460 ERROR nova.api.openstack.wsgi File
[Yahoo-eng-team] [Bug 2054502] Re: shutdowning rabbitmq causes nova-compute.service down
This isn't a Nova bug, maybe some oslo.messaging problem, but anyway, as the nova-compute service will be off, then the servicegroup API wouldn't accept it for the scheduler, so this shouldn't be a problem. ** Also affects: oslo.messaging Importance: Undecided Status: New ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2054502 Title: shutdowning rabbitmq causes nova-compute.service down Status in OpenStack Compute (nova): Invalid Status in oslo.messaging: New Bug description: Description === We have an OpenStack with a RabbitMQ cluster of 3 nodes, and with dozens of nova-compute nodes. When we shut down 1 out of 3 RabbitMQ nodes, Nagios alerted nova-compute.service down for 2 nova-compute nodes. Upon checking, we found that nova-compute.service is running. nova-compute.service - OpenStack Compute Loaded: loaded (/lib/systemd/system/nova-compute.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2024-02-16 00:42:47 UTC; 4 days ago Main PID: 10130 (nova-compute) Tasks: 32 (limit: 463517) Memory: 248.2M CPU: 55min 5.217s CGroup: /system.slice/nova-compute.service ├─10130 /usr/bin/python3 /usr/bin/nova-compute --config-file=/etc/nova/nova.conf --config-file=/etc/nova/nova-compute.conf --log-file=/var/log/nova/nova-compute.log ├─11527 /usr/bin/python3 /bin/privsep-helper --config-file /etc/nova/nova.conf --config-file /etc/nova/nova-compute.conf --privsep_context vif_plug_ovs.privsep.vif_plug --privsep_sock_path /tmp/tmpc0sosqey/privsep.sock └─11702 /usr/bin/python3 /bin/privsep-helper --config-file /etc/nova/nova.conf --config-file /etc/nova/nova-compute.conf --privsep_context nova.privsep.sys_admin_pctxt --privsep_sock_path /tmp/tmp2ik7rchu/privsep.sock Feb 16 00:42:53 node002 sudo[11540]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=64060) Feb 16 00:42:54 node002 sudo[11540]: pam_unix(sudo:session): session closed for user root Feb 20 04:55:31 node002 nova-compute[10130]: Traceback (most recent call last): Feb 20 04:55:31 node002 nova-compute[10130]: File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 476, in fire_timers Feb 20 04:55:31 node002 nova-compute[10130]: timer() Feb 20 04:55:31 node002 nova-compute[10130]: File "/usr/lib/python3/dist-packages/eventlet/hubs/timer.py", line 59, in __call__ Feb 20 04:55:31 node002 nova-compute[10130]: cb(*args, **kw) Feb 20 04:55:31 node002 nova-compute[10130]: File "/usr/lib/python3/dist-packages/eventlet/semaphore.py", line 152, in _do_acquire Feb 20 04:55:31 node002 nova-compute[10130]: waiter.switch() Feb 20 04:55:31 node002 nova-compute[10130]: greenlet.error: cannot switch to a different thread I guess it's possible that when shutting down a RabbitMQ node, nova-compute is experiencing contention or state inconsistencies in processing connection recovery restarting nova-compute.service can resolve the problem. Logs & Configs == The nova-compute.log: 2024-02-20 04:55:28.675 10130 ERROR oslo.messaging._drivers.impl_rabbit [-] [0aefd459-297a-48e8-8b15-15c763531431] AMQP server on 10.10.10.59:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: ConnectionResetError: [Errno 104] Connection reset by peer 2024-02-20 04:55:29.677 10130 ERROR oslo.messaging._drivers.impl_rabbit [-] [0aefd459-297a-48e8-8b15-15c763531431] AMQP server on 10.10.10.59:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.: ConnectionRefusedError: [Errno 111] ECONNREFUSED 2024-02-20 04:55:30.682 10130 INFO oslo.messaging._drivers.impl_rabbit [-] [0aefd459-297a-48e8-8b15-15c763531431] Reconnected to AMQP server on 10.10.10.52:5672 via [amqp] client with port 35346. 2024-02-20 04:55:31.361 10130 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer 然后systemctl status nova-compute Feb 20 04:55:31 node002 nova-compute[10130]: Traceback (most recent call last): Feb 20 04:55:31 node002 nova-compute[10130]: File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 476, in fire_timers Feb 20 04:55:31 node002 nova-compute[10130]: timer() Feb 20 04:55:31 node002 nova-compute[10130]: File "/usr/lib/python3/dist-packages/eventlet/hubs/timer.py", line 59, in __call__ Feb 20 04:55:31 node002 nova-compute[10130]: cb(*args, **kw) Feb 20 04:55:31 node002 nova-compute[10130]: File "/usr/lib/python3/dist-packages/eventlet/semaphore.py", line 152, in _do_acquire Feb 20 04:55:31 node002 nova-compute[10130]: waiter.switch() Feb 20 04:55:31 node002
[Yahoo-eng-team] [Bug 2055004] Re: unable to create vm - Could not find versioned identity endpoints
As you can see on the exception, this is not a Nova bug. Keystone just tells you 'sorry, it's forbidden' so maybe you have some wrong configuration for it. Eventually, as I said, as this is not a Nova bug, closing this report. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2055004 Title: unable to create vm - Could not find versioned identity endpoints Status in OpenStack Compute (nova): Invalid Bug description: Hi, Description === After unstalling openstack I've been trying to create a vm with the command below, but I'm getting this error: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible. (HTTP 500) (Request-ID: req-4e85e38c-f6b8-43ce-ae29-4b96b41da25e) Command: openstack server create --flavor x1.test --image cirros my- instance I've doubled check the nova conf and so far everything looks ok. Logs = logs in var/log/nova/nova-api.log 2024-02-26 01:04:14.853 23820 INFO nova.osapi_compute.wsgi.server [None req-fc792d37-360b-4cbc-8259-e3ad051c2816 01ac288623ee4fcf844338f25b8edb5e 831bbb9ad88e4fd6a8536715ffc0a4c3 - - default default] 192.168.16.20 "GET /v2.1/flavors HTTP/1.1" status: 200 len: 1288 time: 0.0286248 2024-02-26 01:04:14.868 23820 INFO nova.osapi_compute.wsgi.server [None req-58facd70-3a50-4a22-8155-b83d18336ecc 01ac288623ee4fcf844338f25b8edb5e 831bbb9ad88e4fd6a8536715ffc0a4c3 - - default default] 192.168.16.20 "GET /v2.1/flavors/1 HTTP/1.1" status: 200 len: 753 time: 0.0087411 2024-02-26 01:04:14.907 23820 WARNING oslo_config.cfg [None req-56b6dfe0-c3f7-4d41-88e0-71fef6781b15 01ac288623ee4fcf844338f25b8edb5e 831bbb9ad88e4fd6a8536715ffc0a4c3 - - default default] Deprecated: Option "api_servers" from group "glance" is deprecated for removal ( Support for image service configuration via standard keystoneauth1 Adapter options was added in the 17.0.0 Queens release. The api_servers option was retained temporarily to allow consumers time to cut over to a real load balancing solution. ). Its value may be silently ignored in the future. 2024-02-26 01:04:15.366 23820 WARNING keystoneauth.identity.generic.base [None req-56b6dfe0-c3f7-4d41-88e0-71fef6781b15 01ac288623ee4fcf844338f25b8edb5e 831bbb9ad88e4fd6a8536715ffc0a4c3 - - default default] Failed to discover available identity versions when contacting http://openstackcs/identity. Attempting to parse version from URL.: keystoneauth1.exceptions.http.Forbidden: Forbidden (HTTP 403) 2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi [None req-56b6dfe0-c3f7-4d41-88e0-71fef6781b15 01ac288623ee4fcf844338f25b8edb5e 831bbb9ad88e4fd6a8536715ffc0a4c3 - - default default] Unexpected exception in API method: keystoneauth1.exceptions.discovery.DiscoveryFailure: Could not find versioned identity endpoints when attempting to authenticate. Please check that your auth_url is correct. Forbidden (HTTP 403) 2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi Traceback (most recent call last): 2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/keystoneauth1/identity/generic/base.py", line 136, in _do_create_plugin 2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi disc = self.get_discovery(session, 2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/keystoneauth1/identity/base.py", line 608, in get_discovery 2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi return discover.get_discovery(session=session, url=url, 2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/keystoneauth1/discover.py", line 1460, in get_discovery 2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi disc = Discover(session, url, authenticated=authenticated) 2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/keystoneauth1/discover.py", line 540, in __init__ 2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi self._data = get_version_data(session, url, 2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/keystoneauth1/discover.py", line 107, in get_version_data 2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi resp = session.get(url, headers=headers, authenticated=authenticated) 2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/keystoneauth1/session.py", line 1141, in get 2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi return self.request(url, 'GET', **kwargs) 2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi File
[Yahoo-eng-team] [Bug 2055700] Re: server rebuild with reimage-boot-volume and is_volume_backed fails with BuildAbortException
Fabian, do you want some kind of root cause analysis ? If so, I'd prefer if you could ping us in the nova channel rather than creating a bug report. Once you know why you get this exception, you could reopen this bug report if you want to explain why it's having a problem, but for the moment, I'll close it. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2055700 Title: server rebuild with reimage-boot-volume and is_volume_backed fails with BuildAbortException Status in OpenStack Compute (nova): Invalid Bug description: Description === More specifically the following tempest test in master fails with: tempest.api.compute.servers.test_server_actions.ServerActionsV293TestJSON.test_rebuild_volume_backed_server Even with patch for https://review.opendev.org/c/openstack/nova/+/910627 Technically though, it should be unrelated to the driver implementation as... The `ComputeManager._rebuild_default_impl` calls first destroy on the VM in both branches: - https://opendev.org/openstack/nova/src/branch/master/nova/compute/manager.py#L3695-L3701 And in the case of a volume backed VM with `reimage_boot_volume=True` calls `ComputeManager._rebuild_volume_backed_instance` here - https://opendev.org/openstack/nova/src/branch/master/nova/compute/manager.py#L3710-L3715 The function tries to detach the volume from the destroyed instance and at least in the VMware driver raises an `InstanceNotFound`, which I'd argue would be expected. - https://opendev.org/openstack/nova/src/branch/master/nova/compute/manager.py#L3596-L3607 Steps to reproduce == * Install Devstack from master * Run tempest test `tempest.api.compute.servers.test_server_actions.ServerActionsV293TestJSON.test_rebuild_volume_backed_server` Or as a bash script: ``` IMAGE=$(openstack image list -c ID -f value) ID1=$(openstack server create --flavor 1 --image $IMAGE --boot-from-volume 1 rebuild-1 -c id -f value) ID2=$(openstack server create --flavor 1 --image $IMAGE --boot-from-volume 1 rebuild-2 -c id -f value) # Wait for servers to be ready # Works openstack server rebuild --os-compute-api-version 2.93 --image $IMAGE $ID1 # Fails openstack server rebuild --os-compute-api-version 2.93 --reimage-boot-volume --image $IMAGE $ID1 ``` Expected result === The test succeeds. Actual result = Environment === 1. Patch proposed in https://review.opendev.org/c/openstack/nova/+/909474 + Patch proposed in https://review.opendev.org/c/openstack/nova/+/910627 2. Which hypervisor did you use? What's the version of that? vmwareapi (VSphere 7.0.3 & ESXi 7.0.3) 2. Which storage type did you use? vmdk on NFS 4.1 3. Which networking type did you use? networking-nsx-t (https://github.com/sapcc/networking-nsx-t) Logs & Configs == To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2055700/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2056149] Re: Inconsistent volume naming when create instance (from volume)
Looks to me it's more a feature question, not a bug. Which problem do you have ? ** Changed in: nova Status: New => Opinion -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2056149 Title: Inconsistent volume naming when create instance (from volume) Status in OpenStack Compute (nova): Opinion Bug description: Description: When creating an instance from volume, there are inconsistent behaviours and users usability issue. Using Yoga and confirmed it with older versions as well. It likely will be present on new versions too. Cases: - using the CLI and the --boot-from-volume flag: Naming: The instance gets created, and the volume does not get any name, it is just blank "" Problems: By default when instance deletion, the volume does not get deleted. If a user wants to reuse the root volume, finding the right volume is just impossible. Suggestion: it would be nice to call the volume "volume-$INSTANCEUUID" to have direct correspondency between a VM and its root volume. - using Horizon and selecting the CREATE VOLUME button: Naming: The volume this time gets a name, that is equal to the volume UUID. Problems: The behaviour is different than the CLI and users (and admins) get confused. Suggestion: As above, to call the root volume when booting an instance from volume, to name it "volume-$INSTANCEUUID". Overall it would be nice to an option to template the volume naming when creating from volume, something like: boot_from_volume_naming_template: volume-%UUID The blank naming behaviour of case 1 should be fixed as a bug <--- To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2056149/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2056756] Re: A source_type=blank instance was unexpectedly scheduled to the ironic node
Ironic nodes are seen exactly as nova-compute libvirt nodes. If you want to avoid them, you need to use aggregates. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2056756 Title: A source_type=blank instance was unexpectedly scheduled to the ironic node Status in OpenStack Compute (nova): Invalid Bug description: Description === I execute the following command to boot an instance with source_type=blank of root volume, My OpenStack env has many nodes including nova and ironic, the instance was unexpectedly scheduled to the ironic node, I check the rest resource and find that it's exceeded. Could anyone give me some advice to avoid it? Thanks a lot. nova boot --flavor 10 --block-device source=blank,dest=volume,size=1,bootindex=0,volume_type=hdd --nic net- name=share_net test Steps to reproduce == Execute the command to boot the instance: nova boot --flavor 10 --block-device source=blank,dest=volume,size=1,bootindex=0,volume_type=hdd --nic net-name=share_net test Expected result === None of node was scheduled, and the instance status will be error. Actual result = The instance was unexpectedly scheduled to ironic node. Environment === Wallaby Logs & Configs == 1.nova-scheduler: 2024-03-11 20:00:03.632 17 INFO nova.scheduler.manager [req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 b00eb18beb7647dba928b26485606784 - default default] Starting to schedule for instances: ['953d0c4c-b53e-4739-8444-80ac7442f612']^[[00m 2024-03-11 20:00:03.788 17 INFO nova.filters [req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 b00eb18beb7647dba928b26485606784 - default default] Starting with 14 host(s)^[[00m 2024-03-11 20:00:03.789 17 INFO nova.filters [req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 b00eb18beb7647dba928b26485606784 - default default] Filter AvailabilityZoneFilter returned 14 host(s)^[[00m 2024-03-11 20:00:03.789 17 INFO nova.filters [req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 b00eb18beb7647dba928b26485606784 - default default] Filter ComputeFilter returned 14 host(s)^[[00m 2024-03-11 20:00:03.790 17 INFO nova.filters [req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 b00eb18beb7647dba928b26485606784 - default default] Filter ComputeCapabilitiesFilter returned 14 host(s)^[[00m 2024-03-11 20:00:03.791 17 INFO nova.filters [req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 b00eb18beb7647dba928b26485606784 - default default] Filter ImagePropertiesFilter returned 14 host(s)^[[00m 2024-03-11 20:00:03.791 17 INFO nova.filters [req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 b00eb18beb7647dba928b26485606784 - default default] Filter ServerGroupAntiAffinityFilter returned 14 host(s)^[[00m 2024-03-11 20:00:03.791 17 INFO nova.filters [req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 b00eb18beb7647dba928b26485606784 - default default] Filter ServerGroupAffinityFilter returned 14 host(s)^[[00m 2024-03-11 20:00:03.819 17 INFO nova.filters [req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 b00eb18beb7647dba928b26485606784 - default default] Filter NUMATopologyFilter returned 14 host(s)^[[00m 2024-03-11 20:00:03.820 17 INFO nova.filters [req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 b00eb18beb7647dba928b26485606784 - default default] Filter AggregateVolumeTypeFilter returned 14 host(s)^[[00m 2024-03-11 20:00:03.822 17 INFO nova.filters [req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 b00eb18beb7647dba928b26485606784 - default default] Filter SriovPciFilter returned 14 host(s)^[[00m 2024-03-11 20:00:03.823 17 INFO nova.filters [req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 b00eb18beb7647dba928b26485606784 - default default] Filter GPUFilter returned 14 host(s)^[[00m 2024-03-11 20:00:03.823 17 INFO nova.filters [req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 b00eb18beb7647dba928b26485606784 - default default] Filter VGPUFilter returned 14 host(s)^[[00m 2024-03-11 20:00:03.824 17 INFO nova.scheduler.filter_scheduler [req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 b00eb18beb7647dba928b26485606784 - default default] Filtered [(ironic.compute.domain.tld.2, fb729bc5-8d29-47b8-8d0f-cbeb47ba57a8) ram: 4096MB disk: 30720MB io_ops: 0 instances: 0, (ironic.compute.domain.tld.2, 19bbf021-76b4-4222-a633-4539a2c70225) ram: 4096MB disk: 30720MB io_ops: 0 instances: 0,
[Yahoo-eng-team] [Bug 2056195] Re: Return 409 at neutron-client conflict
This appears to me a configuration issue as said in the exception : Error Cannot apply both stateful and stateless security groups on the same port at the same time while attempting the operation., Neutron server returns request_ids: ['req-1007ffaa-3501-4566-9ad9-c540931138f0'] I don't think this is a bug in Nova, so closing the bug accordinly but feel free to reopen if if you can prove the contrary. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2056195 Title: Return 409 at neutron-client conflict Status in OpenStack Compute (nova): Invalid Bug description: Description === When attaching a stateless and stateful security group to a VM, nova returns a 500 error but it's a user issue and a 409 conflict error should be returned. Steps to reproduce == 1. create network 2. create VM "test-vm" attached to the network 3. may create a statefull security group, but default group should already do 4. openstack securit group create --stateless stateless-group 5. openstack server add security group test-vm stateless-group Expected result === Nova forwards the 409 error from Neutron with the error description from Neutron. Actual result = Nova returns: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible. (HTTP 500) (Request-ID: req-c6bbaf50-99b7-4108-98f0-808dfee84933) Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ # nova-api --version 26.2.2 (Zed) 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) Neutron with OVN Logs & Configs == Stacktrace: Traceback (most recent call last):, File "/usr/local/lib/python3.10/site-packages/nova/api/openstack/wsgi.py", line 658, in wrapped, return f(*args, **kwargs), File "/usr/local/lib/python3.10/site-packages/nova/api/openstack/compute/security_groups.py", line 437, in _addSecurityGroup, return security_group_api.add_to_instance(context, instance,, File "/usr/local/lib/python3.10/site-packages/nova/network/security_group_api.py", line 653, in add_to_instance, raise e, File "/usr/local/lib/python3.10/site-packages/nova/network/security_group_api.py", line 648, in add_to_instance, neutron.update_port(port['id'], {'port': updated_port}), File "/usr/local/lib/python3.10/site-packages/nova/network/neutron.py", line 196, in wrapper, ret = obj(*args, **kwargs), File "/usr/local/lib/python3.10/site-packages/neutronclient/v2_0/client.py", line 828, in update_port, return self._update_resource(self.port_path % (port), body=body,, File "/usr/local/lib/python3.10/site-packages/nova/network/neutron.py", line 196, in wrapper, ret = obj(*args, **kwargs), File "/usr/local/lib/python3.10/site-packages/neutronclient/v2_0/client.py", line 2548, in _update_resource, return self.put(path, **kwargs), File "/usr/local/lib/python3.10/site-packages/nova/network/neutron.py", line 196, in wrapper, ret = obj(*args, **kwargs), File "/usr/local/lib/python3.10/site-packages/neutronclient/v2_0/client.py", line 365, in put, return self.retry_request("PUT", action, body=body,, File "/usr/local/lib/python3.10/site-packages/nova/network/neutron.py", line 196, in wrapper, ret = obj(*args, **kwargs), File "/usr/local/lib/python3.10/site-packages/neutronclient/v2_0/client.py", line 333, in retry_request, return self.do_request(method, action, body=body,, File "/usr/local/lib/python3.10/site-packages/nova/network/neutron.py", line 196, in wrapper, ret = obj(*args, **kwargs), File "/usr/local/lib/python3.10/site-packages/neutronclient/v2_0/client.py", line 297, in do_request, self._handle_fault_response(status_code, replybody, resp), File "/usr/local/lib/python3.10/site-packages/nova/network/neutron.py", line 196, in wrapper, ret = obj(*args, **kwargs), File "/usr/local/lib/python3.10/site-packages/neutronclient/v2_0/client.py", line 272, in _handle_fault_response, exception_handler_v20(status_code, error_body), File "/usr/local/lib/python3.10/site-packages/neutronclient/v2_0/client.py", line 90, in exception_handler_v20, raise client_exc(message=error_message,, neutronclient.common.exceptions.Conflict: Error Cannot apply both stateful and stateless security groups on the same port at the same time while attempting the operation., Neutron server returns request_ids: ['req-1007ffaa-3501-4566-9ad9-c540931138f0'] To manage notifications about this bug go to:
[Yahoo-eng-team] [Bug 1943934] Re: report extra gpu device when config one enabled_vgpu_types
Fixed by https://review.opendev.org/c/openstack/nova/+/899406/2 ** Changed in: nova Status: Triaged => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1943934 Title: report extra gpu device when config one enabled_vgpu_types Status in OpenStack Compute (nova): Fix Released Bug description: if there are two gpu devices virtualized on the env, and config one enabled_vgpu_types and device_addresses, Nova will report these two gpu devices to Placement. we should only report the configured device_addresses to Placement. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1943934/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2049121] Re: Boot one VM with two GPU(in same numa)by pci passthrough cannot have GPUDirect P2P capability
Please reopen the bug report by changing the status back to new if you think it's related to Nova. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2049121 Title: Boot one VM with two GPU(in same numa)by pci passthrough cannot have GPUDirect P2P capability Status in OpenStack Compute (nova): Invalid Bug description: Hi, I have two GPU cards, all of them was connect with one same numa CPU socket as below link info: https://paste.opendev.org/show/b7Qi8qCnbLVxO2W0JdQw/ I can boot one nova instance successfully with the two GPU cards by PCI Passthrough way. but in the booted instances, use deviceQuery method would get the below message: Peer access from NVIDIA RTX 6000(GPU0) -> NVIDIA RTX 6000(GPU1): NO Peer access from NVIDIA RTX 6000(GPU1) -> NVIDIA RTX 6000(GPU0): NO The expected return should be as below: Peer access from NVIDIA RTX 6000(GPU0) -> NVIDIA RTX 6000(GPU1): YES Peer access from NVIDIA RTX 6000(GPU1) -> NVIDIA RTX 6000(GPU0): YES so that the memory can be shared between the two GPUs. I'm running Openstack Xena release in Intel Xeon Gold 5220R CPU To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2049121/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2044515] Re: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
As you said, this is due to a connection error : https://controller/identity looks to not be able for your environment. Are you sure this URL is right? Anyway, not a nova bug. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2044515 Title: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible. (HTTP 500) (Request-ID: req-0971807f-dc3f-4882-8a1f-c7c24b86aa0e) Status in OpenStack Compute (nova): Invalid Bug description: we are running openstack on 22.04 qemu .Cant create an instance on the self-service network (we have created flavor,attached network list, security group,keypair,rule, but instanace is not creating )[command used "openstack server create --flavor m1.nano --image cirros --nic net-id=1fbb66da-7362-4ba9-851b-f9251b3e12e2 --security-group 424fd166-6252-4118-97d6-7062aad3c9eb--key-name mykey inbinternet"]. nova api eroor log ** Unable to establish connection to https://controller/identity: HTTPSConnectionPool(host='controller', port=443): Max retries exceeded with url: /identity (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] ECONNREFUSED')) To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2044515/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2044721] Re: Returning 500 to user: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
As you can see here, this is a messaging timeout : f8c11b038ef198fc0 d865adb13a8d4c15b7a4c07f040efdc5 - - default default] Unexpected exception in API method: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID c44ae10b2f774b428763a1d11368cb16 2023-11-27 06:34:51.506 19 ERROR nova.api.openstack.wsgi oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID c44ae10b2f774b428763a1d11368cb16 So, either you have an issue with oslo.messaging or you have a wrong configuration, but this is not a nova bug. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2044721 Title: Returning 500 to user: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible. __call__ /var/lib/kolla/venv/lib/python3.10/site- packages/nova/api/openstack/wsgi.py:936 Status in OpenStack Compute (nova): Invalid Bug description: We were upgrading our server hardware. our compute server was corrupted (unable to enter ubuntu os) , so i reflash the same ubuntu os version to it & with the same IP address/mac address , and redeployed the kolla-ansible with the same configuration . however , after that nova_compute & zun_compute service on compute node seems to be restarting by itself (exit ---> restart ---> exit ---> restart). I have no clue whats the solution to it expected result = all services should be working correctly actual result = zun_compute is down nova_compute is down horizon compute node is down environment === kolla-ansible V2023.1 ubuntu version 22.04 for all 3 servers 2023-11-27 06:26:18.431 22 ERROR nova.api.openstack.wsgi [None req-0c9d29ab-f230-4b26-828f-b1bce144acc9 d54f447283c843df8c11b038ef198fc0 d865adb13a8d4c15b7a4c07f040efdc5 - - default default] Unexpected exception in API method: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 6fc08dc0c65849d98b55f762441ae230 2023-11-27 06:26:18.431 22 ERROR nova.api.openstack.wsgi oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 6fc08dc0c65849d98b55f762441ae230 __call__ /var/lib/kolla/venv/lib/python3.10/site-packages/nova/api/openstack/wsgi.py:936 2023-11-27 06:27:18.420 21 ERROR nova.api.openstack.wsgi [None req-c61791f0-c689-4b37-bc73-183586fd45c1 d54f447283c843df8c11b038ef198fc0 d865adb13a8d4c15b7a4c07f040efdc5 - - default default] Unexpected exception in API method: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID b3811b5050fb45238d0cce866b8db0d3 2023-11-27 06:27:18.420 21 ERROR nova.api.openstack.wsgi oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID b3811b5050fb45238d0cce866b8db0d3 __call__ /var/lib/kolla/venv/lib/python3.10/site-packages/nova/api/openstack/wsgi.py:936 2023-11-27 06:28:18.520 23 ERROR nova.api.openstack.wsgi [None req-25845c63-c886-4878-a976-d427f1fb311f - - - - - -] Unexpected exception in API method: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID a2f8e06d636f4d96b89c5f5e6b685db0 2023-11-27 06:28:18.520 23 ERROR nova.api.openstack.wsgi oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID a2f8e06d636f4d96b89c5f5e6b685db0 __call__ /var/lib/kolla/venv/lib/python3.10/site-packages/nova/api/openstack/wsgi.py:936 2023-11-27 06:29:18.436 19 ERROR nova.api.openstack.wsgi [None req-9eb9d577-c192-4579-a8aa-00d7cabb793b d54f447283c843df8c11b038ef198fc0 d865adb13a8d4c15b7a4c07f040efdc5 - - default default] Unexpected exception in API method: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 7523a9908d5341a7a667be6c0a1775ea 2023-11-27 06:29:18.436 19 ERROR nova.api.openstack.wsgi oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 7523a9908d5341a7a667be6c0a1775ea __call__ /var/lib/kolla/venv/lib/python3.10/site-packages/nova/api/openstack/wsgi.py:936 2023-11-27 06:30:18.603 23 ERROR nova.api.openstack.wsgi [None req-25f72773-caee-4551-8ebb-92747f12e77f d54f447283c843df8c11b038ef198fc0 d865adb13a8d4c15b7a4c07f040efdc5 - - default default] Unexpected exception in API method: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID ac1f456f77c94a96b6deea5d2c3186a5 2023-11-27 06:30:18.603 23 ERROR nova.api.openstack.wsgi oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID ac1f456f77c94a96b6deea5d2c3186a5 __call__
[Yahoo-eng-team] [Bug 2041519] [NEW] Inventories of SR-IOV GPU VFs are impacted by allocations for other VFs
Public bug reported: This is hard to summarize the problem in a bug report title, my bad. Long story short, the case arrives if you start using nVidia SR-IOV next-gen GPUs like A100 which create Virtual Functions on the host, each of them supporting the same GPU types but with a specific amount of available mediated devices to be created equal to 1. If you're using other GPUs (like V100) and you're not running nvidia's sriov-manage to expose the VFs, please nevermind this bug, you shall not be impacted. So, say you have a A100 GPU card, before configuring Nova, you have to run the aforementioned sriov-manage script which will allocate 16 virtual functions for the GPU. Each of those PCI adddresses will correspond to a Placement resource provider (if you configure Nova so) with an VGPU inventory with total=1. Example : https://paste.opendev.org/show/bVxrVLW3yOR3TPV2Lz3A/ Sysfs shows the exact same thing on the nvidia-472 type I configured for : [stack@lenovo-sr655-01 ~]$ cat /sys/class/mdev_bus/*/mdev_supported_types/nvidia-472/available_instances 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Now, the problem arises when you're exhausting the number of mediated devices you can create. In the case of nvidia-472, which corresponds to nvidia's GRID A100-20C, you can create up to 2 VGPUs, ie. mediated devices. Accordingly, when Nova creates the 2 mediated devices automatically when booting an instance, and if *no* mediated devices are found available yet, then *all other* VFs that don't use those 2 mediated devices will have their available_instances value equal to 0 : [stack@lenovo-sr655-01 nova]$ openstack server create --image cirros-0.6.2-x86_64-disk --flavor c1g --key-name mykey --network public vm1 (skipped) [stack@lenovo-sr655-01 ~]$ cat /sys/class/mdev_bus/*/mdev_supported_types/nvidia-472/available_instances 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 [stack@lenovo-sr655-01 nova]$ openstack server create --image cirros-0.6.2-x86_64-disk --flavor c1g --key-name mykey --network public vm2 (skipped) [stack@lenovo-sr655-01 ~]$ cat /sys/class/mdev_bus/*/mdev_supported_types/nvidia-472/available_instances 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 No, when we look at the inventories for all VFs, we see that while it's normal to see 2 Resource Providers having their total to 1 (since we created a mdev, it's counted) and their usage to 1, that said it's not normal to see *other VFs* having a total of 1 and an usage of 0. [stack@lenovo-sr655-01 nova]$ for uuid in $(openstack resource provider list -f value -c uuid); do openstack resource provider inventory list $uuid -f value -c resource_class -c total -c used; done | grep VGPU VGPU 1 1 VGPU 1 1 VGPU 1 0 VGPU 1 0 VGPU 1 0 VGPU 1 0 VGPU 1 0 VGPU 1 0 VGPU 1 0 VGPU 1 0 VGPU 1 0 VGPU 1 0 VGPU 1 0 VGPU 1 0 VGPU 1 0 VGPU 1 0 I eventually went down into the code and found the culprit : https://github.com/openstack/nova/blob/9c9cd3d9b6d1d1e6f62012cd8a86fd588fb74dc2/nova/virt/libvirt/driver.py#L9110-L9111 Before this method is called, we correctly calculate the numbers that we get from libvirt, and all the non-used VFs have their total=0, but since we enter this conditional, we skip to update them. There are different ways to solve this problem : - we stop automatically creating mediated devices and ask operators to pre-allocate all mediated devices before starting nova-compute but there is a big operator impact (and they need to add some tooling) - we blindly remove the RP from the PlacementTree and let update_resource_providers() call in compute manager to try to update Placement with this new view. In that very particular case, we're sure that none of the RPs that have total=0 have allocations against them, so it shouldn't fail but this logic can be errorprone if we try to reproduce it elsewhere. ** Affects: nova Importance: Undecided Status: New ** Tags: vgpu ** Tags added: vgpu -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2041519 Title: Inventories of SR-IOV GPU VFs are impacted by allocations for other VFs Status in OpenStack Compute (nova): New Bug description: This is hard to summarize the problem in a bug report title, my bad. Long story short, the case arrives if you start using nVidia SR-IOV next-gen GPUs like A100 which create Virtual Functions on the host, each of them supporting the same GPU types but with a specific amount of available mediated devices to be created equal to 1. If you're using other GPUs (like V100) and you're not running nvidia's sriov-manage to expose the VFs, please nevermind this bug, you shall not be impacted. So, say you have a A100 GPU card, before configuring Nova, you have to run the aforementioned sriov-manage script which will allocate 16 virtual functions for the GPU. Each of those PCI adddresses will correspond to a Placement resource provider
[Yahoo-eng-team] [Bug 2036867] Re: refactor test: use project id as constant variable in all places
This is indeed not a bug report, please don't create bug reports for this kind of internal points. ** Changed in: nova Status: New => Invalid ** Changed in: nova Importance: Undecided => Wishlist -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2036867 Title: refactor test: use project id as constant variable in all places Status in OpenStack Compute (nova): Invalid Bug description: This is not a bug, same PROJECT_ID const defined in many places. ex: fixtures/nova.py:75:PROJECT_ID = '6f70656e737461636b20342065766572' functional/api_samples_test_base.py:25:PROJECT_ID = "6f70656e737461636b20342065766572" for full list, inside tests, grep for 6f70656e737461636b20342065766572. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2036867/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1983863] Re: Can't log within tpool.execute
** Changed in: nova Status: In Progress => Invalid ** Changed in: nova Status: Invalid => Fix Committed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1983863 Title: Can't log within tpool.execute Status in OpenStack Compute (nova): Fix Committed Status in oslo.log: Fix Released Bug description: There is a bug in eventlet where logging within a native thread can lead to a deadlock situation: https://github.com/eventlet/eventlet/issues/432 When encountered with this issue some projects in OpenStack using oslo.log, eg. Cinder, resolve them by removing any logging withing native threads. There is actually a better approach. The Swift team came up with a solution a long time ago, and it would be great if oslo.log could use this workaround automaticaly: https://opendev.org/openstack/swift/commit/69c715c505cf9e5df29dc1dff2fa1a4847471cb6 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1983863/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2033752] Re: test_reboot_server_hard fails with AssertionError: time.struct_time() not greater than time.struct_time()
Probably due to the recent merge of https://review.opendev.org/c/openstack/nova/+/882284 Now, when rebooting, we call the Cinder API for checking the BDMs so it could be needing more time. ** Also affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2033752 Title: test_reboot_server_hard fails with AssertionError: time.struct_time() not greater than time.struct_time() Status in neutron: New Status in OpenStack Compute (nova): New Status in tempest: New Bug description: Seen many occurrences recently, fails as below:- Traceback (most recent call last): File "/opt/stack/tempest/tempest/api/compute/servers/test_server_actions.py", line 259, in test_reboot_server_hard self._test_reboot_server('HARD') File "/opt/stack/tempest/tempest/api/compute/servers/test_server_actions.py", line 127, in _test_reboot_server self.assertGreater(new_boot_time, boot_time, File "/usr/lib/python3.10/unittest/case.py", line 1244, in assertGreater self.fail(self._formatMessage(msg, standardMsg)) File "/usr/lib/python3.10/unittest/case.py", line 675, in fail raise self.failureException(msg) AssertionError: time.struct_time(tm_year=2023, tm_mon=9, tm_mday=1, tm_hour=7, tm_min=26, tm_sec=33, tm_wday=4, tm_yday=244, tm_isdst=0) not greater than time.struct_time(tm_year=2023, tm_mon=9, tm_mday=1, tm_hour=7, tm_min=26, tm_sec=33, tm_wday=4, tm_yday=244, tm_isdst=0) : time.struct_time(tm_year=2023, tm_mon=9, tm_mday=1, tm_hour=7, tm_min=26, tm_sec=33, tm_wday=4, tm_yday=244, tm_isdst=0) > time.struct_time(tm_year=2023, tm_mon=9, tm_mday=1, tm_hour=7, tm_min=26, tm_sec=33, tm_wday=4, tm_yday=244, tm_isdst=0) Example logs:- https://1e11be38b60141dbb290-777f110ca49a5cd01022e1e8aeff1ed5.ssl.cf1.rackcdn.com/893401/5/check/neutron-ovn-tempest-ovs-release/f379752/testr_results.html https://1b9f88b068db0ff45f98-b11b73e0c31560154dece88f25c72a10.ssl.cf2.rackcdn.com/893401/5/check/neutron-linuxbridge-tempest/0bf1039/testr_results.html https://30b3c23edbff5d871c4c-595cfa47540877e41ce912cd21563e42.ssl.cf1.rackcdn.com/886988/10/check/neutron-ovs-tempest-multinode-full/e57a62a/testr_results.html https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_0e5/886988/10/check/neutron-ovn-tempest-ipv6-only-ovs-release/0e538d1/testr_results.html Opensearch:- https://opensearch.logs.openstack.org/_dashboards/app/discover/?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-30d,to:now))&_a=(columns:!(_source),filters:!(),index:'94869730-aea8-11ec-9e6a-83741af3fdcd',interval:auto,query:(language:kuery,query:'message:%22not%20greater%20than%20time.struct_time%22'),sort:!()) As per opensearch it's started to be seen just few hours back. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2033752/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2028851] Re: Console output was empty in test_get_console_output_server_id_in_shutoff_status
Seems to be a regression coming from the automatic rebase of https://github.com/openstack/tempest/commit/eea2c1cfac1e5d240cad4f8be68cff7d72f220a8 ** Also affects: tempest Importance: Undecided Status: New ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2028851 Title: Console output was empty in test_get_console_output_server_id_in_shutoff_status Status in OpenStack Compute (nova): Invalid Status in tempest: New Bug description: test_get_console_output_server_id_in_shutoff_status https://github.com/openstack/tempest/blob/04cb0adc822ffea6c7bfccce8fa08b03739894b7/tempest/api/compute/servers/test_server_actions.py#L713 is failing consistently in the nova-lvm job starting on July 24 with 132 failures in the last 3 days. https://tinyurl.com/kvcc9289 Traceback (most recent call last): File "/opt/stack/tempest/tempest/api/compute/servers/test_server_actions.py", line 728, in test_get_console_output_server_id_in_shutoff_status self.wait_for(self._get_output) File "/opt/stack/tempest/tempest/api/compute/base.py", line 340, in wait_for condition() File "/opt/stack/tempest/tempest/api/compute/servers/test_server_actions.py", line 213, in _get_output self.assertTrue(output, "Console output was empty.") File "/usr/lib/python3.10/unittest/case.py", line 687, in assertTrue raise self.failureException(msg) AssertionError: '' is not true : Console output was empty. its not clear why this has started failing. it may be a regression or a latent race in the test that we are now failing. def test_get_console_output_server_id_in_shutoff_status(self): """Test getting console output for a server in SHUTOFF status Should be able to GET the console output for a given server_id in SHUTOFF status. """ # NOTE: SHUTOFF is irregular status. To avoid test instability, # one server is created only for this test without using # the server that was created in setUpClass. server = self.create_test_server(wait_until='ACTIVE') temp_server_id = server['id'] self.client.stop_server(temp_server_id) waiters.wait_for_server_status(self.client, temp_server_id, 'SHUTOFF') self.wait_for(self._get_output) the test does not wait for the VM to be sshable so its possible that we are shutting off the VM before it is fully booted and no output has been written to the console. this failure has happened on multiple providers but only in the nova-lvm job. the console behavior is unrelated to the storage backend but the lvm job i belive is using lvm on a loopback file so the storage performance is likely slower then raw/qcow. so perhaps the boot is taking longer and no output is being written. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2028851/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2025813] Re: test_rebuild_volume_backed_server failing 100% on nova-lvm job
Changing the bug importance to High as the fix is merged in master https://review.opendev.org/c/openstack/nova/+/887674 Keeping the stable branches status to Critical since the backports aren't merged yet. ** Changed in: nova Importance: Critical => High ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2025813 Title: test_rebuild_volume_backed_server failing 100% on nova-lvm job Status in devstack-plugin-ceph: New Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) antelope series: In Progress Status in OpenStack Compute (nova) yoga series: Triaged Status in OpenStack Compute (nova) zed series: Triaged Bug description: After the tempest patch was merged [1] nova-lvm job started to fail with the following error in test_rebuild_volume_backed_server: Traceback (most recent call last): File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 70, in wrapper return f(*func_args, **func_kwargs) File "/opt/stack/tempest/tempest/api/compute/servers/test_server_actions.py", line 868, in test_rebuild_volume_backed_server self.get_server_ip(server, validation_resources), File "/opt/stack/tempest/tempest/api/compute/base.py", line 519, in get_server_ip return compute.get_server_ip( File "/opt/stack/tempest/tempest/common/compute.py", line 76, in get_server_ip raise lib_exc.InvalidParam(invalid_param=msg) tempest.lib.exceptions.InvalidParam: Invalid Parameter passed: When validation.connect_method equals floating, validation_resources cannot be None As discussed on IRC with Sean [2], the SSH validation is mandatory now which is disabled in the job config [2]. [1] https://review.opendev.org/c/openstack/tempest/+/831018 [2] https://meetings.opendev.org/irclogs/%23openstack-nova/%23openstack-nova.2023-07-04.log.html#t2023-07-04T15:33:38 [3] https://opendev.org/openstack/nova/src/commit/4b454febf73cdd7b5be0a2dad272c1d7685fac9e/.zuul.yaml#L266-L267 To manage notifications about this bug go to: https://bugs.launchpad.net/devstack-plugin-ceph/+bug/2025813/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2024160] Re: [trunk ports] subport doesn't reach status ACTIVE
** Also affects: nova Importance: Undecided Status: New ** Changed in: nova Status: New => Confirmed ** Changed in: nova Importance: Undecided => High ** Changed in: nova Status: Confirmed => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2024160 Title: [trunk ports] subport doesn't reach status ACTIVE Status in neutron: Confirmed Status in OpenStack Compute (nova): Invalid Bug description: Test test_live_migration_with_trunk has been failing for the last two days. https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_3f3/831018/30/check/nova-live-migration/3f3b065/testr_results.html It's a test about live-migration, but it is important to notice that it fails before any live migration happens. The test creates a VM with a port and a subport. The test waits until the VM status is ACTIVE -> this passes The test waits until the subport status is ACTIVE -> this started failing two days ago because the port status is DOWN There was only one neutron patch merged that day[1], but I checked the test failed during some jobs even before that patch was merged. I compared some logs. Neutron logs when the test passes: [2] Neutron logs when the test fails: [3] When it fails, I see this during the creation of the subport (and I don't see this event when it passes): Jun 14 18:13:43.052982 np0034303809 neutron-server[77531]: DEBUG ovsdbapp.backend.ovs_idl.event [None req-929dd199-4247-46f5-9466-622c7d538547 None None] Matched DELETE: PortBindingUpdateVirtualPortsEvent(events=('update', 'delete'), table='Port_Binding', conditions=None, old_conditions=None), priority=20 to row=Port_Binding(parent_port=[], mac=['fa:16:3e:93:9d:5a 19.80.0.42'], chassis=[], ha_chassis_group=[], options={'mcast_flood_reports': 'true', 'requested-chassis': ''}, type=, tag=[], requested_chassis=[], tunnel_key=2, up=[False], logical_port=f8c707ec-ecd8-4f1e-99ba-6f8303b598b2, gateway_chassis=[], encap=[], external_ids={'name': 'tempest-subport-2029248863', 'neutron:cidrs': '19.80.0.42/24', 'neutron:device_id': '', 'neutron:device_owner': '', 'neutron:network_name': 'neutron-5fd9faa7-ec1c-4f42-ab87-6ce19edda245', 'neutron:port_capabilities': '', 'neutron:port_name': 'tempest-subport-2029248863', 'neutron:project_id': '6f92a9f8e16144148026725b25711d3a', 'neutron:revision_n umber': '1', 'neutron:security_group_ids': '5eab41ef-c5c1-425c-a931-f5b6b4b330ad', 'neutron:subnet_pool_addr_scope4': '', 'neutron:subnet_pool_addr_scope6': '', 'neutron:vnic_type': 'normal'}, virtual_parent=[], nat_addresses=[], datapath=3c472399-d6ee-4b7c-aa97-6777f2bc2772) old= {{(pid=77531) matches /usr/local/lib/python3.10/dist-packages/ovsdbapp/backend/ovs_idl/event.py:43}} ... Jun 14 18:13:49.597911 np0034303809 neutron-server[77531]: DEBUG neutron.plugins.ml2.plugin [None req-3588521e-7878-408d-b1f8-15db562c69f8 None None] Port f8c707ec-ecd8-4f1e-99ba-6f8303b598b2 cannot update to ACTIVE because it is not bound. {{(pid=77531) _port_provisioned /opt/stack/neutron/neutron/plugins/ml2/plugin.py:361}} It seems the ovn version has changed between these jobs: Passes [4]: 2023-06-14 10:01:46.358875 | controller | Preparing to unpack .../ovn-common_22.03.0-0ubuntu1_amd64.deb ... Fails [5]: 2023-06-14 17:55:07.077377 | controller | Preparing to unpack .../ovn-common_22.03.2-0ubuntu0.22.04.1_amd64.deb ... [1] https://review.opendev.org/c/openstack/neutron/+/883687 [2] https://96b562ba0d2478fe5bc1-d58fbc463536b3122b4367e996d5e5b0.ssl.cf1.rackcdn.com/831018/30/check/nova-live-migration/312c2ab/controller/logs/screen-q-svc.txt [3] https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_3f3/831018/30/check/nova-live-migration/3f3b065/controller/logs/screen-q-svc.txt [4] https://96b562ba0d2478fe5bc1-d58fbc463536b3122b4367e996d5e5b0.ssl.cf1.rackcdn.com/831018/30/check/nova-live-migration/312c2ab/job-output.txt [5] https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_3f3/831018/30/check/nova-live-migration/3f3b065/job-output.txt To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2024160/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2023018] [NEW] scaling governors are optional for some OS platforms
Public bug reported: Some OS platforms don't use cpufreq, so operators should be able to just offline their CPUs. For the moment, even if the config option CPU management strategy is 'cpu_state', we return an exception if so. Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service Traceback (most recent call last): Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service File "/opt/stack/nova/nova/filesystem.py", line 37, in read_sys Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service with open(os.path.join(SYS, path), mode='r') as data: Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service FileNotFoundError: [Errno 2] No such file or directory: '/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor' Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service The above exception was the direct cause of the following exception: Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service Traceback (most recent call last): Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service File "/usr/local/lib/python3.10/dist-packages/oslo_service/service.py", line 806, in run_service Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service service.start() Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service File "/opt/stack/nova/nova/service.py", line 162, in start Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service self.manager.init_host(self.service_ref) Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service File "/opt/stack/nova/nova/compute/manager.py", line 1608, in init_host Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service self.driver.init_host(host=self.host) Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 825, in init_host Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service libvirt_cpu.validate_all_dedicated_cpus() Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service File "/opt/stack/nova/nova/virt/libvirt/cpu/api.py", line 143, in validate_all_dedicated_cpus Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service governors.add(pcpu.governor) Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service File "/opt/stack/nova/nova/virt/libvirt/cpu/api.py", line 63, in governor Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service return core.get_governor(self.ident) Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service File "/opt/stack/nova/nova/virt/libvirt/cpu/core.py", line 69, in get_governor Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service return filesystem.read_sys( Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service File "/opt/stack/nova/nova/filesystem.py", line 40, in read_sys Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service raise exception.FileNotFound(file_path=path) from exc Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service nova.exception.FileNotFound: File /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor could not be found. Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service Let's just to support the CPU state if so. ** Affects: nova Importance: Low Assignee: Sylvain Bauza (sylvain-bauza) Status: In Progress ** Tags: cpu libvirt ** Summary changed: - scaling governors are optional for some OS plateforms + scaling governors are optional for some OS platforms -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2023018 Title: scaling governors are optional for some OS platforms Status in OpenStack Compute (nova): In Progress Bug description: Some OS platforms don't use cpufreq, so operators should be able to just offline their CPUs. For the moment, even if the config option CPU management strategy is 'cpu_state', we return an exception if so. Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service Traceback (most recent call last): Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service File "/opt/stack/nova/nova/filesystem.py", line 37, in read_sys Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service with open(os.path.join(SYS, pat
[Yahoo-eng-team] [Bug 2022955] [NEW] FileNotFound when offlining a core due to a privsep context missing
Public bug reported: When we created the CPU power interface, we forgot to add a specific privsep decorator for the set_offline() method : https://review.opendev.org/c/openstack/nova/+/868236/5/nova/virt/libvirt/cpu/core.py#63 As a result, we have a FileNotFound due to a permission error when restarting the nova-compute service : Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service Traceback (most recent call last): Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service File "/opt/stack/nova/nova/filesystem.py", line 56, in write_sys Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service with open(os.path.join(SYS, path), mode='w') as fd: Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service PermissionError: [Errno 13] Permission denied: '/sys/devices/system/cpu/cpu1/online' Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service The above exception was the direct cause of the following exception: Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service Traceback (most recent call last): Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service File "/usr/local/lib/python3.10/dist-packages/oslo_service/service.py", line 806, in run_service Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service service.start() Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service File "/opt/stack/nova/nova/service.py", line 162, in start Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service self.manager.init_host(self.service_ref) Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service File "/opt/stack/nova/nova/compute/manager.py", line 1608, in init_host Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service self.driver.init_host(host=self.host) Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 831, in init_host Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service libvirt_cpu.power_down_all_dedicated_cpus() Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service File "/opt/stack/nova/nova/virt/libvirt/cpu/api.py", line 128, in power_down_all_dedicated_cpus Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service pcpu.online = False Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service File "/opt/stack/nova/nova/virt/libvirt/cpu/api.py", line 50, in online Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service core.set_offline(self.ident) Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service File "/opt/stack/nova/nova/virt/libvirt/cpu/core.py", line 64, in set_offline Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service filesystem.write_sys(os.path.join(gen_cpu_path(core), 'online'), data='0') Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service File "/opt/stack/nova/nova/filesystem.py", line 59, in write_sys Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service raise exception.FileNotFound(file_path=path) from exc Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service nova.exception.FileNotFound: File /sys/devices/system/cpu/cpu1/online could not be found. Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service ** Affects: nova Importance: Undecided Assignee: Sylvain Bauza (sylvain-bauza) Status: In Progress ** Tags: cpu libvirt -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2022955 Title: FileNotFound when offlining a core due to a privsep context missing Status in OpenStack Compute (nova): In Progress Bug description: When we created the CPU power interface, we forgot to add a specific privsep decorator for the set_offline() method : https://review.opendev.org/c/openstack/nova/+/868236/5/nova/virt/libvirt/cpu/core.py#63 As a result, we have a FileNotFound due to a permission error when restarting the nova-compute service : Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service Traceback (most recent call last): Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service File "/opt/stack/nova/nova/filesystem.py", line 56, in write_sys Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service
[Yahoo-eng-team] [Bug 2018318] Re: 'openstack server resize --flavor' should not migrate VMs to another AZ
I left a very large comment on Gerrit but I'll add it here for better visibility. FWIW, I think the problem is legit and needs to be addressed. I'm gonna change the title and the subject to make it clearer but I also think that the solution isn't simple at first and requires some design discussion, hence the Wishlist status. Now, the comment I wrote explaining my -1 (you can find it here https://review.opendev.org/c/openstack/nova/+/864760/comment/b2b03637_f15d6dd2/ ) = > Just because you say so? =) > Can you provide a more technical explanation on why not? I mean, why would that be wrong? Or, what alternative would be better, and why? Sorry, that's kind of a non-documented design consensus (or a tribal knowledge if you prefer) We, as the Nova community, want to keep the RequestSpec.availability_zone record as an immutable object, that is only set when creating the RequestSpec, so then we know whether the user wanted to pin the instance to a specific AZ or not. > What is your proposal? We see the following two different alternatives so far. [...] Maybe you haven't seen my proposal before, but I was talking of https://review.opendev.org/c/openstack/nova/+/469675/12/nova/compute/api.py#1173 that was merged. See again my comment https://review.opendev.org/c/openstack/nova/+/864760/comments/4a302ce3_9805e7c6 TBC, lemme explain the problem and what we need to fix : if an user creates an instance with an image and asking to create a volume on that image, then we need to modify the AZ for the related Request if and only if cross_az_attach=False Now, let's discuss the implementation : 1/ we know that volumes are created way later in the instance boot by the compute service, but we do pass the information of the instance.az to Cinder to tell it to create a volume within that AZ if cross_az_attach=False : https://github.com/openstack/nova/blob/b3fdd7ccf01bafb68e37a457f703b79119dbfa86/nova/virt/block_device.py#L427 https://github.com/openstack/nova/blob/b3fdd7ccf01bafb68e37a457f703b79119dbfa86/nova/virt/block_device.py#L53-L78 2/ unfortunately,instance.availability_zone is only trustworthy if the instance is pinned to an AZ 3/ we know that at the API level, we're able to know whether we will create a volume based on an image since we have the BDMs and we do check them : https://github.com/openstack/nova/blob/b3fdd7ccf01bafb68e37a457f703b79119dbfa86/nova/compute/api.py#L1460 https://github.com/openstack/nova/blob/b3fdd7ccf01bafb68e37a457f703b79119dbfa86/nova/compute/api.py#L1866 https://github.com/openstack/nova/blob/b3fdd7ccf01bafb68e37a457f703b79119dbfa86/nova/compute/api.py#L1960-L1965C43 4/ Accordingly, we are able to follow the same logic than in https://github.com/openstack/nova/blob/b3fdd7ccf01bafb68e37a457f703b79119dbfa86/nova/compute/api.py#L1396-L1397 by checking the BDMs and see whether we gonna create a volume. If so, we SHALL pin the AZ exactly like https://github.com/openstack/nova/blob/b3fdd7ccf01bafb68e37a457f703b79119dbfa86/nova/compute/api.py#L1264 Unfortunately, since the user didn't specify an AZ, Nova doesn' know which AZ to pin the instance to. Consequently, we have multiple options : 1/ we could return an exception to the user if he didn't pinned the instance. That said, I really don't like this UX since the user doesn't know whether cross_az_attach is False or not 2/ we could document the fact that cross_az_attach only works with pre-created volumes. 3/ we could pre-create the volume way earlier at the API level and get its AZ. 4/ we could augment the RequestSpec to have a field saying 'pinned' or something else that the scheduler would honor on a move operation even if RequestSpec.az is None As you see, all those options need to be correctly discussed, so IMHO I'd prefer you to draft a spec so the nova community could address those points so we could find an approved design solution. HTH. ** Changed in: nova Status: Invalid => Confirmed ** Changed in: nova Importance: Undecided => Wishlist ** Summary changed: - 'openstack server resize --flavor' should not migrate VMs to another AZ + cross_az_attach=False doesn't honor BDMs with source=image and dest=volume -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2018318 Title: cross_az_attach=False doesn't honor BDMs with source=image and dest=volume Status in OpenStack Compute (nova): Confirmed Bug description: The config flag cross_az_attach allows an instance to be pinned to the related volume AZ if the value of that config option is set to False. We fixed the case of a volume-backed instance by https://review.opendev.org/c/openstack/nova/+/469675/ if the volume was created before the instance but we haven't yet resolved the case of an BFV-instance created from an image (the BDM shortcut that allows a late creation of a volume by the
[Yahoo-eng-team] [Bug 2018398] Re: Wrong AZ gets showed when adding new compute node
While I understand your concern, I think you missed the intent of default_availability_zone This config option is not intended for scheduling instances, but rather for showing off by default one AZ if none exists. With your environment, you could just decide to define any existing AZ as default_availability_zone, this would prevent 'nova' to show off in the AZ list. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2018398 Title: Wrong AZ gets showed when adding new compute node Status in OpenStack Compute (nova): Invalid Bug description: On a deployment with multi availability zones. When the operator adds a new compute host, the service gets registered as part of “default_availability_zone”. This is an undesirable behavior for users as they see a new AZ appearing which may not be related to the deployment the time window that the host finally gets configured to its correct AZ. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2018398/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2018375] [NEW] routed networks prefilter exception due to subnets can have no segments
Public bug reported: Since some subnets can not have some related segments, the subnet.segment_uuid value can be None but unfortunately, the routed_networks_filter prefilter doesn't support it. 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server [req-ed1b01c5-01bd-493f-8b56-b4cb21e29f59 e416974adb7a44fd910a40b208d28e9f d7b8b3323ea64f35adeec903c340a19e - default default] Exception during message handling: KeyError: 'segment_id' 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/oslo_messaging/rpc/server.py", line 241, in inner 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server return func(*args, **kwargs) 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/nova/scheduler/manager.py", line 140, in select_destinations 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server request_filter.process_reqspec(ctxt, spec_obj) 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/nova/scheduler/request_filter.py", line 387, in process_reqspec 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server filter(ctxt, request_spec) 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/nova/scheduler/request_filter.py", line 41, in wrapper 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server ran = fn(ctxt, request_spec) 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/nova/scheduler/request_filter.py", line 348, in routed_networks_filter 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server aggregates = utils.get_aggregates_for_routed_network( 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/nova/scheduler/utils.py", line 1390, in get_aggregates_for_routed_network 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server segment_ids = network_api.get_segment_ids_for_network( 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/nova/network/neutron.py", line 3610, in get_segment_ids_for_network 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server return [subnet['segment_id'] for subnet in subnets 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/nova/network/neutron.py", line 3611, in 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server if subnet['segment_id'] is not None] 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server KeyError: 'segment_id' 2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server 2023-05-02 22:38:15.178 11 DEBUG nova.scheduler.manager [req-798de5ac-273e-40fd-abce-36e701488046 e416974adb7a44fd910a40b208d28e9f d7b8b3323ea64f35adeec903c340a19e - default default] Starting to schedule for instances: ['412ca82a-06a4-40d9-b12d-08c56a78c5a9'] select_destinations /usr/lib/python3.9/site-packages/nova/scheduler/manager.py:124 ** Affects: nova Importance: Low Assignee: Sylvain Bauza (sylvain-bauza) Status: Confirmed ** Tags: neutron scheduler ** Changed in: nova Status: New => Confirmed ** Changed in: nova Importance: Undecided => Low ** Changed in: nova Assignee: (unassigned) => Sylvain Bauza (sylvain-bauza) ** Tags added: neutron scheduler -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2018375 Title: routed networks prefilter exception due to subnets can have no segments Status in OpenStack Compute (nova): Confirmed Bug description: Since some subnets can not have some related segments, the subnet.segment_uuid value can be None but unfortunately, the routed_networks_filter prefilter doesn't support it. 2023-05-02 22:3
[Yahoo-eng-team] [Bug 2012843] Re: Instances are free to move others zone when AZ was not specified
This is an expected behaviour, as you can read in https://docs.openstack.org/nova/latest/admin/availability- zones.html#resource-affinity If an instance is not pinned for a AZ [1] and if cross_az_attach is equal to True, then the instance can float between *all* Availability Zones. We only pin the instance to a specific AZ if cross_az_attach=False. See the new functional test that verifies this : https://review.opendev.org/c/openstack/nova/+/878948/1/nova/tests/functional/test_cross_az_attach.py [1] By 'pinned', I mean that either the AZ parameter for the instance is set, or the 'default_schedule_az' config option is not 'None'. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2012843 Title: Instances are free to move others zone when AZ was not specified Status in OpenStack Compute (nova): Invalid Bug description: Description === Instances are free move to others zone by migrating or Masakari(host down) when instances dont specify you when launch and cross_az_attach=true. Steps to reproduce == Launch instance without AZ choice(Any Availability Zone on Horizon or CLI). Expected result === Instances should move to its current AZ only when migrating or Masakari HA. Actual result = Instances were moved to other AZs when migrating or Masakari HA. Environment === Xena KVM Openswitch SAN Provider network Logs & Configs == cross_az_attach=true. Any Availabily Zone when launch instance on Horizon Default AZ was not set in nova.conf To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2012843/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2011567] Re: Cycle theme page is empty
Closing this bug report as we said in the PTG that we don't have any cycle themes for Bobcat. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2011567 Title: Cycle theme page is empty Status in OpenStack Compute (nova): Invalid Bug description: here: https://specs.openstack.org/openstack/nova-specs/ Under nova project plans -> Priorities https://specs.openstack.org/openstack/nova-specs/priorities/ussuri-priorities.html ... https://specs.openstack.org/openstack/nova-specs/priorities/2023.1-priorities.html Since Ussuri cycle theme page is no filled. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2011567/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2012049] Re:
Can you add more details about your problem ? At least, I see a MessagingTimeout from the logs, so I'm pretty sure this isn't a Nova bug, rather a configuration issue. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2012049 Title: Status in OpenStack Compute (nova): Invalid Bug description: 2023-03-17 09:06:10.494 14925 INFO nova.api.openstack.wsgi [req-f9a44aff-0978-4fe8-be6a-7c9fb331f6f9 8cf563f1906a43fd9ffe0c3ef4cbb2cf 8326c19d61c146d29b229ededb704804 - default default] HTTP exception thrown: Instance wlk-ubuntu-20.04-3 could not be found. 2023-03-17 09:06:10.496 14925 INFO nova.osapi_compute.wsgi.server [req-f9a44aff-0978-4fe8-be6a-7c9fb331f6f9 8cf563f1906a43fd9ffe0c3ef4cbb2cf 8326c19d61c146d29b229ededb704804 - default default] 192.168.8.213 "GET /v2.1/servers/wlk-ubuntu-20.04-3 HTTP/1.1" status: 404 len: 513 time: 1.0066791 2023-03-17 09:06:10.598 14925 INFO nova.osapi_compute.wsgi.server [req-afecaa68-03f2-4304-a72b-3053e2a41cee 8cf563f1906a43fd9ffe0c3ef4cbb2cf 8326c19d61c146d29b229ededb704804 - default default] 192.168.8.213 "GET /v2.1/servers?name=wlk-ubuntu-20.04-3 HTTP/1.1" status: 200 len: 700 time: 0.0988359 2023-03-17 09:06:10.938 14925 INFO nova.osapi_compute.wsgi.server [req-742f9b23-db89-46ed-95bc-303beda80ce6 8cf563f1906a43fd9ffe0c3ef4cbb2cf 8326c19d61c146d29b229ededb704804 - default default] 192.168.8.213 "GET /v2.1/servers/629397d2-1c46-4e22-844f-cc40cd1831bb HTTP/1.1" status: 200 len: 1989 time: 0.3370461 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi [req-09346754-8b4e-40c0-b824-acbaed9cec4c 8cf563f1906a43fd9ffe0c3ef4cbb2cf 8326c19d61c146d29b229ededb704804 - default default] Unexpected exception in API method: MessagingTimeout: Timed out waiting for a reply to message ID d7605c2e96d6495e802662b3fb267384 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi Traceback (most recent call last): 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/dist-packages/nova/api/openstack/wsgi.py", line 788, in wrapped 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi return f(*args, **kwargs) 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/dist-packages/nova/api/validation/__init__.py", line 108, in wrapper 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/dist-packages/nova/api/openstack/compute/remote_consoles.py", line 52, in get_vnc_console 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi console_type) 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 196, in wrapped 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi return function(self, context, instance, *args, **kwargs) 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 186, in inner 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi return f(self, context, instance, *args, **kw) 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 3737, in get_vnc_console 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi access_url=connect_info['access_url']) 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/dist-packages/nova/consoleauth/rpcapi.py", line 93, in authorize_console 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi return cctxt.call(ctxt, 'authorize_console', **msg_args) 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 174, in call 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi retry=self.retry) 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 131, in _send 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi timeout=timeout, retry=retry) 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in send 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi retry=retry) 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 548, in _send 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi result = self._waiter.wait(msg_id, timeout) 2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi
[Yahoo-eng-team] [Bug 2012873] Re: [nova][DOC] stackalytics links are not updated in openstack wiki
This is a wikipage, you can just update it directly. Closing this bug report as it's not needing to have a Gerrit change. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2012873 Title: [nova][DOC] stackalytics links are not updated in openstack wiki Status in OpenStack Compute (nova): Invalid Bug description: here https://wiki.openstack.org/wiki/Nova/CoreTeam below links should be updated Last 30 Days: https://stackalytics.com/report/contribution/nova/30 to https://www.stackalytics.io/report/contribution?module=nova-group_type=openstack=30 Last 90 days https://stackalytics.com/report/contribution/nova/90 to https://www.stackalytics.io/report/contribution?module=nova-group_type=openstack=90 Last 180 Days https://stackalytics.com/report/contribution/nova/180 to https://www.stackalytics.io/report/contribution?module=nova-group_type=openstack=180 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2012873/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2009263] Re: 'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted interval by 7230.58 sec
This looks to me that the servicegroup API wasn't able to query the DB to find the compute state value. For some reason, the conductor can't connect to the DB. Anyway, closing this one as this is not related to a project development bug. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2009263 Title: 'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted interval by 7230.58 sec Status in OpenStack Compute (nova): Invalid Bug description: nova-conductor service to access the DB service to delay 7230.58s The more information is: 2023-03-02 18:39:53.726 22 INFO nova.servicegroup.drivers.db [-] Recovered from being unable to report status. 2023-03-02 18:39:53.727 19 INFO nova.servicegroup.drivers.db [-] Recovered from being unable to report status. 2023-03-02 18:39:53.729 23 INFO nova.servicegroup.drivers.db [-] Recovered from being unable to report status. 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db [-] Unexpected error while reporting service status: oslo_db.exception.DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db Traceback (most recent call last): 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 812, in _checkout 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db raise exc.InvalidatePoolError() 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db sqlalchemy.exc.InvalidatePoolError: () 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db During handling of the above exception, another exception occurred: 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db Traceback (most recent call last): 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 2285, in _wrap_pool_connect 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db return fn() 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 363, in connect 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db return _ConnectionFairy._checkout(self) 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 842, in _checkout 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db fairy._connection_record._checkin_failed(err) 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 69, in __exit__ 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db exc_value, with_traceback=exc_tb, 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 178, in raise_ 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db raise exception 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 838, in _checkout 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db fairy._connection_record.get_connection() 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 606, in get_connection 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db self.__connect() 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 657, in __connect 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db pool.logger.debug("Error on connect(): %s", e) 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 69, in __exit__ 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db exc_value, with_traceback=exc_tb, 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 178, in raise_ 2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db raise exception 2023-03-02 18:39:53.732 20
[Yahoo-eng-team] [Bug 2003803] Re: Unexpected API error
It looks your environment is not able to call the Neutron API, hence the exception. Sorry, but this is not a nova bug report, hence me closing it. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2003803 Title: Unexpected API error Status in OpenStack Compute (nova): Invalid Bug description: While running masakari test this unexpected API error occurred, it says to open a bug 2023-01-20-21:01:31 keystoneauth.session DEBUG RESP: [500] Connection: close Content-Length: 224 Content-Type: application/json; charset=UTF-8 Date: Fri, 20 Jan 2023 21:01:30 GMT OpenStack-API-Version: compute 2.72 Server: Apache/2.4.41 (Ubuntu) Vary: OpenStack-API-Version,X-OpenStack-Nova-API-Version X-OpenStack-Nova-API-Version: 2.72 x-compute-request-id: req-7538e84d-39fb-4146-98f8-43d446b58398 x-openstack-request-id: req-7538e84d-39fb-4146-98f8-43d446b58398 2023-01-20-21:01:31 keystoneauth.session DEBUG RESP BODY: {"computeFault": {"code": 500, "message": "Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.\n"}} Logs https://oil-jenkins.canonical.com/artifacts/72b27e0d-2ef1-4b72-9d0c-b29407bfa746/generated/generated/openstack/juju-crashdump-openstack-2023-01-20-21.02.27.tar.gz To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2003803/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2011564] Re: launchpad link do not work in this page
Sorry Amit, you maybe missed the point. Those URLs are actually fake, those are just examples for explaining what you need to do when you create a Launchpad feature. For https://review.opendev.org/q/status:open+project:openstack/nova- specs+message:apiimpact that just means that we don't have any open changes having a commit message adding an APIImpact tag. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2011564 Title: launchpad link do not work in this page Status in OpenStack Compute (nova): Invalid Bug description: Description === launchpad links do not work from this/these pages: https://specs.openstack.org/openstack/nova-specs/specs/wallaby/template.html .. https://specs.openstack.org/openstack/nova-specs/specs/2023.1/template.html ex: https://blueprints.launchpad.net/nova/+spec/example https://blueprints.launchpad.net/nova/+spec/awesome-thing https://review.opendev.org/q/status:open+project:openstack/nova-specs+message:apiimpact To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2011564/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2007922] Re: Cleanup pending instances in "building" state
Well, I don't really know about the root cause and why the map_instances() wasn't adding the cell UUID directly when it created the record here first. Now we have a transaction like Mohamed said so it shouldn't be a problem. Closing this bug report now. ** Changed in: nova Status: Incomplete => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2007922 Title: Cleanup pending instances in "building" state Status in OpenStack Compute (nova): Invalid Bug description: Following up on the ML thread [1], it was recommended to create a bug report. After a network issue in a Victoria cluster (3 control nodes in HA mode, 26 compute nodes) some instance builds were interrupted. Some of them could be cleaned up with 'openstack server delete' but two of them can not. They already have a mapping but can not be removed (or "reset-state") by nova. Those are both amphora instances from octavia: control01:~ # openstack server list --project service -c ID -c Name -c Status -f value | grep BUILD 0453a7e5-e4f9-419b-ad71-d837a20ef6bb amphora-0ee32901-0c59-4752-8253-35b66da176ea BUILD dc8cdc3a-f6b2-469b-af6f-ba2aa130ea9b amphora-4990a47b-fe8a-431a-90ec-5ac2368a5251 BUILD control01:~ # openstack server delete amphora-0ee32901-0c59-4752-8253-35b66da176ea No server with a name or ID of 'amphora-0ee32901-0c59-4752-8253-35b66da176ea' exists. control01:~ # openstack server show 0453a7e5-e4f9-419b-ad71-d837a20ef6bb ERROR (CommandError): No server with a name or ID of '0453a7e5-e4f9-419b-ad71-d837a20ef6bb' exists. The database tables referring to the UUID 0453a7e5-e4f9-419b-ad71-d837a20ef6bb are these: nova_cell0/instance_id_mappings.ibd nova_cell0/instance_info_caches.ibd nova_cell0/instance_extra.ibd nova_cell0/instances.ibd nova_cell0/instance_system_metadata.ibd octavia/amphora.ibd nova_api/instance_mappings.ibd nova_api/request_specs.ibd I can provide both debug logs and database queries, just let me know what exactly is required. The storage back end is ceph (Pacific), we use neutron with OpenVSwitch, the exact nova versions are: control01:~ # rpm -qa | grep nova openstack-nova-conductor-22.2.2~dev15-lp152.1.25.noarch openstack-nova-api-22.2.2~dev15-lp152.1.25.noarch openstack-nova-novncproxy-22.2.2~dev15-lp152.1.25.noarch python3-novaclient-17.2.0-lp152.3.2.noarch openstack-nova-scheduler-22.2.2~dev15-lp152.1.25.noarch openstack-nova-22.2.2~dev15-lp152.1.25.noarch python3-nova-22.2.2~dev15-lp152.1.25.noarch [1] https://lists.openstack.org/pipermail/openstack- discuss/2023-February/032308.html To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2007922/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2006770] Re: server list with IP filter doesn't work as expected
Honestly, I don't know what to say here. When the query parameter was added, it was just for a convenient purpose for operators to prevent them to query Neutron first to get the list of ports but this was actually some kind of orchestration we try to avoid. Keeping in mind that an instance can be booted with a port that doesn't have L3 connectivity, I'm not super happy with fixing all of this while it's better to say 'please rather directly call Neutron to get the list of ports that match your IP and then ask Nova to give you the list of instances that have those ports bound to them'. I'd rather deprecate this IP address query param and provide a good api-ref documentation explaining what's the recommended way. As a sidenote, since IP substring filtering is a Neutron extension which is not provided for all clouds, we can't and shouldn't rely on it for getting answers. Putting the report to Opinion but we'll debate it in the next weeks. ** Changed in: nova Status: New => Opinion ** Changed in: nova Importance: Undecided => Low -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2006770 Title: server list with IP filter doesn't work as expected Status in OpenStack Compute (nova): Opinion Bug description: If a project has two servers with 10.10.10.10 and 10.10.10.109 IPs, the "curl -s 'https://nova:443/v2.1/servers?ip=10.10.10.10'" request returns two servers in a response. This happens because neutron API has an "ip-substring-filtering" extension turned on: $ curl -s "https://neutron/v2.0/extensions; -H "X-Auth-Token: ${OS_AUTH_TOKEN}" | jq -r '.extensions[]|select(.alias=="ip-substring-filtering")' { "name": "IP address substring filtering", "alias": "ip-substring-filtering", "description": "Provides IP address substring filtering when listing ports", "updated": "2017-11-28T09:00:00-00:00", "links": [] } And there is no possibility to filter IPs with an exact match like it's done with a "https://neutron/v2.0/ports?fixed_ips=ip_address%3D10.10.10.10; call. Another problem is that ip/ip6 fields are marked as regexp in both SCHEMA and CLI: https://github.com/openstack/nova/blob/49aa40394a4857a06191b05ea3b15913f328a8d0/nova/api/openstack/compute/schemas/servers.py#L638-L639 (values which are not regexp compatible are rejected on the early stage) $ openstack server list --help | grep -- --ip [--ip ] [--ip6 ] [--name ] --ip --ip6 But they are not considered as regexp afterwards. Moreover the https://github.com/openstack/nova/blob/a2964417822bd1a4a83fa5c27282d2be1e18868a/nova/compute/api.py#L3028-L3039 mapping doesn't work, because "fixed_ip" is never allowed in "search_opts" map. Changing "fixed_ip" key to an "ip" key (BTW, there is no "fixed_ip6" mapping, it also should be considered once someone decide to fix this issue) breaks substring filtering, because the filter finally becomes "'ip': '^10\\.10\\.10\\.10$'". Therefore if there is no "substring filtering" neutron extension, the regexp filter mappings must consider this (or even be removed). And the final call: there should be a way for a user to define whether user wants to use substr, exact match or regexp. See also: https://stackoverflow.com/questions/64549906/how-openstack- client-get-server-list-with-accurate-ip-address To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2006770/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2006467] Re: tempest ssh timeout due to udhcpc fails in the cirros guest
Okay, I did a bit of digging today for some other CI failure I saw on another change and eventually, I found this was related. So, lemme explain the issue here. First, I was looking at https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_6f9/868236/5/gate/nova- next/6f9f3d0/ and I was wondering why the SSH connection wasn't working. When I looked at the nova logs, I found that the instance was spawned at 18:18:56 : Feb 14 18:18:56.514945 np0033093378 nova-compute[83239]: INFO nova.compute.manager [None req-053318ab-09ad-4a3a-8ddb-633cc0002c3e tempest-AttachVolumeNegativeTest-1605485622 tempest-AttachVolumeNegativeTest-1605485622-project] [instance: 6a265379-ebfd-4aea-a081-8b271f32c0ea] Took 8.58 seconds to build instance. Then, Tempest tried to ssh the instance at 18:18:59 : 2023-02-14 18:22:39.102680 | controller | 2023-02-14 18:18:59,630 92653 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.161:22' as 'cirros' with public key authentication And eventually, 2mins32sec after that (18:22:31), it stopped : 2023-02-14 18:22:39.103394 | controller | 2023-02-14 18:22:31,398 92653 ERROR [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.161 after 16 attempts. Proxy client: no proxy client Then, I tried to look at the guest console, and I saw that udhcpc tried 3 times : 2023-02-14 18:22:39.129636 | controller | [ 12.638156] sr 0:0:0:0: Attached scsi generic sg0 type 5 [...] 2023-02-14 18:22:39.130384 | controller | Starting network: udhcpc: started, v1.29.3 2023-02-14 18:22:39.130415 | controller | udhcpc: sending discover 2023-02-14 18:22:39.130439 | controller | udhcpc: sending discover 2023-02-14 18:22:39.130461 | controller | udhcpc: sending discover So, I was wondering how long the DHCP discovery was done and eventually, I found that cirros dhcp client actually hangs for 1 min before requesting again. So, now I'm wondering why it takes so much time to get a DHCP address and why the 2nd DHCP call doesn't get the IP address. Adding Neutron team to this bug report because maybe we have something about our DHCP controller. ** Also affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2006467 Title: tempest ssh timeout due to udhcpc fails in the cirros guest Status in neutron: New Status in OpenStack Compute (nova): Confirmed Bug description: Tests trying to ssh into the guest fails intermittently with timeout as udhcpc fails in the guest: 2023-02-01 20:46:32.286979 | controller | Starting network: udhcpc: started, v1.29.3 2023-02-01 20:46:32.286987 | controller | udhcp 2023-02-01 20:46:32.286996 | controller | c: sending discover 2023-02-01 20:46:32.287004 | controller | udhcpc: sending discover 2023-02-01 20:46:32.287013 | controller | udhcpc: sending discover 2023-02-01 20:46:32.287022 | controller | Usage: /sbin/cirros-dhcpc 2023-02-01 20:46:32.287030 | controller | udhcpc: no lease, failing 2023-02-01 20:46:32.287039 | controller | FAIL Traceback (most recent call last): File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 70, in wrapper return f(*func_args, **func_kwargs) File "/opt/stack/tempest/tempest/api/compute/admin/test_volumes_negative.py", line 128, in test_multiattach_rw_volume_update_failure server1 = self.create_test_server( File "/opt/stack/tempest/tempest/api/compute/base.py", line 272, in create_test_server body, servers = compute.create_test_server( File "/opt/stack/tempest/tempest/common/compute.py", line 334, in create_test_server with excutils.save_and_reraise_exception(): File "/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/oslo_utils/excutils.py", line 227, in __exit__ self.force_reraise() File "/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/oslo_utils/excutils.py", line 200, in force_reraise raise self.value File "/opt/stack/tempest/tempest/common/compute.py", line 329, in create_test_server wait_for_ssh_or_ping( File "/opt/stack/tempest/tempest/common/compute.py", line 148, in wait_for_ssh_or_ping waiters.wait_for_ssh( File "/opt/stack/tempest/tempest/common/waiters.py", line 632, in wait_for_ssh raise lib_exc.TimeoutException() tempest.lib.exceptions.TimeoutException: Request timed out Details: None Example failure https://zuul.opendev.org/t/openstack/build/f1c6b7e54b28415c952de0be833731a9/logs Signature $ logsearch log --job-group nova-devstack --result FAILURE 'udhcpc: no lease, failing' --days 7 [snip] Builds with matching logs 6/138:
[Yahoo-eng-team] [Bug 2002951] Re: OOM kills python / mysqld in various nova devstack jobs
Moving the Nova status of the bug to Fix Released as https://review.opendev.org/c/openstack/tempest/+/871000 fixed the root cause for the failing nova jobs. ** Changed in: nova Status: Confirmed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/2002951 Title: OOM kills python / mysqld in various nova devstack jobs Status in Glance: New Status in OpenStack Compute (nova): Fix Released Status in tempest: Confirmed Bug description: The following tests exited without returning a status and likely segfaulted or crashed Python: * tempest.api.compute.admin.test_volume.AttachSCSIVolumeTestJSON.test_attach_scsi_disk_with_config_drive[id-777e468f-17ca-4da4-b93d-b7dbf56c0494] And in the syslog: https://zuul.opendev.org/t/openstack/build/f5aa5edd4d354c2685fc1f3e13d0ef77/log/controller/logs/syslog.txt#3688 Jan 13 22:31:13 np0032729364 kernel: Out of memory: Killed process 114509 (python) total-vm:4966188kB, anon-rss:3914748kB, file- rss:5080kB, shmem-rss:0kB, UID:1002 pgtables:9764kB oom_score_adj:0 Example run: https://zuul.opendev.org/t/openstack/build/f5aa5edd4d354c2685fc1f3e13d0ef77 I see this happening in multiple jobs in the last 10 days: * nova-ceph-multistore 14x * nova-multi-cell 1x * nova-next 1x $ logsearch log --result FAILURE --project openstack/nova --branch master --file controller/logs/syslog.txt 'kernel: Out of memory: Killed process' --days 10 [..snip..] Searching logs: ece0cf2ce71c4a8790a0a36529dd0a8e:/home/gibi/.cache/logsearch/ece0cf2ce71c4a8790a0a36529dd0a8e/controller/logs/syslog.txt:3774:Jan 14 22:57:33 np0032733292 kernel: Out of memory: Killed process 115024 (python) total-vm:4981004kB, anon-rss:3904068kB, file-rss:5320kB, shmem-rss:0kB, UID:1002 pgtables:9376kB oom_score_adj:0 f5aa5edd4d354c2685fc1f3e13d0ef77:/home/gibi/.cache/logsearch/f5aa5edd4d354c2685fc1f3e13d0ef77/controller/logs/syslog.txt:3688:Jan 13 22:31:13 np0032729364 kernel: Out of memory: Killed process 114509 (python) total-vm:4966188kB, anon-rss:3914748kB, file-rss:5080kB, shmem-rss:0kB, UID:1002 pgtables:9764kB oom_score_adj:0 1447c6274e924e068578ca260c9ac2a6:/home/gibi/.cache/logsearch/1447c6274e924e068578ca260c9ac2a6/controller/logs/syslog.txt:3824:Jan 13 21:34:13 np0032729237 kernel: Out of memory: Killed process 114489 (python) total-vm:4975072kB, anon-rss:3954804kB, file-rss:5312kB, shmem-rss:0kB, UID:1002 pgtables:9400kB oom_score_adj:0 446a5a73b22d432295820e5b8083a2f9:/home/gibi/.cache/logsearch/446a5a73b22d432295820e5b8083a2f9/controller/logs/syslog.txt:5103:Jan 13 10:04:25 np0032720733 kernel: Out of memory: Killed process 48920 (mysqld) total-vm:5233384kB, anon-rss:300872kB, file-rss:0kB, shmem- rss:0kB, UID:116 pgtables:2652kB oom_score_adj:0 fae1fbe258134dd8ba060cb743707247:/home/gibi/.cache/logsearch/fae1fbe258134dd8ba060cb743707247/controller/logs/syslog.txt:6686:Jan 13 09:44:04 np0032720410 kernel: Out of memory: Killed process 47404 (mysqld) total-vm:5208828kB, anon-rss:278080kB, file-rss:0kB, shmem- rss:0kB, UID:116 pgtables:2572kB oom_score_adj:0 1bbcaa703b7d42c7a266fde3a6acca65:/home/gibi/.cache/logsearch/1bbcaa703b7d42c7a266fde3a6acca65/controller/logs/syslog.txt:3717:Jan 13 03:41:39 np0032719591 kernel: Out of memory: Killed process 114777 (python) total-vm:4954352kB, anon-rss:4001500kB, file-rss:5124kB, shmem-rss:0kB, UID:1002 pgtables:9416kB oom_score_adj:0 7d9ca42edc5e4bdeb17be8e8045c6468:/home/gibi/.cache/logsearch/7d9ca42edc5e4bdeb17be8e8045c6468/controller/logs/syslog.txt:3828:Jan 12 22:06:40 np0032716841 kernel: Out of memory: Killed process 114731 (python) total-vm:4964792kB, anon-rss:4055532kB, file-rss:5072kB, shmem-rss:0kB, UID:1002 pgtables:9212kB oom_score_adj:0 bcb7bc3b478586906c31c6558b13:/home/gibi/.cache/logsearch/bcb7bc3b478586906c31c6558b13/controller/logs/syslog.txt:3769:Jan 12 20:17:35 np0032714959 kernel: Out of memory: Killed process 114973 (python) total-vm:4971976kB, anon-rss:3855572kB, file-rss:5356kB, shmem-rss:0kB, UID:1002 pgtables:9696kB oom_score_adj:0 7572c2bf5e6547c0a1fc6b0f180a2e1f:/home/gibi/.cache/logsearch/7572c2bf5e6547c0a1fc6b0f180a2e1f/controller/logs/syslog.txt:3805:Jan 12 17:44:16 ubuntu-focal-ovh-gra1-0032713996 kernel: Out of memory: Killed process 114616 (python) total-vm:4974804kB, anon-rss:3949084kB, file-rss:5176kB, shmem-rss:0kB, UID:1002 pgtables:9604kB oom_score_adj:0 aa5cf699f8d04995b43d009e55a1accd:/home/gibi/.cache/logsearch/aa5cf699f8d04995b43d009e55a1accd/controller/logs/syslog.txt:3796:Jan 12 16:23:26 ubuntu-focal-inmotion-iad3-0032713625 kernel: Out of memory: Killed process 114640 (python) total-vm:4964156kB, anon- rss:4310768kB, file-rss:5340kB, shmem-rss:0kB, UID:1002 pgtables:9628kB oom_score_adj:0
[Yahoo-eng-team] [Bug 2004641] Re: ImageLocationsTest.test_replace_location fails intermittently
** Also affects: glance Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/2004641 Title: ImageLocationsTest.test_replace_location fails intermittently Status in Glance: New Status in OpenStack Compute (nova): Confirmed Status in tempest: New Bug description: Saw a new gate failure happening a couple of times : https://opensearch.logs.openstack.org/_dashboards/app/discover?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-7d,to:now))&_a=(columns:!(filename),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:filename,negate:!f,params:(query:job- output.txt),type:phrase),query:(match_phrase:(filename:job- output.txt,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',interval:auto,query:(language:kuery,query:test_replace_location),sort:!()) Example of a failed run : 2023-02-02 22:20:18.197006 | controller | == 2023-02-02 22:20:18.197030 | controller | Failed 1 tests - output below: 2023-02-02 22:20:18.197050 | controller | == 2023-02-02 22:20:18.197071 | controller | 2023-02-02 22:20:18.197095 | controller | tempest.api.image.v2.test_images.ImageLocationsTest.test_replace_location[id-bf6e0009-c039-4884-b498-db074caadb10] 2023-02-02 22:20:18.197115 | controller | -- 2023-02-02 22:20:18.197134 | controller | 2023-02-02 22:20:18.197152 | controller | Captured traceback: 2023-02-02 22:20:18.197171 | controller | ~~~ 2023-02-02 22:20:18.197190 | controller | Traceback (most recent call last): 2023-02-02 22:20:18.197212 | controller | 2023-02-02 22:20:18.197234 | controller | File "/opt/stack/tempest/tempest/api/image/v2/test_images.py", line 875, in test_replace_location 2023-02-02 22:20:18.197254 | controller | image = self._check_set_multiple_locations() 2023-02-02 22:20:18.197273 | controller | 2023-02-02 22:20:18.197292 | controller | File "/opt/stack/tempest/tempest/api/image/v2/test_images.py", line 847, in _check_set_multiple_locations 2023-02-02 22:20:18.197311 | controller | image = self._check_set_location() 2023-02-02 22:20:18.197329 | controller | 2023-02-02 22:20:18.197351 | controller | File "/opt/stack/tempest/tempest/api/image/v2/test_images.py", line 820, in _check_set_location 2023-02-02 22:20:18.197372 | controller | self.client.update_image(image['id'], [ 2023-02-02 22:20:18.197391 | controller | 2023-02-02 22:20:18.197410 | controller | File "/opt/stack/tempest/tempest/lib/services/image/v2/images_client.py", line 40, in update_image 2023-02-02 22:20:18.197429 | controller | resp, body = self.patch('images/%s' % image_id, data, headers) 2023-02-02 22:20:18.197447 | controller | 2023-02-02 22:20:18.197465 | controller | File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 346, in patch 2023-02-02 22:20:18.197490 | controller | return self.request('PATCH', url, extra_headers, headers, body) 2023-02-02 22:20:18.197513 | controller | 2023-02-02 22:20:18.197533 | controller | File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 720, in request 2023-02-02 22:20:18.197552 | controller | self._error_checker(resp, resp_body) 2023-02-02 22:20:18.197571 | controller | 2023-02-02 22:20:18.197590 | controller | File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 831, in _error_checker 2023-02-02 22:20:18.197612 | controller | raise exceptions.BadRequest(resp_body, resp=resp) 2023-02-02 22:20:18.197633 | controller | 2023-02-02 22:20:18.197655 | controller | tempest.lib.exceptions.BadRequest: Bad request 2023-02-02 22:20:18.197674 | controller | Details: b'400 Bad Request\n\nThe Store URI was malformed.\n\n ' 2023-02-02 22:20:18.197692 | controller | 2023-02-02 22:20:18.197711 | controller | 2023-02-02 22:20:18.197729 | controller | Captured pythonlogging: 2023-02-02 22:20:18.197748 | controller | ~~~ 2023-02-02 22:20:18.197774 | controller | 2023-02-02 22:01:06,773 114933 INFO [tempest.lib.common.rest_client] Request (ImageLocationsTest:test_replace_location): 201 POST https://10.210.193.38/image/v2/images 1.036s 2023-02-02 22:20:18.197798 | controller | 2023-02-02 22:01:06,774 114933 DEBUG[tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': ''} 2023-02-02 22:20:18.198218 | controller | Body: {"container_format": "bare", "disk_format": "raw"} 2023-02-02 22:20:18.198250 | controller | Response -
[Yahoo-eng-team] [Bug 2004641] [NEW] ImageLocationsTest.test_replace_location fails intermittently
Public bug reported: Saw a new gate failure happening a couple of times : https://opensearch.logs.openstack.org/_dashboards/app/discover?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-7d,to:now))&_a=(columns:!(filename),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:filename,negate:!f,params:(query:job- output.txt),type:phrase),query:(match_phrase:(filename:job- output.txt,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',interval:auto,query:(language:kuery,query:test_replace_location),sort:!()) Example of a failed run : 2023-02-02 22:20:18.197006 | controller | == 2023-02-02 22:20:18.197030 | controller | Failed 1 tests - output below: 2023-02-02 22:20:18.197050 | controller | == 2023-02-02 22:20:18.197071 | controller | 2023-02-02 22:20:18.197095 | controller | tempest.api.image.v2.test_images.ImageLocationsTest.test_replace_location[id-bf6e0009-c039-4884-b498-db074caadb10] 2023-02-02 22:20:18.197115 | controller | -- 2023-02-02 22:20:18.197134 | controller | 2023-02-02 22:20:18.197152 | controller | Captured traceback: 2023-02-02 22:20:18.197171 | controller | ~~~ 2023-02-02 22:20:18.197190 | controller | Traceback (most recent call last): 2023-02-02 22:20:18.197212 | controller | 2023-02-02 22:20:18.197234 | controller | File "/opt/stack/tempest/tempest/api/image/v2/test_images.py", line 875, in test_replace_location 2023-02-02 22:20:18.197254 | controller | image = self._check_set_multiple_locations() 2023-02-02 22:20:18.197273 | controller | 2023-02-02 22:20:18.197292 | controller | File "/opt/stack/tempest/tempest/api/image/v2/test_images.py", line 847, in _check_set_multiple_locations 2023-02-02 22:20:18.197311 | controller | image = self._check_set_location() 2023-02-02 22:20:18.197329 | controller | 2023-02-02 22:20:18.197351 | controller | File "/opt/stack/tempest/tempest/api/image/v2/test_images.py", line 820, in _check_set_location 2023-02-02 22:20:18.197372 | controller | self.client.update_image(image['id'], [ 2023-02-02 22:20:18.197391 | controller | 2023-02-02 22:20:18.197410 | controller | File "/opt/stack/tempest/tempest/lib/services/image/v2/images_client.py", line 40, in update_image 2023-02-02 22:20:18.197429 | controller | resp, body = self.patch('images/%s' % image_id, data, headers) 2023-02-02 22:20:18.197447 | controller | 2023-02-02 22:20:18.197465 | controller | File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 346, in patch 2023-02-02 22:20:18.197490 | controller | return self.request('PATCH', url, extra_headers, headers, body) 2023-02-02 22:20:18.197513 | controller | 2023-02-02 22:20:18.197533 | controller | File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 720, in request 2023-02-02 22:20:18.197552 | controller | self._error_checker(resp, resp_body) 2023-02-02 22:20:18.197571 | controller | 2023-02-02 22:20:18.197590 | controller | File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 831, in _error_checker 2023-02-02 22:20:18.197612 | controller | raise exceptions.BadRequest(resp_body, resp=resp) 2023-02-02 22:20:18.197633 | controller | 2023-02-02 22:20:18.197655 | controller | tempest.lib.exceptions.BadRequest: Bad request 2023-02-02 22:20:18.197674 | controller | Details: b'400 Bad Request\n\nThe Store URI was malformed.\n\n ' 2023-02-02 22:20:18.197692 | controller | 2023-02-02 22:20:18.197711 | controller | 2023-02-02 22:20:18.197729 | controller | Captured pythonlogging: 2023-02-02 22:20:18.197748 | controller | ~~~ 2023-02-02 22:20:18.197774 | controller | 2023-02-02 22:01:06,773 114933 INFO [tempest.lib.common.rest_client] Request (ImageLocationsTest:test_replace_location): 201 POST https://10.210.193.38/image/v2/images 1.036s 2023-02-02 22:20:18.197798 | controller | 2023-02-02 22:01:06,774 114933 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': ''} 2023-02-02 22:20:18.198218 | controller | Body: {"container_format": "bare", "disk_format": "raw"} 2023-02-02 22:20:18.198250 | controller | Response - Headers: {'date': 'Thu, 02 Feb 2023 22:01:06 GMT', 'server': 'Apache/2.4.41 (Ubuntu)', 'content-length': '626', 'content-type': 'application/json', 'location': 'http://10.210.193.38:19292/v2/images/36bc7732-dfbd-4d63-871d-ff84b0be764e', 'openstack-image-import-methods': 'glance-direct,web-download,copy-image', 'openstack-image-store-ids': 'cheap,robust,web,os_glance_staging_store,os_glance_tasks_store', 'x-openstack-request-id': 'req-f0d0376e-9e9a-4e82-a528-643f1912004c', 'connection':
[Yahoo-eng-team] [Bug 1996188] Re: [OSSA-2023-002] Arbitrary file access through custom VMDK flat descriptor (CVE-2022-47951)
https://review.opendev.org/c/openstack/nova/+/871612 is now merged, putting the bug report to Fix Released. ** Changed in: nova Importance: Undecided => Critical ** Changed in: nova Status: New => Confirmed ** Changed in: nova Status: Confirmed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1996188 Title: [OSSA-2023-002] Arbitrary file access through custom VMDK flat descriptor (CVE-2022-47951) Status in Cinder: In Progress Status in Glance: Fix Released Status in OpenStack Compute (nova): Fix Released Status in OpenStack Security Advisory: Fix Released Bug description: The vulnerability managers received the following report from Sébastien Meriot with OVH via encrypted E-mail: Our Openstack team did discover what looks like a security issue in Nova this morning allowing a remote attacker to read any file on the system. After making a quick CVSS calculation, we got a CVSS of 5.8 (CVSS:3.0/AV:N/AC:H/PR:L/UI:R/S:C/C:H/I:N/A:N). Here is the details : By using a VMDK file, you can dump any file on the hypervisor. 1. Create an image: qemu-img create -f vmdk leak.vmdk 1M -o subformat=monolithicFlat 2. Edit the leak.vmdk and change the name this way: RW 2048 FLAT "leak-flat.vmdk" 0 --> RW 2048 FLAT "/etc/nova/nova.conf" 0 3. Upload the image: openstack image create --file leak.vmdk leak.vmdk 4. Start a new instance: openstack server create --image leak.vmdk --net demo --flavor nano leak-instance 5. The instance won't boot of course. You can create an image from this instance: openstack server image create --name leak-instance-image leak-instance 6. Download the image: openstack image save --file leak-instance-image leak-instance-image 7. You get access to the nova.conf file content and you can get access to the openstack admin creds. We are working on a fix and would be happy to share it with you if needed. We think it does affect Nova but it could affect Glance as well. We're not sure yet. [postscript per Arnaud Morin (amorin) in IRC] cinder seems also affected To manage notifications about this bug go to: https://bugs.launchpad.net/cinder/+bug/1996188/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2002951] Re: OOM kills python / mysqld in various nova devstack jobs
FWIW, I created another change that was running this test *earlier*, and it worked : https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_362/870924/2/check/nova- ceph-multistore/3626391/testr_results.html That being said, this test tooked more than 181secs so I created a new revision for knowing how it takes for creating the cached image and how large this cached image is using the memory : https://review.opendev.org/c/openstack/tempest/+/870913/2/tempest/api/compute/admin/test_aaa_volume.py#90 Still waiting the results but here I think we need to modify this test to maybe not caching this way if we can, or maybe to be run differently. ** Also affects: tempest Importance: Undecided Status: New ** Also affects: glance Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2002951 Title: OOM kills python / mysqld in various nova devstack jobs Status in Glance: New Status in OpenStack Compute (nova): Confirmed Status in tempest: New Bug description: The following tests exited without returning a status and likely segfaulted or crashed Python: * tempest.api.compute.admin.test_volume.AttachSCSIVolumeTestJSON.test_attach_scsi_disk_with_config_drive[id-777e468f-17ca-4da4-b93d-b7dbf56c0494] And in the syslog: https://zuul.opendev.org/t/openstack/build/f5aa5edd4d354c2685fc1f3e13d0ef77/log/controller/logs/syslog.txt#3688 Jan 13 22:31:13 np0032729364 kernel: Out of memory: Killed process 114509 (python) total-vm:4966188kB, anon-rss:3914748kB, file- rss:5080kB, shmem-rss:0kB, UID:1002 pgtables:9764kB oom_score_adj:0 Example run: https://zuul.opendev.org/t/openstack/build/f5aa5edd4d354c2685fc1f3e13d0ef77 I see this happening in multiple jobs in the last 10 days: * nova-ceph-multistore 14x * nova-multi-cell 1x * nova-next 1x $ logsearch log --result FAILURE --project openstack/nova --branch master --file controller/logs/syslog.txt 'kernel: Out of memory: Killed process' --days 10 [..snip..] Searching logs: ece0cf2ce71c4a8790a0a36529dd0a8e:/home/gibi/.cache/logsearch/ece0cf2ce71c4a8790a0a36529dd0a8e/controller/logs/syslog.txt:3774:Jan 14 22:57:33 np0032733292 kernel: Out of memory: Killed process 115024 (python) total-vm:4981004kB, anon-rss:3904068kB, file-rss:5320kB, shmem-rss:0kB, UID:1002 pgtables:9376kB oom_score_adj:0 f5aa5edd4d354c2685fc1f3e13d0ef77:/home/gibi/.cache/logsearch/f5aa5edd4d354c2685fc1f3e13d0ef77/controller/logs/syslog.txt:3688:Jan 13 22:31:13 np0032729364 kernel: Out of memory: Killed process 114509 (python) total-vm:4966188kB, anon-rss:3914748kB, file-rss:5080kB, shmem-rss:0kB, UID:1002 pgtables:9764kB oom_score_adj:0 1447c6274e924e068578ca260c9ac2a6:/home/gibi/.cache/logsearch/1447c6274e924e068578ca260c9ac2a6/controller/logs/syslog.txt:3824:Jan 13 21:34:13 np0032729237 kernel: Out of memory: Killed process 114489 (python) total-vm:4975072kB, anon-rss:3954804kB, file-rss:5312kB, shmem-rss:0kB, UID:1002 pgtables:9400kB oom_score_adj:0 446a5a73b22d432295820e5b8083a2f9:/home/gibi/.cache/logsearch/446a5a73b22d432295820e5b8083a2f9/controller/logs/syslog.txt:5103:Jan 13 10:04:25 np0032720733 kernel: Out of memory: Killed process 48920 (mysqld) total-vm:5233384kB, anon-rss:300872kB, file-rss:0kB, shmem- rss:0kB, UID:116 pgtables:2652kB oom_score_adj:0 fae1fbe258134dd8ba060cb743707247:/home/gibi/.cache/logsearch/fae1fbe258134dd8ba060cb743707247/controller/logs/syslog.txt:6686:Jan 13 09:44:04 np0032720410 kernel: Out of memory: Killed process 47404 (mysqld) total-vm:5208828kB, anon-rss:278080kB, file-rss:0kB, shmem- rss:0kB, UID:116 pgtables:2572kB oom_score_adj:0 1bbcaa703b7d42c7a266fde3a6acca65:/home/gibi/.cache/logsearch/1bbcaa703b7d42c7a266fde3a6acca65/controller/logs/syslog.txt:3717:Jan 13 03:41:39 np0032719591 kernel: Out of memory: Killed process 114777 (python) total-vm:4954352kB, anon-rss:4001500kB, file-rss:5124kB, shmem-rss:0kB, UID:1002 pgtables:9416kB oom_score_adj:0 7d9ca42edc5e4bdeb17be8e8045c6468:/home/gibi/.cache/logsearch/7d9ca42edc5e4bdeb17be8e8045c6468/controller/logs/syslog.txt:3828:Jan 12 22:06:40 np0032716841 kernel: Out of memory: Killed process 114731 (python) total-vm:4964792kB, anon-rss:4055532kB, file-rss:5072kB, shmem-rss:0kB, UID:1002 pgtables:9212kB oom_score_adj:0 bcb7bc3b478586906c31c6558b13:/home/gibi/.cache/logsearch/bcb7bc3b478586906c31c6558b13/controller/logs/syslog.txt:3769:Jan 12 20:17:35 np0032714959 kernel: Out of memory: Killed process 114973 (python) total-vm:4971976kB, anon-rss:3855572kB, file-rss:5356kB, shmem-rss:0kB, UID:1002 pgtables:9696kB oom_score_adj:0 7572c2bf5e6547c0a1fc6b0f180a2e1f:/home/gibi/.cache/logsearch/7572c2bf5e6547c0a1fc6b0f180a2e1f/controller/logs/syslog.txt:3805:Jan 12
[Yahoo-eng-team] [Bug 2002068] Re: Can not handle authentication request for 2 credentials
Looks like nova-compute service is unable to talk to the libvirt API. Definitely a config issue, closing this bug. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2002068 Title: Can not handle authentication request for 2 credentials Status in OpenStack Compute (nova): Invalid Bug description: My python environment: python 3.8.10 and use venv to run. When I run "nova-compute" I got the error message like that. Am I forget some thing? nova.exception.InternalError: Can not handle authentication request for 2 credentials >>> FULL 2023-01-06 11:26:33.694 388587 WARNING oslo_messaging.rpc.client [None req-6e6d4628-a393-4d04-8958-dbfcfda36c25 - - - - - -] Using RPCClient manually to instantiate client. Please use get_rpc_client to obtain an RPC client instance. 2023-01-06 11:26:33.695 388587 WARNING oslo_messaging.rpc.client [None req-6e6d4628-a393-4d04-8958-dbfcfda36c25 - - - - - -] Using RPCClient manually to instantiate client. Please use get_rpc_client to obtain an RPC client instance. 2023-01-06 11:26:33.695 388587 WARNING oslo_messaging.rpc.client [None req-6e6d4628-a393-4d04-8958-dbfcfda36c25 - - - - - -] Using RPCClient manually to instantiate client. Please use get_rpc_client to obtain an RPC client instance. 2023-01-06 11:26:33.696 388587 INFO nova.virt.driver [None req-6e6d4628-a393-4d04-8958-dbfcfda36c25 - - - - - -] Loading compute driver 'libvirt.LibvirtDriver' 2023-01-06 11:26:33.778 388587 INFO nova.compute.provider_config [None req-6e6d4628-a393-4d04-8958-dbfcfda36c25 - - - - - -] No provider configs found in /etc/nova/provider_config/. If files are present, ensure the Nova process has access. 2023-01-06 11:26:33.799 388587 WARNING oslo_config.cfg [None req-6e6d4628-a393-4d04-8958-dbfcfda36c25 - - - - - -] Deprecated: Option "api_servers" from group "glance" is deprecated for removal ( Support for image service configuration via standard keystoneauth1 Adapter options was added in the 17.0.0 Queens release. The api_servers option was retained temporarily to allow consumers time to cut over to a real load balancing solution. ). Its value may be silently ignored in the future. 2023-01-06 11:26:33.815 388587 INFO nova.service [-] Starting compute node (version 26.1.0) 2023-01-06 11:26:33.835 388587 CRITICAL nova [-] Unhandled error: nova.exception.InternalError: Can not handle authentication request for 2 credentials 2023-01-06 11:26:33.835 388587 ERROR nova Traceback (most recent call last): 2023-01-06 11:26:33.835 388587 ERROR nova File "/opt/nova/venv/lib/python3.8/site-packages/nova/virt/libvirt/host.py", line 338, in _connect_auth_cb 2023-01-06 11:26:33.835 388587 ERROR nova raise exception.InternalError( 2023-01-06 11:26:33.835 388587 ERROR nova nova.exception.InternalError: Can not handle authentication request for 2 credentials 2023-01-06 11:26:33.835 388587 ERROR nova 2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host [-] Connection to libvirt failed: authentication failed: Failed to collect auth credentials: libvirt.libvirtError: authentication failed: Failed to collect auth credentials 2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host Traceback (most recent call last): 2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host File "/opt/nova/venv/lib/python3.8/site-packages/nova/virt/libvirt/host.py", line 588, in get_connection 2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host conn = self._get_connection() 2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host File "/opt/nova/venv/lib/python3.8/site-packages/nova/virt/libvirt/host.py", line 568, in _get_connection 2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host self._queue_conn_event_handler( 2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host File "/opt/nova/venv/lib/python3.8/site-packages/oslo_utils/excutils.py", line 227, in __exit__ 2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host self.force_reraise() 2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host File "/opt/nova/venv/lib/python3.8/site-packages/oslo_utils/excutils.py", line 200, in force_reraise 2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host raise self.value 2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host File "/opt/nova/venv/lib/python3.8/site-packages/nova/virt/libvirt/host.py", line 560, in _get_connection 2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host self._wrapped_conn = self._get_new_connection() 2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host File "/opt/nova/venv/lib/python3.8/site-packages/nova/virt/libvirt/host.py", line 504, in _get_new_connection 2023-01-06 11:26:33.840 388587 ERROR
[Yahoo-eng-team] [Bug 1996214] [NEW] rfe make evacuate only defining the instance and not start it
Public bug reported: We agreed during the Nova 2023.1 Antelope PTG to modify the behaviour of evacuate which would *not* start the instance. This requires an API microversion. ** Affects: nova Importance: Wishlist Status: Triaged ** Tags: low-hanging-fruit rfe -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1996214 Title: rfe make evacuate only defining the instance and not start it Status in OpenStack Compute (nova): Triaged Bug description: We agreed during the Nova 2023.1 Antelope PTG to modify the behaviour of evacuate which would *not* start the instance. This requires an API microversion. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1996214/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1996213] [NEW] [rfe] modify our usage of privsep in nova
Public bug reported: Nova compute services use the privsep library [1] for specific 'root' privilege usage for a command or a direct call to the system. Unfortunately, our current usage we do from this library is not really a good recommendation : instead of using a sysadmin context that uses *all* privileged caps for any caller we have [2], we should rather define a per-call context with specific caps. [1] https://docs.openstack.org/oslo.privsep/latest/user/index.html [2] https://github.com/openstack/nova/blob/c97507dfcd57cce9d76670d3b0d48538900c00e9/nova/privsep/__init__.py#L21-L31 ** Affects: nova Importance: Wishlist Status: Triaged ** Tags: low-hanging-fruit rfe -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1996213 Title: [rfe] modify our usage of privsep in nova Status in OpenStack Compute (nova): Triaged Bug description: Nova compute services use the privsep library [1] for specific 'root' privilege usage for a command or a direct call to the system. Unfortunately, our current usage we do from this library is not really a good recommendation : instead of using a sysadmin context that uses *all* privileged caps for any caller we have [2], we should rather define a per-call context with specific caps. [1] https://docs.openstack.org/oslo.privsep/latest/user/index.html [2] https://github.com/openstack/nova/blob/c97507dfcd57cce9d76670d3b0d48538900c00e9/nova/privsep/__init__.py#L21-L31 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1996213/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1996210] [NEW] RFE use openstack sdk to interact with cinder
Public bug reported: This is a tracking rfe bug to enable the use of the OpenStack SDK when calling cinder. This is in aid of allowing cinder to deprecate and eventually remove cinder client in a future release by removing nova dependency on it. ** Affects: nova Importance: Wishlist Status: Triaged ** Tags: low-hanging-fruit rfe -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1996210 Title: RFE use openstack sdk to interact with cinder Status in OpenStack Compute (nova): Triaged Bug description: This is a tracking rfe bug to enable the use of the OpenStack SDK when calling cinder. This is in aid of allowing cinder to deprecate and eventually remove cinder client in a future release by removing nova dependency on it. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1996210/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1990809] Re: multinode setup, devstack scheduler fails to start after controller restart
Looks to me not a Nova issue, maybe just a devstack issue or a configuration problem. Moving it then to devstack. ** Also affects: devstack Importance: Undecided Status: New ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1990809 Title: multinode setup, devstack scheduler fails to start after controller restart Status in devstack: New Status in OpenStack Compute (nova): Invalid Bug description: In multinode devstack setup nova scheduler fails to start after reboot Steps to reproduce == 1 - deploy multinode devstack https://docs.openstack.org/devstack/latest/guides/multinode-lab.html 2 - Verify all compute nodes are listed and setup is working as expected $ openstack compute service list create vm, assign floating IP and access VM 3 - Restart compute nodes, and controller node $ sudo init 6 4 - Once controller and all other nodes are rebooted, check whether all nova services are running $ openstack compute service list $ sudo systemctl status devstack@n-* Expected result === $ sudo systemctl status devstack@n-* All services should be running $ openstack compute service list openstack cmds should run without a issue, Actual result = nova-schduler fails to start with error: Sep 26 04:59:14 multinodesetupcontroller nova-scheduler[926]: ERROR nova self._init_plugins(extensions) Sep 26 04:59:14 multinodesetupcontroller nova-scheduler[926]: ERROR nova File "/usr/local/lib/python3.8/dist-packages/stevedore/driver.py", line 113, in _init_plugins Sep 26 04:59:14 multinodesetupcontroller nova-scheduler[926]: ERROR nova raise NoMatches('No %r driver found, looking for %r' % Sep 26 04:59:14 multinodesetupcontroller nova-scheduler[926]: ERROR nova stevedore.exception.NoMatches: No 'nova.scheduler.driver' driver found, looking for 'filter_scheduler' Sep 26 04:59:14 multinodesetupcontroller nova-scheduler[926]: ERROR nova Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: INFO oslo_service.periodic_task [-] Skipping periodic task _discover_hosts_in_cells because its interval is negative Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: WARNING stevedore.named [-] Could not load filter_scheduler Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: CRITICAL nova [-] Unhandled error: stevedore.exception.NoMatches: No 'nova.scheduler.driver' driver found, looking for 'filter_scheduler' Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova Traceback (most recent call last): Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova File "/usr/local/bin/nova-scheduler", line 10, in Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova sys.exit(main()) Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova File "/opt/stack/nova/nova/cmd/scheduler.py", line 47, in main Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova server = service.Service.create(binary='nova-scheduler', Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova File "/opt/stack/nova/nova/service.py", line 252, in create Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova service_obj = cls(host, binary, topic, manager, Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova File "/opt/stack/nova/nova/service.py", line 116, in __init__ Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova self.manager = manager_class(host=self.host, *args, **kwargs) Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova File "/opt/stack/nova/nova/scheduler/manager.py", line 60, in __init__ Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova self.driver = driver.DriverManager( Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova File "/usr/local/lib/python3.8/dist-packages/stevedore/driver.py", line 54, in __init__ Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova super(DriverManager, self).__init__( Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova File "/usr/local/lib/python3.8/dist-packages/stevedore/named.py", line 89, in __init__ Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova self._init_plugins(extensions) Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova File "/usr/local/lib/python3.8/dist-packages/stevedore/driver.py", line 113, in _init_plugins Sep 26 05:09:16
[Yahoo-eng-team] [Bug 1990121] Re: Nova 26 needs to depend on os-traits >= 2.9.0
Master patch : https://review.opendev.org/c/openstack/nova/+/858236 ** Also affects: nova/zed Importance: Critical Status: In Progress ** Changed in: nova/zed Importance: Critical => High ** Changed in: nova/zed Assignee: (unassigned) => Thomas Goirand (thomas-goirand) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1990121 Title: Nova 26 needs to depend on os-traits >= 2.9.0 Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) zed series: In Progress Bug description: Without the latest os-traits, we get unit test failures like below. == FAIL: nova.tests.unit.compute.test_pci_placement_translator.TestTranslator.test_trait_normalization_09 nova.tests.unit.compute.test_pci_placement_translator.TestTranslator.test_trait_normalization_09 -- testtools.testresult.real._StringException: pythonlogging:'': {{{ 2022-09-17 10:46:54,848 WARNING [oslo_policy.policy] JSON formatted policy_file support is deprecated since Victoria release. You need to use YAML format which will be default in future. You can use ``oslopolicy-convert-json-to-yaml`` tool to convert existing JSON-formatted policy file to YAML-formatted in backward compatible way: https://docs.openstack.org/oslo.policy/latest/cli/oslopolicy-convert-json-to-yaml.html. 2022-09-17 10:46:54,849 WARNING [oslo_policy.policy] JSON formatted policy_file support is deprecated since Victoria release. You need to use YAML format which will be default in future. You can use ``oslopolicy-convert-json-to-yaml`` tool to convert existing JSON-formatted policy file to YAML-formatted in backward compatible way: https://docs.openstack.org/oslo.policy/latest/cli/oslopolicy-convert-json-to-yaml.html. 2022-09-17 10:46:54,851 WARNING [oslo_policy.policy] Policy Rules ['os_compute_api:extensions', 'os_compute_api:os-floating-ip-pools', 'os_compute_api:os-quota-sets:defaults', 'os_compute_api:os-availability-zone:list', 'os_compute_api:limits', 'project_member_api', 'project_reader_api', 'project_member_or_admin', 'project_reader_or_admin', 'os_compute_api:limits:other_project', 'os_compute_api:os-lock-server:unlock:unlock_override', 'os_compute_api:servers:create:zero_disk_flavor', 'compute:servers:resize:cross_cell', 'os_compute_api:os-shelve:unshelve_to_host'] specified in policy files are the same as the defaults provided by the service. You can remove these rules from policy files which will make maintenance easier. You can detect these redundant rules by ``oslopolicy-list-redundant`` tool also. }}} Traceback (most recent call last): File "/usr/lib/python3/dist-packages/ddt.py", line 191, in wrapper return func(self, *args, **kwargs) File "/<>/nova/tests/unit/compute/test_pci_placement_translator.py", line 92, in test_trait_normalization ppt._get_traits_for_dev({"traits": trait_names}) File "/<>/nova/compute/pci_placement_translator.py", line 78, in _get_traits_for_dev os_traits.COMPUTE_MANAGED_PCI_DEVICE AttributeError: module 'os_traits' has no attribute 'COMPUTE_MANAGED_PCI_DEVICE' To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1990121/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1896617] Re: [SRU] Creation of image (or live snapshot) from the existing VM fails if libvirt-image-backend is configured to qcow2 starting from Ussuri
Putting the bug to Opinion/Wishlist as this sounds half a Nova problem (since we set the chmod) and half a distro-specific configuration. I'm not against any modification but we need to correctly address this gap as a blueprint ideally. ** Changed in: nova Status: Triaged => Opinion ** Changed in: nova Importance: Undecided => Wishlist -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1896617 Title: [SRU] Creation of image (or live snapshot) from the existing VM fails if libvirt-image-backend is configured to qcow2 starting from Ussuri Status in OpenStack Nova Compute Charm: Invalid Status in Ubuntu Cloud Archive: Fix Released Status in Ubuntu Cloud Archive ussuri series: Fix Released Status in Ubuntu Cloud Archive victoria series: Fix Released Status in OpenStack Compute (nova): Opinion Status in nova package in Ubuntu: Fix Released Status in nova source package in Focal: Fix Released Status in nova source package in Groovy: Fix Released Bug description: [Impact] tl;dr 1) creating the image from the existing VM fails if qcow2 image backend is used, but everything is fine if using rbd image backend in nova-compute. 2) openstack server image create --name fails with some unrelated error: $ openstack server image create --wait 842fa12c-19ee-44cb-bb31-36d27ec9d8fc HTTP 404 Not Found: No image found with ID f4693860-cd8d-4088-91b9-56b2f173ffc7 == Details == Two Tempest tests ([1] and [2]) from the 2018.02 Refstack test lists [0] are failing with the following exception: 49701867-bedc-4d7d-aa71-7383d877d90c Traceback (most recent call last): File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/base.py", line 369, in create_image_from_server waiters.wait_for_image_status(client, image_id, wait_until) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/common/waiters.py", line 161, in wait_for_image_status image = show_image(image_id) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/services/compute/images_client.py", line 74, in show_image resp, body = self.get("images/%s" % image_id) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py", line 298, in get return self.request('GET', url, extra_headers, headers) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/services/compute/base_compute_client.py", line 48, in request method, url, extra_headers, headers, body, chunked) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py", line 687, in request self._error_checker(resp, resp_body) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py", line 793, in _error_checker raise exceptions.NotFound(resp_body, resp=resp) tempest.lib.exceptions.NotFound: Object not found Details: {'code': 404, 'message': 'Image not found.'} During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/images/test_images_oneserver.py", line 69, in test_create_delete_image wait_until='ACTIVE') File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/base.py", line 384, in create_image_from_server image_id=image_id) tempest.exceptions.SnapshotNotFoundException: Server snapshot image d82e95b0-9c62-492d-a08c-5bb118d3bf56 not found. So far I was able to identify the following: 1) https://github.com/openstack/tempest/blob/master/tempest/api/compute/images/test_images_oneserver.py#L69 invokes a "create image from server" 2) It fails with the following error message in the nova-compute logs: https://pastebin.canonical.com/p/h6ZXdqjRRm/ The same occurs if the "openstack server image create --wait" will be executed; however, according to https://docs.openstack.org/nova/ussuri/admin/migrate-instance-with- snapshot.html the VM has to be shut down before the image creation: "Shut down the source VM before you take the snapshot to ensure that all data is flushed to disk. If necessary, list the instances to view the instance name. Use the openstack server stop command to shut down the instance:" This step is definitely being skipped by the test (e.g it's trying
[Yahoo-eng-team] [Bug 1988311] Re: Concurrent evacuation of vms with pinned cpus to the same host fail randomly
Setting to High as we need to bump our requirements on master to prevent older releases of oslo.concurrency. Also, need to backport the patch into stable releases of oslo.concurrency for Yoga. ** Also affects: nova/yoga Importance: Undecided Status: New ** Changed in: nova/yoga Status: New => Confirmed ** Changed in: nova/yoga Importance: Undecided => High ** Changed in: nova Importance: Critical => High -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1988311 Title: Concurrent evacuation of vms with pinned cpus to the same host fail randomly Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) yoga series: Confirmed Status in oslo.concurrency: Fix Released Bug description: Reproduction: Boot two vms (each with one pinned cpu) on devstack0. Then evacuate them to devtack0a. devstack0a has two dedicated cpus, so both vms should fit. However sometimes (for example 6 out of 10 times) the evacuation of one vm fails with this error message: 'CPU set to pin [0] must be a subset of free CPU set [1]'. devstack0 - all-in-one host devstack0a - compute-only host # have two dedicated cpus for pinning on the evacuation target host devstack0a:/etc/nova/nova-cpu.conf: [compute] cpu_dedicated_set = 0,1 # the dedicated cpus are properly tracked in placement $ openstack resource provider list +--+++--+--+ | uuid | name | generation | root_provider_uuid | parent_provider_uuid | +--+++--+--+ | a0574d87-42ee-4e13-b05a-639dc62c1196 | devstack0a | 2 | a0574d87-42ee-4e13-b05a-639dc62c1196 | None | | 2e6fac42-d6e3-4366-a864-d5eb2bdc2241 | devstack0 | 2 | 2e6fac42-d6e3-4366-a864-d5eb2bdc2241 | None | +--+++--+--+ $ openstack resource provider inventory list a0574d87-42ee-4e13-b05a-639dc62c1196 ++--+--+--+--+---+---+--+ | resource_class | allocation_ratio | min_unit | max_unit | reserved | step_size | total | used | ++--+--+--+--+---+---+--+ | MEMORY_MB | 1.5 |1 | 3923 | 512 | 1 | 3923 |0 | | DISK_GB| 1.0 |1 | 28 |0 | 1 |28 |0 | | PCPU | 1.0 |1 |2 |0 | 1 | 2 |0 | ++--+--+--+--+---+---+--+ # use vms with one pinned cpu openstack flavor create cirros256-pinned --public --ram 256 --disk 1 --vcpus 1 --property hw_rng:allowed=True --property hw:cpu_policy=dedicated # boot two vms (each with one pinned cpu) on devstack0 n=2 ; for i in $( seq $n ) ; do openstack server create --flavor cirros256-pinned --image cirros-0.5.2-x86_64-disk --nic net-id=private --availability-zone :devstack0 --wait vm$i ; done # kill n-cpu on devstack0 devstack0 $ sudo systemctl stop devstack@n-cpu # and force it down, so we can start evacuating openstack compute service set devstack0 nova-compute --down # evacuate both vms to devstack0a concurrently for vm in $( openstack server list --host devstack0 -f value -c ID ) ; do openstack --os-compute-api-version 2.29 server evacuate --host devstack0a $vm & done # follow up on how the evacuation is going, check if the bug occured, see details a bit below for i in $( seq $n ) ; do openstack server show vm$i -f value -c OS-EXT-SRV-ATTR:host -c status ; done # clean up devstack0 $ sudo systemctl start devstack@n-cpu openstack compute service set devstack0 nova-compute --up for i in $( seq $n ) ; do openstack server delete vm$i --wait ; done This bug is not deterministic. For example out of 10 tries (like above) I have seen 4 successes - when both vms successfully evacuated to (went to ACTIVE on) devstack0a. But in the other 6 cases only one vm evacuated successfully. The other vm went to ERROR state, with the error message: "CPU set to pin [0] must be a subset of free CPU set [1]". For example: $ openstack server show vm2 ... | fault | {'code': 400, 'created': '2022-08-24T13:50:33Z', 'message': 'CPU set to pin [0] must be a subset of free CPU set [1]'} | ... In n-cpu logs we see the following: aug 24
[Yahoo-eng-team] [Bug 1981631] Re: Nova fails to reuse mdev vgpu devices
OK, I maybe mistriaged this bug report, as this is specific to the Ampere architecture with SR-IOV support, so nevermind comment #2. FWIW, this hardware support is very special as you indeed need to enable VFs, as described in nvidia docs : https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#creating-sriov-vgpu-device-red-hat-el-kvm Indeed, 32 VFs would be configured *but* if you specify enabled_vgpu_types to the right nvidia-471 type for the PCI address, then the VGPU inventory for this PCI device will have a total of 4, not 32 as I tested earlier. Anyway, this whole Ampere support is very fragile upstream as this is not fully supported upstream, so I'm about to set this bug to Opinion, as Ampere GPUs won't be able to be tested upstream. Please do further testing to identify whether something is missing with current vGPU support we have in Nova but for the moment and which would break Ampere support, but please understand that upstream support is absolutely hardware-independant and has to not be nvidia-specific. ** Tags added: vgpu ** Changed in: nova Status: Confirmed => Opinion -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1981631 Title: Nova fails to reuse mdev vgpu devices Status in OpenStack Compute (nova): Opinion Bug description: Description: Hello we are experiencing a weird issue where Nova creates the mdev devices from virtual functions when none are created but then will not reuse them once they are all created and vgpu instances are removed. I believe part of this issue was the uuid issue from this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1701281 Manually applying the latest patch partially fixed the issue (placement stopped reporting no hosts available), now the error is on the hypervisor side saying 'no vgpu resources available'. If I manually remove the mdev device by with commands like the following: echo "1" > /sys/bus/mdev/devices/150c155c-da0b-45a6-8bc1-a8016231b100/remove then Im able to spin up an instance again. all mdev devices match in mdevctl list and virsh nodedev-list Steps to reproduce: 1) freshly setup hypervisor with no mdev devices created yet 2) spin up vgpu instances until all mdevs are created that will fit on physical gpu(s) 3) delete vgpu instances 4) try and spin up new vgpu instances Expected Result: = Instance spin up and use reuse the mdev vgpu devices Actual Result: = Build error from Nova API: Error: Failed to perform requested operation on instance "colby_gpu_test23", the instance has an error status: Please try again later [Error: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance c18565f9-da37-42e9-97b9-fa33da5f1ad0.]. Error in hypervisor logs: nova.exception.ComputeResourcesUnavailable: Insufficient compute resources: vGPU resource is not available mdevctl output: cdc98056-8597-4531-9e55-90ab44a71b4e :21:00.7 nvidia-563 manual 298f1e4b-784d-42a9-b3e5-bdedd0eeb8e1 :21:01.2 nvidia-563 manual 2abee89e-8cb4-4727-ac2f-62888daab7b4 :21:02.4 nvidia-563 manual 32445186-57ca-43f4-b599-65a455fffe65 :21:04.2 nvidia-563 manual 0c4f5d07-2893-49a1-990e-4c74c827083b :81:00.7 nvidia-563 manual 75d1b78c-b097-42a9-b736-4a8518b02a3d :81:01.2 nvidia-563 manual a54d33e0-9ddc-49bb-8908-b587c72616a9 :81:02.5 nvidia-563 manual cd7a49a8-9306-41bb-b44e-00374b1e623a :81:03.4 nvidia-563 manual virsh nodedev-list -cap mdev: mdev_0c4f5d07_2893_49a1_990e_4c74c827083b__81_00_7 mdev_298f1e4b_784d_42a9_b3e5_bdedd0eeb8e1__21_01_2 mdev_2abee89e_8cb4_4727_ac2f_62888daab7b4__21_02_4 mdev_32445186_57ca_43f4_b599_65a455fffe65__21_04_2 mdev_75d1b78c_b097_42a9_b736_4a8518b02a3d__81_01_2 mdev_a54d33e0_9ddc_49bb_8908_b587c72616a9__81_02_5 mdev_cd7a49a8_9306_41bb_b44e_00374b1e623a__81_03_4 mdev_cdc98056_8597_4531_9e55_90ab44a71b4e__21_00_7 nvidia-smi vgpu output: Wed Jul 13 20:15:16 2022 +-+ | NVIDIA-SMI 510.73.06 Driver Version: 510.73.06 | |-+--++ | GPU Name | Bus-Id | GPU-Util | | vGPU ID Name | VM ID VM Name| vGPU-Util | |=+==+| | 0 NVIDIA A40 | :21:00.0 | 0% | | 3251635106 NVIDIA A40-12Q | 2786... instance-00014520 | 0% | | 3251635117
[Yahoo-eng-team] [Bug 1940425] Re: test_live_migration_with_trunk tempest test fails due to port remains in down state
As we have proof of the issue being due to os-vif 3.0.0 release, changing the Nova status to Invalid. ** Also affects: os-vif Importance: Undecided Status: New ** Changed in: nova Status: Confirmed => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1940425 Title: test_live_migration_with_trunk tempest test fails due to port remains in down state Status in neutron: Confirmed Status in OpenStack Compute (nova): Invalid Status in os-vif: New Bug description: Example failure is in [1]: 2021-08-18 10:40:52,334 124842 DEBUG[tempest.lib.common.utils.test_utils] Call _is_port_status_active returns false in 60.00 seconds }}} Traceback (most recent call last): File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 89, in wrapper return func(*func_args, **func_kwargs) File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 70, in wrapper return f(*func_args, **func_kwargs) File "/opt/stack/tempest/tempest/api/compute/admin/test_live_migration.py", line 281, in test_live_migration_with_trunk self.assertTrue( File "/usr/lib/python3.8/unittest/case.py", line 765, in assertTrue raise self.failureException(msg) AssertionError: False is not true Please note that a similar bug was reported and fixed previously https://bugs.launchpad.net/tempest/+bug/1924258 It seems that fix did not fully solved the issue. It is not super frequent I saw 4 occasions in the last 30 days [2]. [1] https://zuul.opendev.org/t/openstack/build/fdbda223dc10456db58f922b6435f680/logs [2] https://paste.opendev.org/show/808166/ To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1940425/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1968054] Re: oslo.messaging._drivers.impl_rabbit Connection failed: timed out
Unfortunately, this doesn't look a Nova issue : this is either an oslo.messaging bug or rather a configuration issue. Closing this bug for Nova. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1968054 Title: oslo.messaging._drivers.impl_rabbit Connection failed: timed out Status in OpenStack Compute (nova): Invalid Status in oslo.messaging: New Bug description: I am running Wallaby Release on Ubuntu 20.04 (Openstack-Ansible deployment tool) oslo.messaging=12.7.1 nova=23.1.1 since i upgrade to Wallaby i have started noticed following error message very frequently in nova-compute and solution is to restart nova-compute agent. Here is the full logs: https://paste.opendev.org/show/bft9znewTxyXHkvIcQO0/ 01 19:43:36 compute1.example.net nova-compute[1546242]: AssertionError: Apr 01 19:45:35 compute1.example.net nova-compute[34090]: 2022-04-01 19:45:35.059 34090 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 110] Connection timed out Apr 01 19:45:40 compute1.example.net nova-compute[34090]: 2022-04-01 19:45:40.063 34090 ERROR oslo.messaging._drivers.impl_rabbit [req-707abbfe-8ee0-4af7-900a-e43dc5dec597 - - - - -] [7d350e59-001f-4203-bd41-369650cd5c5c] AMQP server on 172.28.17.24:5671 is unreachable: . Trying again in 1 seconds.: socket.timeout Apr 01 19:45:40 compute1.example.net nova-compute[34090]: 2022-04-01 19:45:40.079 34090 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: timed out (retrying in 0 seconds): socket.timeout: timed out Apr 01 19:45:41 compute1.example.net nova-compute[34090]: 2022-04-01 19:45:41.983 34090 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 0 seconds): OSError: [Errno 113] EHOSTUNREACH Apr 01 19:45:42 compute1.example.net nova-compute[34090]: 2022-04-01 19:45:42.367 34090 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 2.0 seconds): OSError: [Errno 113] EHOSTUNREACH Apr 01 19:45:42 compute1.example.net nova-compute[34090]: Traceback (most recent call last): Apr 01 19:45:42 compute1.example.net nova-compute[34090]: File "/openstack/venvs/nova-23.1.1/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 476, in fire_timers Apr 01 19:45:42 compute1.example.net nova-compute[34090]: timer() Apr 01 19:45:42 compute1.example.net nova-compute[34090]: File "/openstack/venvs/nova-23.1.1/lib/python3.8/site-packages/eventlet/hubs/timer.py", line 59, in __call__ Apr 01 19:45:42 compute1.example.net nova-compute[34090]: cb(*args, **kw) Apr 01 19:45:42 compute1.example.net nova-compute[34090]: File "/openstack/venvs/nova-23.1.1/lib/python3.8/site-packages/eventlet/semaphore.py", line 152, in _do_acquire Apr 01 19:45:42 compute1.example.net nova-compute[34090]: waiter.switch() Apr 01 19:45:42 compute1.example.net nova-compute[34090]: greenlet.error: cannot switch to a different thread Apr 01 19:45:49 compute1.example.net nova-compute[34090]: 2022-04-01 19:45:49.388 34090 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: timed out (retrying in 0 seconds): socket.timeout: timed out Apr 01 19:45:50 compute1.example.net nova-compute[34090]: 2022-04-01 19:45:50.303 34090 ERROR oslo.messaging._drivers.impl_rabbit [-] [08af61ee-e653-44b0-82bb-155a2a8b7ef3] AMQP server on 172.28.17.24:5671 is unreachable: [Errno 113] No route to host. Trying again in 1 seconds.: OSError: [Errno 113] No route to host Apr 01 19:45:51 compute1.example.net nova-compute[34090]: 2022-04-01 19:45:51.199 34090 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 0 seconds): OSError: [Errno 113] EHOSTUNREACH Apr 01 19:45:51 compute1.example.net nova-compute[34090]: 2022-04-01 19:45:51.583 34090 ERROR oslo.messaging._drivers.impl_rabbit [-] [08af61ee-e653-44b0-82bb-155a2a8b7ef3] AMQP server on 172.28.17.24:5671 is unreachable: [Errno 113] EHOSTUNREACH. Trying again in 1 seconds.: OSError: [Errno 113] EHOSTUNREACH Apr 01 19:45:51 compute1.example.net nova-compute[34090]: Traceback (most recent call last): Apr 01 19:45:51 compute1.example.net nova-compute[34090]: File "/openstack/venvs/nova-23.1.1/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 476, in fire_timers Apr 01 19:45:51 compute1.example.net nova-compute[34090]: timer() Apr 01 19:45:51 compute1.example.net nova-compute[34090]: File "/openstack/venvs/nova-23.1.1/lib/python3.8/site-packages/eventlet/hubs/timer.py", line 59, in __call__ Apr 01 19:45:51 compute1.example.net nova-compute[34090]: cb(*args, **kw) Apr 01 19:45:51 compute1.example.net nova-compute[34090]: File
[Yahoo-eng-team] [Bug 1968555] Re: evacuate after network issue will cause vm running on two host
If you see some compute flapping due to some network issue, you can force it to be down : https://docs.openstack.org/api-ref/compute/?expanded=update-forced-down-detail#update-forced-down Once the compute is down (because either it's forced down or by the service group API), indeed you can evacuate the instance and then you would have two different instances, once for the original one, and the other one for the new host. That said, given the original host is down, you should restart the compute service then once it's back up, right? If so, we then verify the evacuated instances and we delete them : https://github.com/openstack/nova/blob/a1f006d799d2294234d381395a9ae9c22a2d80b9/nova/compute/manager.py#L1531 ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1968555 Title: evacuate after network issue will cause vm running on two host Status in OpenStack Compute (nova): Invalid Bug description: Environment === openstack queen + libvirt 4.5.0 + qemu 2.12 running on centos7, with ceph rbd storage Description === If the management network of the compute host is abnormal, it may cause nova-compute down but the openstack-nova-compute.service is still running on that host. Now you evacuate a vm on that host, the evacuate will succeed, the vm will be running both on the old host and the new host even after the management network of old host recover, it may cause vm error. Steps to reproduce == 1. Manually turn down the management network port of the compute host, like ifconfig eth0 down 2. After the nova-compute of that host see down with openstack compute service list, evacuate one vm on that host: nova evacuate 3. After evacuate succeed, you can find the vm running on two host. 4. Manually turn up the management network port of the old compute host, like ifconfig eth0 up, you can find the vm still running on this host, it can't be auto destroy unless you restart the openstack-nova-compute.service on that host. Expected result === Maybe we can add a periodic task to auto destroy this vm? To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1968555/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1969054] Re: when enabled enforce_new_defaults, create server failed
Marking the bug as WONTFIX as we fixed the root cause in the Yoga release. ** Changed in: nova Status: New => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1969054 Title: when enabled enforce_new_defaults,create server failed Status in OpenStack Compute (nova): Won't Fix Bug description: Description === When enabled enforce_new_defaults in nova.conf. system scope admin failed to create a server.It occure an error in neutron log(controller node). 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova Traceback (most recent call last): 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova File "/usr/lib/python3.6/site-packages/neutron/notifiers/nova.py", line 266, in send_events 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova batched_events) 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova File "/usr/lib/python3.6/site-packages/novaclient/v2/server_external_events.py", line 39, in create 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova return_raw=True) 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova File "/usr/lib/python3.6/site-packages/novaclient/base.py", line 363, in _create 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova resp, body = self.api.client.post(url, body=body) 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova File "/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py", line 401, in post 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova return self.request(url, 'POST', **kwargs) 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova File "/usr/lib/python3.6/site-packages/novaclient/client.py", line 78, in request 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova raise exceptions.from_response(resp, body, url, method) 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova novaclient.exceptions.Forbidden: Policy doesn't allow os_compute_api:os-server-external-events:create to be performed. (HTTP 403) (Request-ID: req-928afad8-32b9-420 8-8e5e-e2bc9061a56a) Steps to reproduce == 1、enabled enforce_new_defaults in nova.conf and restart nova 2、empty policy.yaml >/etc/nova/policy.yaml 3、use admin(system scope) to create a server 4、create server failed 5、disabled enforce_new_defaults ,admin could create server successfully. Expected result === admin user create the server successfully. Actual result = The status of server stuck in "BUILD" ,after 5 mimutes,it become "error". It occure an error in neutron log(controller node). 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova Traceback (most recent call last): 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova File "/usr/lib/python3.6/site-packages/neutron/notifiers/nova.py", line 266, in send_events 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova batched_events) 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova File "/usr/lib/python3.6/site-packages/novaclient/v2/server_external_events.py", line 39, in create 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova return_raw=True) 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova File "/usr/lib/python3.6/site-packages/novaclient/base.py", line 363, in _create 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova resp, body = self.api.client.post(url, body=body) 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova File "/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py", line 401, in post 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova return self.request(url, 'POST', **kwargs) 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova File "/usr/lib/python3.6/site-packages/novaclient/client.py", line 78, in request 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova raise exceptions.from_response(resp, body, url, method) 2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova novaclient.exceptions.Forbidden: Policy doesn't allow os_compute_api:os-server-external-events:create to be performed. (HTTP 403) (Request-ID: req-928afad8-32b9-420 8-8e5e-e2bc9061a56a) Environment === OS release centos8.2 openstack victoria nova 22.2.2 neutron 17.2 keystone 18.0 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1969054/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1965441] Re: Error: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible
You're asking to shrew the disk from 160GB to 100GB and that's something we don't support. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1965441 Title: Error: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible Status in OpenStack Compute (nova): Invalid Bug description: (HTTP 500) (Request-ID: req-819ea4bc-7115-4645-a640-7e3a9bb9595a) Resize from m1.xlarge to m1.medium-xdisk on VIO Flavor Name m1.xlarge Flavor ID 5 RAM 16GB VCPUs 8 VCPU Disk 160GB Flavor Details Name m1.medium-xdisk VCPUs 2 Root Disk 100 GB Ephemeral Disk0 GB Total Disk100 GB RAM 4,096 MB Ubuntu18.04LTS-pristine To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1965441/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1962726] Re: ssh-rsa key is no longer allowed by recent openssh
We discussed this during the previous Nova meeting and we agreed on the fact this is a correct issue, but we need to deprecate the generation API (and continue to accept to import the public keys). As this means a new API microversion, we need a spec for it so we'll discuss this during the next PTG. Closing the bug. ** Changed in: nova Importance: Undecided => Wishlist ** Changed in: nova Status: New => Opinion -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1962726 Title: ssh-rsa key is no longer allowed by recent openssh Status in OpenStack Compute (nova): Opinion Bug description: Description === Currently create Key-pair API without actual key content returns the key generated at server side which is formatted in ssh-rsa. However ssh-rsa is no longer supported by default since openssh 8.8 https://www.openssh.com/txt/release-8.8 ``` This release disables RSA signatures using the SHA-1 hash algorithm by default. This change has been made as the SHA-1 hash algorithm is cryptographically broken, and it is possible to create chosen-prefix hash collisions for https://bugs.launchpad.net/nova/+bug/1962726/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1962381] Re: Nova Instance Creation Fails with Error: USB is diabled for this domain
Looks to me a config issue, not a project bug. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1962381 Title: Nova Instance Creation Fails with Error: USB is diabled for this domain Status in OpenStack Compute (nova): Invalid Bug description: My instances fail to create and I get the following error in /var/log/nova-compute.log on the compute nodes and /var/log/nova- conductor.log on the controller nodeabout USB being disabled for a domain but devices being present in the domain.xml. 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [req-b7f8ef2c-f89a-4380-b573-bba4b99aa296 d20aa0616f264b39a2b72422d2d5d947 53a12573b5e14406bf85e864dc0acd68 - default default] [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a] Failed to build and run instance: libvirt.libvirtError: unsupported configuration: USB is disabled for this domain, but USB devices are present in the domain XML 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a] Traceback (most recent call last): 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a] File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2442, in _build_and_run_instance 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a] self.driver.spawn(context, instance, image_meta, 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 3766, in spawn 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a] self._create_guest_with_network( 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6758, in _create_guest_with_network 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a] self._cleanup_failed_start( 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a] File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a] self.force_reraise() 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a] File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a] six.reraise(self.type_, self.value, self.tb) 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a] File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a] raise value 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6727, in _create_guest_with_network 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a] guest = self._create_guest( 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6655, in _create_guest 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a] guest = libvirt_guest.Guest.create(xml, self._host) 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/guest.py", line 144, in create 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a] LOG.error('Error defining a guest with XML: %s', 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a] File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a] self.force_reraise() 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a] File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise 2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance:
[Yahoo-eng-team] [Bug 1963553] Re: Openstack Fails to Launch Instances "/usr/bin/qemu-system-arm' does not support virt type 'kvm; "
This doesn't sound a Nova project-specific bug, rather a config issue for a specific OS/arch. AFAIK, you need to use CentOS AArch64 images for the RPi. Anyway, closing the bug. ** Changed in: nova Status: New => Incomplete ** Changed in: nova Status: Incomplete => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1963553 Title: Openstack Fails to Launch Instances "/usr/bin/qemu-system-arm' does not support virt type 'kvm; " Status in OpenStack Compute (nova): Invalid Bug description: Environment - Openstack Victoria on Ubuntu 20.0.4 on Raspberry Pi 4B, 1 controller, 2 compute, 1 storage nodes. Been troubleshooting an Raspberry Pi 4B Openstack setup. I have all my openstack components running and verified but when trying to launch an instance it errors out (seems to be from libvirt): Emulator '/usr/bin/qemu-system-arm' does not support virt type 'kvm'\n", '\nDuring handling of the above exception, another exception occurred Per these instructions (https://docs.openstack.org/nova/victoria/configuration/config.html) as a fix I tried I've tried the following options in nova.conf in all variations with no difference in outcome (Restarted Libvirt and Nova- Computer service after each change as well) cpu_mode = (default) cpu_mode = host-passthrough virt_type = kvm virt_type = qemu I'm at a loss for what to do as to my knowledge the only way information is passed through to libvirt is through nova.conf, and appreciate any assistance. Controller Node /var/log/nova-conductor.log 2022-03-03 16:39:13.462 1987 ERROR nova.scheduler.utils [req-de8dd1d0-3ff5-4495-ad8b-8af235b4d8c4 d20aa0616f264b39a2b72422d2d5d947 - - default default] [instance: c453aedc-08d5-4b5c-95c9-ddda1eab4514] Error from last host: compute2 (node compute2): ['Traceback (most recent call last):\n', ' File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2442, in _build_and_run_instance\n self.driver.spawn(context, instance, image_meta,\n', ' File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 3766, in spawn\n self._create_guest_with_network(\n', ' File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6758, in _create_guest_with_network\n self.cleanup_failed_start(\n', ' File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in exit\n self.force_reraise()\n', ' File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise\n six.reraise(self.type, self.value, self.tb)\n', ' File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise\n raise value\n', ' File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6727, in _create_guest_with_network\n guest = self._create_guest(\n', ' File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6655, in _create_guest\n guest = libvirt_guest.Guest.create(xml, self.host)\n', ' File "/usr/lib/python3/dist-packages/nova/virt/libvirt/guest.py", line 144, in create\n LOG.error('Error defining a guest with XML: %s',\n', ' File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in exit\n self.force_reraise()\n', ' File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise\n six.reraise(self.type, self.value, self.tb)\n', ' File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise\n raise value\n', ' File "/usr/lib/python3/dist-packages/nova/virt/libvirt/guest.py", line 141, in create\n guest = host.write_instance_config(xml)\n', ' File "/usr/lib/python3/dist-packages/nova/virt/libvirt/host.py", line 1144, in write_instance_config\n domain = self.get_connection().defineXML(xml)\n', ' File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 193, in doit\n result = proxy_call(self._autowrap, f, *args, **kwargs)\n', ' File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 151, in proxy_call\n rv = execute(f, *args, **kwargs)\n', ' File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 132, in execute\n six.reraise(c, e, tb)\n', ' File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise\n raise value\n', ' File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 86, in tworker\n rv = meth(*args, **kwargs)\n', ' File "/usr/lib/python3/dist-packages/libvirt.py", line 4047, in defineXML\n if ret is None:raise libvirtError('virDomainDefineXML() failed', conn=self)\n', "libvirt.libvirtError: unsupported configuration: Emulator '/usr/bin/qemu-system-arm' does not support virt type 'kvm'\n", '\nDuring handling of the above exception, another exception occurred:\n\n', 'Traceback (most recent call last):\n', ' File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2268, in _do_build_and_run_instance\n
[Yahoo-eng-team] [Bug 1964097] Re: Questions about the command "nova list & openstack server list"
No, it's just calling the API DB. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1964097 Title: Questions about the command "nova list & openstack server list" Status in OpenStack Compute (nova): Invalid Bug description: When I use the command "nova list" to list all instances in the system, does this operation go through the message queue? Thank you all! To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1964097/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1959742] Re: Cant launch Instance (Nova, https://cloud.lab.fiware.org/project/instances/)
You need to give us more logs in order to understand what the issue is. Looks to me it's not a bug, rather a configuration issue. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1959742 Title: Cant launch Instance (Nova, https://cloud.lab.fiware.org/project/instances/) Status in OpenStack Compute (nova): Invalid Bug description: Whenever i try to create instance, I get Error: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible. (HTTP 500) (Request-ID: req-273ba8b2-95fa-4b66-a958-317bf4f59a50) Error: Unable to launch instance named "learning-1" To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1959742/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1959682] Re: String concatenation TypeError in resize flavor helper
** Also affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1959682 Title: String concatenation TypeError in resize flavor helper Status in OpenStack Compute (nova): Invalid Status in tempest: In Progress Bug description: In cae966812, for certain resize tests, we started adding a numeric ID to the new flavor name to avoid collisions. This was incorrectly done as a string + int concatenation, which is raising a `TypeError: can only concatenate str (not "int") to str`. Example of this happening in nova-next job: https://zuul.opendev.org/t/openstack/build/7f750faf22ec48219ddd072cfe6e02e1/logs To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1959682/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1956432] Re: Old allocation of VM is not deleted after evacuating
This is expected behaviour. As we assume that we can only evacuate when a nova-compute service is down, there are no ways for the nova-compute service to ask Placement to remove those allocations. That's only when nova-compute is back up that we can delete those allocations. We also provide some nova-manage commands for deleting orphaned allocations in case of a non-recoverable compute service. See https://docs.openstack.org/nova/latest/cli/nova-manage.html#placement- audit ** Changed in: nova Status: New => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1956432 Title: Old allocation of VM is not deleted after evacuating Status in OpenStack Compute (nova): Won't Fix Bug description: I found that the old instance allocation in placement is not deleted after executing evacuate, it will lead to wrong resources info of old compute node. - MariaDB [placement]> select * from allocations where consumer_id='4c6c29e7-a1f0-4dac-a3ef-a98b5598abe9'; +-++---+--+--+---+--+ | created_at | updated_at | id| resource_provider_id | consumer_id | resource_class_id | used | +-++---+--+--+---+--+ | 2022-01-05 08:23:19 | NULL | 18315 | 11 | 4c6c29e7-a1f0-4dac-a3ef-a98b5598abe9 | 2 |1 | | 2022-01-05 08:23:19 | NULL | 18318 | 11 | 4c6c29e7-a1f0-4dac-a3ef-a98b5598abe9 | 1 | 512 | | 2022-01-05 08:23:19 | NULL | 18321 | 11 | 4c6c29e7-a1f0-4dac-a3ef-a98b5598abe9 | 0 |1 | | 2022-01-05 08:23:19 | NULL | 18324 | 33 | 4c6c29e7-a1f0-4dac-a3ef-a98b5598abe9 | 0 |1 | | 2022-01-05 08:23:19 | NULL | 18327 | 33 | 4c6c29e7-a1f0-4dac-a3ef-a98b5598abe9 | 1 | 512 | | 2022-01-05 08:23:19 | NULL | 18330 | 33 | 4c6c29e7-a1f0-4dac-a3ef-a98b5598abe9 | 2 |1 | +-++---+--+--+---+--+ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1956432/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1956983] Re: Consider making upgrade check for old computes a failure
I'm not sure I'd classify it as a bug. Probably a good thought so, so marking it Invalid/Wishlist but I'm open to thoughts. To answer your question, now that we have an hardstop blocking nova services to restart if there are old enough, this sound a legit blueprint to address. ** Changed in: nova Importance: Undecided => Wishlist ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1956983 Title: Consider making upgrade check for old computes a failure Status in OpenStack Compute (nova): Invalid Bug description: Currently, the upgrade check for older than N-1 computes only produces a warning. For example: Check: Older than N-1 computes Result: Warning Details: Current Nova version does not support computes older than Victoria but the minimum compute service level in your system is 30 and the oldest supported service level is 52. If this is overlooked, Nova services will fail to start after upgrade. With Nova API down, the old services cannot be removed without database edits. Is there a specific reason to keep this check as a warning rather than a failure? To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1956983/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1953734] Re: POST /os-security-groups returns HTTP 500 on invalid input
As we document in our API docs, this /os-security-groups API resource is now deprecated [1] since API microversion 2.36 [2] which is shipped with the Newton release [1] https://docs.openstack.org/api-ref/compute/#create-security-group [2] https://docs.openstack.org/nova/latest/reference/api-microversion-history.html#microversion Accordingly, we can't fix this bug in our project, even within the existing stable branches. ** Changed in: nova Status: New => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1953734 Title: POST /os-security-groups returns HTTP 500 on invalid input Status in OpenStack Compute (nova): Won't Fix Bug description: Nova does not validate the input on os-security-groups API resource. curl -X POST 'http://10.1.0.21/compute/v2.1/os-security-groups' -d '{"security_group": "nostrud commodo tempor", "name": "eiusmod veniam", "description": "non esse occaecat"}' -H "Content-Type: application/json; charset=UTF-8" -H "Accept: application/json" -H "X-Auth-Token: ${token}" {"computeFault": {"code": 500, "message": "Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.\n"}} Dec 09 10:27:16 master0 devstack@n-api.service[3644655]: ERROR nova.api.openstack.wsgi [None req-910489b0-9e02-4748-84ed-f2b2574ec7bb admin admin] Unexpected exception in API method: Attribut> Dec 09 10:27:16 master0 devstack@n-api.service[3644655]: ERROR nova.api.openstack.wsgi Traceback (most recent call last): Dec 09 10:27:16 master0 devstack@n-api.service[3644655]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/api/openstack/wsgi.py", line 658, in wrapped Dec 09 10:27:16 master0 devstack@n-api.service[3644655]: ERROR nova.api.openstack.wsgi return f(*args, **kwargs) Dec 09 10:27:16 master0 devstack@n-api.service[3644655]: ERROR nova.api.openstack.wsgi File "/opt/stack/nova/nova/api/openstack/compute/security_groups.py", line 219, in create Dec 09 10:27:16 master0 devstack@n-api.service[3644655]: ERROR nova.api.openstack.wsgi group_name = security_group.get('name', None) Dec 09 10:27:16 master0 devstack@n-api.service[3644655]: ERROR nova.api.openstack.wsgi AttributeError: 'str' object has no attribute 'get' Dec 09 10:27:16 master0 devstack@n-api.service[3644655]: ERROR nova.api.openstack.wsgi Dec 09 10:27:16 master0 devstack@n-api.service[3644655]: INFO nova.api.openstack.wsgi [None req-910489b0-9e02-4748-84ed-f2b2574ec7bb admin admin] HTTP exception thrown: Unexpected API Error. > Dec 09 10:27:16 master0 devstack@n-api.service[3644655]: Dec 09 10:27:16 master0 devstack@n-api.service[3644655]: DEBUG nova.api.openstack.wsgi [None req-910489b0-9e02-4748-84ed-f2b2574ec7bb admin admin] Returning 500 to user: Unexpected API Error.> Dec 09 10:27:16 master0 devstack@n-api.service[3644655]: {{(pid=3644655) __call__ /opt/stack/nova/nova/api/openstack/wsgi.py:936}} reproducible on recent master with simple devstack setup To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1953734/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1951983] Re: nova contains a regular expression that is vulnerable to ReDoS (Regular Expression Denial of Service).
If I understand correctly which module has this issue, this is about hacking.py. @dw1s, you tell this is before SHA1 8f250f50446ca2d7aa84609d5144088aa4cded78 but I can't find it in the nova repo. Either way, this hacking.py module isn't run by our services and is just used by our PEP8 jobs, so I don't see any problem here. ** Changed in: nova Status: New => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1951983 Title: nova contains a regular expression that is vulnerable to ReDoS (Regular Expression Denial of Service). Status in OpenStack Compute (nova): Won't Fix Status in OpenStack Security Advisory: Won't Fix Bug description: # Summary nova contains a regular expression that is vulnerable to ReDoS (Regular Expression Denial of Service). # Description ReDoS, or Regular Expression Denial of Service, is a vulnerability affecting inefficient regular expressions which can perform extremely badly when run on a crafted input string. # Proof of Concept To see that the regular expression is vulnerable, copy-paste it into a separate file & run the code as shown in below. ```python import re log_remove_context = re.compile( r"(.)*LOG\.(.*)\(.*(context=[_a-zA-Z0-9].*)+.*\)") log_remove_context.match('LOG.' + '(' * 3456) ``` # Impact This issue may lead to a denial of service. # References - https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1951983/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1951720] Re: Virtual interface creation failed
Yeah, I'll put it to the Neutron team to ask them to look at this bug. In case they say it's a Nova bug, please modify the nova status to "New" again. Thanks. ** Also affects: neutron Importance: Undecided Status: New ** Changed in: nova Status: New => Opinion -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1951720 Title: Virtual interface creation failed Status in neutron: New Status in OpenStack Compute (nova): Opinion Bug description: Hi, I have testig stack of openstack wallaby (deployed with kolla-ansible with kolla source images) and found probably weird bug in nova/neutron. I have testing heat template which was starting about 6 instances with bunch of network interfaces with security group - nothing special. Testing openstack ENV is clean, fresh installed, tempest passing. So what is going on ? 1. Sometimes heat stack is created successfully without error 2. Sometimes heat stack is created sucessfully - BUT with errors in nova-compute - so retry mechanism works Errors in nova-compute : 7d241eaae5bb137aedc6fcc] [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] Took 0.10 seconds to destroy the instance on the hypervisor. 2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [req-5b21ebad-ffb7-46b0-8c37-fd665d01013e 64fe2842ff8c6302c0450bee25600a10e54f2b9793e9c8776f956c993a7a7ee8 0960461696f64f82ba108f8397bf508c - e01e19b257d241eaae5bb137aedc6fcc e01e19b2 57d241eaae5bb137aedc6fcc] [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] Failed to allocate network(s): nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed 2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] Traceback (most recent call last): 2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6930, in _create_guest_with_network 2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]guest = self._create_guest( 2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] File "/usr/lib/python3.9/contextlib.py", line 124, in __exit__ 2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]next(self.gen) 2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 479, in wait_for_instance_event 2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]actual_event = event.wait() 2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] File "/usr/lib/python3/dist-packages/eventlet/event.py", line 125, in wait 2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]result = hub.switch() 2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 313, in switch 2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]return self.greenlet.switch() 2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] eventlet.timeout.Timeout: 300 seconds 2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] 2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] During handling of the above exception, another exception occurred: 2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] 2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] Traceback (most recent call last): 2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2366, in _build_and_run_instance 2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]self.driver.spawn(context, instance, image_meta, 2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 3885, in spawn 2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]self._create_guest_with_network( 2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance:
[Yahoo-eng-team] [Bug 1947753] Re: Evacuated instances are not removed from the source
OK, let me get it right. You say that if you want to evacuate an instance, you don't really know whether the original service runs correctly, right? That's basically why Nova verifies whether the host is not operational and somehow 'failed'. Sometimes, you're right, Nova thinks the compute service isn't faulty and then you can't evacuate. Some other time, Nova thinks the compute service *is* faulty and then you can evacuate. If you're doing so, then indeed you could have problems *if* the host is actually running. That's why in general we recommend operators to "fence" the original faulty host that's detected by Nova before evacuating. Either way, if the service continues to run, it verifies the evacuation status periodically and deletes the host. So, maybe you're getting a race when you evacuate while a compute fault is transient and then you see a problem. If so, I'd recommend you, as I said, to 'fence' the host before evacuating instances... or wait a little bit before evacuating the instances if the issue is transient. Maybe that's something related to healthchecks we want to work on : if you're getting a better status of a faulty compute service, you wouldn't issue evacuations unless you're sure it went down. Putting the bug report as Opinion but I'm more than happy to discuss with you, Belmiro, on #openstack-nova if you wish. ** Changed in: nova Status: New => Opinion ** Changed in: nova Importance: Undecided => Wishlist ** Tags added: evacuate -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1947753 Title: Evacuated instances are not removed from the source Status in OpenStack Compute (nova): Opinion Bug description: Instance "evacuation" is a great feature and we are trying to take advantage of it. But, it has some limitations, depending how "broken" is the node. Let me give some context... In the scenario where the compute node loses connectivity (broken switch port, loose network cable, ...) or nova-compute is suck (filesystem issue) evacuating instances can have some unexpected consequences and lead to data corruption in the application (for example in a DB application). If a compute node loses connectivity (or an entire set of compute nodes), nova-compute and the instances are "not available". If the node runs critical applications (let's suppose a MySQL DB), the cloud operator could be tempted to "evacuate" the instance to recover the critical application for the user. At this point the cloud operator may not know yet the compute node issue and maybe it won't be possible to shut it down (management network affected?, ...) or even simply don't want to interfere with the work of the repair team. The repair teams fixes the issue (it can take few minutes or hours...) and nova-compute and the instances are available again. The problem is that nova-compute doesn't destroy the evacuated instances in the source. ``` 2021-10-19 11:17:51.519 3050 WARNING nova.compute.resource_tracker [req-0ed10e35-2715-466a-918b-69eb1fc770e8 - - - - -] Instance fc3be091-56d3-4c69-8adb-2fdb8b0a35d2 has been moved to another host foo.cern.ch(foo.cern.ch). There are allocations remaining against the source host that might need to be removed: {u'resources': {u'VCPU': 1, u'MEMORY_MB': 1875}}. ``` At this point we have 2 instances sharing the same IP and possibly writing into the same volume. Only when nova-compute is restarted (I guess that was always the assumption... the compute node was really broken) the evacuated instances in the affected node are removed. ``` 2021-10-19 15:39:49.257 21189 INFO nova.compute.manager [req-ded45b0c-20ab-4587-9533-8c613d977f79 - - - - -] Destroying instance as it has been evacuated from this host but still exists in the hypervisor 2021-10-19 15:39:52.949 21189 INFO nova.virt.libvirt.driver [ ] Instance destroyed successfully. ``` I would expect that nova-compute will constantly check for the evacuated instances and then removed them. Otherwise, this requires a lot of coordination between different support teams. Should this be moved to a periodic task? https://github.com/openstack/nova/blob/e14eef0719eceef35e7e96b3e3d242ec79a80969/nova/compute/manager.py#L1440 I'm running Stein, but looking into the code, we have the same behaviour in master. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1947753/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1947824] Re: 503 Service Unavailable: The server is currently unavailable. Please try again at a later time.: The Keystone service is temporarily unavailable. (HTTP 503)
*** This bug is a duplicate of bug 1947825 *** https://bugs.launchpad.net/bugs/1947825 ** This bug has been marked a duplicate of bug 1947825 503 Service Unavailable: The server is currently unavailable. Please try again at a later time.: The Keystone service is temporarily unavailable. (HTTP 503) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1947824 Title: 503 Service Unavailable: The server is currently unavailable. Please try again at a later time.: The Keystone service is temporarily unavailable. (HTTP 503) Status in OpenStack Compute (nova): New Bug description: I am trying to install the rocky version of openstack, and while configuring the glance service and executing the command to upload the image we are facing the following error. 503 Service Unavailable: The server is currently unavailable. Please try again at a later time.: The Keystone service is temporarily unavailable. (HTTP 503) and the keystone service status is empty, its not displaying anything. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1947824/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1947825] Re: 503 Service Unavailable: The server is currently unavailable. Please try again at a later time.: The Keystone service is temporarily unavailable. (HTTP 503)
Looks a configuration issue from Keystone. Not really a Nova bug : the nova-api service tells you that the Keystone API service is not available. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1947825 Title: 503 Service Unavailable: The server is currently unavailable. Please try again at a later time.: The Keystone service is temporarily unavailable. (HTTP 503) Status in OpenStack Compute (nova): Invalid Bug description: I am trying to install the rocky version of openstack, and while configuring the glance service and executing the command to upload the image we are facing the following error. 503 Service Unavailable: The server is currently unavailable. Please try again at a later time.: The Keystone service is temporarily unavailable. (HTTP 503) and the keystone service status is empty, its not displaying anything. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1947825/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1948393] Re: useless configuration options in 'nova.conf'
This looks a glanceclient issue, nope ? ** Also affects: glance Importance: Undecided Status: New ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1948393 Title: useless configuration options in 'nova.conf' Status in Glance: New Status in OpenStack Compute (nova): Invalid Bug description: I was trying to add a retry for glance operations via nova. I came across below options defined in the conf: [glance] .. #connect_retries = #connect_retry_delay = #status_code_retries = #status_code_retry_delay = === I tried to set some value for `connect_retries` and tried to reproduce a connection error for the snapshot upload. Somehow the `connect_retries` value is not getting picked up. I tried to search these options in the code also(nova/nova/conf/glance.py), but could not find them. let me know if this is a known issue. could not find any duplicate bug for this. Nova release version: Train To manage notifications about this bug go to: https://bugs.launchpad.net/glance/+bug/1948393/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1948637] Re: nova should support deleting or adding tag when server 's status is error
This is not a bug, rather a feature request. If the instance is in ERROR, why should you want to modify the tag ? ** Changed in: nova Importance: Undecided => Wishlist ** Changed in: nova Status: New => Opinion -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1948637 Title: nova should support deleting or adding tag when server 's status is error Status in OpenStack Compute (nova): Opinion Bug description: when adding or deleting tag to a error VM,it report b"Cannot 'update tag' instance b7c8cc1c-8c26-4767-bd11-7faf99cee2df while it is in vm_state error (HTTP 409) b"Cannot 'delete tag' instance b7c8cc1c-8c26-4767-bd11-7faf99cee2df while it is in vm_state error (HTTP 409) tags and names belong to user-defined data, the name of the virtual machine can be modified in an error state, should the label also have similar operations? To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1948637/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1946298] Re: live-migration fails when option rom size is different in guest's memory
We discussed this bug report during the latest Asian-friendly Nova meeting [1] and we agreed on the fact this report is about asking Nova to support iPXE, as for the moment Nova doesn't really expose it for the moment, even if libvirtd does this. Please provide a blueprint for explaining your needs and then we will discuss on it. [1] https://meetings.opendev.org/meetings/nova_extra/2021/nova_extra.2021-10-07-08.04.log.html#l-59 ** Changed in: nova Status: New => Invalid ** Changed in: nova Importance: Undecided => Wishlist -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1946298 Title: live-migration fails when option rom size is different in guest's memory Status in OpenStack Compute (nova): Invalid Bug description: Description === This problem is found when doing live-migration accross nova versions, especially ipxe versions depended on libvirt. If an instance is attached to interface, option rom is loaded into guest's rom. When doing live migration, qemu will check rom size and try to resize resizable memory region. However, option rom is not resizable. Once the destination node found option rom size changed when loaded to memory, an exception will occur and stop the migration process. Steps to reproduce == A simple way to reproduce: * Prepare two nova-compute node, which can be the same version * Create an instance on Node A, and attach an interface to it * Check which ipxe rom is loaded into memory by its model type. For example, if an interface is defined with ``, then `/usr/lib/ipxe/qemu/efi-virtio.rom` is loaded to rom on ubuntu x86 system. * Change the rom's virtual size on the destination Node B. Simply `echo "hello" > /usr/lib/ipxe/qemu/efi-virtio.rom` The virtual size is the max length when rom is loaded to guest's memory, which is exponential times of 2. We can use the following command to get rom's virtual size. `virsh qemu-monitor-command --hmp 'info ramblock'` * Do live-migration `nova live-migration --block-migrate cirros1 cmp02` Expected result === Normally, if the rom's virtual size is not changed, migration will succeed. Actual result = After the execution of the steps above, the live-migration will fail with error. Environment === Nova version: $ dpkg -l | grep nova ii nova-common2:21.2.1-0ubuntu1 all ii nova-compute 2:21.2.1-0ubuntu1 all ii nova-compute-kvm 2:21.2.1-0ubuntu1 all ii nova-compute-libvirt 2:21.2.1-0ubuntu1 all ii python3-nova 2:21.2.1-0ubuntu1 all ii python3-novaclient 2:17.0.0-0ubuntu1 all Hypervisor type: libvirt $ dpkg -l | grep libvirt ii libvirt-clients6.0.0-0ubuntu8.13 amd64 ii libvirt-daemon 6.0.0-0ubuntu8.13 amd64 ii libvirt-daemon-driver-qemu 6.0.0-0ubuntu8.13 amd64 ii libvirt-daemon-driver-storage-rbd 6.0.0-0ubuntu8.13 amd64 ii libvirt-daemon-system 6.0.0-0ubuntu8.13 amd64 ii libvirt-daemon-system-systemd 6.0.0-0ubuntu8.13 amd64 ii libvirt0:amd64 6.0.0-0ubuntu8.13 amd64 ii nova-compute-libvirt 2:21.2.1-0ubuntu1 all ii python3-libvirt6.1.0-1 amd64 Networking type: Neutron with OpenVSwitch $ dpkg -l | grep neutron ii neutron-common 2:16.4.0-0ubuntu3 all ii neutron-openvswitch-agent 2:16.4.0-0ubuntu3 all ii python3-neutron2:16.4.0-0ubuntu3 all ii python3-neutron-lib2.3.0-0ubuntu1all ii python3-neutronclient 1:7.1.1-0ubuntu1 all Logs & Configs == ```text 2021-09-22 10:10:31.451 35235 ERROR nova.virt.libvirt.driver [-] [instance: 6d91c241-75b8-4067-8874-c64970b87f6a] Migration operation has aborted 2021-09-22 10:10:31.644 35235 INFO nova.compute.manager [-] [instance: 6d91c241-75b8-4067-8874-c64970b87f6a] Swapping old allocation on dict_keys(['61b9a486-f53e-4b70-b54c-0db29f8ff978']) held by migration f5308871-0e91-48b0-8a68-a7d66239b3bd for instance 2021-09-22 10:10:31.671 35235 ERROR nova.virt.libvirt.driver [-] [instance: 6d91c241-75b8-4067-8874-c64970b87f6a] Live Migration failure: internal error: qemu unexpectedly closed the monitor: 2021-09-22T02:10:31.450377Z qemu-system-x86_64: Length mismatch: :00:03.0/virtio-net-pci.rom: 0x1000 in != 0x8: Invalid argument 2021-09-22T02:10:31.450414Z qemu-system-x86_64: error while loading state for instance 0x0 of device 'ram'
[Yahoo-eng-team] [Bug 1945401] Re: scheduler can not filter node by storage backend
You need to use aggregates if you want to use different storage backends per compute. For what's it worth, if you really want to have the scheduler having a way to verify storage backends, that would be a new feature and not a bug. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1945401 Title: scheduler can not filter node by storage backend Status in OpenStack Compute (nova): Invalid Bug description: If my aggregation has ceph backend node cmp01 and fcsan backend node cmp02, and I create a ceph backend VM01 in cmp01, then execute migrate it. The migration will be failed if scheduler filter cmp02 or I set the target node is cmp02. --Traceback-- oslo_messaging.rpc.client.RemoteError: Remote error: ClientException Unable to create attachment for volume (Invalid input received: Connector doesn't have required information: wwpns). (HTTP 500) I think nova need to pre-check the target node I set or filter the available storage backend compute nodes when selecting destination. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1945401/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1945538] Re: Database permission configured properly but I have the Access deny
This looks to me an unrelated issue for the Nova repository. This rather looks to me an issue with the Ubuntu packages. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1945538 Title: Database permission configured properly but I have the Access deny Status in OpenStack Compute (nova): Invalid Bug description: Description === I installed Openstack using [OpenStack Installation Guide][OpenStack Installation Guide], all command and configuration is on my [Github][My-Github]. At this point **one controller** with **two compute** node. * Nova configuration on controller* user@controller001:~$ sudo !! sudo grep -v '^\s*$\|^\s*\#' /etc/nova/nova.conf [DEFAULT] log_dir = /var/log/nova lock_path = /var/lock/nova state_path = /var/lib/nova transport_url = rabbit://openstack:openstack@controller001:5672/ my_ip = 192.168.56.50 [api] auth_strategy = keystone [api_database] connection = mysql+pymysql://nova:openstack@controller001/nova_api [barbican] [cache] [cinder] [compute] [conductor] [console] [consoleauth] [cors] [cyborg] [database] connection = mysql+pymysql://nova:openstack@controller001/nova [devices] [ephemeral_storage_encryption] [filter_scheduler] [glance] api_servers = http://controller001:9292 [guestfs] [healthcheck] [hyperv] [image_cache] [ironic] [key_manager] [keystone] [keystone_authtoken] www_authenticate_uri = http://controller001:5000/ auth_url = http://controller001:5000/ memcached_servers = controller001:11211 auth_type = password project_domain_name = Default user_domain_name = Default project_name = service username = nova password = openstack [libvirt] [metrics] [mks] [neutron] [notifications] [oslo_concurrency] lock_path = /var/lib/nova/tmp [oslo_messaging_amqp] [oslo_messaging_kafka] [oslo_messaging_notifications] [oslo_messaging_rabbit] [oslo_middleware] [oslo_policy] [pci] [placement] region_name = RegionOne project_domain_name = Default project_name = service auth_type = password user_domain_name = Default auth_url = http://controller001:5000/v3 username = placement password = openstack [powervm] [privsep] [profiler] [quota] [rdp] [remote_debug] [scheduler] discover_hosts_in_cells_interval = 300 [serial_console] [service_user] [spice] [upgrade_levels] [vault] [vendordata_dynamic_auth] [vmware] [vnc] enabled = true server_listen = $my_ip server_proxyclient_address = $my_ip [workarounds] [wsgi] [zvm] [cells] enable = False [os_region_name] openstack = * Nova configuration on compute:* user@compute001:~$ sudo grep -v '^\s*$\|^\s*\#' /etc/nova/nova.conf [DEFAULT] log_dir = /var/log/nova lock_path = /var/lock/nova state_path = /var/lib/nova transport_url = rabbit://openstack:openstack@controller001 my_ip = 172.16.56.51 [api] auth_strategy = keystone [api_database] connection = sqlite:var/lib/nova/nova_api.sqlite [barbican] [cache] [cinder] [compute] [conductor] [console] [consoleauth] [cors] [cyborg] [database] connection = sqlite:var/lib/nova/nova.sqlite [devices] [ephemeral_storage_encryption] [filter_scheduler] [glance] api_servers = http://controller001:9292 [guestfs] [healthcheck] [hyperv] [image_cache] [ironic] [key_manager] [keystone] [keystone_authtoken] www_authenticate_uri = http://controller001:5000/ auth_url = http://controller001:5000/ memcached_servers = controller001:11211 auth_type = password project_domain_name = Default user_domain_name = Default project_name = service username = nova password = openstack [libvirt] [metrics] [mks] [neutron] [notifications] [oslo_concurrency] lock_path = /var/lib/nova/tmp [oslo_messaging_amqp] [oslo_messaging_kafka] [oslo_messaging_notifications] [oslo_messaging_rabbit] [oslo_middleware] [oslo_policy] [pci] [placement] region_name = RegionOne project_domain_name = Default project_name = service auth_type = password user_domain_name = Default auth_url = http://controller001:5000/v3 username = placement password = openstack [powervm] [privsep] [profiler] [quota] [rdp] [remote_debug] [scheduler] discover_hosts_in_cells_interval = 300 [serial_console] [service_user] [spice] [upgrade_levels] [vault] [vendordata_dynamic_auth] [vmware] [vnc] enabled = true server_listen = 0.0.0.0 server_proxyclient_address = $my_ip novncproxy_base_url = http://controller001:6080/vnc_auto.html [workarounds] [wsgi] [zvm] [cells] enable = False [os_region_name] openstack = In the below section improve database permission configured correctly
[Yahoo-eng-team] [Bug 1944111] Re: Missing __init__.py in nova/db/api
** Changed in: nova/yoga Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1944111 Title: Missing __init__.py in nova/db/api Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) xena series: Fix Released Status in OpenStack Compute (nova) yoga series: Fix Released Bug description: Looks like nova/db/api is missing an __init__.py, which breaks *at least* my Debian packaging. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1944111/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1938765] [NEW] nova-lvm job constantly fails on Glance image upload
Public bug reported: The nova-lvm job seems to fail for every change [1] which prevents us to merge any change touching nova/virt/libvirt [2] All failures seem to relate to the same Tempest tests (6 of them) failing with the same problem : a Glance image upload issue as Glance API returns a HTTP502. Aug 01 03:50:12.036077 ubuntu-focal-ovh-bhs1-0025715861 nova-compute[106038]: ERROR os_brick.initiator.linuxscsi [None req-8fa95fb0-15e8-4bf7-8314-0d2dac1c9b9c tempest-VolumesAdminNegativeTest-243853426 tempest-VolumesAdminNegativeTest-243853426-project] multipathd is not running: exit code None: oslo_concurrency.processutils.ProcessExecutionError: [Errno 2] No such file or directory Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova- compute[106038]: ERROR oslo_messaging.rpc.server [None req-76a536ab-394f-4347-8dec-30842c1ec1d2 tempest- ListImageFiltersTestJSON-1499681503 tempest- ListImageFiltersTestJSON-1499681503-project] Exception during message handling: glanceclient.exc.HTTPBadGateway: HTTP 502 Bad Gateway: Bad Gateway: The proxy server received an invalid: response from an upstream server.: Apache/2.4.41 (Ubuntu) Server at 158.69.72.121 Port 80 Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova- compute[106038]: ERROR oslo_messaging.rpc.server Traceback (most recent call last): Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova- compute[106038]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2916, in snapshot Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova- compute[106038]: ERROR oslo_messaging.rpc.server metadata['location'] = root_disk.direct_snapshot( Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova- compute[106038]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/imagebackend.py", line 452, in direct_snapshot Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova- compute[106038]: ERROR oslo_messaging.rpc.server raise NotImplementedError(_('direct_snapshot() is not implemented')) Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova- compute[106038]: ERROR oslo_messaging.rpc.server NotImplementedError: direct_snapshot() is not implemented Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova- compute[106038]: ERROR oslo_messaging.rpc.server Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova- compute[106038]: ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred: Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova- compute[106038]: ERROR oslo_messaging.rpc.server Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova- compute[106038]: ERROR oslo_messaging.rpc.server Traceback (most recent call last): Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova- compute[106038]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova- compute[106038]: ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova- compute[106038]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist- packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova- compute[106038]: ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova- compute[106038]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist- packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova- compute[106038]: ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova- compute[106038]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/exception_wrapper.py", line 71, in wrapped Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova- compute[106038]: ERROR oslo_messaging.rpc.server _emit_versioned_exception_notification( Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova- compute[106038]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 227, in __exit__ Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova- compute[106038]: ERROR oslo_messaging.rpc.server self.force_reraise() Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova- compute[106038]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova- compute[106038]: ERROR oslo_messaging.rpc.server raise self.value Aug 01 03:51:41.348545
[Yahoo-eng-team] [Bug 1918340] Re: Fault Injection #1 - improve unit test effectiveness
Fixing unit tests or tech debt concern don't really need to have bug reports. That's also why we have Gerrit, for discussing whether the debt fix is good or not. So, instead of discussing here about what to do, please upload a new change fixing what you want and ask us to review it by #openstack-nova, we'll do. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1918340 Title: Fault Injection #1 - improve unit test effectiveness Status in OpenStack Compute (nova): Invalid Bug description: Description === I have performed fault injection in openstack nova by changing the code of compute/api.py (inserting a representative/probable bug) and then ran the unit, functional and integration tests and discover that some of the bugs inserted were not detected by the test suite: The reference WIDS (Wrong string in initial data) is a type of fault where the string used in a variable initialization is set to an incorrect value. Steps to reproduce == Line of Code Original Code Incorrect Code 102 AGGREGATE_ACTION_UPDATE_META = 'UpdateMeta' AGGREGATE_ACTION_UPDATE_META = 'NHZWTCGB' Refactor the line of code above to the incorrect code. Then execute the unit tests. Expected result === The unit tests should detect the fault. Actual result === The fault was not detected by the unit tests. Environment === The code tested is on the stable/ussuri branch. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1918340/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1900006] Re: Asking for different vGPU types is racey
Victoria backport candidate : https://review.opendev.org/c/openstack/nova/+/784907 ** Also affects: nova/victoria Importance: Undecided Status: New ** Changed in: nova/victoria Status: New => Confirmed ** Changed in: nova/victoria Importance: Undecided => Medium -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/196 Title: Asking for different vGPU types is racey Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) victoria series: Confirmed Bug description: When testing on Victoria virtual GPUs, I wanted to have different types : [devices] enabled_vgpu_types = nvidia-320,nvidia-321 [vgpu_nvidia-320] device_addresses = :04:02.1,:04:02.2 [vgpu_nvidia-321] device_addresses = :04:02.3 Unfortunately, I saw that only the first type was used. When restarting the nova-compute service, we got the log : WARNING nova.virt.libvirt.driver [None req-a23d9cb4-6554-499c-9fcf-d7f9706535ef None None] The vGPU type 'nvidia-320' was listed in '[devices] enabled_vgpu_types' but no corresponding '[vgpu_nvidia-320]' group or '[vgpu_nvidia-320] device_addresses' option was defined. Only the first type 'nvidia-320' will be used. It's due to the fact that we call _get_supported_vgpu_types() first when creating the libvirt implementation [1] while we only register the new CONF options by init_host() [2] which is called after. [1] https://github.com/openstack/nova/blob/90777d790d7c268f50851ac3e5b4e02617f5ae1c/nova/virt/libvirt/driver.py#L418 [2] https://github.com/openstack/nova/blob/90777d7/nova/compute/manager.py#L1405 A simple fix would just be to make sure we have dynamic options within _get_supported_vgpu_types() To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/196/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1920977] Re: Error 504 when disabling a nova-compute service recently down
You shouldn't disable the host by calling the host API, but rather either waiting for the periodic verification (indeed, around 60 secs) or calling the force-down API. https://docs.openstack.org/api-ref/compute/?expanded=update-forced-down- detail#update-forced-down ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1920977 Title: Error 504 when disabling a nova-compute service recently down Status in OpenStack Compute (nova): Invalid Bug description: Description === When a host fails and the nova-compute service stops working, it takes some time for the nova control plane to detect it and mark the service as "down" (I believe up to 60 seconds by default?). During this time where nova-compute is dead but not marked as "down" in nova, if an operator tries to set the compute service as 'disabled', the command hangs for quite some time before returning an error. Showing the status of compute services immediately after this error indicates that the service was actually updated and marked as disabled. If the host is already seen as "down" in nova-api when trying to update status, the command ends successfully Steps to reproduce == - On a working and enabled nova-compute host, stop nova-compute service - Before host is reported as down in nova-api, run: $ openstack compute service set --disable nova-compute Expected result === - nova-compute service is marked as disabled in nova-api - command returns with a success - a nova-api log says something like "The trait will be synchronized automatically by the compute service when the update_available_resource periodic task runs or when the service is restarted." Actual result = - nova-compute service is marked as disabled in nova-api - command hangs for some time before returning an error: ``` Failed to set service status to disabled Compute service nova-compute of host failed to set. ``` Logs & Configs == When nova-api still thinks nova-compute is up and command fails, nova-api shows a stack trace with the following error: ``` An error occurred while updating the COMPUTE_STATUS_DISABLED trait on compute node resource providers managed by host . The trait will be synchronized automatically by the compute service when the update_available_resource periodic task runs.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID ``` When nova-api already knows service is down, there is only an info log: ``` Compute service on host is down. The COMPUTE_STATUS_DISABLED trait will be synchronized when the service is restarted. ``` Environment === Encountered on ussuri Impact == I would say disabling nova-compute may be one of the 1st actions an operator will try when a host is failing. This behavior also has a bad impact when using Masakari, as the 1st action taken by default is to disable the nova-compute service (see https://docs.openstack.org/masakari/latest/configuration/recovery_workflow_custom_task.html). As a result, recovery process in masakari ends up in error (even if a retry mecanism saves the day). To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1920977/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1922264] Re: On a compute node with 3 GPUs and 2 vgpu groups, nova fails to load second group
*** This bug is a duplicate of bug 196 *** https://bugs.launchpad.net/bugs/196 Marking this bug report as duplicate, so we can directly backport the change down to stable/victoria. ** This bug has been marked a duplicate of bug 196 Asking for different vGPU types is racey -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1922264 Title: On a compute node with 3 GPUs and 2 vgpu groups, nova fails to load second group Status in OpenStack Compute (nova): Confirmed Bug description: Description === We have a multiple compute nodes with multiple NVIDIA GPU cards (RTX8000/RTX6000). Nodes with a mix of RTX8000 and RTX6000 cards have 2 gpu groups configured in nova.conf but nova-compute only creates resource providers for the first gpu group. Steps to reproduce == For example, on a node with 2 RTX8000 and 1 RTX6000. $ lspci | grep -i nvidia 21:00.0 3D controller: NVIDIA Corporation TU102GL [Quadro RTX 6000/8000] (rev a1) 81:00.0 3D controller: NVIDIA Corporation TU102GL [Quadro RTX 6000/8000] (rev a1) e2:00.0 3D controller: NVIDIA Corporation TU102GL [Quadro RTX 6000/8000] (rev a1) $ nvidia-smi Thu Apr 1 17:22:53 2021 +-+ | NVIDIA-SMI 460.32.04Driver Version: 460.32.04CUDA Version: N/A | |---+--+--+ | GPU NamePersistence-M| Bus-IdDisp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===+==+==| | 0 Quadro RTX 8000 On | :21:00.0 Off |0 | | N/A 30CP827W / 250W |285MiB / 46079MiB | 0% Default | | | | N/A | +---+--+--+ | 1 Quadro RTX 8000 On | :81:00.0 Off |0 | | N/A 30CP827W / 250W |285MiB / 46079MiB | 0% Default | | | | N/A | +---+--+--+ | 2 Quadro RTX 6000 On | :E2:00.0 Off |0 | | N/A 30CP824W / 250W |150MiB / 23039MiB | 0% Default | | | | N/A | +---+--+--+ Extract from nova.conf : ... [devices] enabled_vgpu_types = nvidia-428, nvidia-387 [vgpu_nvidia-428] device_addresses = :21:00.0,:81:00.0 [vgpu_nvidia-387] device_addresses = :e2:00.0 When nova-compute starts, log shows : 2021-04-01 17:15:25.454 7 WARNING nova.virt.libvirt.driver [req-bebc8637-d231-435c-a6cc-4613e14e2f76 - - - - -] The vGPU type 'nvidia-428' was listed in '[devices] enabled_vgpu_types' but no corresponding '[vgpu_nvidia-428]' group or '[vgpu_nvidia-428] device_addresses' option was defined. Only the first type 'nvidia-428' will be used. And a listing of resource providers on this node shows that only nvidia-428 GPUs were used : $ openstack resource provider list --os-placement-api-version 1.14 --in-tree f5d35bdc-b4b7-4764-a9d0-41f67fd95385 +--+++--+--+ | uuid | name | generation | root_provider_uuid | parent_provider_uuid | +--+++--+--+ | f5d35bdc-b4b7-4764-a9d0-41f67fd95385 | cloud-lyse-cmp-02 | 32 | f5d35bdc-b4b7-4764-a9d0-41f67fd95385 | None | | 21a4a16e-8d33-4a23-a924-b00f8c31f0d0 | cloud-lyse-cmp-02_pci__81_00_0 | 4 | f5d35bdc-b4b7-4764-a9d0-41f67fd95385 | f5d35bdc-b4b7-4764-a9d0-41f67fd95385 | | 76e1ee94-fbf2-410e-9711-fba71c709388 | cloud-lyse-cmp-02_pci__21_00_0 | 2 | f5d35bdc-b4b7-4764-a9d0-41f67fd95385 | f5d35bdc-b4b7-4764-a9d0-41f67fd95385 | +--+++--+--+ In nova.conf, if I swap nvidia-428 & nvidia-387 in
[Yahoo-eng-team] [Bug 1921804] Re: leftover bdm when rabbitmq unstable
While I understand your concern, the nova community has a consensus about telling that nova services shouldn't verify the status of Rabbit MQs and those should expect that MQs are working. Given there are workarounds for removing the attachment in case you had a failure, I'll move this bug to Wontfix. Unfortunately, this consensus isn't captured in https://docs.openstack.org/nova/latest/contributor/project-scope.html but I'll propose a patch for writing it clearly. ** Changed in: nova Status: New => Won't Fix ** Tags added: volumes ** Tags added: oslo -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1921804 Title: leftover bdm when rabbitmq unstable Status in OpenStack Compute (nova): Won't Fix Bug description: Description === When rabbitMQ unstable, there might be a chance that method https://github.com/openstack/nova/blob/7a1222a8654684262a8e589d91e67f2b9a9da336/nova/compute/api.py#L4741 will timeout but bdm is successfully created. Under such cases, volume will be shown in server show, but cannot be detached. and volume status is available. Steps to reproduce == there might be no way to safely reproduce this failure, because when rabbitmq is unstable, many other services will also show unusual behavior. Expected result === We should be able to remove such attachment from api without manually fixing db... ```console root@mgt02:~# openstack server show 4e5c3c7d-6b4c-4841-9e6e-9a3374036a3e +-+---+ | Field | Value | +-+---+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | cn-north-3a | | OS-EXT-SRV-ATTR:host| compute01 | | OS-EXT-SRV-ATTR:hypervisor_hostname | compute01 | | OS-EXT-SRV-ATTR:instance_name | instance-ce4c | | OS-EXT-STS:power_state | Running | | OS-EXT-STS:task_state | None | | OS-EXT-STS:vm_state | active | | OS-SRV-USG:launched_at | 2021-03-29T09:06:38.00 | | OS-SRV-USG:terminated_at| None | | accessIPv4 | | | accessIPv6 | | | addresses | newsql-net=192.168.1.217; service_mgt=100.114.3.41| | config_drive| True | | created | 2021-03-29T09:05:19Z | | flavor | newsql_2C8G40G_general (51db3192-cece-4b9a-9969-7916b4543beb) | | hostId | cf1f3937a3286677b3020d817541ac33d7c8f1ca74be49b26f128093 | | id | 4e5c3c7d-6b4c-4841-9e6e-9a3374036a3e | | image | newsql-bini2.0.0alpha-ubuntu18.04-x64-20210112-pub (4531e3bf-0433-40c6-816b-6763f9d02c7a) | | key_name| None | | name| NewSQL-1abc5b28-b9e6-45cd-893d-5bb3a7732a43-3 | | progress| 0
[Yahoo-eng-team] [Bug 1921381] Re: iSCSI: Flushing issues when multipath config has changed
** Also affects: nova Importance: Undecided Status: New ** Changed in: nova Status: New => Confirmed ** Changed in: nova Importance: Undecided => Critical ** Changed in: nova Assignee: (unassigned) => Lee Yarwood (lyarwood) ** Tags added: wallaby-rc-potential -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1921381 Title: iSCSI: Flushing issues when multipath config has changed Status in OpenStack Compute (nova): Confirmed Status in os-brick: In Progress Status in os-brick wallaby series: New Status in os-brick xena series: In Progress Bug description: OS-Brick disconnect_volume code assumes that the use_multipath parameter that is used to instantiate the connector has the same value than the connector that was used on the original connect_volume call. Unfortunately this is not necessarily true, because Nova can attach a volume, then its multipath configuration can be enabled or disabled, and then a detach can be issued. This leads to a series of serious issues such as: - Not flushing the single path on disconnect_volume (possible data loss) and leaving it as a leftover device on the host when Nova calls terminate-connection on Cinder. - Not flushing the multipath device (possible data loss) and leaving it as a lefover device similarly to the other case. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1921381/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1463631] Re: 60_nova/resources.sh:106:ping_check_public fails intermittently
putting it as invalid as we can't really help here, but in case I'm wrong, please punt it again as New. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1463631 Title: 60_nova/resources.sh:106:ping_check_public fails intermittently Status in grenade: Confirmed Status in neutron: New Status in OpenStack Compute (nova): Invalid Bug description: http://logs.openstack.org/12/186112/17/gate/gate-grenade- dsvm/4da364e/logs/grenade.sh.txt.gz#_2015-06-09_22_42_15_929 2015-06-09 22:42:13.960 | --- 172.24.5.1 ping statistics --- 2015-06-09 22:42:13.960 | 1 packets transmitted, 0 received, 100% packet loss, time 0ms 2015-06-09 22:42:13.960 | 2015-06-09 22:42:15.929 | + [[ True = \T\r\u\e ]] 2015-06-09 22:42:15.929 | + die 67 '[Fail] Couldn'\''t ping server' 2015-06-09 22:42:15.929 | + local exitcode=0 2015-06-09 22:42:15.929 | [Call Trace] 2015-06-09 22:42:15.929 | /opt/stack/new/grenade/projects/60_nova/resources.sh:134:verify 2015-06-09 22:42:15.929 | /opt/stack/new/grenade/projects/60_nova/resources.sh:101:verify_noapi 2015-06-09 22:42:15.929 | /opt/stack/new/grenade/projects/60_nova/resources.sh:106:ping_check_public 2015-06-09 22:42:15.929 | /opt/stack/new/grenade/functions:67:die 2015-06-09 22:42:15.931 | [ERROR] /opt/stack/new/grenade/functions:67 [Fail] Couldn't ping server 2015-06-09 22:42:16.933 | 1 die /opt/stack/old/devstack/functions-common 2015-06-09 22:42:16.933 | 67 ping_check_public /opt/stack/new/grenade/functions 2015-06-09 22:42:16.933 | 106 verify_noapi /opt/stack/new/grenade/projects/60_nova/resources.sh 2015-06-09 22:42:16.933 | 101 verify /opt/stack/new/grenade/projects/60_nova/resources.sh 2015-06-09 22:42:16.933 | 134 main /opt/stack/new/grenade/projects/60_nova/resources.sh 2015-06-09 22:42:16.933 | Exit code: 1 2015-06-09 22:42:16.961 | World dumping... see /opt/stack/old/worlddump-2015-06-09-224216.txt for details 2015-06-09 22:42:26.139 | [Call Trace] 2015-06-09 22:42:26.139 | ./grenade.sh:250:resources 2015-06-09 22:42:26.139 | /opt/stack/new/grenade/inc/plugin:82:die 2015-06-09 22:42:26.141 | [ERROR] /opt/stack/new/grenade/inc/plugin:82 Failed to run /opt/stack/new/grenade/projects/60_nova/resources.sh verify I wonder if there is a race in setting up security groups. http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiW0ZhaWxdIENvdWxkbid0IHBpbmcgc2VydmVyXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6ImN1c3RvbSIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJmcm9tIjoiMjAxNS0wNS0yN1QwMDozMDoxNiswMDowMCIsInRvIjoiMjAxNS0wNi0xMFQwMDozMDoxNiswMDowMCIsInVzZXJfaW50ZXJ2YWwiOiIwIn0sInN0YW1wIjoxNDMzODk2MjUwNTAyfQ== This hits in nova-network and neutron grenade jobs. To manage notifications about this bug go to: https://bugs.launchpad.net/grenade/+bug/1463631/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1902925] [NEW] Upgrades to compute RPC API 5.12 are broken
Public bug reported: In change https://review.opendev.org/#/c/715326/ we allowed a new argument to the rebuild_instance() RPC method named 'accel_uuids'. In the same change, in order to manage different version of computes, we allowed to not pass this argument if the destination RPC service is not able to speak 5.12. That being said, as we forgot to make the accel_uuids argument be nullable, we then accordingly cast a call to the compute manager without this attribute while it expects it, which would lead to a TypeError on the server side. FWIW, this can happen with any RPC pin, even with the compute='auto' default value as this value will elect to automatically pin a version that both the source and destination can support. ** Affects: nova Importance: Critical Assignee: Sylvain Bauza (sylvain-bauza) Status: Confirmed ** Affects: nova/victoria Importance: Critical Assignee: Sylvain Bauza (sylvain-bauza) Status: Confirmed ** Tags: compute upgrade ** Changed in: nova Status: New => Confirmed ** Changed in: nova Importance: Undecided => High ** Changed in: nova Importance: High => Critical ** Also affects: nova/victoria Importance: Undecided Status: New ** Changed in: nova/victoria Importance: Undecided => Critical ** Changed in: nova Assignee: (unassigned) => Sylvain Bauza (sylvain-bauza) ** Changed in: nova/victoria Assignee: (unassigned) => Sylvain Bauza (sylvain-bauza) ** Changed in: nova/victoria Status: New => Confirmed ** Tags added: compute upgrade -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1902925 Title: Upgrades to compute RPC API 5.12 are broken Status in OpenStack Compute (nova): Confirmed Status in OpenStack Compute (nova) victoria series: Confirmed Bug description: In change https://review.opendev.org/#/c/715326/ we allowed a new argument to the rebuild_instance() RPC method named 'accel_uuids'. In the same change, in order to manage different version of computes, we allowed to not pass this argument if the destination RPC service is not able to speak 5.12. That being said, as we forgot to make the accel_uuids argument be nullable, we then accordingly cast a call to the compute manager without this attribute while it expects it, which would lead to a TypeError on the server side. FWIW, this can happen with any RPC pin, even with the compute='auto' default value as this value will elect to automatically pin a version that both the source and destination can support. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1902925/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1900006] [NEW] Asking for different vGPU types is racey
Public bug reported: When testing on Victoria virtual GPUs, I wanted to have different types : [devices] enabled_vgpu_types = nvidia-320,nvidia-321 [vgpu_nvidia-320] device_addresses = :04:02.1,:04:02.2 [vgpu_nvidia-321] device_addresses = :04:02.3 Unfortunately, I saw that only the first type was used. When restarting the nova-compute service, we got the log : WARNING nova.virt.libvirt.driver [None req-a23d9cb4-6554-499c-9fcf-d7f9706535ef None None] The vGPU type 'nvidia-320' was listed in '[devices] enabled_vgpu_types' but no corresponding '[vgpu_nvidia-320]' group or '[vgpu_nvidia-320] device_addresses' option was defined. Only the first type 'nvidia-320' will be used. It's due to the fact that we call _get_supported_vgpu_types() first when creating the libvirt implementation [1] while we only register the new CONF options by init_host() [2] which is called after. [1] https://github.com/openstack/nova/blob/90777d790d7c268f50851ac3e5b4e02617f5ae1c/nova/virt/libvirt/driver.py#L418 [2] https://github.com/openstack/nova/blob/90777d7/nova/compute/manager.py#L1405 A simple fix would just be to make sure we have dynamic options within _get_supported_vgpu_types() ** Affects: nova Importance: Medium Assignee: Sylvain Bauza (sylvain-bauza) Status: Confirmed ** Tags: libvirt vgpu -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/196 Title: Asking for different vGPU types is racey Status in OpenStack Compute (nova): Confirmed Bug description: When testing on Victoria virtual GPUs, I wanted to have different types : [devices] enabled_vgpu_types = nvidia-320,nvidia-321 [vgpu_nvidia-320] device_addresses = :04:02.1,:04:02.2 [vgpu_nvidia-321] device_addresses = :04:02.3 Unfortunately, I saw that only the first type was used. When restarting the nova-compute service, we got the log : WARNING nova.virt.libvirt.driver [None req-a23d9cb4-6554-499c-9fcf-d7f9706535ef None None] The vGPU type 'nvidia-320' was listed in '[devices] enabled_vgpu_types' but no corresponding '[vgpu_nvidia-320]' group or '[vgpu_nvidia-320] device_addresses' option was defined. Only the first type 'nvidia-320' will be used. It's due to the fact that we call _get_supported_vgpu_types() first when creating the libvirt implementation [1] while we only register the new CONF options by init_host() [2] which is called after. [1] https://github.com/openstack/nova/blob/90777d790d7c268f50851ac3e5b4e02617f5ae1c/nova/virt/libvirt/driver.py#L418 [2] https://github.com/openstack/nova/blob/90777d7/nova/compute/manager.py#L1405 A simple fix would just be to make sure we have dynamic options within _get_supported_vgpu_types() To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/196/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1896741] Re: Intel mediated device info doesn't provide a name attribute
** Changed in: nova/ussuri Status: New => Confirmed ** Also affects: nova/victoria Importance: Low Assignee: Sylvain Bauza (sylvain-bauza) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1896741 Title: Intel mediated device info doesn't provide a name attribute Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) train series: Confirmed Status in OpenStack Compute (nova) ussuri series: Confirmed Status in OpenStack Compute (nova) victoria series: In Progress Bug description: When testing some Xeon server for virtual GPU support, I saw that Nova provides an exception as the i915 driver doesn't provide a name for mdev types : Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager Traceback (most recent call last): Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 9824, in _update_available_resource_for_node Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager startup=startup) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 896, in update_available_resource Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager self._update_available_resource(context, resources, startup=startup) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/oslo_concurrency/lockutils.py", line 360, in inner Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager return f(*args, **kwargs) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 981, in _update_available_resource Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager self._update(context, cn, startup=startup) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1233, in _update Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager self._update_to_placement(context, compute_node, startup) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/retrying.py", line 49, in wrapped_f Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager return Retrying(*dargs, **dkw).call(f, *args, **kw) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/retrying.py", line 206, in call Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager return attempt.get(self._wrap_exception) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/retrying.py", line 247, in get Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager six.reraise(self.value[0], self.value[1], self.value[2]) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/six.py", line 703, in reraise Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager raise value Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/retrying.py", line 200, in call Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager attempt = Attempt(fn(*args, **kwargs), attempt_number, False) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1169, in _update_to_placement Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager self.driver.update_provider_tree(prov_tree, nodename) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 7857, in update_provider_tree Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager provider_tree, nodename, allocations=allocations) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.ma
[Yahoo-eng-team] [Bug 1896741] Re: Intel mediated device info doesn't provide a name attribute
** Also affects: nova/train Importance: Undecided Status: New ** Also affects: nova/ussuri Importance: Undecided Status: New ** Changed in: nova/train Status: New => Confirmed ** Changed in: nova/ussuri Importance: Undecided => Low ** Changed in: nova/train Importance: Undecided => Low ** Changed in: nova/train Assignee: (unassigned) => Sylvain Bauza (sylvain-bauza) ** Changed in: nova/ussuri Assignee: (unassigned) => Sylvain Bauza (sylvain-bauza) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1896741 Title: Intel mediated device info doesn't provide a name attribute Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) train series: Confirmed Status in OpenStack Compute (nova) ussuri series: New Bug description: When testing some Xeon server for virtual GPU support, I saw that Nova provides an exception as the i915 driver doesn't provide a name for mdev types : Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager Traceback (most recent call last): Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 9824, in _update_available_resource_for_node Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager startup=startup) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 896, in update_available_resource Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager self._update_available_resource(context, resources, startup=startup) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/oslo_concurrency/lockutils.py", line 360, in inner Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager return f(*args, **kwargs) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 981, in _update_available_resource Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager self._update(context, cn, startup=startup) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1233, in _update Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager self._update_to_placement(context, compute_node, startup) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/retrying.py", line 49, in wrapped_f Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager return Retrying(*dargs, **dkw).call(f, *args, **kw) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/retrying.py", line 206, in call Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager return attempt.get(self._wrap_exception) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/retrying.py", line 247, in get Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager six.reraise(self.value[0], self.value[1], self.value[2]) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/six.py", line 703, in reraise Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager raise value Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/retrying.py", line 200, in call Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager attempt = Attempt(fn(*args, **kwargs), attempt_number, False) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1169, in _update_to_placement Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager self.driver.update_provider_tree(prov_tree, nodename) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 7857, in up
[Yahoo-eng-team] [Bug 1896741] [NEW] Intel mediated device info doesn't provide a name attribute
.py", line 6984, in _count_mdev_capable_devices Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager types=enabled_vgpu_types) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 7268, in _get_mdev_capable_devices Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager device = self._get_mdev_capabilities_for_dev(name, types) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 7253, in _get_mdev_capabilities_for_dev Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager 'name': cap['name'], Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager KeyError: 'name' For example : [root@mymachine ~]# ll /sys/class/mdev_bus/\:00\:02.0/mdev_supported_types/i915-GVTg_V5_8/ total 0 -r--r--r--. 1 root root 4096 Sep 22 14:18 available_instances --w---. 1 root root 4096 Sep 23 06:01 create -r--r--r--. 1 root root 4096 Sep 23 05:43 description -r--r--r--. 1 root root 4096 Sep 22 14:18 device_api drwxr-xr-x. 2 root root0 Sep 23 06:01 devices When looking at the kernel driver API documentation https://www.kernel.org/doc/html/latest/driver-api/vfio-mediated- device.html it says that the "name" attribute is optional: "name This attribute should show human readable name. This is optional attribute." The fix should be easy, we don't use this attribute in Nova. ** Affects: nova Importance: Low Assignee: Sylvain Bauza (sylvain-bauza) Status: Triaged ** Tags: libvirt vgpu -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1896741 Title: Intel mediated device info doesn't provide a name attribute Status in OpenStack Compute (nova): Triaged Bug description: When testing some Xeon server for virtual GPU support, I saw that Nova provides an exception as the i915 driver doesn't provide a name for mdev types : Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager Traceback (most recent call last): Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 9824, in _update_available_resource_for_node Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager startup=startup) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 896, in update_available_resource Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager self._update_available_resource(context, resources, startup=startup) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/oslo_concurrency/lockutils.py", line 360, in inner Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager return f(*args, **kwargs) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 981, in _update_available_resource Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager self._update(context, cn, startup=startup) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1233, in _update Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager self._update_to_placement(context, compute_node, startup) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/retrying.py", line 49, in wrapped_f Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager return Retrying(*dargs, **dkw).call(f, *args, **kw) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/retrying.py", line 206, in call Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager return attempt.get(self._wrap_exception) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/retrying.py", line 247, in get Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager six.reraise(self.value[0], self.value[1],
[Yahoo-eng-team] [Bug 1887377] Re: nova does not loadbalance asignmnet of resources on a host based on avaiablity of pci device, hugepages or pcpus.
While I totally understand the use case, I think this is a new feature for performance reasons and not a bug. CLosing it as Wishlist but of course you can work on it if you wish ;) ** Changed in: nova Importance: Undecided => Wishlist ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1887377 Title: nova does not loadbalance asignmnet of resources on a host based on avaiablity of pci device, hugepages or pcpus. Status in OpenStack Compute (nova): Invalid Bug description: Nova has supported hugpages, cpu pinning and pci numa affintiy for a very long time. since its introduction the advice has always been to create a flavor that mimic your typeical hardware toplogy. i.e. if all your compute host have 2 numa nodes the you should create flavor that request 2 numa nodes. for along time operators have ignored this advice and continued to create singel numa node flavor sighting that after 5+ year of hardware venders working with VNF vendor to make there product numa aware, vnf often still do not optimize properly for a multi numa environment. as a result many operator still deploy single numa vms although that is becoming less common over time. when you deploy a vm with a single numa node today we more or less iterate over the host numa node in order and assign the vm to the first numa nodes where it fits. on a host without any pci devices whitelisted for openstack management this behvaior result in numa nodes being filled linerally form numa 0 to numa n. that mean if a host had 100G of hugepage on both numa node 0 and 1 and you schduled 101 1G singel numa vms to the host, 100 vm would spawn on numa0 and 1 vm would spwan on numa node 1. that means that the first 100 vms would all contened for cpu resouces on the first numa node while the last vm had all of the secound numa ndoe to its own use. the correct behavior woudl be for nova to round robin asign the vms attepmetin to keep the resouce avapiableity blanced. this will maxiumise performance for indivigual vms while pessimisng the schduling of large vms on a host. to this end a new numa blancing config option (unset, pack or spread) should be added and we should sort numa nodes in decending(spread) or acending(pack) order based on pMEM, pCPUs, mempages and pci devices in that sequence. in future release when numa is in placment this sorting will need to be done in a weigher that sorts the allocation caindiates based on the same pack/spread cirtira. i am filing this as a bug not a feature as this will have a significant impact for existing deployment that either expected https://specs.openstack.org/openstack/nova- specs/specs/pike/implemented/reserve-numa-with-pci.html to implement this logic already or who do not follow our existing guidance on creating flavor that align to the host topology. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1887377/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1893904] Re: Placement is not updated if a VGPU is re-created on a new GPU upon host reboot
This was a known issue that should have been fixed by https://review.opendev.org/#/c/715489/ which was merged during the Ussuri timeframe. For being clear, since mdevs disappear when you reboot, Nova now tries to find the already provided GPU by looking at the guest XML. Closing this bug as the master branch no longer has the bug, but please reopen it in case you can reproduce the problem with master. ** Changed in: nova Status: New => Won't Fix ** Changed in: nova Importance: Undecided => Low -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1893904 Title: Placement is not updated if a VGPU is re-created on a new GPU upon host reboot Status in OpenStack Compute (nova): Won't Fix Bug description: First of all, I'm not really sure which project to "blame" for this bug, but here's the problem: When you reboot a compute-node with Nvidia GRID and guests running with a VGPU attached, the guests will often have their VGPU re-created on a different GPU than before the reboot. This is not updated in placement, causing the placement API to provide false information about which resource provider that is actually a valid allocation candidate for a new VGPU. Steps to reproduce: 1. Create a new instance with a VGPU attached, take note of wich GPU the VGPU is created on (with nvidia-smi vgpu) 2. Reboot the compute-node 3. Start the instance, and observe that its VGPU now lives on a different GPU 4. Check "openstack allocation candidate list --resource VGPU=1" and correlate the resource provider id with "openstack resource provider list" to see that placement now will list the allocated GPU as free, and the inital GPU (from before the reboot) is still marked as used. This will obviously only be an issue on compute-nodes with multiple physical GPUs. Examples: https://paste.ubuntu.com/p/PZ6qgKtnRb/ This will eventually cause scheduling of new VGPU instances to fail, because they will try to use a device that in reality is already used (but marked as available in placement) Expected results: Either that the GRID-driver and libvirt should ensure that an instance keeps the same GPU for its VGPU through reboots (effectively making this.. not a nova bug) OR nova-compute should notify placement of the change and update the allocations Versions: This was first observed in stein, but the issue is also present in train. # rpm -qa | grep nova python2-nova-20.3.0-1.el7.noarch python2-novaclient-15.1.1-1.el7.noarch openstack-nova-compute-20.3.0-1.el7.noarch openstack-nova-common-20.3.0-1.el7.noarch To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1893904/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1895092] Re: Error when trying block migration
This sounds a configuration issue : 2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi [req-2c75be17-d5c4-4acd-a302-326388068067 170fdf1f861847fa995f2f0646ec4143 85dd9df42f4e47b3b0fc5848ab947b62 - default default] Unexpected exception in API method: MigrationError_Remote: Migration error: Unable to establish connection to http://controller01:5000/v3/auth/tokens: ('Connection aborted.', BadStatusLine('No status line received - the server has closed the connection',)) When trying to get a token for the migration, Keystone closed the connection abruptely. Please introspect the keystone logs, this is either way unrelated to Nova itself. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1895092 Title: Error when trying block migration Status in OpenStack Compute (nova): Invalid Bug description: When I try live-migration to different host by using block migration with following command, API error is occurred. $ openstack server migrate --block-migration 3bf28a9d-0545-4d30-8892-6e2af655db4a --live compute40 --- error message --- Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible. (HTTP 500) (Request-ID: req-2c75be17-d5c4-4acd-a302-326388068067) - I attached the log of nova-api and information of openstack environment as follows. ### /var/log/nova/nova-api.log 2020-09-10 14:27:47.614 3545 INFO nova.osapi_compute.wsgi.server [req-1028bb0a-a95a-41b1-9ab9-2ebf7b19039f 170fdf1f861847fa995f2f0646ec4143 85dd9df42f4e47b3b0fc5848ab947b62 - default default] 10.81.0.2 "GET /v2.1/servers/3bf28a9d-0545-4d30-8892-6e2af655db4a HTTP/1.1" status: 200 len: 2312 time: 0.7340441 2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi [req-2c75be17-d5c4-4acd-a302-326388068067 170fdf1f861847fa995f2f0646ec4143 85dd9df42f4e47b3b0fc5848ab947b62 - default default] Unexpected exception in API method: MigrationError_Remote: Migration error: Unable to establish connection to http://controller01:5000/v3/auth/tokens: ('Connection aborted.', BadStatusLine('No status line received - the server has closed the connection',)) 2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi Traceback (most recent call last): 2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/dist-packages/nova/api/openstack/wsgi.py", line 801, in wrapped 2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi return f(*args, **kwargs) 2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/dist-packages/nova/api/validation/__init__.py", line 110, in wrapper 2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/dist-packages/nova/api/validation/__init__.py", line 110, in wrapper 2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/dist-packages/nova/api/validation/__init__.py", line 110, in wrapper 2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/dist-packages/nova/api/openstack/compute/migrate_server.py", line 111, in _migrate_live 2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi async_) 2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 206, in inner 2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi return function(self, context, instance, *args, **kwargs) 2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 214, in _wrapped 2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi return fn(self, context, instance, *args, **kwargs) 2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 154, in inner 2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi return f(self, context, instance, *args, **kw) 2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 4550, in live_migrate 2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi request_spec=request_spec, async_=async_) 2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/dist-packages/nova/conductor/api.py", line 112, in live_migrate_instance 2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi
[Yahoo-eng-team] [Bug 1838309] Re: Live migration might fail when run after revert of previous live migration
Now that the minimum versions for Ussuri are libvirt 4.0.0 are QEMU 2.1, I think we can close this one unless libvirt 4.0.0 with QEMU 2.5 have the same issues. Please open this one again if you see this. ** Changed in: nova Status: New => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1838309 Title: Live migration might fail when run after revert of previous live migration Status in OpenStack Compute (nova): Won't Fix Bug description: When migrating an instance between two computes on queens, running two different qemu versions, first live migration failed and was rolled back (traceback follows just in case, unrelated to this issue): 2019-07-26 14:39:44.469 1576 ERROR nova.virt.libvirt.driver [req-26f3a831-8e4f-43a2-83ce-e60645264147 0aa8a4a6ed7d4733871ef79fa0302d43 31ee6aa6bff7498fba21b9807697ec32 - default default] [instance: b0681d51-2924-44be-a8b7-36db0d86b92f] Live Migration failure: internal error: qemu unexpectedly closed the monitor: 2019-07-26 14:39:43.479+: Domain id=16 is tainted: shell-scripts 2019-07-26T14:39:43.630545Z qemu-system-x86_64: -drive file=rbd:cinder/volume-df3d0060-451c-4b22-8d15-2c579fb47681:id=cinder:auth_supported=cephx\;none:mon_host=192.168.16.14\:6789\;192.168.16.15\:6789\;192.168.16.16\:6789,file.password-secret=virtio-disk2-secret0,format=raw,if=none,id=drive-virtio-disk2,serial=df3d0060-451c-4b22-8d15-2c579fb47681,cache=writeback,discard=unmap: 'serial' is deprecated, please use the corresponding option of '-device' instead 2019-07-26T14:39:44.075108Z qemu-system-x86_64: VQ 2 size 0x80 < last_avail_idx 0xedda - used_idx 0xeddd 2019-07-26T14:39:44.075130Z qemu-system-x86_64: Failed to load virtio-balloon:virtio 2019-07-26T14:39:44.075134Z qemu-system-x86_64: error while loading state for instance 0x0 of device ':00:07.0/virtio-balloon' 2019-07-26T14:39:44.075582Z qemu-system-x86_64: load of migration failed: Operation not permitted: libvirtError: internal error: qemu unexpectedly closed the monitor: 2019-07-26 14:39:43.479+: Domain id=16 is tainted: shell-scripts then, after revert, live migration was retried, and now it failed because of the following problem: {u'message': u'Requested operation is not valid: cannot undefine transient domain', u'code': 500, u'details': u' File "/usr/lib/python2.7/dist-packages/nova/compute/manag er.py", line 202, in decorated_function\nreturn function(self, context, *args, **kwargs)\n File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 6438, in _post_live_migration\ndestroy_vifs=destroy_vifs)\n File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 1100, in cleanup\nself._undefine_domain(instance)\n File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 1012, in _undefine_domain\ninstance=instance)\n File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__\nself.force_reraise()\n File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise\nsix.reraise(self.type_, self.value, self.tb)\n File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 999, in _undefine_domain\nguest.delete_configuration(support_uefi)\n File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/guest.py", line 271, in delete_configuration\nself._domain.undefine()\n File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit\n result = proxy_call(self._autowrap, f, *args, **kwargs)\n File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in proxy_call\n rv = execute(f, *args, **kwargs)\n File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute\n six.reraise(c, e, tb)\n File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker\n rv = meth(*args, **kwargs)\n File "/usr/lib/python2.7/dist-packages/libvirt.py", line 2701, in undefine\nif ret == -1: raise libvirtError (\'virDomainUndefine() failed\', dom=self)\n', u'created': u'2019-07-29T14:39:41Z'} It seems to happen because a domain was already undefined once on the first try to live migrate and after that it can not be undefined second time. We might need to check if the domain is persistent before undefining it in case of live migrations. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1838309/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1877281] [NEW] vGPU multiple instance creation test is racey
Public bug reported: Zuul can fail sometimes on : 2020-05-05 09:07:46.656481 | ubuntu-bionic | == 2020-05-05 09:07:46.656502 | ubuntu-bionic | Failed 1 tests - output below: 2020-05-05 09:07:46.656518 | ubuntu-bionic | == 2020-05-05 09:07:46.656533 | ubuntu-bionic | 2020-05-05 09:07:46.656548 | ubuntu-bionic | nova.tests.functional.libvirt.test_vgpu.VGPUTests.test_multiple_instance_create 2020-05-05 09:07:46.656563 | ubuntu-bionic | --- 2020-05-05 09:07:46.656577 | ubuntu-bionic | 2020-05-05 09:07:46.656594 | ubuntu-bionic | Captured traceback: 2020-05-05 09:07:46.656609 | ubuntu-bionic | ~~~ 2020-05-05 09:07:46.656625 | ubuntu-bionic | Traceback (most recent call last): 2020-05-05 09:07:46.656651 | ubuntu-bionic | 2020-05-05 09:07:46.656669 | ubuntu-bionic | File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/libvirt/test_vgpu.py", line 248, in test_multiple_instance_create 2020-05-05 09:07:46.656686 | ubuntu-bionic | self.assert_vgpu_usage_for_compute(self.compute1, expected=2) 2020-05-05 09:07:46.656701 | ubuntu-bionic | 2020-05-05 09:07:46.656716 | ubuntu-bionic | File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/libvirt/test_vgpu.py", line 178, in assert_vgpu_usage_for_compute 2020-05-05 09:07:46.656732 | ubuntu-bionic | self.assertEqual(expected, len(mdevs)) 2020-05-05 09:07:46.656784 | ubuntu-bionic | 2020-05-05 09:07:46.656803 | ubuntu-bionic | File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/testtools/testcase.py", line 415, in assertEqual 2020-05-05 09:07:46.656818 | ubuntu-bionic | self.assertThat(observed, matcher, message) 2020-05-05 09:07:46.656834 | ubuntu-bionic | 2020-05-05 09:07:46.656848 | ubuntu-bionic | File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/testtools/testcase.py", line 502, in assertThat 2020-05-05 09:07:46.656863 | ubuntu-bionic | raise mismatch_error 2020-05-05 09:07:46.656878 | ubuntu-bionic | 2020-05-05 09:07:46.656892 | ubuntu-bionic | testtools.matchers._impl.MismatchError: 2 != 1 2020-05-05 09:07:46.656907 | ubuntu-bionic | Logstash query : http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22testtools.matchers._impl.MismatchError%3A%202%20!%3D%201%5C%22%20AND%20build_name%3A%5C%22nova-tox-functional-py36%5C%22 8 occurrences over 7 days. ** Affects: nova Importance: High Assignee: Sylvain Bauza (sylvain-bauza) Status: In Progress ** Tags: gate-failure vgpu ** Tags added: vgpu -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1877281 Title: vGPU multiple instance creation test is racey Status in OpenStack Compute (nova): In Progress Bug description: Zuul can fail sometimes on : 2020-05-05 09:07:46.656481 | ubuntu-bionic | == 2020-05-05 09:07:46.656502 | ubuntu-bionic | Failed 1 tests - output below: 2020-05-05 09:07:46.656518 | ubuntu-bionic | == 2020-05-05 09:07:46.656533 | ubuntu-bionic | 2020-05-05 09:07:46.656548 | ubuntu-bionic | nova.tests.functional.libvirt.test_vgpu.VGPUTests.test_multiple_instance_create 2020-05-05 09:07:46.656563 | ubuntu-bionic | --- 2020-05-05 09:07:46.656577 | ubuntu-bionic | 2020-05-05 09:07:46.656594 | ubuntu-bionic | Captured traceback: 2020-05-05 09:07:46.656609 | ubuntu-bionic | ~~~ 2020-05-05 09:07:46.656625 | ubuntu-bionic | Traceback (most recent call last): 2020-05-05 09:07:46.656651 | ubuntu-bionic | 2020-05-05 09:07:46.656669 | ubuntu-bionic | File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/libvirt/test_vgpu.py", line 248, in test_multiple_instance_create 2020-05-05 09:07:46.656686 | ubuntu-bionic | self.assert_vgpu_usage_for_compute(self.compute1, expected=2) 2020-05-05 09:07:46.656701 | ubuntu-bionic | 2020-05-05 09:07:46.656716 | ubuntu-bionic | File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/libvirt/test_vgpu.py", line 178, in assert_vgpu_usage_for_compute 2020-05-05 09:07:46.656732 | ubuntu-bionic | self.assertEqual(expected, len(mdevs)) 2020-05-05 09:07:46.656784 | ubuntu-bionic | 2020-05-05 09:07:46.656803 | ubuntu-bionic | File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/testtools/testcase.py", line 415, in assertEqual 2020-05-05 09:07:46.656818 | ubuntu-bionic | self.assertThat(observed, matcher, mess
[Yahoo-eng-team] [Bug 1780225] Re: Libvirt error when using --max > 1 with vGPU
In Stein, we merged the ability to have multiple Resource Providers, each of them being a pGPU. In Ussuri, we accepted to have a specific vGPU type per pGPU. Now, I tested the above behaviour with https://review.opendev.org/723858 and it works now, unless you ask for a specific total capacity. I'll close this bug that was only for libvirt vGPUs and please look at https://bugs.launchpad.net/nova/+bug/1874664 for the related issue. ** Changed in: nova Status: Confirmed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1780225 Title: Libvirt error when using --max > 1 with vGPU Status in OpenStack Compute (nova): Fix Released Bug description: Description === Using devstack Rocky with a NVIDIA Tesla M10 + GRID driver on RHEL 7.5. Profile used in nova: nvidia-35 (num_heads=2, frl_config=45, framebuffer=512M, max_resolution=2560x1600, max_instance=16) I can launch instances one by one without any issue. I cannot use --max paramater greater than 1. Expected result === Be able to use --max parameter with vGPU Steps to reproduce == [root@host2 ~]# openstack server list +--+---++-+++ | ID | Name | Status | Networks | Image | Flavor | +--+---++-+++ | 56aeda96-f193-49fc-914d-8b507674eb16 | instance0 | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fef2:8e20, 10.0.0.12, 172.24.4.2 | rhel75 | vgpu | +--+---++-+++ [root@host2 ~]# openstack server create --flavor vgpu --image rhel75 --key-name myself --max 2 instance +-+---+ | Field | Value | +-+---+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | | | OS-EXT-SRV-ATTR:host| None | | OS-EXT-SRV-ATTR:hypervisor_hostname | None | | OS-EXT-SRV-ATTR:instance_name | | | OS-EXT-STS:power_state | NOSTATE | | OS-EXT-STS:task_state | scheduling | | OS-EXT-STS:vm_state | building | | OS-SRV-USG:launched_at | None | | OS-SRV-USG:terminated_at| None | | accessIPv4 | | | accessIPv6 | | | addresses | | | adminPass | iNiFmD6kNszw | | config_drive| | | created | 2018-07-05T09:19:25Z | | flavor | vgpu (vgpu1) | | hostId | | | id | 5a8691a8-a18c-4c71-8541-be00f224fd82 | | image | rhel75 (e63a49a8-4568-4b57-9d12-1eb1ede28438) | | key_name| myself | | name| instance-1 | | progress| 0 | | project_id | fdea2c781db74ae593c5e9501e9290cc | | properties | | | security_groups | name='default' | | status | BUILD | | updated | 2018-07-05T09:19:25Z |
[Yahoo-eng-team] [Bug 1874664] Re: Boot more than one instances failed with accelerators in its flavor
Given we are after RC1 (which means that we only accept regression bugfixes for RC2 and later versions), I think we should just document the current caveat in https://docs.openstack.org/api-guide/compute /accelerator-support.html and trying to backport the bugfix for a later Ussuri release (say 21.0.1). ** Also affects: nova/ussuri Importance: Undecided Status: New ** Changed in: nova/ussuri Importance: Undecided => Medium -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1874664 Title: Boot more than one instances failed with accelerators in its flavor Status in OpenStack Compute (nova): Confirmed Status in OpenStack Compute (nova) ussuri series: New Bug description: When boot more than one instance with accelerator, and the accelerators are in one compute node, there will be two problems as below: One problem is as we always get the first item(alloc_reqs[0]) in alloc_reqs, when we iterator the second instance, it will throw conflict exception when putting the allocations. Another is as we always get the first item in alloc_reqs_by_rp_uuid.get(selected_host.uuid), the selected_alloc_req is always stable, that will cause the values in selections_to_return are same . In fact, it's not right for subsequent operations. More details you can see: https://etherpad.opendev.org/p/filter_scheduler_issue_with_accelerators To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1874664/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1636825] Re: Instances for which rebuild failed get deleted from source host
This is indeed fixed upstream, as you can see in the source code there https://github.com/openstack/nova/blob/2cddf595a8cdedbdb844e800d853ea143817b545/nova/compute/manager.py#L721-L738 We only delete instances if the evacuation was either done, or just precreated. If the migration wasn't good, then we don't delete the instance. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1636825 Title: Instances for which rebuild failed get deleted from source host Status in OpenStack Compute (nova): Invalid Bug description: Description === In the current implementation we have the method '_destroy_evacuated_instances' in compute.manager which deletes any instances from source after they have been evacuated. This method is called as part of host initialization (init_host) and checks the migration records for VMs which were evacuated. There is a possibility that if a VM fails as part of rebuild operation on destination host after creating a migration record, then when the source host is brought back up it may end up deleting the VM from source as well. To fix this we should check the 'host' attribute in instances table before deleting the VM and delete VM from source only if the host has been updated in db after rebuild. Steps to reproduce == * deploy a VM * Bring down the host where VM was deployed. * Evacuate the instance to another host where the rebuild operation may fail (insufficient resources or storage issue) * This will result in VM not being present on destination host. * Check that a migration record of type 'evacuation' is present in db. * Bring the source host up. Expected result === The VM should be present on the source host. Actual result = VM gets deleted as part of evacuated instance cleanup on stat-up of compute service on source host. Environment === 1. Exact version of OpenStack you are running. See the following Openstack Newton 2. Which hypervisor did you use? PowerVM 3. Which networking type did you use? Neutron with OpenVSwitch Logs & Configs == In the logs following message is seen on startup - 2016-10-24 09:32:11.131 3169 INFO nova.compute.manager [req-6611fe85-0515-4cb4-b1c0-3f34f196a0c7 - - - - -] [instance: ed9ca4b9-8938-4d7b-9eec-1dd6ca7bc8c8] Deleting instance as it has been evacuated from this host To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1636825/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1738297] Re: Nova Destroys Local Disks for Instance with Broken iSCSI Connection to Cinder Volume Upon Resume from Suspend
I'm happy the main root cause is fixed (deleting the source disks). To be clear, you can configure to resume guest states on compute service restarts with the flag https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.resume_guests_state_on_host_boot Closing the bug. ** Changed in: nova Status: New => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1738297 Title: Nova Destroys Local Disks for Instance with Broken iSCSI Connection to Cinder Volume Upon Resume from Suspend Status in OpenStack Compute (nova): Won't Fix Bug description: Background: Libvirt + KVM cloud running Newton (but relevant code appears the same on master). Earlier this week we had some issues with a Cinder storage server (it uses LVM+iSCSI). tgt service was consuming 100% CPU (after running for several months) and Compute nodes lost iSCSI connection. I had to restart tgt, cinder-volume service, and a number of compute hosts + instances. Today, a user tried resuming their instance which was suspended before aforementioned trouble. (Note: this instance has root and ephemeral disks stored locally, third disk on shared Cinder storage). It appears (per below-linked logs) that the iSCSI connection from the compute host to the Cinder storage server was broken/missing, and because of this, Cinder apparently "cleaned up" the instance including *destroying its disk files*. Instance is now in error state. nova-compute.log: http://paste.openstack.org/show/628991/ /var/log/syslog: http://paste.openstack.org/show/628992/ We're still running Newton but the code appears the same on master. Based on the log messages ("Deleting instance files" and "Deletion of /var/lib/nova/instances/68058b22-e17f-42f7-80ff-aeb06cbc82cb_del complete"), it appears that we ended up in this function, `delete_instance_files`: https://github.com/openstack/nova/blob/stable/newton/nova/virt/libvirt/driver.py#L7745-L7801 A trace wasn't logged for this, but I'm guessing we got here from the `cleanup` function: https://github.com/openstack/nova/blob/a0e4f627f0be48db65c23f4f180d4bc6dd68cc83/nova/virt/libvirt/driver.py#L933-L1032 One of `cleanup`'s arguments is `destroy_disks=True`, so I'm guessing this was run with defaults or not overridden. (Someone, please correct me if the available data suggest otherwise!) Nobody requested a Delete action, so this appears to be Nova deciding to destroy an instance's local disks after encountering an otherwise- unhandled exception related to the iSCSI device being unavailable. I will try to reproduce and update the bug if successful. For us, losing an instance's data is a Problem -- our users (scientists) often store unique data on instances that are configured by hand. If an instance cannot be resumed, I would much rather Nova leave the instance's disks intact for investigation / data recovery, instead of throwing everything out. For deployments whose instances may contain important data, could this behavior be made configurable? Perhaps "destroy_disks_on_failed_resume = False" in nova.conf? Thank you! Chris Martin (P.S. actually a Cinder question, but someone here may know: is there something that can/should be done to re-initialize iSCSI connections between compute nodes and a Cinder storage server after a recovered failure of the iSCSI target service on the storage server?) To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1738297/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp