[Yahoo-eng-team] [Bug 2053061] Re: Unexpected API Error.

2024-03-19 Thread Sylvain Bauza
As said by the exception, your certificates are invalid. Please double-check 
your configs.
This isn't a Nova bug, rather a configuration problem, closing the report.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2053061

Title:
  Unexpected API Error.

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  2024-02-13 20:01:27.785 2124 ERROR nova.api.openstack.wsgi 
keystoneauth1.exceptions.connection.SSLError: SSL exception connecting to 
https://controller:9292/v2/images/7b393dfb-e35d-4915-9da0-f532fecc349c: 
HTTPSConnectionPool(host='controller', port=9292): Max retries exceeded with 
url: /v2/images/7b393dfb-e35d-4915-9da0-f532fecc349c (Caused by 
SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify 
failed (_ssl.c:897)'),))
  2024-02-13 20:01:27.785 2124 ERROR nova.api.openstack.wsgi 
  2024-02-13 20:01:27.794 2124 INFO nova.api.openstack.wsgi 
[req-7cd9c353-06b6-45c4-b294-5994b63094c0 a694ed41a63240a982b3110fde0248be 
6e523b33312a4ac79590df56f886ceb2 - default default] HTTP exception thrown: 
Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and 
attach the Nova API log if possible.
  
  2024-02-13 20:01:27.795 2124 INFO nova.osapi_compute.wsgi.server 
[req-7cd9c353-06b6-45c4-b294-5994b63094c0 a694ed41a63240a982b3110fde0248be 
6e523b33312a4ac79590df56f886ceb2 - default default] 127.0.0.1 "POST 
/v2.1/6e523b33312a4ac79590df56f886ceb2/servers HTTP/1.1" status: 500 len: 651 
time: 0.0623181
  [root@controller nova(keystone)]# cat nova-api.log | grep SSLError
  2024-02-13 19:29:52.080 24184 ERROR nova.api.openstack.wsgi 
[req-18342d99-d0c7-436b-a6c8-1c6fa8966ae9 a694ed41a63240a982b3110fde0248be 
6e523b33312a4ac79590df56f886ceb2 - default default] Unexpected exception in API 
method: keystoneauth1.exceptions.connection.SSLError: SSL exception connecting 
to https://controller:9292/v2/images/7b393dfb-e35d-4915-9da0-f532fecc349c: 
HTTPSConnectionPool(host='controller', port=9292): Max retries exceeded with 
url: /v2/images/7b393dfb-e35d-4915-9da0-f532fecc349c (Caused by 
SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify 
failed (_ssl.c:897)'),))
  2024-02-13 19:29:52.080 24184 ERROR nova.api.openstack.wsgi ssl.SSLError: 
[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)
  2024-02-13 19:29:52.080 24184 ERROR nova.api.openstack.wsgi 
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='controller', 
port=9292): Max retries exceeded with url: 
/v2/images/7b393dfb-e35d-4915-9da0-f532fecc349c (Caused by SSLError(SSLError(1, 
'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),))
  2024-02-13 19:29:52.080 24184 ERROR nova.api.openstack.wsgi raise 
SSLError(e, request=request)
  2024-02-13 19:29:52.080 24184 ERROR nova.api.openstack.wsgi 
requests.exceptions.SSLError: HTTPSConnectionPool(host='controller', 
port=9292): Max retries exceeded with url: 
/v2/images/7b393dfb-e35d-4915-9da0-f532fecc349c (Caused by SSLError(SSLError(1, 
'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),))
  2024-02-13 19:29:52.080 24184 ERROR nova.api.openstack.wsgi raise 
exceptions.SSLError(msg)
  2024-02-13 19:29:52.080 24184 ERROR nova.api.openstack.wsgi 
keystoneauth1.exceptions.connection.SSLError: SSL exception connecting to 
https://controller:9292/v2/images/7b393dfb-e35d-4915-9da0-f532fecc349c: 
HTTPSConnectionPool(host='controller', port=9292): Max retries exceeded with 
url: /v2/images/7b393dfb-e35d-4915-9da0-f532fecc349c (Caused by 
SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify 
failed (_ssl.c:897)'),))
  
  2024-02-13 19:32:24.553 24184 ERROR nova.api.openstack.wsgi 
[req-9c284a0d-7109-4c9b-a74b-c653e030a6b6 a694ed41a63240a982b3110fde0248be 
6e523b33312a4ac79590df56f886ceb2 - default default] Unexpected exception in API 
method: keystoneauth1.exceptions.connection.SSLError: SSL exception connecting 
to https://controller:9292/v2/images/7b393dfb-e35d-4915-9da0-f532fecc349c: 
HTTPSConnectionPool(host='controller', port=9292): Max retries exceeded with 
url: /v2/images/7b393dfb-e35d-4915-9da0-f532fecc349c (Caused by 
SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify 
failed (_ssl.c:897)'),))
  2024-02-13 19:32:24.553 24184 ERROR nova.api.openstack.wsgi ssl.SSLError: 
[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)
  2024-02-13 19:32:24.553 24184 ERROR nova.api.openstack.wsgi 
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='controller', 
port=9292): Max retries exceeded with url: 
/v2/images/7b393dfb-e35d-4915-9da0-f532fecc349c (Caused by SSLError(SSLError(1, 
'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),))
  2024-02-13 19:32:24.553 24184 ERROR nova.api.openstack.wsgi 

[Yahoo-eng-team] [Bug 2054329] Re: orphan allocations cause orphan resource providers and prevents compute service deletion

2024-03-19 Thread Sylvain Bauza
This is a known issue that we recently fixed by ensuring that you can't
change the hostname silently :
https://specs.openstack.org/openstack/nova-
specs/specs/2023.1/implemented/stable-compute-uuid.html

That series won't be backported to Zed so I'd recommend you to upgrade
to Antelope. In the meantime, you can do some janitory on the orphaned
resources by using the 'nova-manage placement audit' command which will
tell you which placement resources are zombies.


** Changed in: nova
   Status: New => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2054329

Title:
  orphan allocations cause orphan resource providers and prevents
  compute service deletion

Status in OpenStack Compute (nova):
  Won't Fix

Bug description:
  Description
  ===
  It can happen, that there are orphan allocations against a resource provider.
  E.g. when something went wrong during a migration.

  During the deletion of a nova-compute-service, the nova-api tries to delete 
the resource-provider in placement aswell.
  When the resource provider has still allocations against it, the deletion of 
the resource-provider will fail but the deletion of the nova-compute-service 
will be successfull.
  This causes orphan resource-providers.

  This is based on the try-catch around the deletion of the resource-provider:
  
https://opendev.org/openstack/nova/src/commit/6e510eb62e00c34e98a5245a6de2dd2955ffb57a/nova/api/openstack/compute/services.py#L321

  If a new nova-compute-service with the same hostname gets created, it will 
not create a new resource provider as there is already one with the correct 
hostname.
  This causes a mismatch between the ID of the nova-compute-service and the ID 
of the resource-provider.

  If you now try to delete the new nova-compute-service, it will generate an 
'ValueError', due to this mismatch.
  This also happens for all other requests to placement, where the 
resource_provider is referenced via the UUID instead of the name.

  Steps to reproduce
  ==
  1. Generate orphaned allocations on a resource provider
  Can be done by generating a random allocation:
  ```
  openstack resource provider allocation set  
--allocation="rp=,VCPU=2" --project-id 
 --user-id 
  ```
  2. Delete the nova-compute-service via the nova-api
  3. Restart the nova-compute service, so a new nova-compute-service is created
  4. You will start to see erros in the logs of placement/nova-api, regarding 
not finding the resource provider with the old UUID
  5. Delete the nova-compute-service via the nova-api, this will generate a 500 
error and the nova-compute-service is not deleted.

  Expected result
  ===
  No erros in the logs regarding not finding a resource-provider based on its 
ID.
  The deletion of the recreated nova-compute-service should be succesfull.

  Actual result
  =
  We see erros in the log regarding not finding the resource provider:
  ```
  An error occurred while updating COMPUTE_STATUS_DISABLED trait on compute 
node resource provider d5d7cf1c-51ea-4139-9fc3-6007ba58441e. The trait will be 
synchronized when the update_available_resource periodic task runs. Error: 
Failed to get traits for resource provider with UUID 
d5d7cf1c-51ea-4139-9fc3-6007ba58441e
  ```
  We are not able to delete the newly created nova-compute-service, due to a 
ValueError as it is not able to find the resource-provider based on the 
nova-compute-service UUID.

  Environment
  ===
  We are running Openstack Zed, but based on the Code the issue should be still 
present on the master branch.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2054329/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2055419] Re: network autoallocation fails for non-admin user

2024-03-19 Thread Sylvain Bauza
Seems to me a neutron issue, moving the bug report to the proper
project.

** Also affects: neutron
   Importance: Undecided
   Status: New

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2055419

Title:
  network autoallocation fails for non-admin user

Status in neutron:
  New
Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Description
  ===
  Automatic allocation of network topologies 
(https://docs.openstack.org/neutron/latest/admin/config-auto-allocation.html) 
causes unexpected API error when requested by user without admin role.

  Tempest test affected:

  
tempest.api.compute.admin.test_auto_allocate_network.AutoAllocateNetworkTest.test_server_multi_create_auto_allocate

  is failing.

  Steps to reproduce
  ==

  * request server creation with network autoallocation as user without
  admin role:

  $ openstack --os-compute-api-version 2.37 server create --flavor
   --image  --nic auto vm1

  Expected result
  ===
  Forbidden response (if i understand documentation correctly) or creation of 
network and router (if it is allowed).

  Actual result
  =
  Unexpected API Error.

   ERROR nova.api.openstack.wsgi [None req-   - - 
default default] Unexpected exception in API method: 
neutronclient.common.exceptions.NotFound: The resource could not be found.
  Neutron server returns request_ids: ['req-']
   ERROR nova.api.openstack.wsgi Traceback (most recent call last):
   ERROR nova.api.openstack.wsgi   File 
"/var/lib/kolla/venv/lib/python3.10/site-packages/nova/api/openstack/wsgi.py", 
line 658, in wrapped
   ERROR nova.api.openstack.wsgi return f(*args, **kwargs)
   ERROR nova.api.openstack.wsgi   File 
"/var/lib/kolla/venv/lib/python3.10/site-packages/nova/api/validation/__init__.py",
 line 110, in wrapper
   ERROR nova.api.openstack.wsgi return func(*args, **kwargs)
   ERROR nova.api.openstack.wsgi   File 
"/var/lib/kolla/venv/lib/python3.10/site-packages/nova/api/validation/__init__.py",
 line 110, in wrapper
   ERROR nova.api.openstack.wsgi return func(*args, **kwargs)
   ERROR nova.api.openstack.wsgi   File 
"/var/lib/kolla/venv/lib/python3.10/site-packages/nova/api/validation/__init__.py",
 line 110, in wrapper
   ERROR nova.api.openstack.wsgi return func(*args, **kwargs)
   ERROR nova.api.openstack.wsgi   [Previous line repeated 11 more times]
   ERROR nova.api.openstack.wsgi   File 
"/var/lib/kolla/venv/lib/python3.10/site-packages/nova/api/openstack/compute/servers.py",
 line 786, in create
   ERROR nova.api.openstack.wsgi instances, resv_id = 
self.compute_api.create(
   ERROR nova.api.openstack.wsgi   File 
"/var/lib/kolla/venv/lib/python3.10/site-packages/nova/compute/api.py", line 
2207, in create
   ERROR nova.api.openstack.wsgi return self._create_instance(
   ERROR nova.api.openstack.wsgi   File 
"/var/lib/kolla/venv/lib/python3.10/site-packages/nova/compute/api.py", line 
1683, in _create_instance
   ERROR nova.api.openstack.wsgi ) = self._validate_and_build_base_options(
   ERROR nova.api.openstack.wsgi   File 
"/var/lib/kolla/venv/lib/python3.10/site-packages/nova/compute/api.py", line 
1081, in _validate_and_build_base_options
   ERROR nova.api.openstack.wsgi max_network_count = 
self._check_requested_networks(
   ERROR nova.api.openstack.wsgi   File 
"/var/lib/kolla/venv/lib/python3.10/site-packages/nova/compute/api.py", line 
543, in _check_requested_networks
   ERROR nova.api.openstack.wsgi return 
self.network_api.validate_networks(context, requested_networks,
   ERROR nova.api.openstack.wsgi   File 
"/var/lib/kolla/venv/lib/python3.10/site-packages/nova/network/neutron.py", 
line 2648, in validate_networks
   ERROR nova.api.openstack.wsgi ports_needed_per_instance = 
self._ports_needed_per_instance(
   ERROR nova.api.openstack.wsgi   File 
"/var/lib/kolla/venv/lib/python3.10/site-packages/nova/network/neutron.py", 
line 2509, in _ports_needed_per_instance
   ERROR nova.api.openstack.wsgi if not 
self._can_auto_allocate_network(context, neutron):
   ERROR nova.api.openstack.wsgi   File 
"/var/lib/kolla/venv/lib/python3.10/site-packages/nova/network/neutron.py", 
line 2438, in _can_auto_allocate_network
   ERROR nova.api.openstack.wsgi 
neutron.validate_auto_allocated_topology_requirements(
   ERROR nova.api.openstack.wsgi   File 
"/var/lib/kolla/venv/lib/python3.10/site-packages/nova/network/neutron.py", 
line 196, in wrapper
   ERROR nova.api.openstack.wsgi ret = obj(*args, **kwargs)
   ERROR nova.api.openstack.wsgi   File 
"/var/lib/kolla/venv/lib/python3.10/site-packages/debtcollector/renames.py", 
line 41, in decorator
   ERROR nova.api.openstack.wsgi return wrapped(*args, **kwargs)
   ERROR nova.api.openstack.wsgi   File 

[Yahoo-eng-team] [Bug 2058248] Re: Bugs in python files

2024-03-19 Thread Sylvain Bauza
The exception comes from OSC, moving the bug report to that project.

** Also affects: python-openstackclient
   Importance: Undecided
   Status: New

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2058248

Title:
  Bugs in python files

Status in OpenStack Compute (nova):
  Invalid
Status in python-openstackclient:
  New

Bug description:
  Description
  ===
  Python error that pops up during instance creation

  Steps to reproduce
  ==
  I use demo scipt to launch an instance:

  $ . demo-openrc
  $ openstack server --debug create --flavor m1.nano --image cirros --nic 
net-id=698c77d5-49cb-47f2-8e26-766b2be3783d --security-group default --key-name 
mykey selfservice-instance

  Expected result
  ===
  An instance created and show status like 
https://docs.openstack.org/install-guide/launch-instance-selfservice.html

  Actual result
  =
  An instance created but pop up Python error:
  Resource.get() takes 1 positional argument but 2 were given
  Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/cliff/app.py", line 410, in 
run_subcommand
  result = cmd.run(parsed_args)
File "/usr/lib/python3/dist-packages/osc_lib/command/command.py", line 39, 
in run
  return super(Command, self).run(parsed_args)
File "/usr/lib/python3/dist-packages/cliff/display.py", line 117, in run
  column_names, data = self.take_action(parsed_args)
File "/usr/lib/python3/dist-packages/openstackclient/compute/v2/server.py", 
line 1964, in take_action
  details = _prep_server_detail(compute_client, image_client, server)
File "/usr/lib/python3/dist-packages/openstackclient/compute/v2/server.py", 
line 147, in _prep_server_detail
  server = utils.find_resource(compute_client.servers, info['id'])
File "/usr/lib/python3/dist-packages/osc_lib/utils/__init__.py", line 271, 
in find_resource
  if (resource.get('id') == name_or_id or
  TypeError: Resource.get() takes 1 positional argument but 2 were given

  Environment
  ===
  1. Openstack version is 2023.2
  2. Hypervisor is Libvirt + KVM, storage type is LVM
  3. Networking type is Neutron with OpenVSwitch

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2058248/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2052915] Re: "neutron-ovs-grenade-multinode" and "neutron-ovn-grenade-multinode" failing in 2023.1 and Zed

2024-03-12 Thread Sylvain Bauza
As discussed on the nova meeting, nova-grenade-multinode is no longer
failing, so I'll close this bug report only for Nova.

** Changed in: nova
   Status: Confirmed => Invalid

** Changed in: nova
   Status: Invalid => Won't Fix

** Changed in: nova
   Importance: Critical => Low

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2052915

Title:
  "neutron-ovs-grenade-multinode" and "neutron-ovn-grenade-multinode"
  failing in 2023.1 and Zed

Status in neutron:
  Triaged
Status in OpenStack Compute (nova):
  Won't Fix

Bug description:
  The issue seems to be in the neutron-lib version installed:
  2024-02-07 16:19:35.155231 | compute1 | ERROR: neutron 21.2.1.dev38 has 
requirement neutron-lib>=3.1.0, but you'll have neutron-lib 2.20.2 which is 
incompatible.

  That leads to an error when starting the Neutron API (an API definition is 
not found) [1]:
  Feb 07 16:13:54.385467 np0036680724 neutron-server[67288]: ERROR neutron 
ImportError: cannot import name 'port_mac_address_override' from 
'neutron_lib.api.definitions' 
(/usr/local/lib/python3.8/dist-packages/neutron_lib/api/definitions/__init__.py)

  Setting priority to Critical because that affects to the CI.

  
[1]https://9faad8159db8d6994977-b587eccfce0a645f527dfcbc49e54bb4.ssl.cf2.rackcdn.com/891397/4/check/neutron-
  ovs-grenade-multinode/ba47cef/controller/logs/screen-q-svc.txt

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2052915/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2054404] Re: Self Signed Certs Cause Metadata cert errors seemingly

2024-03-12 Thread Sylvain Bauza
Doesn't look to me a Nova bug, maybe a Kolla one. Moving this report
then.

** Also affects: kolla
   Importance: Undecided
   Status: New

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2054404

Title:
  Self Signed Certs Cause Metadata cert errors seemingly

Status in kolla:
  New
Status in OpenStack Compute (nova):
  Invalid

Bug description:
  ==> /var/log/kolla/nova/nova-metadata-error.log <==
  2024-02-18 00:58:15.029954 AH01909: 
tunninet-server-noel.ny5.lan.tunninet.com:8775:0 server certificate does NOT 
include an ID which matches the server name
  2024-02-18 00:58:16.360069 AH01909: 
tunninet-server-noel.ny5.lan.tunninet.com:8775:0 server certificate does NOT 
include an ID which matches the server name

  I have no cert issues elsewhere, just this. What could cause this? it
  usually elsewhere has an IP and the fqdn as SAN's.

  How can i troubleshoot the root cause ?

To manage notifications about this bug go to:
https://bugs.launchpad.net/kolla/+bug/2054404/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2054409] Re: (HTTP 500) after upgrade

2024-03-12 Thread Sylvain Bauza
Are you sure you also upgraded Neutron ? Apparently, the Neutron
metadata agent tries to call the neutron server by RPC but the server
doesn't support the RPC version from the agent.

Doesn't look to me a problem from Nova, so I'll close this bug report
but please reopen it if you find some problem on Nova.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2054409

Title:
   (HTTP
  500) after upgrade

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Description
  ===
  After upgrading from Antelope to Bobcat, I'm unable to manage instances.

  Steps to reproduce
  ==

  Following the upgrade guide
  https://docs.openstack.org/nova/2023.2/admin/upgrades.html

  * Pre-existing installation, Antelope 2023.1
  * Upgraded the controller node to Bobcat 2023.2
  * Ran 'nova-manage api_db sync' and 'nova-manage db sync'
  * Upgraded other services, 'nova-status upgrade check' reports
  everything as succesful.
  * 'nova service-list' does not report orphaned records
  * Ran 'nova-manage db online_data_migrations'

  Expected result
  ===
  Upgrade successful, able to use the OpenStack instance

  Actual result
  =
  The last step does not complete succesfully. Even after running
  it multiple times, there still is a pending operation:

  # nova-manage db online_data_migrations
  Modules with known eventlet monkey patching issues were imported prior to 
eventlet monkey patching: urllib3. This warning can usually be ignored if the 
caller is only importing and not executing nova code.
  Running batches of 50 until complete
  1 rows matched query populate_instance_compute_id, 0 migrated
  +-+--+---+
  |  Migration  | Total Needed | Completed |
  +-+--+---+
  | fill_virtual_interface_list |  0   | 0 |
  | migrate_empty_ratio |  0   | 0 |
  |   migrate_quota_classes_to_api_db   |  0   | 0 |
  |migrate_quota_limits_to_api_db   |  0   | 0 |
  |  migration_migrate_to_uuid  |  0   | 0 |
  |  populate_dev_uuids |  0   | 0 |
  | populate_instance_compute_id|  1   | 0 |
  | populate_missing_availability_zones |  0   | 0 |
  |  populate_queued_for_delete |  0   | 0 |
  |   populate_user_id  |  0   | 0 |
  |populate_uuids   |  0   | 0 |
  +-+--+---+

  Listing instances on the dashboard now fails with the error:
  Error: Unable to retrieve instances. Details
  Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/
  and attach the Nova API log if possible.  (HTTP 500) (Request-ID: 
req-2fa94a10-3168-41a5-8b0f-6d7499465ff1)

  Environment
  ===
  1. Exact version of OpenStack you are running? Bobcat 2023.2

  2. Which hypervisor did you use? Compute nodes are running
  qemu-kvm 8.0.0 and libvirt 9.5.0

  2. Which storage type did you use? Cinder has two backends, NFS and
  dcache

  3. Which networking type did you use? Neutron with OpenVSwitch

  Logs & Configs
  ==

  /var/log/nova/nova-api.log

  Trying to list instances:
  2024-02-20 08:57:14.744 66460 ERROR nova.api.openstack.wsgi [None 
req-2fa94a10-3168-41a5-8b0f-6d7499465ff1 458ee6e3adf142048041c8d24fabeb85 
c824412ae5904653a037e893827aa693 - - default default] Unexpected exception in 
API method: neutronclient.common.exceptions.InternalServerError: Request 
Failed: internal server error while processing your request.
  Neutron server returns request_ids: 
['req-8894501a-070a-49be-8e46-e9bac7f1afb0']
  2024-02-20 08:57:14.744 66460 ERROR nova.api.openstack.wsgi Traceback (most 
recent call last):
  2024-02-20 08:57:14.744 66460 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python3.9/site-packages/nova/api/openstack/wsgi.py", line 658, in 
wrapped
  2024-02-20 08:57:14.744 66460 ERROR nova.api.openstack.wsgi return 
f(*args, **kwargs)
  2024-02-20 08:57:14.744 66460 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python3.9/site-packages/nova/api/validation/__init__.py", line 192, 
in wrapper
  2024-02-20 08:57:14.744 66460 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2024-02-20 08:57:14.744 66460 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python3.9/site-packages/nova/api/validation/__init__.py", line 192, 
in wrapper
  2024-02-20 08:57:14.744 66460 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2024-02-20 08:57:14.744 66460 ERROR nova.api.openstack.wsgi   File 

[Yahoo-eng-team] [Bug 2054502] Re: shutdowning rabbitmq causes nova-compute.service down

2024-03-12 Thread Sylvain Bauza
This isn't a Nova bug, maybe some oslo.messaging problem, but anyway, as
the nova-compute service will be off, then the servicegroup API wouldn't
accept it for the scheduler, so this shouldn't be a problem.


** Also affects: oslo.messaging
   Importance: Undecided
   Status: New

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2054502

Title:
  shutdowning rabbitmq causes nova-compute.service down

Status in OpenStack Compute (nova):
  Invalid
Status in oslo.messaging:
  New

Bug description:
  Description
  ===
  We have an OpenStack with a RabbitMQ cluster of 3 nodes, and with dozens of 
nova-compute nodes.
  When we shut down 1 out of 3 RabbitMQ nodes, Nagios alerted 
nova-compute.service down for 2 nova-compute nodes. 

  Upon checking, we found that nova-compute.service is running.

  nova-compute.service - OpenStack Compute
   Loaded: loaded (/lib/systemd/system/nova-compute.service; enabled; 
vendor preset: enabled)
   Active: active (running) since Fri 2024-02-16 00:42:47 UTC; 4 days ago
 Main PID: 10130 (nova-compute)
Tasks: 32 (limit: 463517)
   Memory: 248.2M
  CPU: 55min 5.217s
   CGroup: /system.slice/nova-compute.service
   ├─10130 /usr/bin/python3 /usr/bin/nova-compute 
--config-file=/etc/nova/nova.conf --config-file=/etc/nova/nova-compute.conf 
--log-file=/var/log/nova/nova-compute.log
   ├─11527 /usr/bin/python3 /bin/privsep-helper --config-file 
/etc/nova/nova.conf --config-file /etc/nova/nova-compute.conf --privsep_context 
vif_plug_ovs.privsep.vif_plug --privsep_sock_path /tmp/tmpc0sosqey/privsep.sock
   └─11702 /usr/bin/python3 /bin/privsep-helper --config-file 
/etc/nova/nova.conf --config-file /etc/nova/nova-compute.conf --privsep_context 
nova.privsep.sys_admin_pctxt --privsep_sock_path /tmp/tmp2ik7rchu/privsep.sock

  Feb 16 00:42:53 node002 sudo[11540]: pam_unix(sudo:session): session opened 
for user root(uid=0) by (uid=64060)
  Feb 16 00:42:54 node002 sudo[11540]: pam_unix(sudo:session): session closed 
for user root
  Feb 20 04:55:31 node002 nova-compute[10130]: Traceback (most recent call 
last):
  Feb 20 04:55:31 node002 nova-compute[10130]:   File 
"/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 476, in fire_timers
  Feb 20 04:55:31 node002 nova-compute[10130]: timer()
  Feb 20 04:55:31 node002 nova-compute[10130]:   File 
"/usr/lib/python3/dist-packages/eventlet/hubs/timer.py", line 59, in __call__
  Feb 20 04:55:31 node002 nova-compute[10130]: cb(*args, **kw)
  Feb 20 04:55:31 node002 nova-compute[10130]:   File 
"/usr/lib/python3/dist-packages/eventlet/semaphore.py", line 152, in _do_acquire
  Feb 20 04:55:31 node002 nova-compute[10130]: waiter.switch()
  Feb 20 04:55:31 node002 nova-compute[10130]: greenlet.error: cannot switch to 
a different thread

  I guess it's possible that when shutting down a RabbitMQ node, nova-compute 
is experiencing contention or state inconsistencies in processing connection 
recovery
  restarting nova-compute.service can resolve the problem. 

  Logs & Configs
  ==
  The nova-compute.log:

  2024-02-20 04:55:28.675 10130 ERROR oslo.messaging._drivers.impl_rabbit [-] 
[0aefd459-297a-48e8-8b15-15c763531431] AMQP server on 10.10.10.59:5672 is 
unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: 
ConnectionResetError: [Errno 104] Connection reset by peer
  2024-02-20 04:55:29.677 10130 ERROR oslo.messaging._drivers.impl_rabbit [-] 
[0aefd459-297a-48e8-8b15-15c763531431] AMQP server on 10.10.10.59:5672 is 
unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.: 
ConnectionRefusedError: [Errno 111] ECONNREFUSED
  2024-02-20 04:55:30.682 10130 INFO oslo.messaging._drivers.impl_rabbit [-] 
[0aefd459-297a-48e8-8b15-15c763531431] Reconnected to AMQP server on 
10.10.10.52:5672 via [amqp] client with port 35346.
  2024-02-20 04:55:31.361 10130 INFO oslo.messaging._drivers.impl_rabbit [-] A 
recoverable connection/channel error occurred, trying to reconnect: [Errno 104] 
Connection reset by peer
  然后systemctl status nova-compute 
  Feb 20 04:55:31 node002 nova-compute[10130]: Traceback (most recent call 
last):
  Feb 20 04:55:31 node002 nova-compute[10130]:   File 
"/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 476, in fire_timers
  Feb 20 04:55:31 node002 nova-compute[10130]: timer()
  Feb 20 04:55:31 node002 nova-compute[10130]:   File 
"/usr/lib/python3/dist-packages/eventlet/hubs/timer.py", line 59, in __call__
  Feb 20 04:55:31 node002 nova-compute[10130]: cb(*args, **kw)
  Feb 20 04:55:31 node002 nova-compute[10130]:   File 
"/usr/lib/python3/dist-packages/eventlet/semaphore.py", line 152, in _do_acquire
  Feb 20 04:55:31 node002 nova-compute[10130]: waiter.switch()
  Feb 20 04:55:31 node002 

[Yahoo-eng-team] [Bug 2055004] Re: unable to create vm - Could not find versioned identity endpoints

2024-03-12 Thread Sylvain Bauza
As you can see on the exception, this is not a Nova bug. Keystone just
tells you 'sorry, it's forbidden' so maybe you have some wrong
configuration for it. Eventually, as I said, as this is not a Nova bug,
closing this report.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2055004

Title:
  unable to create vm - Could not find versioned identity endpoints

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Hi,

  Description
  ===

  After unstalling openstack I've been trying to create a vm with the
  command below, but I'm getting this error:

  Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ 
and attach the Nova API log if possible.
   (HTTP 500) 
(Request-ID: req-4e85e38c-f6b8-43ce-ae29-4b96b41da25e)

  Command: openstack server create --flavor x1.test --image cirros my-
  instance

  I've doubled check the nova conf and so far everything looks ok.

  Logs
  =

  logs in var/log/nova/nova-api.log

  2024-02-26 01:04:14.853 23820 INFO nova.osapi_compute.wsgi.server [None 
req-fc792d37-360b-4cbc-8259-e3ad051c2816 01ac288623ee4fcf844338f25b8edb5e 
831bbb9ad88e4fd6a8536715ffc0a4c3 - - default default] 192.168.16.20 "GET 
/v2.1/flavors HTTP/1.1" status: 200 len: 1288 time: 0.0286248
  2024-02-26 01:04:14.868 23820 INFO nova.osapi_compute.wsgi.server [None 
req-58facd70-3a50-4a22-8155-b83d18336ecc 01ac288623ee4fcf844338f25b8edb5e 
831bbb9ad88e4fd6a8536715ffc0a4c3 - - default default] 192.168.16.20 "GET 
/v2.1/flavors/1 HTTP/1.1" status: 200 len: 753 time: 0.0087411
  2024-02-26 01:04:14.907 23820 WARNING oslo_config.cfg [None 
req-56b6dfe0-c3f7-4d41-88e0-71fef6781b15 01ac288623ee4fcf844338f25b8edb5e 
831bbb9ad88e4fd6a8536715ffc0a4c3 - - default default] Deprecated: Option 
"api_servers" from group "glance" is deprecated for removal (
  Support for image service configuration via standard keystoneauth1 Adapter
  options was added in the 17.0.0 Queens release. The api_servers option was
  retained temporarily to allow consumers time to cut over to a real load
  balancing solution.
  ).  Its value may be silently ignored in the future.
  2024-02-26 01:04:15.366 23820 WARNING keystoneauth.identity.generic.base 
[None req-56b6dfe0-c3f7-4d41-88e0-71fef6781b15 01ac288623ee4fcf844338f25b8edb5e 
831bbb9ad88e4fd6a8536715ffc0a4c3 - - default default] Failed to discover 
available identity versions when contacting http://openstackcs/identity. 
Attempting to parse version from URL.: keystoneauth1.exceptions.http.Forbidden: 
Forbidden (HTTP 403)
  2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi [None 
req-56b6dfe0-c3f7-4d41-88e0-71fef6781b15 01ac288623ee4fcf844338f25b8edb5e 
831bbb9ad88e4fd6a8536715ffc0a4c3 - - default default] Unexpected exception in 
API method: keystoneauth1.exceptions.discovery.DiscoveryFailure: Could not find 
versioned identity endpoints when attempting to authenticate. Please check that 
your auth_url is correct. Forbidden (HTTP 403)
  2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi Traceback (most 
recent call last):
  2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python3/dist-packages/keystoneauth1/identity/generic/base.py", line 
136, in _do_create_plugin
  2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi disc = 
self.get_discovery(session,
  2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python3/dist-packages/keystoneauth1/identity/base.py", line 608, in 
get_discovery
  2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi return 
discover.get_discovery(session=session, url=url,
  2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python3/dist-packages/keystoneauth1/discover.py", line 1460, in 
get_discovery
  2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi disc = 
Discover(session, url, authenticated=authenticated)
  2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python3/dist-packages/keystoneauth1/discover.py", line 540, in 
__init__
  2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi self._data = 
get_version_data(session, url,
  2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python3/dist-packages/keystoneauth1/discover.py", line 107, in 
get_version_data
  2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi resp = 
session.get(url, headers=headers, authenticated=authenticated)
  2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python3/dist-packages/keystoneauth1/session.py", line 1141, in get
  2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi return 
self.request(url, 'GET', **kwargs)
  2024-02-26 01:04:15.368 23820 ERROR nova.api.openstack.wsgi   File 

[Yahoo-eng-team] [Bug 2055700] Re: server rebuild with reimage-boot-volume and is_volume_backed fails with BuildAbortException

2024-03-12 Thread Sylvain Bauza
Fabian, do you want some kind of root cause analysis ? If so, I'd prefer if you 
could ping us in the nova channel rather than creating a bug report.
Once you know why you get this exception, you could reopen this bug report if 
you want to explain why it's having a problem, but for the moment, I'll close 
it.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2055700

Title:
  server rebuild with reimage-boot-volume and is_volume_backed fails
  with BuildAbortException

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Description
  ===

  More specifically the following tempest test in master fails with:
  
tempest.api.compute.servers.test_server_actions.ServerActionsV293TestJSON.test_rebuild_volume_backed_server
  Even with patch for https://review.opendev.org/c/openstack/nova/+/910627

  Technically though, it should be unrelated to the driver
  implementation as...

  
  The `ComputeManager._rebuild_default_impl` calls first destroy on the VM in 
both branches:
  - 
https://opendev.org/openstack/nova/src/branch/master/nova/compute/manager.py#L3695-L3701
  And in the case of a volume backed VM with `reimage_boot_volume=True` calls 
`ComputeManager._rebuild_volume_backed_instance` here
  - 
https://opendev.org/openstack/nova/src/branch/master/nova/compute/manager.py#L3710-L3715
  The function tries to detach the volume from the destroyed instance and at 
least in the VMware driver raises an `InstanceNotFound`, which I'd argue would 
be expected.
  - 
https://opendev.org/openstack/nova/src/branch/master/nova/compute/manager.py#L3596-L3607

  
  Steps to reproduce
  ==
  * Install Devstack from master
  * Run tempest test 
`tempest.api.compute.servers.test_server_actions.ServerActionsV293TestJSON.test_rebuild_volume_backed_server`

  Or as a bash script:
  ```
  IMAGE=$(openstack image list -c ID -f value)
  ID1=$(openstack server create --flavor 1 --image $IMAGE --boot-from-volume 1 
rebuild-1 -c id -f value)
  ID2=$(openstack server create --flavor 1 --image $IMAGE --boot-from-volume 1 
rebuild-2 -c id -f value)
  # Wait for servers to be ready

  # Works
  openstack server rebuild --os-compute-api-version 2.93  --image $IMAGE $ID1

  # Fails
  openstack server rebuild --os-compute-api-version 2.93 --reimage-boot-volume 
--image $IMAGE $ID1

  ```
  Expected result
  ===
  The test succeeds.

  Actual result
  =

  
  Environment
  ===
  1. Patch proposed in https://review.opendev.org/c/openstack/nova/+/909474
+  Patch proposed in https://review.opendev.org/c/openstack/nova/+/910627

  2. Which hypervisor did you use? What's the version of that?

  vmwareapi (VSphere 7.0.3 & ESXi 7.0.3)

  2. Which storage type did you use?

  vmdk on NFS 4.1

  3. Which networking type did you use?

  networking-nsx-t (https://github.com/sapcc/networking-nsx-t)

  Logs & Configs
  ==

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2055700/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2056149] Re: Inconsistent volume naming when create instance (from volume)

2024-03-12 Thread Sylvain Bauza
Looks to me it's more a feature question, not a bug.
Which problem do you have ?

** Changed in: nova
   Status: New => Opinion

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2056149

Title:
  Inconsistent volume naming when create instance (from volume)

Status in OpenStack Compute (nova):
  Opinion

Bug description:
  Description:
  When creating an instance from volume, there are inconsistent behaviours and 
users usability issue. Using Yoga and confirmed it with older versions as well. 
It likely will be present on new versions too.

  Cases:
  - using the CLI and the --boot-from-volume flag:
Naming: The instance gets created, and the volume does not get any name, it 
is just blank ""
Problems: By default when instance deletion, the volume does not get 
deleted. If a user wants to reuse the root volume, finding the right volume is 
just impossible.
Suggestion: it would be nice to call the volume "volume-$INSTANCEUUID" to 
have direct correspondency between a VM and its root volume.

  - using Horizon and selecting the CREATE VOLUME button:
Naming: The volume this time gets a name, that is equal to the volume UUID.
Problems: The behaviour is different than the CLI and users (and admins) 
get confused.
Suggestion: As above, to call the root volume when booting an instance from 
volume, to name it "volume-$INSTANCEUUID".

  Overall it would be nice to an option to template the volume naming
  when creating from volume, something like:
  boot_from_volume_naming_template: volume-%UUID

  The blank naming behaviour of case 1 should be fixed as a bug <---

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2056149/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2056756] Re: A source_type=blank instance was unexpectedly scheduled to the ironic node

2024-03-12 Thread Sylvain Bauza
Ironic nodes are seen exactly as nova-compute libvirt nodes. If you want
to avoid them, you need to use aggregates.


** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2056756

Title:
  A source_type=blank instance was unexpectedly scheduled to the ironic
  node

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Description
  ===

  I execute the following command to boot an instance with
  source_type=blank of root volume, My OpenStack env has many nodes
  including nova and ironic, the instance was unexpectedly scheduled to
  the ironic node, I check the rest resource and find that it's
  exceeded.

  Could anyone give me some advice to avoid it? Thanks a lot.

  nova boot --flavor 10 --block-device
  source=blank,dest=volume,size=1,bootindex=0,volume_type=hdd --nic net-
  name=share_net test

  Steps to reproduce
  ==

  Execute the command to boot the instance:
  nova boot --flavor 10 --block-device 
source=blank,dest=volume,size=1,bootindex=0,volume_type=hdd --nic 
net-name=share_net test

  Expected result
  ===
  None of node was scheduled, and the instance status will be error.

  Actual result
  =
  The instance was unexpectedly scheduled to ironic node.

  Environment
  ===
  Wallaby

  
  Logs & Configs
  ==

  1.nova-scheduler:

  2024-03-11 20:00:03.632 17 INFO nova.scheduler.manager 
[req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 
b00eb18beb7647dba928b26485606784 - default default] Starting to schedule for 
instances: ['953d0c4c-b53e-4739-8444-80ac7442f612']^[[00m
  2024-03-11 20:00:03.788 17 INFO nova.filters 
[req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 
b00eb18beb7647dba928b26485606784 - default default] Starting with 14 
host(s)^[[00m
  2024-03-11 20:00:03.789 17 INFO nova.filters 
[req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 
b00eb18beb7647dba928b26485606784 - default default] Filter 
AvailabilityZoneFilter returned 14 host(s)^[[00m
  2024-03-11 20:00:03.789 17 INFO nova.filters 
[req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 
b00eb18beb7647dba928b26485606784 - default default] Filter ComputeFilter 
returned 14 host(s)^[[00m
  2024-03-11 20:00:03.790 17 INFO nova.filters 
[req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 
b00eb18beb7647dba928b26485606784 - default default] Filter 
ComputeCapabilitiesFilter returned 14 host(s)^[[00m
  2024-03-11 20:00:03.791 17 INFO nova.filters 
[req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 
b00eb18beb7647dba928b26485606784 - default default] Filter 
ImagePropertiesFilter returned 14 host(s)^[[00m
  2024-03-11 20:00:03.791 17 INFO nova.filters 
[req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 
b00eb18beb7647dba928b26485606784 - default default] Filter 
ServerGroupAntiAffinityFilter returned 14 host(s)^[[00m
  2024-03-11 20:00:03.791 17 INFO nova.filters 
[req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 
b00eb18beb7647dba928b26485606784 - default default] Filter 
ServerGroupAffinityFilter returned 14 host(s)^[[00m
  2024-03-11 20:00:03.819 17 INFO nova.filters 
[req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 
b00eb18beb7647dba928b26485606784 - default default] Filter NUMATopologyFilter 
returned 14 host(s)^[[00m
  2024-03-11 20:00:03.820 17 INFO nova.filters 
[req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 
b00eb18beb7647dba928b26485606784 - default default] Filter 
AggregateVolumeTypeFilter returned 14 host(s)^[[00m
  2024-03-11 20:00:03.822 17 INFO nova.filters 
[req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 
b00eb18beb7647dba928b26485606784 - default default] Filter SriovPciFilter 
returned 14 host(s)^[[00m
  2024-03-11 20:00:03.823 17 INFO nova.filters 
[req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 
b00eb18beb7647dba928b26485606784 - default default] Filter GPUFilter returned 
14 host(s)^[[00m
  2024-03-11 20:00:03.823 17 INFO nova.filters 
[req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 
b00eb18beb7647dba928b26485606784 - default default] Filter VGPUFilter returned 
14 host(s)^[[00m
  2024-03-11 20:00:03.824 17 INFO nova.scheduler.filter_scheduler 
[req-4a81e1fd-9acd-43cf-9f53-f5ddb7155acb f2f7c29c86034f0a85e60549601fe5b5 
b00eb18beb7647dba928b26485606784 - default default] Filtered 
[(ironic.compute.domain.tld.2, fb729bc5-8d29-47b8-8d0f-cbeb47ba57a8) ram: 
4096MB disk: 30720MB io_ops: 0 instances: 0, (ironic.compute.domain.tld.2, 
19bbf021-76b4-4222-a633-4539a2c70225) ram: 4096MB disk: 30720MB io_ops: 0 
instances: 0, 

[Yahoo-eng-team] [Bug 2056195] Re: Return 409 at neutron-client conflict

2024-03-11 Thread Sylvain Bauza
This appears to me a configuration issue as said in the exception : 
 Error Cannot apply both stateful and stateless security groups on the same 
port at the same time while attempting the operation.,
 Neutron server returns request_ids: 
['req-1007ffaa-3501-4566-9ad9-c540931138f0']

I don't think this is a bug in Nova, so closing the bug accordinly but
feel free to reopen if if you can prove the contrary.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2056195

Title:
  Return 409 at neutron-client conflict

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Description
  ===
  When attaching a stateless and stateful security group to a VM, nova returns 
a 500 error but it's a user issue and a 409 conflict error should be returned.

  Steps to reproduce
  ==

  1. create network
  2. create VM "test-vm" attached to the network
  3. may create a statefull security group, but default group should already do
  4. openstack securit group create --stateless stateless-group
  5. openstack server add security group test-vm stateless-group

  Expected result
  ===
  Nova forwards the 409 error from Neutron with the error description from 
Neutron.

  Actual result
  =
  Nova returns: 
  Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ 
and attach the Nova API log if possible.
   (HTTP 500) (Request-ID: 
req-c6bbaf50-99b7-4108-98f0-808dfee84933)
   

  Environment
  ===

  1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/

  # nova-api --version
  26.2.2 (Zed)

  
  3. Which networking type did you use?
 (For example: nova-network, Neutron with OpenVSwitch, ...)

  Neutron with OVN

  
  Logs & Configs
  ==
  Stacktrace:

  Traceback (most recent call last):,
File "/usr/local/lib/python3.10/site-packages/nova/api/openstack/wsgi.py", 
line 658, in wrapped,
  return f(*args, **kwargs),
File 
"/usr/local/lib/python3.10/site-packages/nova/api/openstack/compute/security_groups.py",
 line 437, in _addSecurityGroup,
  return security_group_api.add_to_instance(context, instance,,
File 
"/usr/local/lib/python3.10/site-packages/nova/network/security_group_api.py", 
line 653, in add_to_instance,
  raise e,
File 
"/usr/local/lib/python3.10/site-packages/nova/network/security_group_api.py", 
line 648, in add_to_instance,
  neutron.update_port(port['id'], {'port': updated_port}),
File "/usr/local/lib/python3.10/site-packages/nova/network/neutron.py", 
line 196, in wrapper,
  ret = obj(*args, **kwargs),
File 
"/usr/local/lib/python3.10/site-packages/neutronclient/v2_0/client.py", line 
828, in update_port,
  return self._update_resource(self.port_path % (port), body=body,,
File "/usr/local/lib/python3.10/site-packages/nova/network/neutron.py", 
line 196, in wrapper,
  ret = obj(*args, **kwargs),
File 
"/usr/local/lib/python3.10/site-packages/neutronclient/v2_0/client.py", line 
2548, in _update_resource,
  return self.put(path, **kwargs),
File "/usr/local/lib/python3.10/site-packages/nova/network/neutron.py", 
line 196, in wrapper,
  ret = obj(*args, **kwargs),
File 
"/usr/local/lib/python3.10/site-packages/neutronclient/v2_0/client.py", line 
365, in put,
  return self.retry_request("PUT", action, body=body,,
File "/usr/local/lib/python3.10/site-packages/nova/network/neutron.py", 
line 196, in wrapper,
  ret = obj(*args, **kwargs),
File 
"/usr/local/lib/python3.10/site-packages/neutronclient/v2_0/client.py", line 
333, in retry_request,
  return self.do_request(method, action, body=body,,
File "/usr/local/lib/python3.10/site-packages/nova/network/neutron.py", 
line 196, in wrapper,
  ret = obj(*args, **kwargs),
File 
"/usr/local/lib/python3.10/site-packages/neutronclient/v2_0/client.py", line 
297, in do_request,
  self._handle_fault_response(status_code, replybody, resp),
File "/usr/local/lib/python3.10/site-packages/nova/network/neutron.py", 
line 196, in wrapper,
  ret = obj(*args, **kwargs),
File 
"/usr/local/lib/python3.10/site-packages/neutronclient/v2_0/client.py", line 
272, in _handle_fault_response,
  exception_handler_v20(status_code, error_body),
File 
"/usr/local/lib/python3.10/site-packages/neutronclient/v2_0/client.py", line 
90, in exception_handler_v20,
  raise client_exc(message=error_message,, 
neutronclient.common.exceptions.Conflict: 
Error Cannot apply both stateful and stateless security groups on the 
same port at the same time while attempting the operation., 
Neutron server returns request_ids: 
['req-1007ffaa-3501-4566-9ad9-c540931138f0']

To manage notifications about this bug go to:

[Yahoo-eng-team] [Bug 1943934] Re: report extra gpu device when config one enabled_vgpu_types

2024-01-19 Thread Sylvain Bauza
Fixed by https://review.opendev.org/c/openstack/nova/+/899406/2

** Changed in: nova
   Status: Triaged => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1943934

Title:
  report extra gpu device when config one enabled_vgpu_types

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  if there are two gpu devices virtualized on the env, and config one
  enabled_vgpu_types and device_addresses, Nova will report these two
  gpu devices to Placement. we should only report the configured
  device_addresses to Placement.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1943934/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2049121] Re: Boot one VM with two GPU(in same numa)by pci passthrough cannot have GPUDirect P2P capability

2024-01-15 Thread Sylvain Bauza
Please reopen the bug report by changing the status back to new if you
think it's related to Nova.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2049121

Title:
  Boot one VM with two GPU(in same numa)by pci passthrough cannot have
  GPUDirect P2P capability

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Hi,
  I have two GPU cards, all of them was connect with one same numa CPU socket 
as below link info:
  https://paste.opendev.org/show/b7Qi8qCnbLVxO2W0JdQw/

  I can boot one nova instance successfully with the two GPU cards by
  PCI Passthrough way.

  but in the booted instances, use deviceQuery method would get the below 
message:
  Peer access from NVIDIA RTX 6000(GPU0) -> NVIDIA RTX 6000(GPU1): NO
  Peer access from NVIDIA RTX 6000(GPU1) -> NVIDIA RTX 6000(GPU0): NO

  The expected return should be as below:
  Peer access from NVIDIA RTX 6000(GPU0) -> NVIDIA RTX 6000(GPU1): YES
  Peer access from NVIDIA RTX 6000(GPU1) -> NVIDIA RTX 6000(GPU0): YES

  so that the memory can be shared between the two GPUs.

  I'm running Openstack Xena release in Intel Xeon Gold 5220R CPU

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2049121/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2044515] Re: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.

2023-11-28 Thread Sylvain Bauza
As you said, this is due to a connection error :
https://controller/identity looks to not be able for your environment.
Are you sure this URL is right?

Anyway, not a nova bug.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2044515

Title:
  Unexpected API Error. Please report this at
  http://bugs.launchpad.net/nova/ and attach the Nova API log if
  possible.  (HTTP 500)
  (Request-ID: req-0971807f-dc3f-4882-8a1f-c7c24b86aa0e)

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  we are running openstack on 22.04 qemu .Cant create an instance on the
  self-service network (we have created flavor,attached network list,
  security group,keypair,rule, but instanace is not creating )[command
  used "openstack server create --flavor m1.nano --image cirros   --nic
  net-id=1fbb66da-7362-4ba9-851b-f9251b3e12e2  --security-group
  424fd166-6252-4118-97d6-7062aad3c9eb--key-name mykey
  inbinternet"].


   nova api eroor log 
  **

  Unable to establish connection to https://controller/identity:
  HTTPSConnectionPool(host='controller', port=443): Max retries exceeded
  with url: /identity (Caused by
  NewConnectionError(': Failed to establish a new connection: [Errno 111]
  ECONNREFUSED'))

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2044515/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2044721] Re: Returning 500 to user: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.

2023-11-28 Thread Sylvain Bauza
As you can see here, this is a messaging timeout : 
f8c11b038ef198fc0 d865adb13a8d4c15b7a4c07f040efdc5 - - default default] 
Unexpected exception in API method: oslo_messaging.exceptions.MessagingTimeout: 
Timed out waiting for a reply to message ID c44ae10b2f774b428763a1d11368cb16
2023-11-27 06:34:51.506 19 ERROR nova.api.openstack.wsgi 
oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to 
message ID c44ae10b2f774b428763a1d11368cb16

So, either you have an issue with oslo.messaging or you have a wrong
configuration, but this is not a nova bug.


** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2044721

Title:
  Returning 500 to user: Unexpected API Error. Please report
  this at http://bugs.launchpad.net/nova/ and attach the Nova API log if
  possible. 
  __call__ /var/lib/kolla/venv/lib/python3.10/site-
  packages/nova/api/openstack/wsgi.py:936

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  We were upgrading our server hardware. our compute server was
  corrupted (unable to enter ubuntu os) , so i reflash the same ubuntu
  os version to it & with the same IP address/mac address , and
  redeployed the kolla-ansible with the same configuration . however ,
  after that nova_compute & zun_compute service on compute node seems to
  be restarting by itself (exit ---> restart ---> exit ---> restart). I
  have no clue whats the solution to it

  
  expected result
  =
  all services should be working correctly

  
  actual result
  =
  zun_compute is down
  nova_compute is down
  horizon compute node is down

  
  environment
  ===
  kolla-ansible V2023.1
  ubuntu version 22.04 for all 3 servers

  
  
  2023-11-27 06:26:18.431 22 ERROR nova.api.openstack.wsgi [None 
req-0c9d29ab-f230-4b26-828f-b1bce144acc9 d54f447283c843df8c11b038ef198fc0 
d865adb13a8d4c15b7a4c07f040efdc5 - - default default] Unexpected exception in 
API method: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a 
reply to message ID 6fc08dc0c65849d98b55f762441ae230
  2023-11-27 06:26:18.431 22 ERROR nova.api.openstack.wsgi 
oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to 
message ID 6fc08dc0c65849d98b55f762441ae230
  
   __call__ 
/var/lib/kolla/venv/lib/python3.10/site-packages/nova/api/openstack/wsgi.py:936
  2023-11-27 06:27:18.420 21 ERROR nova.api.openstack.wsgi [None 
req-c61791f0-c689-4b37-bc73-183586fd45c1 d54f447283c843df8c11b038ef198fc0 
d865adb13a8d4c15b7a4c07f040efdc5 - - default default] Unexpected exception in 
API method: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a 
reply to message ID b3811b5050fb45238d0cce866b8db0d3
  2023-11-27 06:27:18.420 21 ERROR nova.api.openstack.wsgi 
oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to 
message ID b3811b5050fb45238d0cce866b8db0d3
  
   __call__ 
/var/lib/kolla/venv/lib/python3.10/site-packages/nova/api/openstack/wsgi.py:936
  2023-11-27 06:28:18.520 23 ERROR nova.api.openstack.wsgi [None 
req-25845c63-c886-4878-a976-d427f1fb311f - - - - - -] Unexpected exception in 
API method: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a 
reply to message ID a2f8e06d636f4d96b89c5f5e6b685db0
  2023-11-27 06:28:18.520 23 ERROR nova.api.openstack.wsgi 
oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to 
message ID a2f8e06d636f4d96b89c5f5e6b685db0
  
   __call__ 
/var/lib/kolla/venv/lib/python3.10/site-packages/nova/api/openstack/wsgi.py:936
  2023-11-27 06:29:18.436 19 ERROR nova.api.openstack.wsgi [None 
req-9eb9d577-c192-4579-a8aa-00d7cabb793b d54f447283c843df8c11b038ef198fc0 
d865adb13a8d4c15b7a4c07f040efdc5 - - default default] Unexpected exception in 
API method: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a 
reply to message ID 7523a9908d5341a7a667be6c0a1775ea
  2023-11-27 06:29:18.436 19 ERROR nova.api.openstack.wsgi 
oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to 
message ID 7523a9908d5341a7a667be6c0a1775ea
  
   __call__ 
/var/lib/kolla/venv/lib/python3.10/site-packages/nova/api/openstack/wsgi.py:936
  2023-11-27 06:30:18.603 23 ERROR nova.api.openstack.wsgi [None 
req-25f72773-caee-4551-8ebb-92747f12e77f d54f447283c843df8c11b038ef198fc0 
d865adb13a8d4c15b7a4c07f040efdc5 - - default default] Unexpected exception in 
API method: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a 
reply to message ID ac1f456f77c94a96b6deea5d2c3186a5
  2023-11-27 06:30:18.603 23 ERROR nova.api.openstack.wsgi 
oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to 
message ID ac1f456f77c94a96b6deea5d2c3186a5
  
   __call__ 

[Yahoo-eng-team] [Bug 2041519] [NEW] Inventories of SR-IOV GPU VFs are impacted by allocations for other VFs

2023-10-27 Thread Sylvain Bauza
Public bug reported:

This is hard to summarize the problem in a bug report title, my bad.

Long story short, the case arrives if you start using nVidia SR-IOV next-gen 
GPUs like A100 which create Virtual Functions on the host, each of them 
supporting the same GPU types but with a specific amount of available mediated 
devices to be created equal to 1.
If you're using other GPUs (like V100) and you're not running nvidia's 
sriov-manage to expose the VFs, please nevermind this bug, you shall not be 
impacted.

So, say you have a A100 GPU card, before configuring Nova, you have to
run the aforementioned sriov-manage script which will allocate 16
virtual functions for the GPU. Each of those PCI adddresses will
correspond to a Placement resource provider (if you configure Nova so)
with an VGPU inventory with total=1.

Example :
https://paste.opendev.org/show/bVxrVLW3yOR3TPV2Lz3A/

Sysfs shows the exact same thing on the nvidia-472 type I configured for :
[stack@lenovo-sr655-01 ~]$ cat 
/sys/class/mdev_bus/*/mdev_supported_types/nvidia-472/available_instances
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1


Now, the problem arises when you're exhausting the number of mediated devices 
you can create.
In the case of nvidia-472, which corresponds to nvidia's GRID A100-20C, you can 
create up to 2 VGPUs, ie. mediated devices.

Accordingly, when Nova creates the 2 mediated devices automatically when
booting an instance, and if *no* mediated devices are found available
yet, then *all other* VFs that don't use those 2 mediated devices will
have their available_instances value equal to 0 :

[stack@lenovo-sr655-01 nova]$ openstack server create --image 
cirros-0.6.2-x86_64-disk --flavor c1g --key-name mykey --network public vm1
(skipped)
[stack@lenovo-sr655-01 ~]$ cat 
/sys/class/mdev_bus/*/mdev_supported_types/nvidia-472/available_instances
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
[stack@lenovo-sr655-01 nova]$ openstack server create --image 
cirros-0.6.2-x86_64-disk --flavor c1g --key-name mykey --network public vm2
(skipped)
[stack@lenovo-sr655-01 ~]$ cat 
/sys/class/mdev_bus/*/mdev_supported_types/nvidia-472/available_instances
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0


No, when we look at the inventories for all VFs, we see that while it's normal 
to see 2 Resource Providers having their total to 1 (since we created a mdev, 
it's counted) and their usage to 1, that said it's not normal to see *other 
VFs* having a total of 1 and an usage of 0.

[stack@lenovo-sr655-01 nova]$ for uuid in $(openstack resource provider list -f 
value -c uuid); do openstack resource provider inventory list $uuid -f value -c 
resource_class -c total -c used; done | grep VGPU
VGPU 1 1
VGPU 1 1
VGPU 1 0
VGPU 1 0
VGPU 1 0
VGPU 1 0
VGPU 1 0
VGPU 1 0
VGPU 1 0
VGPU 1 0
VGPU 1 0
VGPU 1 0
VGPU 1 0
VGPU 1 0
VGPU 1 0
VGPU 1 0


I eventually went down into the code and found the culprit :

https://github.com/openstack/nova/blob/9c9cd3d9b6d1d1e6f62012cd8a86fd588fb74dc2/nova/virt/libvirt/driver.py#L9110-L9111

Before this method is called, we correctly calculate the numbers that we
get from libvirt, and all the non-used VFs have their total=0, but since
we enter this conditional, we skip to update them.


There are different ways to solve this problem :
 - we stop automatically creating mediated devices and ask operators to 
pre-allocate all mediated devices before starting nova-compute but there is a 
big operator impact (and they need to add some tooling)
 - we blindly remove the RP from the PlacementTree and let 
update_resource_providers() call in compute manager to try to update Placement 
with this new view. In that very particular case, we're sure that none of the 
RPs that have total=0 have allocations against them, so it shouldn't fail but 
this logic can be errorprone if we try to reproduce it elsewhere.

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: vgpu

** Tags added: vgpu

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2041519

Title:
  Inventories of SR-IOV GPU VFs are impacted by allocations for other
  VFs

Status in OpenStack Compute (nova):
  New

Bug description:
  This is hard to summarize the problem in a bug report title, my bad.

  Long story short, the case arrives if you start using nVidia SR-IOV next-gen 
GPUs like A100 which create Virtual Functions on the host, each of them 
supporting the same GPU types but with a specific amount of available mediated 
devices to be created equal to 1.
  If you're using other GPUs (like V100) and you're not running nvidia's 
sriov-manage to expose the VFs, please nevermind this bug, you shall not be 
impacted.

  So, say you have a A100 GPU card, before configuring Nova, you have to
  run the aforementioned sriov-manage script which will allocate 16
  virtual functions for the GPU. Each of those PCI adddresses will
  correspond to a Placement resource provider 

[Yahoo-eng-team] [Bug 2036867] Re: refactor test: use project id as constant variable in all places

2023-09-26 Thread Sylvain Bauza
This is indeed not a bug report, please don't create bug reports for
this kind of internal points.

** Changed in: nova
   Status: New => Invalid

** Changed in: nova
   Importance: Undecided => Wishlist

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2036867

Title:
  refactor test: use project id as constant variable in all places

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  This is not a bug, same PROJECT_ID const defined in many places.

  ex:
  fixtures/nova.py:75:PROJECT_ID = '6f70656e737461636b20342065766572'
  functional/api_samples_test_base.py:25:PROJECT_ID = 
"6f70656e737461636b20342065766572"

  
  for full list, inside tests, grep for 6f70656e737461636b20342065766572.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2036867/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1983863] Re: Can't log within tpool.execute

2023-09-13 Thread Sylvain Bauza
** Changed in: nova
   Status: In Progress => Invalid

** Changed in: nova
   Status: Invalid => Fix Committed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1983863

Title:
  Can't log within tpool.execute

Status in OpenStack Compute (nova):
  Fix Committed
Status in oslo.log:
  Fix Released

Bug description:
  There is a bug in eventlet where logging within a native thread can
  lead to a deadlock situation:
  https://github.com/eventlet/eventlet/issues/432

  When encountered with this issue some projects in OpenStack using
  oslo.log, eg. Cinder, resolve them by removing any logging withing
  native threads.

  There is actually a better approach.  The Swift team came up with a
  solution a long time ago, and it would be great if oslo.log could use
  this workaround automaticaly:
  
https://opendev.org/openstack/swift/commit/69c715c505cf9e5df29dc1dff2fa1a4847471cb6

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1983863/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2033752] Re: test_reboot_server_hard fails with AssertionError: time.struct_time() not greater than time.struct_time()

2023-09-01 Thread Sylvain Bauza
Probably due to the recent merge of
https://review.opendev.org/c/openstack/nova/+/882284

Now, when rebooting, we call the Cinder API for checking the BDMs so it
could be needing more time.


** Also affects: nova
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2033752

Title:
  test_reboot_server_hard fails with  AssertionError: time.struct_time()
  not greater than time.struct_time()

Status in neutron:
  New
Status in OpenStack Compute (nova):
  New
Status in tempest:
  New

Bug description:
  Seen many occurrences recently, fails as below:-

  Traceback (most recent call last):
File 
"/opt/stack/tempest/tempest/api/compute/servers/test_server_actions.py", line 
259, in test_reboot_server_hard
  self._test_reboot_server('HARD')
File 
"/opt/stack/tempest/tempest/api/compute/servers/test_server_actions.py", line 
127, in _test_reboot_server
  self.assertGreater(new_boot_time, boot_time,
File "/usr/lib/python3.10/unittest/case.py", line 1244, in assertGreater
  self.fail(self._formatMessage(msg, standardMsg))
File "/usr/lib/python3.10/unittest/case.py", line 675, in fail
  raise self.failureException(msg)
  AssertionError: time.struct_time(tm_year=2023, tm_mon=9, tm_mday=1, 
tm_hour=7, tm_min=26, tm_sec=33, tm_wday=4, tm_yday=244, tm_isdst=0) not 
greater than time.struct_time(tm_year=2023, tm_mon=9, tm_mday=1, tm_hour=7, 
tm_min=26, tm_sec=33, tm_wday=4, tm_yday=244, tm_isdst=0) : 
time.struct_time(tm_year=2023, tm_mon=9, tm_mday=1, tm_hour=7, tm_min=26, 
tm_sec=33, tm_wday=4, tm_yday=244, tm_isdst=0) > time.struct_time(tm_year=2023, 
tm_mon=9, tm_mday=1, tm_hour=7, tm_min=26, tm_sec=33, tm_wday=4, tm_yday=244, 
tm_isdst=0)

  Example logs:-
  
https://1e11be38b60141dbb290-777f110ca49a5cd01022e1e8aeff1ed5.ssl.cf1.rackcdn.com/893401/5/check/neutron-ovn-tempest-ovs-release/f379752/testr_results.html
  
https://1b9f88b068db0ff45f98-b11b73e0c31560154dece88f25c72a10.ssl.cf2.rackcdn.com/893401/5/check/neutron-linuxbridge-tempest/0bf1039/testr_results.html
  
https://30b3c23edbff5d871c4c-595cfa47540877e41ce912cd21563e42.ssl.cf1.rackcdn.com/886988/10/check/neutron-ovs-tempest-multinode-full/e57a62a/testr_results.html
  
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_0e5/886988/10/check/neutron-ovn-tempest-ipv6-only-ovs-release/0e538d1/testr_results.html

  Opensearch:-
  
https://opensearch.logs.openstack.org/_dashboards/app/discover/?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-30d,to:now))&_a=(columns:!(_source),filters:!(),index:'94869730-aea8-11ec-9e6a-83741af3fdcd',interval:auto,query:(language:kuery,query:'message:%22not%20greater%20than%20time.struct_time%22'),sort:!())

  As per opensearch it's started to be seen just few hours back.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2033752/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2028851] Re: Console output was empty in test_get_console_output_server_id_in_shutoff_status

2023-07-27 Thread Sylvain Bauza
Seems to be a regression coming from the automatic rebase of
https://github.com/openstack/tempest/commit/eea2c1cfac1e5d240cad4f8be68cff7d72f220a8

** Also affects: tempest
   Importance: Undecided
   Status: New

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2028851

Title:
   Console output was empty in
  test_get_console_output_server_id_in_shutoff_status

Status in OpenStack Compute (nova):
  Invalid
Status in tempest:
  New

Bug description:
  test_get_console_output_server_id_in_shutoff_status

  
https://github.com/openstack/tempest/blob/04cb0adc822ffea6c7bfccce8fa08b03739894b7/tempest/api/compute/servers/test_server_actions.py#L713

  is failing consistently in the nova-lvm job starting on July 24 with
  132 failures in the last 3 days. https://tinyurl.com/kvcc9289

  
  Traceback (most recent call last):
File 
"/opt/stack/tempest/tempest/api/compute/servers/test_server_actions.py", line 
728, in test_get_console_output_server_id_in_shutoff_status
  self.wait_for(self._get_output)
File "/opt/stack/tempest/tempest/api/compute/base.py", line 340, in wait_for
  condition()
File 
"/opt/stack/tempest/tempest/api/compute/servers/test_server_actions.py", line 
213, in _get_output
  self.assertTrue(output, "Console output was empty.")
File "/usr/lib/python3.10/unittest/case.py", line 687, in assertTrue
  raise self.failureException(msg)
  AssertionError: '' is not true : Console output was empty.

  its not clear why this has started failing. it may be a regression or
  a latent race in the test that we are now failing.

  def test_get_console_output_server_id_in_shutoff_status(self):
  """Test getting console output for a server in SHUTOFF status

  Should be able to GET the console output for a given server_id
  in SHUTOFF status.
  """

  # NOTE: SHUTOFF is irregular status. To avoid test instability,
  #   one server is created only for this test without using
  #   the server that was created in setUpClass.
  server = self.create_test_server(wait_until='ACTIVE')
  temp_server_id = server['id']

  self.client.stop_server(temp_server_id)
  waiters.wait_for_server_status(self.client, temp_server_id, 'SHUTOFF')
  self.wait_for(self._get_output)

  the test does not wait for the VM to be sshable so its possible that
  we are shutting off the VM before it is fully booted and no output has
  been written to the console.

  this failure has happened on multiple providers but only in the nova-lvm job.
  the console behavior is unrelated to the storage backend but the lvm job i 
belive is using
  lvm on a loopback file so the storage performance is likely slower then 
raw/qcow.

  so perhaps the boot is taking longer and no output is being written.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2028851/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2025813] Re: test_rebuild_volume_backed_server failing 100% on nova-lvm job

2023-07-11 Thread Sylvain Bauza
Changing the bug importance to High as the fix is merged in master
https://review.opendev.org/c/openstack/nova/+/887674

Keeping the stable branches status to Critical since the backports
aren't merged yet.

** Changed in: nova
   Importance: Critical => High

** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2025813

Title:
  test_rebuild_volume_backed_server failing 100% on nova-lvm job

Status in devstack-plugin-ceph:
  New
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) antelope series:
  In Progress
Status in OpenStack Compute (nova) yoga series:
  Triaged
Status in OpenStack Compute (nova) zed series:
  Triaged

Bug description:
  After the tempest patch was merged [1] nova-lvm job started to fail
  with the following error in test_rebuild_volume_backed_server:

  
  Traceback (most recent call last):
File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 70, in 
wrapper
  return f(*func_args, **func_kwargs)
File 
"/opt/stack/tempest/tempest/api/compute/servers/test_server_actions.py", line 
868, in test_rebuild_volume_backed_server
  self.get_server_ip(server, validation_resources),
File "/opt/stack/tempest/tempest/api/compute/base.py", line 519, in 
get_server_ip
  return compute.get_server_ip(
File "/opt/stack/tempest/tempest/common/compute.py", line 76, in 
get_server_ip
  raise lib_exc.InvalidParam(invalid_param=msg)
  tempest.lib.exceptions.InvalidParam: Invalid Parameter passed: When 
validation.connect_method equals floating, validation_resources cannot be None

  As discussed on IRC with Sean [2], the SSH validation is mandatory now
  which is disabled in the job config [2].

  [1] https://review.opendev.org/c/openstack/tempest/+/831018
  [2] 
https://meetings.opendev.org/irclogs/%23openstack-nova/%23openstack-nova.2023-07-04.log.html#t2023-07-04T15:33:38
  [3] 
https://opendev.org/openstack/nova/src/commit/4b454febf73cdd7b5be0a2dad272c1d7685fac9e/.zuul.yaml#L266-L267

To manage notifications about this bug go to:
https://bugs.launchpad.net/devstack-plugin-ceph/+bug/2025813/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2024160] Re: [trunk ports] subport doesn't reach status ACTIVE

2023-06-20 Thread Sylvain Bauza
** Also affects: nova
   Importance: Undecided
   Status: New

** Changed in: nova
   Status: New => Confirmed

** Changed in: nova
   Importance: Undecided => High

** Changed in: nova
   Status: Confirmed => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2024160

Title:
  [trunk ports] subport doesn't reach status ACTIVE

Status in neutron:
  Confirmed
Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Test test_live_migration_with_trunk has been failing for the last two days.
  
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_3f3/831018/30/check/nova-live-migration/3f3b065/testr_results.html

  It's a test about live-migration, but it is important to notice that
  it fails before any live migration happens.

  The test creates a VM with a port and a subport.
  The test waits until the VM status is ACTIVE -> this passes
  The test waits until the subport status is ACTIVE -> this started failing two 
days ago because the port status is DOWN

  There was only one neutron patch merged that day[1], but I checked the
  test failed during some jobs even before that patch was merged.

  
  I compared some logs.
  Neutron logs when the test passes: [2]
  Neutron logs when the test fails: [3]

  When it fails, I see this during the creation of the subport (and I don't see 
this event when it passes):
  Jun 14 18:13:43.052982 np0034303809 neutron-server[77531]: DEBUG 
ovsdbapp.backend.ovs_idl.event [None req-929dd199-4247-46f5-9466-622c7d538547 
None None] Matched DELETE: PortBindingUpdateVirtualPortsEvent(events=('update', 
'delete'), table='Port_Binding', conditions=None, old_conditions=None), 
priority=20 to row=Port_Binding(parent_port=[], mac=['fa:16:3e:93:9d:5a 
19.80.0.42'], chassis=[], ha_chassis_group=[], options={'mcast_flood_reports': 
'true', 'requested-chassis': ''}, type=, tag=[], requested_chassis=[], 
tunnel_key=2, up=[False], logical_port=f8c707ec-ecd8-4f1e-99ba-6f8303b598b2, 
gateway_chassis=[], encap=[], external_ids={'name': 
'tempest-subport-2029248863', 'neutron:cidrs': '19.80.0.42/24', 
'neutron:device_id': '', 'neutron:device_owner': '', 'neutron:network_name': 
'neutron-5fd9faa7-ec1c-4f42-ab87-6ce19edda245', 'neutron:port_capabilities': 
'', 'neutron:port_name': 'tempest-subport-2029248863', 'neutron:project_id': 
'6f92a9f8e16144148026725b25711d3a', 'neutron:revision_n
 umber': '1', 'neutron:security_group_ids': 
'5eab41ef-c5c1-425c-a931-f5b6b4b330ad', 'neutron:subnet_pool_addr_scope4': '', 
'neutron:subnet_pool_addr_scope6': '', 'neutron:vnic_type': 'normal'}, 
virtual_parent=[], nat_addresses=[], 
datapath=3c472399-d6ee-4b7c-aa97-6777f2bc2772) old= {{(pid=77531) matches 
/usr/local/lib/python3.10/dist-packages/ovsdbapp/backend/ovs_idl/event.py:43}}
  ...
  Jun 14 18:13:49.597911 np0034303809 neutron-server[77531]: DEBUG 
neutron.plugins.ml2.plugin [None req-3588521e-7878-408d-b1f8-15db562c69f8 None 
None] Port f8c707ec-ecd8-4f1e-99ba-6f8303b598b2 cannot update to ACTIVE because 
it is not bound. {{(pid=77531) _port_provisioned 
/opt/stack/neutron/neutron/plugins/ml2/plugin.py:361}}


  It seems the ovn version has changed between these jobs:
  Passes [4]:
  2023-06-14 10:01:46.358875 | controller | Preparing to unpack 
.../ovn-common_22.03.0-0ubuntu1_amd64.deb ...

  
  Fails [5]:
  2023-06-14 17:55:07.077377 | controller | Preparing to unpack 
.../ovn-common_22.03.2-0ubuntu0.22.04.1_amd64.deb ...





  [1] https://review.opendev.org/c/openstack/neutron/+/883687
  [2] 
https://96b562ba0d2478fe5bc1-d58fbc463536b3122b4367e996d5e5b0.ssl.cf1.rackcdn.com/831018/30/check/nova-live-migration/312c2ab/controller/logs/screen-q-svc.txt
  [3] 
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_3f3/831018/30/check/nova-live-migration/3f3b065/controller/logs/screen-q-svc.txt
  [4] 
https://96b562ba0d2478fe5bc1-d58fbc463536b3122b4367e996d5e5b0.ssl.cf1.rackcdn.com/831018/30/check/nova-live-migration/312c2ab/job-output.txt
  [5] 
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_3f3/831018/30/check/nova-live-migration/3f3b065/job-output.txt

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2024160/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2023018] [NEW] scaling governors are optional for some OS platforms

2023-06-06 Thread Sylvain Bauza
Public bug reported:

Some OS platforms don't use cpufreq, so operators should be able to just
offline their CPUs.

For the moment, even if the config option CPU management strategy is
'cpu_state', we return an exception if so.

Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service 
Traceback (most recent call last):
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service   
File "/opt/stack/nova/nova/filesystem.py", line 37, in read_sys
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service 
with open(os.path.join(SYS, path), mode='r') as data:
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service 
FileNotFoundError: [Errno 2] No such file or directory: 
'/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor'
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service 
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service The 
above exception was the direct cause of the following exception:
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service 
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service 
Traceback (most recent call last):
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service   
File "/usr/local/lib/python3.10/dist-packages/oslo_service/service.py", line 
806, in run_service
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service 
service.start()
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service   
File "/opt/stack/nova/nova/service.py", line 162, in start
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service 
self.manager.init_host(self.service_ref)
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service   
File "/opt/stack/nova/nova/compute/manager.py", line 1608, in init_host
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service 
self.driver.init_host(host=self.host)
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service   
File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 825, in init_host
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service 
libvirt_cpu.validate_all_dedicated_cpus()
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service   
File "/opt/stack/nova/nova/virt/libvirt/cpu/api.py", line 143, in 
validate_all_dedicated_cpus
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service 
governors.add(pcpu.governor)
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service   
File "/opt/stack/nova/nova/virt/libvirt/cpu/api.py", line 63, in governor
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service 
return core.get_governor(self.ident)
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service   
File "/opt/stack/nova/nova/virt/libvirt/cpu/core.py", line 69, in get_governor
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service 
return filesystem.read_sys(
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service   
File "/opt/stack/nova/nova/filesystem.py", line 40, in read_sys
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service 
raise exception.FileNotFound(file_path=path) from exc
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service 
nova.exception.FileNotFound: File 
/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor could not be found.
Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service 

Let's just to support the CPU state if so.

** Affects: nova
 Importance: Low
 Assignee: Sylvain Bauza (sylvain-bauza)
 Status: In Progress


** Tags: cpu libvirt

** Summary changed:

- scaling governors are optional for some OS plateforms
+ scaling governors are optional for some OS platforms

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2023018

Title:
  scaling governors are optional for some OS platforms

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  Some OS platforms don't use cpufreq, so operators should be able to
  just offline their CPUs.

  For the moment, even if the config option CPU management strategy is
  'cpu_state', we return an exception if so.

  Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service 
Traceback (most recent call last):
  Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service   
File "/opt/stack/nova/nova/filesystem.py", line 37, in read_sys
  Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service   
  with open(os.path.join(SYS, pat

[Yahoo-eng-team] [Bug 2022955] [NEW] FileNotFound when offlining a core due to a privsep context missing

2023-06-05 Thread Sylvain Bauza
Public bug reported:

When we created the CPU power interface, we forgot to add a specific privsep 
decorator for the set_offline() method :
https://review.opendev.org/c/openstack/nova/+/868236/5/nova/virt/libvirt/cpu/core.py#63

As a result, we have a FileNotFound due to a permission error when
restarting the nova-compute service :

Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service 
Traceback (most recent call last):
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service   
File "/opt/stack/nova/nova/filesystem.py", line 56, in write_sys
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service 
with open(os.path.join(SYS, path), mode='w') as fd:
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service 
PermissionError: [Errno 13] Permission denied: 
'/sys/devices/system/cpu/cpu1/online'
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service 
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service The 
above exception was the direct cause of the following exception:
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service 
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service 
Traceback (most recent call last):
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service   
File "/usr/local/lib/python3.10/dist-packages/oslo_service/service.py", line 
806, in run_service
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service 
service.start()
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service   
File "/opt/stack/nova/nova/service.py", line 162, in start
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service 
self.manager.init_host(self.service_ref)
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service   
File "/opt/stack/nova/nova/compute/manager.py", line 1608, in init_host
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service 
self.driver.init_host(host=self.host)
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service   
File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 831, in init_host
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service 
libvirt_cpu.power_down_all_dedicated_cpus()
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service   
File "/opt/stack/nova/nova/virt/libvirt/cpu/api.py", line 128, in 
power_down_all_dedicated_cpus
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service 
pcpu.online = False
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service   
File "/opt/stack/nova/nova/virt/libvirt/cpu/api.py", line 50, in online
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service 
core.set_offline(self.ident)
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service   
File "/opt/stack/nova/nova/virt/libvirt/cpu/core.py", line 64, in set_offline
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service 
filesystem.write_sys(os.path.join(gen_cpu_path(core), 'online'), data='0')
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service   
File "/opt/stack/nova/nova/filesystem.py", line 59, in write_sys
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service 
raise exception.FileNotFound(file_path=path) from exc
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service 
nova.exception.FileNotFound: File /sys/devices/system/cpu/cpu1/online could not 
be found.
Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service

** Affects: nova
 Importance: Undecided
 Assignee: Sylvain Bauza (sylvain-bauza)
 Status: In Progress


** Tags: cpu libvirt

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2022955

Title:
  FileNotFound when offlining a core due to a privsep context missing

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  When we created the CPU power interface, we forgot to add a specific privsep 
decorator for the set_offline() method :
  
https://review.opendev.org/c/openstack/nova/+/868236/5/nova/virt/libvirt/cpu/core.py#63

  As a result, we have a FileNotFound due to a permission error when
  restarting the nova-compute service :

  Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service 
Traceback (most recent call last):
  Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service   
File "/opt/stack/nova/nova/filesystem.py", line 56, in write_sys
  Jun 05 16:18:49 sbauza-dev2 nova-compute[76374]: ERROR oslo_service.service   
  

[Yahoo-eng-team] [Bug 2018318] Re: 'openstack server resize --flavor' should not migrate VMs to another AZ

2023-05-12 Thread Sylvain Bauza
I left a very large comment on Gerrit but I'll add it here for better
visibility.

FWIW, I think the problem is legit and needs to be addressed. I'm gonna
change the title and the subject to make it clearer but I also think
that the solution isn't simple at first and requires some design
discussion, hence the Wishlist status.

Now, the comment I wrote explaining my -1 (you can find it here
https://review.opendev.org/c/openstack/nova/+/864760/comment/b2b03637_f15d6dd2/
)

=
> Just because you say so? =)

> Can you provide a more technical explanation on why not? I mean, why
would that be wrong? Or, what alternative would be better, and why?

Sorry, that's kind of a non-documented design consensus (or a tribal knowledge 
if you prefer)
We, as the Nova community, want to keep the RequestSpec.availability_zone 
record as an immutable object, that is only set when creating the RequestSpec, 
so then we know whether the user wanted to pin the instance to a specific AZ or 
not.

> What is your proposal? We see the following two different alternatives
so far. [...]

Maybe you haven't seen my proposal before, but I was talking of 
https://review.opendev.org/c/openstack/nova/+/469675/12/nova/compute/api.py#1173
 that was merged.
See again my comment 
https://review.opendev.org/c/openstack/nova/+/864760/comments/4a302ce3_9805e7c6
TBC, lemme explain the problem and what we need to fix : if an user creates an 
instance with an image and asking to create a volume on that image, then we 
need to modify the AZ for the related Request if and only if 
cross_az_attach=False

Now, let's discuss the implementation :
1/ we know that volumes are created way later in the instance boot by the 
compute service, but we do pass the information of the instance.az to Cinder to 
tell it to create a volume within that AZ if cross_az_attach=False :
https://github.com/openstack/nova/blob/b3fdd7ccf01bafb68e37a457f703b79119dbfa86/nova/virt/block_device.py#L427
https://github.com/openstack/nova/blob/b3fdd7ccf01bafb68e37a457f703b79119dbfa86/nova/virt/block_device.py#L53-L78

2/ unfortunately,instance.availability_zone is only trustworthy if the
instance is pinned to an AZ

3/ we know that at the API level, we're able to know whether we will create a 
volume based on an image since we have the BDMs and we do check them :
https://github.com/openstack/nova/blob/b3fdd7ccf01bafb68e37a457f703b79119dbfa86/nova/compute/api.py#L1460
https://github.com/openstack/nova/blob/b3fdd7ccf01bafb68e37a457f703b79119dbfa86/nova/compute/api.py#L1866
https://github.com/openstack/nova/blob/b3fdd7ccf01bafb68e37a457f703b79119dbfa86/nova/compute/api.py#L1960-L1965C43

4/ Accordingly, we are able to follow the same logic than in
https://github.com/openstack/nova/blob/b3fdd7ccf01bafb68e37a457f703b79119dbfa86/nova/compute/api.py#L1396-L1397
by checking the BDMs and see whether we gonna create a volume. If so, we
SHALL pin the AZ exactly like
https://github.com/openstack/nova/blob/b3fdd7ccf01bafb68e37a457f703b79119dbfa86/nova/compute/api.py#L1264

Unfortunately, since the user didn't specify an AZ, Nova doesn' know
which AZ to pin the instance to. Consequently, we have multiple options
:

1/ we could return an exception to the user if he didn't pinned the instance. 
That said, I really don't like this UX since the user doesn't know whether 
cross_az_attach is False or not
2/ we could document the fact that cross_az_attach only works with pre-created 
volumes.
3/ we could pre-create the volume way earlier at the API level and get its AZ.
4/ we could augment the RequestSpec to have a field saying 'pinned' or 
something else that the scheduler would honor on a move operation even if 
RequestSpec.az is None

As you see, all those options need to be correctly discussed, so IMHO
I'd prefer you to draft a spec so the nova community could address those
points so we could find an approved design solution.

HTH.

** Changed in: nova
   Status: Invalid => Confirmed

** Changed in: nova
   Importance: Undecided => Wishlist

** Summary changed:

- 'openstack server resize --flavor' should not migrate VMs to another AZ
+ cross_az_attach=False doesn't honor BDMs with source=image and dest=volume

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2018318

Title:
  cross_az_attach=False doesn't honor BDMs with source=image and
  dest=volume

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  The config flag cross_az_attach allows an instance to be pinned to the
  related volume AZ if the value of that config option is set to False.

  
  We fixed the case of a volume-backed instance by 
https://review.opendev.org/c/openstack/nova/+/469675/ if the volume was created 
before the instance but we haven't yet resolved the case of an BFV-instance 
created from an image (the BDM shortcut that allows a late creation of a volume 
by the 

[Yahoo-eng-team] [Bug 2018398] Re: Wrong AZ gets showed when adding new compute node

2023-05-03 Thread Sylvain Bauza
While I understand your concern, I think you missed the intent of
default_availability_zone

This config option is not intended for scheduling instances, but rather
for showing off by default one AZ if none exists.

With your environment, you could just decide to define any existing AZ
as default_availability_zone, this would prevent 'nova' to show off in
the AZ list.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2018398

Title:
  Wrong AZ gets showed when adding new compute node

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  On a deployment with multi availability zones. When the operator adds
  a new compute host, the service gets registered as part of
  “default_availability_zone”.

  This is an undesirable behavior for users as they see a new AZ
  appearing which may not be related to the deployment the time window
  that the host finally gets configured to its correct AZ.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2018398/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2018375] [NEW] routed networks prefilter exception due to subnets can have no segments

2023-05-03 Thread Sylvain Bauza
Public bug reported:

Since some subnets can not have some related segments, the
subnet.segment_uuid value can be None but unfortunately, the
routed_networks_filter prefilter doesn't support it.


2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server 
[req-ed1b01c5-01bd-493f-8b56-b4cb21e29f59 e416974adb7a44fd910a40b208d28e9f
d7b8b3323ea64f35adeec903c340a19e - default default] Exception during message 
handling: KeyError: 'segment_id'
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server Traceback (most 
recent call last):
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3.9/site-packages/oslo_messaging/rpc/server.py", line 165, in
_process_incoming
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server res = 
self.dispatcher.dispatch(message)
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3.9/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, 
in
dispatch
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server return 
self._do_dispatch(endpoint, method, ctxt, args)
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3.9/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, 
in
_do_dispatch
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server result = 
func(ctxt, **new_args)
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3.9/site-packages/oslo_messaging/rpc/server.py", line 241, in 
inner
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server return 
func(*args, **kwargs)
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3.9/site-packages/nova/scheduler/manager.py", line 140, in
select_destinations
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server 
request_filter.process_reqspec(ctxt, spec_obj)
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3.9/site-packages/nova/scheduler/request_filter.py", line 387, 
in
process_reqspec
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server filter(ctxt, 
request_spec)
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3.9/site-packages/nova/scheduler/request_filter.py", line 41, in
wrapper
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server ran = fn(ctxt, 
request_spec)
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3.9/site-packages/nova/scheduler/request_filter.py", line 348, 
in
routed_networks_filter
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server aggregates = 
utils.get_aggregates_for_routed_network(
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3.9/site-packages/nova/scheduler/utils.py", line 1390, in
get_aggregates_for_routed_network
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server segment_ids = 
network_api.get_segment_ids_for_network(
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3.9/site-packages/nova/network/neutron.py", line 3610, in
get_segment_ids_for_network
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server return 
[subnet['segment_id'] for subnet in subnets
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python3.9/site-packages/nova/network/neutron.py", line 3611, in 

2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server if 
subnet['segment_id'] is not None]
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server KeyError: 
'segment_id'
2023-05-02 22:38:10.382 12 ERROR oslo_messaging.rpc.server
2023-05-02 22:38:15.178 11 DEBUG nova.scheduler.manager 
[req-798de5ac-273e-40fd-abce-36e701488046 e416974adb7a44fd910a40b208d28e9f
d7b8b3323ea64f35adeec903c340a19e - default default] Starting to schedule for 
instances: ['412ca82a-06a4-40d9-b12d-08c56a78c5a9'] select_destinations
/usr/lib/python3.9/site-packages/nova/scheduler/manager.py:124

** Affects: nova
 Importance: Low
 Assignee: Sylvain Bauza (sylvain-bauza)
 Status: Confirmed


** Tags: neutron scheduler

** Changed in: nova
   Status: New => Confirmed

** Changed in: nova
   Importance: Undecided => Low

** Changed in: nova
 Assignee: (unassigned) => Sylvain Bauza (sylvain-bauza)

** Tags added: neutron scheduler

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2018375

Title:
  routed networks prefilter exception due to subnets can have no
  segments

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  Since some subnets can not have some related segments, the
  subnet.segment_uuid value can be None but unfortunately, the
  routed_networks_filter prefilter doesn't support it.

  
  2023-05-02 22:3

[Yahoo-eng-team] [Bug 2012843] Re: Instances are free to move others zone when AZ was not specified

2023-04-11 Thread Sylvain Bauza
This is an expected behaviour, as you can read in
https://docs.openstack.org/nova/latest/admin/availability-
zones.html#resource-affinity

If an instance is not pinned for a AZ [1] and if cross_az_attach is equal to 
True, then the instance can float between *all* Availability Zones.
We only pin the instance to a specific AZ if cross_az_attach=False. See the new 
functional test that verifies this : 
https://review.opendev.org/c/openstack/nova/+/878948/1/nova/tests/functional/test_cross_az_attach.py


[1] By 'pinned', I mean that either the AZ parameter for the instance is set, 
or the 'default_schedule_az' config option is not 'None'.


** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2012843

Title:
  Instances are free to move others zone when AZ was not specified

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Description
  ===

  Instances are free move to others zone by migrating or Masakari(host
  down) when instances dont specify you when launch and
  cross_az_attach=true.

  Steps to reproduce
  ==

  Launch instance without AZ choice(Any Availability Zone on Horizon or
  CLI).

  
  Expected result
  ===

  Instances should move to its current AZ only when migrating or
  Masakari HA.

  Actual result
  =

  Instances were moved to other AZs when migrating or Masakari HA.

  Environment
  ===
  Xena
  KVM
  Openswitch
  SAN
  Provider network

  Logs & Configs
  ==
  cross_az_attach=true.

  Any Availabily Zone when launch instance on Horizon

  Default AZ was not set in nova.conf

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2012843/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2011567] Re: Cycle theme page is empty

2023-04-11 Thread Sylvain Bauza
Closing this bug report as we said in the PTG that we don't have any
cycle themes for Bobcat.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2011567

Title:
  Cycle theme page is empty

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  here: https://specs.openstack.org/openstack/nova-specs/

  Under nova project plans -> Priorities

  
https://specs.openstack.org/openstack/nova-specs/priorities/ussuri-priorities.html
  ...
  
https://specs.openstack.org/openstack/nova-specs/priorities/2023.1-priorities.html

  
  Since Ussuri cycle theme page is no filled.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2011567/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2012049] Re:

2023-04-11 Thread Sylvain Bauza
Can you add more details about your problem ?

At least, I see a MessagingTimeout from the logs, so I'm pretty sure
this isn't a Nova bug, rather a configuration issue.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2012049

Title:
  

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  2023-03-17 09:06:10.494 14925 INFO nova.api.openstack.wsgi 
[req-f9a44aff-0978-4fe8-be6a-7c9fb331f6f9 8cf563f1906a43fd9ffe0c3ef4cbb2cf 
8326c19d61c146d29b229ededb704804 - default default] HTTP exception thrown: 
Instance wlk-ubuntu-20.04-3 could not be found.
  2023-03-17 09:06:10.496 14925 INFO nova.osapi_compute.wsgi.server 
[req-f9a44aff-0978-4fe8-be6a-7c9fb331f6f9 8cf563f1906a43fd9ffe0c3ef4cbb2cf 
8326c19d61c146d29b229ededb704804 - default default] 192.168.8.213 "GET 
/v2.1/servers/wlk-ubuntu-20.04-3 HTTP/1.1" status: 404 len: 513 time: 1.0066791
  2023-03-17 09:06:10.598 14925 INFO nova.osapi_compute.wsgi.server 
[req-afecaa68-03f2-4304-a72b-3053e2a41cee 8cf563f1906a43fd9ffe0c3ef4cbb2cf 
8326c19d61c146d29b229ededb704804 - default default] 192.168.8.213 "GET 
/v2.1/servers?name=wlk-ubuntu-20.04-3 HTTP/1.1" status: 200 len: 700 time: 
0.0988359
  2023-03-17 09:06:10.938 14925 INFO nova.osapi_compute.wsgi.server 
[req-742f9b23-db89-46ed-95bc-303beda80ce6 8cf563f1906a43fd9ffe0c3ef4cbb2cf 
8326c19d61c146d29b229ededb704804 - default default] 192.168.8.213 "GET 
/v2.1/servers/629397d2-1c46-4e22-844f-cc40cd1831bb HTTP/1.1" status: 200 len: 
1989 time: 0.3370461
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi 
[req-09346754-8b4e-40c0-b824-acbaed9cec4c 8cf563f1906a43fd9ffe0c3ef4cbb2cf 
8326c19d61c146d29b229ededb704804 - default default] Unexpected exception in API 
method: MessagingTimeout: Timed out waiting for a reply to message ID 
d7605c2e96d6495e802662b3fb267384
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi Traceback (most 
recent call last):
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/dist-packages/nova/api/openstack/wsgi.py", line 788, in 
wrapped
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi return 
f(*args, **kwargs)
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/dist-packages/nova/api/validation/__init__.py", line 108, 
in wrapper
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/dist-packages/nova/api/openstack/compute/remote_consoles.py",
 line 52, in get_vnc_console
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi console_type)
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 196, in wrapped
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi return 
function(self, context, instance, *args, **kwargs)
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 186, in inner
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi return 
f(self, context, instance, *args, **kw)
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 3737, in 
get_vnc_console
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi 
access_url=connect_info['access_url'])
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/dist-packages/nova/consoleauth/rpcapi.py", line 93, in 
authorize_console
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi return 
cctxt.call(ctxt, 'authorize_console', **msg_args)
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 174, in 
call
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi 
retry=self.retry)
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 131, in 
_send
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi 
timeout=timeout, retry=retry)
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 
559, in send
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi retry=retry)
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 
548, in _send
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi result = 
self._waiter.wait(msg_id, timeout)
  2023-03-17 09:07:11.076 14925 ERROR nova.api.openstack.wsgi   

[Yahoo-eng-team] [Bug 2012873] Re: [nova][DOC] stackalytics links are not updated in openstack wiki

2023-04-11 Thread Sylvain Bauza
This is a wikipage, you can just update it directly.
Closing this bug report as it's not needing to have a Gerrit change.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2012873

Title:
  [nova][DOC] stackalytics links are not updated in openstack wiki

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  here https://wiki.openstack.org/wiki/Nova/CoreTeam

  below links should be updated
  Last 30 Days:
  https://stackalytics.com/report/contribution/nova/30
  to
  
https://www.stackalytics.io/report/contribution?module=nova-group_type=openstack=30

  Last 90 days
  https://stackalytics.com/report/contribution/nova/90
  to
  
https://www.stackalytics.io/report/contribution?module=nova-group_type=openstack=90

  Last 180 Days
  https://stackalytics.com/report/contribution/nova/180
  to
  
https://www.stackalytics.io/report/contribution?module=nova-group_type=openstack=180

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2012873/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2009263] Re: 'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted interval by 7230.58 sec

2023-03-17 Thread Sylvain Bauza
This looks to me that the servicegroup API wasn't able to query the DB
to find the compute state value. For some reason, the conductor can't
connect to the DB.

Anyway, closing this one as this is not related to a project development
bug.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2009263

Title:
  'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
  interval by 7230.58 sec

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  nova-conductor service to access the DB service to delay 7230.58s

  The more information is:
  2023-03-02 18:39:53.726 22 INFO nova.servicegroup.drivers.db [-] Recovered 
from being unable to report status.
  2023-03-02 18:39:53.727 19 INFO nova.servicegroup.drivers.db [-] Recovered 
from being unable to report status.
  2023-03-02 18:39:53.729 23 INFO nova.servicegroup.drivers.db [-] Recovered 
from being unable to report status.
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db [-] Unexpected 
error while reporting service status: oslo_db.exception.DBConnectionError: 
(pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during 
query')
  (Background on this error at: http://sqlalche.me/e/e3q8)
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db Traceback (most 
recent call last):
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db   File 
"/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", 
line 812, in _checkout
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db raise 
exc.InvalidatePoolError()
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db 
sqlalchemy.exc.InvalidatePoolError: ()
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db During handling 
of the above exception, another exception occurred:
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db Traceback (most 
recent call last):
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db   File 
"/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", 
line 2285, in _wrap_pool_connect
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db return fn()
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db   File 
"/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", 
line 363, in connect
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db return 
_ConnectionFairy._checkout(self)
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db   File 
"/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", 
line 842, in _checkout
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db 
fairy._connection_record._checkin_failed(err)
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db   File 
"/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py",
 line 69, in __exit__
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db exc_value, 
with_traceback=exc_tb,
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db   File 
"/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", 
line 178, in raise_
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db raise 
exception
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db   File 
"/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", 
line 838, in _checkout
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db 
fairy._connection_record.get_connection()
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db   File 
"/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", 
line 606, in get_connection
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db 
self.__connect()
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db   File 
"/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", 
line 657, in __connect
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db 
pool.logger.debug("Error on connect(): %s", e)
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db   File 
"/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py",
 line 69, in __exit__
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db exc_value, 
with_traceback=exc_tb,
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db   File 
"/var/lib/kolla/venv/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", 
line 178, in raise_
  2023-03-02 18:39:53.732 20 ERROR nova.servicegroup.drivers.db raise 
exception
  2023-03-02 18:39:53.732 20 

[Yahoo-eng-team] [Bug 2003803] Re: Unexpected API error

2023-03-17 Thread Sylvain Bauza
It looks your environment is not able to call the Neutron API, hence the
exception.

Sorry, but this is not a nova bug report, hence me closing it.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2003803

Title:
  Unexpected API error

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  While running masakari test this unexpected API error occurred, it
  says to open a bug

  
  2023-01-20-21:01:31 keystoneauth.session DEBUG RESP: [500] Connection: close 
Content-Length: 224 Content-Type: application/json; charset=UTF-8 Date: Fri, 20 
Jan 2023 21:01:30 GMT OpenStack-API-Version: compute 2.72 Server: Apache/2.4.41 
(Ubuntu) Vary: OpenStack-API-Version,X-OpenStack-Nova-API-Version 
X-OpenStack-Nova-API-Version: 2.72 x-compute-request-id: 
req-7538e84d-39fb-4146-98f8-43d446b58398 x-openstack-request-id: 
req-7538e84d-39fb-4146-98f8-43d446b58398
  2023-01-20-21:01:31 keystoneauth.session DEBUG RESP BODY: {"computeFault": 
{"code": 500, "message": "Unexpected API Error. Please report this at 
http://bugs.launchpad.net/nova/ and attach the Nova API log if 
possible.\n"}}

  
  Logs
  
https://oil-jenkins.canonical.com/artifacts/72b27e0d-2ef1-4b72-9d0c-b29407bfa746/generated/generated/openstack/juju-crashdump-openstack-2023-01-20-21.02.27.tar.gz

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2003803/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2011564] Re: launchpad link do not work in this page

2023-03-14 Thread Sylvain Bauza
Sorry Amit, you maybe missed the point. Those URLs are actually fake,
those are just examples for explaining what you need to do when you
create a Launchpad feature.

For https://review.opendev.org/q/status:open+project:openstack/nova-
specs+message:apiimpact that just means that we don't have any open
changes having a commit message adding an APIImpact tag.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2011564

Title:
  launchpad  link do not work in this page

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Description
  ===
  launchpad links do not work from this/these pages:
  https://specs.openstack.org/openstack/nova-specs/specs/wallaby/template.html
  ..
  https://specs.openstack.org/openstack/nova-specs/specs/2023.1/template.html

  
  ex: 
  https://blueprints.launchpad.net/nova/+spec/example
  https://blueprints.launchpad.net/nova/+spec/awesome-thing
  
https://review.opendev.org/q/status:open+project:openstack/nova-specs+message:apiimpact

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2011564/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2007922] Re: Cleanup pending instances in "building" state

2023-02-22 Thread Sylvain Bauza
Well, I don't really know about the root cause and why the
map_instances() wasn't adding the cell UUID directly when it created the
record here first. Now we have a transaction like Mohamed said so it
shouldn't be a problem.

Closing this bug report now.


** Changed in: nova
   Status: Incomplete => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2007922

Title:
  Cleanup pending instances in "building" state

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Following up on the ML thread [1], it was recommended to create a bug report.
  After a network issue in a Victoria cluster (3 control nodes in HA mode, 26 
compute nodes) some instance builds were interrupted. Some of them could be 
cleaned up with 'openstack server delete' but two of them can not. They already 
have a mapping but can not be removed (or "reset-state") by nova. Those are 
both amphora instances from octavia:

  control01:~ # openstack server list --project service -c ID -c Name -c Status 
-f value | grep BUILD
  0453a7e5-e4f9-419b-ad71-d837a20ef6bb 
amphora-0ee32901-0c59-4752-8253-35b66da176ea BUILD
  dc8cdc3a-f6b2-469b-af6f-ba2aa130ea9b 
amphora-4990a47b-fe8a-431a-90ec-5ac2368a5251 BUILD

  control01:~ # openstack server delete 
amphora-0ee32901-0c59-4752-8253-35b66da176ea
  No server with a name or ID of
  'amphora-0ee32901-0c59-4752-8253-35b66da176ea' exists.

  control01:~ # openstack server show 0453a7e5-e4f9-419b-ad71-d837a20ef6bb
  ERROR (CommandError): No server with a name or ID of
  '0453a7e5-e4f9-419b-ad71-d837a20ef6bb' exists.

  The database tables referring to the UUID
  0453a7e5-e4f9-419b-ad71-d837a20ef6bb are these:

  nova_cell0/instance_id_mappings.ibd
  nova_cell0/instance_info_caches.ibd
  nova_cell0/instance_extra.ibd
  nova_cell0/instances.ibd
  nova_cell0/instance_system_metadata.ibd
  octavia/amphora.ibd
  nova_api/instance_mappings.ibd
  nova_api/request_specs.ibd

  I can provide both debug logs and database queries, just let me know
  what exactly is required.

  The storage back end is ceph (Pacific), we use neutron with
  OpenVSwitch, the exact nova versions are:

  control01:~ # rpm -qa | grep nova
  openstack-nova-conductor-22.2.2~dev15-lp152.1.25.noarch
  openstack-nova-api-22.2.2~dev15-lp152.1.25.noarch
  openstack-nova-novncproxy-22.2.2~dev15-lp152.1.25.noarch
  python3-novaclient-17.2.0-lp152.3.2.noarch
  openstack-nova-scheduler-22.2.2~dev15-lp152.1.25.noarch
  openstack-nova-22.2.2~dev15-lp152.1.25.noarch
  python3-nova-22.2.2~dev15-lp152.1.25.noarch

  [1] https://lists.openstack.org/pipermail/openstack-
  discuss/2023-February/032308.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2007922/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2006770] Re: server list with IP filter doesn't work as expected

2023-02-20 Thread Sylvain Bauza
Honestly, I don't know what to say here. When the query parameter was
added, it was just for a convenient purpose for operators to prevent
them to query Neutron first to get the list of ports but this was
actually some kind of orchestration we try to avoid.

Keeping in mind that an instance can be booted with a port that doesn't
have L3 connectivity, I'm not super happy with fixing all of this while
it's better to say 'please rather directly call Neutron to get the list
of ports that match your IP and then ask Nova to give you the list of
instances that have those ports bound to them'.

I'd rather deprecate this IP address query param and provide a good api-ref 
documentation explaining what's the recommended way. 
As a sidenote, since IP substring filtering is a Neutron extension which is not 
provided for all clouds, we can't and shouldn't rely on it for getting answers.

Putting the report to Opinion but we'll debate it in the next weeks.

** Changed in: nova
   Status: New => Opinion

** Changed in: nova
   Importance: Undecided => Low

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2006770

Title:
  server list with IP filter doesn't work as expected

Status in OpenStack Compute (nova):
  Opinion

Bug description:
  If a project has two servers with 10.10.10.10 and 10.10.10.109 IPs,
  the "curl -s 'https://nova:443/v2.1/servers?ip=10.10.10.10'" request
  returns two servers in a response.

  This happens because neutron API has an "ip-substring-filtering"
  extension turned on:

  $ curl -s "https://neutron/v2.0/extensions; -H "X-Auth-Token: 
${OS_AUTH_TOKEN}" | jq -r 
'.extensions[]|select(.alias=="ip-substring-filtering")'
  {
"name": "IP address substring filtering",
"alias": "ip-substring-filtering",
"description": "Provides IP address substring filtering when listing ports",
"updated": "2017-11-28T09:00:00-00:00",
"links": []
  }

  And there is no possibility to filter IPs with an exact match like
  it's done with a
  "https://neutron/v2.0/ports?fixed_ips=ip_address%3D10.10.10.10; call.

  

  Another problem is that ip/ip6 fields are marked as regexp in both
  SCHEMA and CLI:

  
https://github.com/openstack/nova/blob/49aa40394a4857a06191b05ea3b15913f328a8d0/nova/api/openstack/compute/schemas/servers.py#L638-L639
  (values which are not regexp compatible are rejected on the early
  stage)

  $ openstack server list --help | grep -- --ip
   [--ip ]
   [--ip6 ] [--name ]
--ip 
--ip6 

  But they are not considered as regexp afterwards. Moreover the
  
https://github.com/openstack/nova/blob/a2964417822bd1a4a83fa5c27282d2be1e18868a/nova/compute/api.py#L3028-L3039
  mapping doesn't work, because "fixed_ip" is never allowed in
  "search_opts" map.

  Changing "fixed_ip" key to an "ip" key (BTW, there is no "fixed_ip6"
  mapping, it also should be considered once someone decide to fix this
  issue) breaks substring filtering, because the filter finally becomes
  "'ip': '^10\\.10\\.10\\.10$'".

  Therefore if there is no "substring filtering" neutron extension, the
  regexp filter mappings must consider this (or even be removed).

  And the final call: there should be a way for a user to define whether
  user wants to use substr, exact match or regexp.

  See also: https://stackoverflow.com/questions/64549906/how-openstack-
  client-get-server-list-with-accurate-ip-address

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2006770/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2006467] Re: tempest ssh timeout due to udhcpc fails in the cirros guest

2023-02-15 Thread Sylvain Bauza
Okay, I did a bit of digging today for some other CI failure I saw on
another change and eventually, I found this was related.

So, lemme explain the issue here. First, I was looking at
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_6f9/868236/5/gate/nova-
next/6f9f3d0/ and I was wondering why the SSH connection wasn't working.

When I looked at the nova logs, I found that the instance was spawned at 
18:18:56 :
Feb 14 18:18:56.514945 np0033093378 nova-compute[83239]: INFO 
nova.compute.manager [None req-053318ab-09ad-4a3a-8ddb-633cc0002c3e 
tempest-AttachVolumeNegativeTest-1605485622 
tempest-AttachVolumeNegativeTest-1605485622-project] [instance: 
6a265379-ebfd-4aea-a081-8b271f32c0ea] Took 8.58 seconds to build instance.

Then, Tempest tried to ssh the instance at 18:18:59 :
2023-02-14 18:22:39.102680 | controller | 2023-02-14 18:18:59,630 92653 INFO
 [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.161:22' as 
'cirros' with public key authentication

And eventually, 2mins32sec after that (18:22:31), it stopped :
2023-02-14 18:22:39.103394 | controller | 2023-02-14 18:22:31,398 92653 ERROR   
 [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to 
cirros@172.24.5.161 after 16 attempts. Proxy client: no proxy client

Then, I tried to look at the guest console, and I saw that udhcpc tried 3 times 
:
2023-02-14 18:22:39.129636 | controller | [   12.638156] sr 0:0:0:0: Attached 
scsi generic sg0 type 5
[...]
2023-02-14 18:22:39.130384 | controller | Starting network: udhcpc: started, 
v1.29.3
2023-02-14 18:22:39.130415 | controller | udhcpc: sending discover
2023-02-14 18:22:39.130439 | controller | udhcpc: sending discover
2023-02-14 18:22:39.130461 | controller | udhcpc: sending discover


So, I was wondering how long the DHCP discovery was done and eventually, I 
found that cirros dhcp client actually hangs for 1 min before requesting again.

So, now I'm wondering why it takes so much time to get a DHCP address
and why the 2nd DHCP call doesn't get the IP address.

Adding Neutron team to this bug report because maybe we have something
about our DHCP controller.



** Also affects: neutron
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2006467

Title:
  tempest ssh timeout due to udhcpc fails in the cirros guest

Status in neutron:
  New
Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  Tests trying to ssh into the guest fails intermittently with timeout
  as udhcpc fails in the guest:

  2023-02-01 20:46:32.286979 | controller | Starting network: udhcpc:
  started, v1.29.3

  2023-02-01 20:46:32.286987 | controller | udhcp

  2023-02-01 20:46:32.286996 | controller | c: sending discover

  2023-02-01 20:46:32.287004 | controller | udhcpc: sending discover

  2023-02-01 20:46:32.287013 | controller | udhcpc: sending discover

  2023-02-01 20:46:32.287022 | controller | Usage: /sbin/cirros-dhcpc
  

  2023-02-01 20:46:32.287030 | controller | udhcpc: no lease, failing

  2023-02-01 20:46:32.287039 | controller | FAIL

  
  Traceback (most recent call last):
File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 70, in 
wrapper
  return f(*func_args, **func_kwargs)
File 
"/opt/stack/tempest/tempest/api/compute/admin/test_volumes_negative.py", line 
128, in test_multiattach_rw_volume_update_failure
  server1 = self.create_test_server(
File "/opt/stack/tempest/tempest/api/compute/base.py", line 272, in 
create_test_server
  body, servers = compute.create_test_server(
File "/opt/stack/tempest/tempest/common/compute.py", line 334, in 
create_test_server
  with excutils.save_and_reraise_exception():
File 
"/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/oslo_utils/excutils.py",
 line 227, in __exit__
  self.force_reraise()
File 
"/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/oslo_utils/excutils.py",
 line 200, in force_reraise
  raise self.value
File "/opt/stack/tempest/tempest/common/compute.py", line 329, in 
create_test_server
  wait_for_ssh_or_ping(
File "/opt/stack/tempest/tempest/common/compute.py", line 148, in 
wait_for_ssh_or_ping
  waiters.wait_for_ssh(
File "/opt/stack/tempest/tempest/common/waiters.py", line 632, in 
wait_for_ssh
  raise lib_exc.TimeoutException()
  tempest.lib.exceptions.TimeoutException: Request timed out
  Details: None

  Example failure
  
https://zuul.opendev.org/t/openstack/build/f1c6b7e54b28415c952de0be833731a9/logs

  Signature
  $ logsearch log --job-group nova-devstack  --result FAILURE 'udhcpc: no 
lease, failing' --days 7
  [snip]
  Builds with matching logs 6/138:
  

[Yahoo-eng-team] [Bug 2002951] Re: OOM kills python / mysqld in various nova devstack jobs

2023-02-15 Thread Sylvain Bauza
Moving the Nova status of the bug to Fix Released as
https://review.opendev.org/c/openstack/tempest/+/871000 fixed the root
cause for the failing nova jobs.

** Changed in: nova
   Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/2002951

Title:
  OOM kills python / mysqld in various nova devstack jobs

Status in Glance:
  New
Status in OpenStack Compute (nova):
  Fix Released
Status in tempest:
  Confirmed

Bug description:
  The following tests exited without returning a status
  and likely segfaulted or crashed Python:

  *
  
tempest.api.compute.admin.test_volume.AttachSCSIVolumeTestJSON.test_attach_scsi_disk_with_config_drive[id-777e468f-17ca-4da4-b93d-b7dbf56c0494]

  
  And in the syslog: 
https://zuul.opendev.org/t/openstack/build/f5aa5edd4d354c2685fc1f3e13d0ef77/log/controller/logs/syslog.txt#3688

  Jan 13 22:31:13 np0032729364 kernel: Out of memory: Killed process
  114509 (python) total-vm:4966188kB, anon-rss:3914748kB, file-
  rss:5080kB, shmem-rss:0kB, UID:1002 pgtables:9764kB oom_score_adj:0

  Example run:
  https://zuul.opendev.org/t/openstack/build/f5aa5edd4d354c2685fc1f3e13d0ef77

  I see this happening in multiple jobs in the last 10 days:
  * nova-ceph-multistore 14x
  * nova-multi-cell 1x
  * nova-next 1x

  $ logsearch log --result FAILURE --project openstack/nova --branch master 
--file controller/logs/syslog.txt 'kernel: Out of memory: Killed process' 
--days 10
  [..snip..]
  Searching logs:
  
ece0cf2ce71c4a8790a0a36529dd0a8e:/home/gibi/.cache/logsearch/ece0cf2ce71c4a8790a0a36529dd0a8e/controller/logs/syslog.txt:3774:Jan
 14 22:57:33 np0032733292 kernel: Out of memory: Killed process 115024 (python) 
total-vm:4981004kB, anon-rss:3904068kB, file-rss:5320kB, shmem-rss:0kB, 
UID:1002 pgtables:9376kB oom_score_adj:0

  
f5aa5edd4d354c2685fc1f3e13d0ef77:/home/gibi/.cache/logsearch/f5aa5edd4d354c2685fc1f3e13d0ef77/controller/logs/syslog.txt:3688:Jan
  13 22:31:13 np0032729364 kernel: Out of memory: Killed process 114509
  (python) total-vm:4966188kB, anon-rss:3914748kB, file-rss:5080kB,
  shmem-rss:0kB, UID:1002 pgtables:9764kB oom_score_adj:0

  
1447c6274e924e068578ca260c9ac2a6:/home/gibi/.cache/logsearch/1447c6274e924e068578ca260c9ac2a6/controller/logs/syslog.txt:3824:Jan
  13 21:34:13 np0032729237 kernel: Out of memory: Killed process 114489
  (python) total-vm:4975072kB, anon-rss:3954804kB, file-rss:5312kB,
  shmem-rss:0kB, UID:1002 pgtables:9400kB oom_score_adj:0

  
446a5a73b22d432295820e5b8083a2f9:/home/gibi/.cache/logsearch/446a5a73b22d432295820e5b8083a2f9/controller/logs/syslog.txt:5103:Jan
  13 10:04:25 np0032720733 kernel: Out of memory: Killed process 48920
  (mysqld) total-vm:5233384kB, anon-rss:300872kB, file-rss:0kB, shmem-
  rss:0kB, UID:116 pgtables:2652kB oom_score_adj:0

  
fae1fbe258134dd8ba060cb743707247:/home/gibi/.cache/logsearch/fae1fbe258134dd8ba060cb743707247/controller/logs/syslog.txt:6686:Jan
  13 09:44:04 np0032720410 kernel: Out of memory: Killed process 47404
  (mysqld) total-vm:5208828kB, anon-rss:278080kB, file-rss:0kB, shmem-
  rss:0kB, UID:116 pgtables:2572kB oom_score_adj:0

  
1bbcaa703b7d42c7a266fde3a6acca65:/home/gibi/.cache/logsearch/1bbcaa703b7d42c7a266fde3a6acca65/controller/logs/syslog.txt:3717:Jan
  13 03:41:39 np0032719591 kernel: Out of memory: Killed process 114777
  (python) total-vm:4954352kB, anon-rss:4001500kB, file-rss:5124kB,
  shmem-rss:0kB, UID:1002 pgtables:9416kB oom_score_adj:0

  
7d9ca42edc5e4bdeb17be8e8045c6468:/home/gibi/.cache/logsearch/7d9ca42edc5e4bdeb17be8e8045c6468/controller/logs/syslog.txt:3828:Jan
  12 22:06:40 np0032716841 kernel: Out of memory: Killed process 114731
  (python) total-vm:4964792kB, anon-rss:4055532kB, file-rss:5072kB,
  shmem-rss:0kB, UID:1002 pgtables:9212kB oom_score_adj:0

  
bcb7bc3b478586906c31c6558b13:/home/gibi/.cache/logsearch/bcb7bc3b478586906c31c6558b13/controller/logs/syslog.txt:3769:Jan
  12 20:17:35 np0032714959 kernel: Out of memory: Killed process 114973
  (python) total-vm:4971976kB, anon-rss:3855572kB, file-rss:5356kB,
  shmem-rss:0kB, UID:1002 pgtables:9696kB oom_score_adj:0

  
7572c2bf5e6547c0a1fc6b0f180a2e1f:/home/gibi/.cache/logsearch/7572c2bf5e6547c0a1fc6b0f180a2e1f/controller/logs/syslog.txt:3805:Jan
  12 17:44:16 ubuntu-focal-ovh-gra1-0032713996 kernel: Out of memory:
  Killed process 114616 (python) total-vm:4974804kB, anon-rss:3949084kB,
  file-rss:5176kB, shmem-rss:0kB, UID:1002 pgtables:9604kB
  oom_score_adj:0

  
aa5cf699f8d04995b43d009e55a1accd:/home/gibi/.cache/logsearch/aa5cf699f8d04995b43d009e55a1accd/controller/logs/syslog.txt:3796:Jan
  12 16:23:26 ubuntu-focal-inmotion-iad3-0032713625 kernel: Out of
  memory: Killed process 114640 (python) total-vm:4964156kB, anon-
  rss:4310768kB, file-rss:5340kB, shmem-rss:0kB, UID:1002
  pgtables:9628kB oom_score_adj:0

  

[Yahoo-eng-team] [Bug 2004641] Re: ImageLocationsTest.test_replace_location fails intermittently

2023-02-07 Thread Sylvain Bauza
** Also affects: glance
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/2004641

Title:
  ImageLocationsTest.test_replace_location fails intermittently

Status in Glance:
  New
Status in OpenStack Compute (nova):
  Confirmed
Status in tempest:
  New

Bug description:
  Saw a new gate failure happening a couple of times :

  
https://opensearch.logs.openstack.org/_dashboards/app/discover?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-7d,to:now))&_a=(columns:!(filename),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:filename,negate:!f,params:(query:job-
  output.txt),type:phrase),query:(match_phrase:(filename:job-
  
output.txt,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',interval:auto,query:(language:kuery,query:test_replace_location),sort:!())

  
  Example of a failed run :
  2023-02-02 22:20:18.197006 | controller | ==
  2023-02-02 22:20:18.197030 | controller | Failed 1 tests - output below:
  2023-02-02 22:20:18.197050 | controller | ==
  2023-02-02 22:20:18.197071 | controller |
  2023-02-02 22:20:18.197095 | controller | 
tempest.api.image.v2.test_images.ImageLocationsTest.test_replace_location[id-bf6e0009-c039-4884-b498-db074caadb10]
  2023-02-02 22:20:18.197115 | controller | 
--
  2023-02-02 22:20:18.197134 | controller |
  2023-02-02 22:20:18.197152 | controller | Captured traceback:
  2023-02-02 22:20:18.197171 | controller | ~~~
  2023-02-02 22:20:18.197190 | controller | Traceback (most recent call 
last):
  2023-02-02 22:20:18.197212 | controller |
  2023-02-02 22:20:18.197234 | controller |   File 
"/opt/stack/tempest/tempest/api/image/v2/test_images.py", line 875, in 
test_replace_location
  2023-02-02 22:20:18.197254 | controller | image = 
self._check_set_multiple_locations()
  2023-02-02 22:20:18.197273 | controller |
  2023-02-02 22:20:18.197292 | controller |   File 
"/opt/stack/tempest/tempest/api/image/v2/test_images.py", line 847, in 
_check_set_multiple_locations
  2023-02-02 22:20:18.197311 | controller | image = 
self._check_set_location()
  2023-02-02 22:20:18.197329 | controller |
  2023-02-02 22:20:18.197351 | controller |   File 
"/opt/stack/tempest/tempest/api/image/v2/test_images.py", line 820, in 
_check_set_location
  2023-02-02 22:20:18.197372 | controller | 
self.client.update_image(image['id'], [
  2023-02-02 22:20:18.197391 | controller |
  2023-02-02 22:20:18.197410 | controller |   File 
"/opt/stack/tempest/tempest/lib/services/image/v2/images_client.py", line 40, 
in update_image
  2023-02-02 22:20:18.197429 | controller | resp, body = 
self.patch('images/%s' % image_id, data, headers)
  2023-02-02 22:20:18.197447 | controller |
  2023-02-02 22:20:18.197465 | controller |   File 
"/opt/stack/tempest/tempest/lib/common/rest_client.py", line 346, in patch
  2023-02-02 22:20:18.197490 | controller | return self.request('PATCH', 
url, extra_headers, headers, body)
  2023-02-02 22:20:18.197513 | controller |
  2023-02-02 22:20:18.197533 | controller |   File 
"/opt/stack/tempest/tempest/lib/common/rest_client.py", line 720, in request
  2023-02-02 22:20:18.197552 | controller | self._error_checker(resp, 
resp_body)
  2023-02-02 22:20:18.197571 | controller |
  2023-02-02 22:20:18.197590 | controller |   File 
"/opt/stack/tempest/tempest/lib/common/rest_client.py", line 831, in 
_error_checker
  2023-02-02 22:20:18.197612 | controller | raise 
exceptions.BadRequest(resp_body, resp=resp)
  2023-02-02 22:20:18.197633 | controller |
  2023-02-02 22:20:18.197655 | controller | 
tempest.lib.exceptions.BadRequest: Bad request
  2023-02-02 22:20:18.197674 | controller | Details: b'400 Bad Request\n\nThe 
Store URI was malformed.\n\n   '
  2023-02-02 22:20:18.197692 | controller |
  2023-02-02 22:20:18.197711 | controller |
  2023-02-02 22:20:18.197729 | controller | Captured pythonlogging:
  2023-02-02 22:20:18.197748 | controller | ~~~
  2023-02-02 22:20:18.197774 | controller | 2023-02-02 22:01:06,773 114933 
INFO [tempest.lib.common.rest_client] Request 
(ImageLocationsTest:test_replace_location): 201 POST 
https://10.210.193.38/image/v2/images 1.036s
  2023-02-02 22:20:18.197798 | controller | 2023-02-02 22:01:06,774 114933 
DEBUG[tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 
'application/json', 'Accept': 'application/json', 'X-Auth-Token': ''}
  2023-02-02 22:20:18.198218 | controller | Body: {"container_format": 
"bare", "disk_format": "raw"}
  2023-02-02 22:20:18.198250 | controller | Response - 

[Yahoo-eng-team] [Bug 2004641] [NEW] ImageLocationsTest.test_replace_location fails intermittently

2023-02-03 Thread Sylvain Bauza
Public bug reported:

Saw a new gate failure happening a couple of times :

https://opensearch.logs.openstack.org/_dashboards/app/discover?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-7d,to:now))&_a=(columns:!(filename),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:filename,negate:!f,params:(query:job-
output.txt),type:phrase),query:(match_phrase:(filename:job-
output.txt,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',interval:auto,query:(language:kuery,query:test_replace_location),sort:!())


Example of a failed run :
2023-02-02 22:20:18.197006 | controller | ==
2023-02-02 22:20:18.197030 | controller | Failed 1 tests - output below:
2023-02-02 22:20:18.197050 | controller | ==
2023-02-02 22:20:18.197071 | controller |
2023-02-02 22:20:18.197095 | controller | 
tempest.api.image.v2.test_images.ImageLocationsTest.test_replace_location[id-bf6e0009-c039-4884-b498-db074caadb10]
2023-02-02 22:20:18.197115 | controller | 
--
2023-02-02 22:20:18.197134 | controller |
2023-02-02 22:20:18.197152 | controller | Captured traceback:
2023-02-02 22:20:18.197171 | controller | ~~~
2023-02-02 22:20:18.197190 | controller | Traceback (most recent call last):
2023-02-02 22:20:18.197212 | controller |
2023-02-02 22:20:18.197234 | controller |   File 
"/opt/stack/tempest/tempest/api/image/v2/test_images.py", line 875, in 
test_replace_location
2023-02-02 22:20:18.197254 | controller | image = 
self._check_set_multiple_locations()
2023-02-02 22:20:18.197273 | controller |
2023-02-02 22:20:18.197292 | controller |   File 
"/opt/stack/tempest/tempest/api/image/v2/test_images.py", line 847, in 
_check_set_multiple_locations
2023-02-02 22:20:18.197311 | controller | image = self._check_set_location()
2023-02-02 22:20:18.197329 | controller |
2023-02-02 22:20:18.197351 | controller |   File 
"/opt/stack/tempest/tempest/api/image/v2/test_images.py", line 820, in 
_check_set_location
2023-02-02 22:20:18.197372 | controller | 
self.client.update_image(image['id'], [
2023-02-02 22:20:18.197391 | controller |
2023-02-02 22:20:18.197410 | controller |   File 
"/opt/stack/tempest/tempest/lib/services/image/v2/images_client.py", line 40, 
in update_image
2023-02-02 22:20:18.197429 | controller | resp, body = 
self.patch('images/%s' % image_id, data, headers)
2023-02-02 22:20:18.197447 | controller |
2023-02-02 22:20:18.197465 | controller |   File 
"/opt/stack/tempest/tempest/lib/common/rest_client.py", line 346, in patch
2023-02-02 22:20:18.197490 | controller | return self.request('PATCH', url, 
extra_headers, headers, body)
2023-02-02 22:20:18.197513 | controller |
2023-02-02 22:20:18.197533 | controller |   File 
"/opt/stack/tempest/tempest/lib/common/rest_client.py", line 720, in request
2023-02-02 22:20:18.197552 | controller | self._error_checker(resp, 
resp_body)
2023-02-02 22:20:18.197571 | controller |
2023-02-02 22:20:18.197590 | controller |   File 
"/opt/stack/tempest/tempest/lib/common/rest_client.py", line 831, in 
_error_checker
2023-02-02 22:20:18.197612 | controller | raise 
exceptions.BadRequest(resp_body, resp=resp)
2023-02-02 22:20:18.197633 | controller |
2023-02-02 22:20:18.197655 | controller | 
tempest.lib.exceptions.BadRequest: Bad request
2023-02-02 22:20:18.197674 | controller | Details: b'400 Bad Request\n\nThe 
Store URI was malformed.\n\n   '
2023-02-02 22:20:18.197692 | controller |
2023-02-02 22:20:18.197711 | controller |
2023-02-02 22:20:18.197729 | controller | Captured pythonlogging:
2023-02-02 22:20:18.197748 | controller | ~~~
2023-02-02 22:20:18.197774 | controller | 2023-02-02 22:01:06,773 114933 
INFO [tempest.lib.common.rest_client] Request 
(ImageLocationsTest:test_replace_location): 201 POST 
https://10.210.193.38/image/v2/images 1.036s
2023-02-02 22:20:18.197798 | controller | 2023-02-02 22:01:06,774 114933 DEBUG  
  [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 
'application/json', 'Accept': 'application/json', 'X-Auth-Token': ''}
2023-02-02 22:20:18.198218 | controller | Body: {"container_format": 
"bare", "disk_format": "raw"}
2023-02-02 22:20:18.198250 | controller | Response - Headers: {'date': 
'Thu, 02 Feb 2023 22:01:06 GMT', 'server': 'Apache/2.4.41 (Ubuntu)', 
'content-length': '626', 'content-type': 'application/json', 'location': 
'http://10.210.193.38:19292/v2/images/36bc7732-dfbd-4d63-871d-ff84b0be764e', 
'openstack-image-import-methods': 'glance-direct,web-download,copy-image', 
'openstack-image-store-ids': 
'cheap,robust,web,os_glance_staging_store,os_glance_tasks_store', 
'x-openstack-request-id': 'req-f0d0376e-9e9a-4e82-a528-643f1912004c', 
'connection': 

[Yahoo-eng-team] [Bug 1996188] Re: [OSSA-2023-002] Arbitrary file access through custom VMDK flat descriptor (CVE-2022-47951)

2023-01-27 Thread Sylvain Bauza
https://review.opendev.org/c/openstack/nova/+/871612 is now merged,
putting the bug report to Fix Released.

** Changed in: nova
   Importance: Undecided => Critical

** Changed in: nova
   Status: New => Confirmed

** Changed in: nova
   Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1996188

Title:
  [OSSA-2023-002] Arbitrary file access through custom VMDK flat
  descriptor (CVE-2022-47951)

Status in Cinder:
  In Progress
Status in Glance:
  Fix Released
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Security Advisory:
  Fix Released

Bug description:
  The vulnerability managers received the following report from
  Sébastien Meriot with OVH via encrypted E-mail:

  Our Openstack team did discover what looks like a security issue in Nova this 
morning allowing a remote attacker to read any file on the system.
  After making a quick CVSS calculation, we got a CVSS of 5.8 
(CVSS:3.0/AV:N/AC:H/PR:L/UI:R/S:C/C:H/I:N/A:N).

  Here is the details :
  By using a VMDK file, you can dump any file on the hypervisor.
  1. Create an image: qemu-img create -f vmdk leak.vmdk 1M -o 
subformat=monolithicFlat
  2. Edit the leak.vmdk and change the name this way: RW 2048 FLAT 
"leak-flat.vmdk" 0 --> RW 2048 FLAT "/etc/nova/nova.conf" 0
  3. Upload the image: openstack image create --file leak.vmdk leak.vmdk
  4. Start a new instance: openstack server create --image leak.vmdk --net demo 
--flavor nano leak-instance
  5. The instance won't boot of course. You can create an image from this 
instance: openstack server image create --name leak-instance-image leak-instance
  6. Download the image: openstack image save --file leak-instance-image 
leak-instance-image
  7. You get access to the nova.conf file content and you can get access to the 
openstack admin creds.

  We are working on a fix and would be happy to share it with you if needed.
  We think it does affect Nova but it could affect Glance as well. We're not 
sure yet.

  [postscript per Arnaud Morin (amorin) in IRC]

  cinder seems also affected

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/1996188/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2002951] Re: OOM kills python / mysqld in various nova devstack jobs

2023-01-18 Thread Sylvain Bauza
FWIW, I created another change that was running this test *earlier*, and
it worked :

https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_362/870924/2/check/nova-
ceph-multistore/3626391/testr_results.html

That being said, this test tooked more than 181secs so I created a new
revision for knowing how it takes for creating the cached image and how
large this cached image is using the memory :

https://review.opendev.org/c/openstack/tempest/+/870913/2/tempest/api/compute/admin/test_aaa_volume.py#90

Still waiting the results but here I think we need to modify this test
to maybe not caching this way if we can, or maybe to be run differently.


** Also affects: tempest
   Importance: Undecided
   Status: New

** Also affects: glance
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2002951

Title:
  OOM kills python / mysqld in various nova devstack jobs

Status in Glance:
  New
Status in OpenStack Compute (nova):
  Confirmed
Status in tempest:
  New

Bug description:
  The following tests exited without returning a status
  and likely segfaulted or crashed Python:

  *
  
tempest.api.compute.admin.test_volume.AttachSCSIVolumeTestJSON.test_attach_scsi_disk_with_config_drive[id-777e468f-17ca-4da4-b93d-b7dbf56c0494]

  
  And in the syslog: 
https://zuul.opendev.org/t/openstack/build/f5aa5edd4d354c2685fc1f3e13d0ef77/log/controller/logs/syslog.txt#3688

  Jan 13 22:31:13 np0032729364 kernel: Out of memory: Killed process
  114509 (python) total-vm:4966188kB, anon-rss:3914748kB, file-
  rss:5080kB, shmem-rss:0kB, UID:1002 pgtables:9764kB oom_score_adj:0

  Example run:
  https://zuul.opendev.org/t/openstack/build/f5aa5edd4d354c2685fc1f3e13d0ef77

  I see this happening in multiple jobs in the last 10 days:
  * nova-ceph-multistore 14x
  * nova-multi-cell 1x
  * nova-next 1x

  $ logsearch log --result FAILURE --project openstack/nova --branch master 
--file controller/logs/syslog.txt 'kernel: Out of memory: Killed process' 
--days 10
  [..snip..]
  Searching logs:
  
ece0cf2ce71c4a8790a0a36529dd0a8e:/home/gibi/.cache/logsearch/ece0cf2ce71c4a8790a0a36529dd0a8e/controller/logs/syslog.txt:3774:Jan
 14 22:57:33 np0032733292 kernel: Out of memory: Killed process 115024 (python) 
total-vm:4981004kB, anon-rss:3904068kB, file-rss:5320kB, shmem-rss:0kB, 
UID:1002 pgtables:9376kB oom_score_adj:0

  
f5aa5edd4d354c2685fc1f3e13d0ef77:/home/gibi/.cache/logsearch/f5aa5edd4d354c2685fc1f3e13d0ef77/controller/logs/syslog.txt:3688:Jan
  13 22:31:13 np0032729364 kernel: Out of memory: Killed process 114509
  (python) total-vm:4966188kB, anon-rss:3914748kB, file-rss:5080kB,
  shmem-rss:0kB, UID:1002 pgtables:9764kB oom_score_adj:0

  
1447c6274e924e068578ca260c9ac2a6:/home/gibi/.cache/logsearch/1447c6274e924e068578ca260c9ac2a6/controller/logs/syslog.txt:3824:Jan
  13 21:34:13 np0032729237 kernel: Out of memory: Killed process 114489
  (python) total-vm:4975072kB, anon-rss:3954804kB, file-rss:5312kB,
  shmem-rss:0kB, UID:1002 pgtables:9400kB oom_score_adj:0

  
446a5a73b22d432295820e5b8083a2f9:/home/gibi/.cache/logsearch/446a5a73b22d432295820e5b8083a2f9/controller/logs/syslog.txt:5103:Jan
  13 10:04:25 np0032720733 kernel: Out of memory: Killed process 48920
  (mysqld) total-vm:5233384kB, anon-rss:300872kB, file-rss:0kB, shmem-
  rss:0kB, UID:116 pgtables:2652kB oom_score_adj:0

  
fae1fbe258134dd8ba060cb743707247:/home/gibi/.cache/logsearch/fae1fbe258134dd8ba060cb743707247/controller/logs/syslog.txt:6686:Jan
  13 09:44:04 np0032720410 kernel: Out of memory: Killed process 47404
  (mysqld) total-vm:5208828kB, anon-rss:278080kB, file-rss:0kB, shmem-
  rss:0kB, UID:116 pgtables:2572kB oom_score_adj:0

  
1bbcaa703b7d42c7a266fde3a6acca65:/home/gibi/.cache/logsearch/1bbcaa703b7d42c7a266fde3a6acca65/controller/logs/syslog.txt:3717:Jan
  13 03:41:39 np0032719591 kernel: Out of memory: Killed process 114777
  (python) total-vm:4954352kB, anon-rss:4001500kB, file-rss:5124kB,
  shmem-rss:0kB, UID:1002 pgtables:9416kB oom_score_adj:0

  
7d9ca42edc5e4bdeb17be8e8045c6468:/home/gibi/.cache/logsearch/7d9ca42edc5e4bdeb17be8e8045c6468/controller/logs/syslog.txt:3828:Jan
  12 22:06:40 np0032716841 kernel: Out of memory: Killed process 114731
  (python) total-vm:4964792kB, anon-rss:4055532kB, file-rss:5072kB,
  shmem-rss:0kB, UID:1002 pgtables:9212kB oom_score_adj:0

  
bcb7bc3b478586906c31c6558b13:/home/gibi/.cache/logsearch/bcb7bc3b478586906c31c6558b13/controller/logs/syslog.txt:3769:Jan
  12 20:17:35 np0032714959 kernel: Out of memory: Killed process 114973
  (python) total-vm:4971976kB, anon-rss:3855572kB, file-rss:5356kB,
  shmem-rss:0kB, UID:1002 pgtables:9696kB oom_score_adj:0

  
7572c2bf5e6547c0a1fc6b0f180a2e1f:/home/gibi/.cache/logsearch/7572c2bf5e6547c0a1fc6b0f180a2e1f/controller/logs/syslog.txt:3805:Jan
  12 

[Yahoo-eng-team] [Bug 2002068] Re: Can not handle authentication request for 2 credentials

2023-01-17 Thread Sylvain Bauza
Looks like nova-compute service is unable to talk to the libvirt API.
Definitely a config issue, closing this bug.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2002068

Title:
  Can not handle authentication request for 2 credentials

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  My python environment: python 3.8.10 and use venv to run.

  When I run "nova-compute" I got the error message like that. Am I
  forget some thing?

  nova.exception.InternalError: Can not handle authentication request
  for 2 credentials

  
  >>> FULL

  2023-01-06 11:26:33.694 388587 WARNING oslo_messaging.rpc.client [None 
req-6e6d4628-a393-4d04-8958-dbfcfda36c25 - - - - - -] Using RPCClient manually 
to instantiate client. Please use get_rpc_client to obtain an RPC client 
instance.
  2023-01-06 11:26:33.695 388587 WARNING oslo_messaging.rpc.client [None 
req-6e6d4628-a393-4d04-8958-dbfcfda36c25 - - - - - -] Using RPCClient manually 
to instantiate client. Please use get_rpc_client to obtain an RPC client 
instance.
  2023-01-06 11:26:33.695 388587 WARNING oslo_messaging.rpc.client [None 
req-6e6d4628-a393-4d04-8958-dbfcfda36c25 - - - - - -] Using RPCClient manually 
to instantiate client. Please use get_rpc_client to obtain an RPC client 
instance.
  2023-01-06 11:26:33.696 388587 INFO nova.virt.driver [None 
req-6e6d4628-a393-4d04-8958-dbfcfda36c25 - - - - - -] Loading compute driver 
'libvirt.LibvirtDriver'
  2023-01-06 11:26:33.778 388587 INFO nova.compute.provider_config [None 
req-6e6d4628-a393-4d04-8958-dbfcfda36c25 - - - - - -] No provider configs found 
in /etc/nova/provider_config/. If files are present, ensure the Nova process 
has access.
  2023-01-06 11:26:33.799 388587 WARNING oslo_config.cfg [None 
req-6e6d4628-a393-4d04-8958-dbfcfda36c25 - - - - - -] Deprecated: Option 
"api_servers" from group "glance" is deprecated for removal (
  Support for image service configuration via standard keystoneauth1 Adapter
  options was added in the 17.0.0 Queens release. The api_servers option was
  retained temporarily to allow consumers time to cut over to a real load
  balancing solution.
  ).  Its value may be silently ignored in the future.
  2023-01-06 11:26:33.815 388587 INFO nova.service [-] Starting compute node 
(version 26.1.0)
  2023-01-06 11:26:33.835 388587 CRITICAL nova [-] Unhandled error: 
nova.exception.InternalError: Can not handle authentication request for 2 
credentials
  2023-01-06 11:26:33.835 388587 ERROR nova Traceback (most recent call last):
  2023-01-06 11:26:33.835 388587 ERROR nova   File 
"/opt/nova/venv/lib/python3.8/site-packages/nova/virt/libvirt/host.py", line 
338, in _connect_auth_cb
  2023-01-06 11:26:33.835 388587 ERROR nova raise exception.InternalError(
  2023-01-06 11:26:33.835 388587 ERROR nova nova.exception.InternalError: Can 
not handle authentication request for 2 credentials
  2023-01-06 11:26:33.835 388587 ERROR nova
  2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host [-] Connection to 
libvirt failed: authentication failed: Failed to collect auth credentials: 
libvirt.libvirtError: authentication failed: Failed to collect auth credentials
  2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host Traceback (most 
recent call last):
  2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host   File 
"/opt/nova/venv/lib/python3.8/site-packages/nova/virt/libvirt/host.py", line 
588, in get_connection
  2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host conn = 
self._get_connection()
  2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host   File 
"/opt/nova/venv/lib/python3.8/site-packages/nova/virt/libvirt/host.py", line 
568, in _get_connection
  2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host 
self._queue_conn_event_handler(
  2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host   File 
"/opt/nova/venv/lib/python3.8/site-packages/oslo_utils/excutils.py", line 227, 
in __exit__
  2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host 
self.force_reraise()
  2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host   File 
"/opt/nova/venv/lib/python3.8/site-packages/oslo_utils/excutils.py", line 200, 
in force_reraise
  2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host raise 
self.value
  2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host   File 
"/opt/nova/venv/lib/python3.8/site-packages/nova/virt/libvirt/host.py", line 
560, in _get_connection
  2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host 
self._wrapped_conn = self._get_new_connection()
  2023-01-06 11:26:33.840 388587 ERROR nova.virt.libvirt.host   File 
"/opt/nova/venv/lib/python3.8/site-packages/nova/virt/libvirt/host.py", line 
504, in _get_new_connection
  2023-01-06 11:26:33.840 388587 ERROR 

[Yahoo-eng-team] [Bug 1996214] [NEW] rfe make evacuate only defining the instance and not start it

2022-11-10 Thread Sylvain Bauza
Public bug reported:

We agreed during the Nova 2023.1 Antelope PTG to modify the behaviour of
evacuate which would *not* start the instance. This requires an API
microversion.

** Affects: nova
 Importance: Wishlist
 Status: Triaged


** Tags: low-hanging-fruit rfe

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1996214

Title:
  rfe make evacuate only defining the instance and not start it

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  We agreed during the Nova 2023.1 Antelope PTG to modify the behaviour
  of evacuate which would *not* start the instance. This requires an API
  microversion.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1996214/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1996213] [NEW] [rfe] modify our usage of privsep in nova

2022-11-10 Thread Sylvain Bauza
Public bug reported:

Nova compute services use the privsep library [1] for specific 'root'
privilege usage for a command or a direct call to the system.

Unfortunately, our current usage we do from this library is not really a
good recommendation : instead of using a sysadmin context that uses
*all* privileged caps for any caller we have [2], we should rather
define a per-call context with specific caps.

[1] https://docs.openstack.org/oslo.privsep/latest/user/index.html
[2] 
https://github.com/openstack/nova/blob/c97507dfcd57cce9d76670d3b0d48538900c00e9/nova/privsep/__init__.py#L21-L31

** Affects: nova
 Importance: Wishlist
 Status: Triaged


** Tags: low-hanging-fruit rfe

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1996213

Title:
  [rfe] modify our usage of privsep in nova

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  Nova compute services use the privsep library [1] for specific 'root'
  privilege usage for a command or a direct call to the system.

  Unfortunately, our current usage we do from this library is not really
  a good recommendation : instead of using a sysadmin context that uses
  *all* privileged caps for any caller we have [2], we should rather
  define a per-call context with specific caps.

  [1] https://docs.openstack.org/oslo.privsep/latest/user/index.html
  [2] 
https://github.com/openstack/nova/blob/c97507dfcd57cce9d76670d3b0d48538900c00e9/nova/privsep/__init__.py#L21-L31

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1996213/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1996210] [NEW] RFE use openstack sdk to interact with cinder

2022-11-10 Thread Sylvain Bauza
Public bug reported:

This is a tracking rfe bug to enable the use of the OpenStack SDK when
calling cinder.

This is in aid of allowing cinder to deprecate and eventually remove
cinder client in a future release by removing nova dependency on it.

** Affects: nova
 Importance: Wishlist
 Status: Triaged


** Tags: low-hanging-fruit rfe

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1996210

Title:
   RFE use openstack sdk to interact with cinder

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  This is a tracking rfe bug to enable the use of the OpenStack SDK when
  calling cinder.

  This is in aid of allowing cinder to deprecate and eventually remove
  cinder client in a future release by removing nova dependency on it.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1996210/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1990809] Re: multinode setup, devstack scheduler fails to start after controller restart

2022-09-27 Thread Sylvain Bauza
Looks to me not a Nova issue, maybe just a devstack issue or a
configuration problem. Moving it then to devstack.

** Also affects: devstack
   Importance: Undecided
   Status: New

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1990809

Title:
  multinode setup, devstack scheduler fails to start after controller
  restart

Status in devstack:
  New
Status in OpenStack Compute (nova):
  Invalid

Bug description:
  In multinode devstack setup nova scheduler fails to start after reboot


  Steps to reproduce
  ==

  1 - deploy multinode devstack
  https://docs.openstack.org/devstack/latest/guides/multinode-lab.html

  2 - Verify all compute nodes are listed and setup is working as expected
  $ openstack compute service list
  
  create vm, assign floating IP and access VM

  3 - Restart compute nodes, and controller node
  $ sudo init 6

  4 - Once controller and all other nodes are rebooted, check whether all nova 
services are running
  $ openstack compute service list

  $ sudo systemctl status devstack@n-*


  Expected result
  ===
  $ sudo systemctl status devstack@n-*

  All services should be running

  
  $ openstack compute service list

  openstack cmds should run without a issue,


  Actual result
  =
  nova-schduler fails to start with error:
  
  Sep 26 04:59:14 multinodesetupcontroller nova-scheduler[926]: ERROR nova 
self._init_plugins(extensions)
  Sep 26 04:59:14 multinodesetupcontroller nova-scheduler[926]: ERROR nova   
File "/usr/local/lib/python3.8/dist-packages/stevedore/driver.py", line 113, in 
_init_plugins
  Sep 26 04:59:14 multinodesetupcontroller nova-scheduler[926]: ERROR nova 
raise NoMatches('No %r driver found, looking for %r' %
  Sep 26 04:59:14 multinodesetupcontroller nova-scheduler[926]: ERROR nova 
stevedore.exception.NoMatches: No 'nova.scheduler.driver' driver found, looking 
for 'filter_scheduler'
  Sep 26 04:59:14 multinodesetupcontroller nova-scheduler[926]: ERROR nova 
  Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: INFO 
oslo_service.periodic_task [-] Skipping periodic task _discover_hosts_in_cells 
because its interval is negative
  Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: WARNING 
stevedore.named [-] Could not load filter_scheduler
  Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: CRITICAL nova 
[-] Unhandled error: stevedore.exception.NoMatches: No 'nova.scheduler.driver' 
driver found, looking for 'filter_scheduler'
  Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova 
Traceback (most recent call last):
  Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova   
File "/usr/local/bin/nova-scheduler", line 10, in 
  Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova
 sys.exit(main())
  Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova   
File "/opt/stack/nova/nova/cmd/scheduler.py", line 47, in main
  Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova
 server = service.Service.create(binary='nova-scheduler',
  Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova   
File "/opt/stack/nova/nova/service.py", line 252, in create
  Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova
 service_obj = cls(host, binary, topic, manager,
  Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova   
File "/opt/stack/nova/nova/service.py", line 116, in __init__
  Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova
 self.manager = manager_class(host=self.host, *args, **kwargs)
  Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova   
File "/opt/stack/nova/nova/scheduler/manager.py", line 60, in __init__
  Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova
 self.driver = driver.DriverManager(
  Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova   
File "/usr/local/lib/python3.8/dist-packages/stevedore/driver.py", line 54, in 
__init__
  Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova
 super(DriverManager, self).__init__(
  Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova   
File "/usr/local/lib/python3.8/dist-packages/stevedore/named.py", line 89, in 
__init__
  Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova
 self._init_plugins(extensions)
  Sep 26 05:09:16 multinodesetupcontroller nova-scheduler[11226]: ERROR nova   
File "/usr/local/lib/python3.8/dist-packages/stevedore/driver.py", line 113, in 
_init_plugins
  Sep 26 05:09:16 

[Yahoo-eng-team] [Bug 1990121] Re: Nova 26 needs to depend on os-traits >= 2.9.0

2022-09-19 Thread Sylvain Bauza
Master patch : https://review.opendev.org/c/openstack/nova/+/858236

** Also affects: nova/zed
   Importance: Critical
   Status: In Progress

** Changed in: nova/zed
   Importance: Critical => High

** Changed in: nova/zed
 Assignee: (unassigned) => Thomas Goirand (thomas-goirand)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1990121

Title:
  Nova 26 needs to depend on os-traits >= 2.9.0

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) zed series:
  In Progress

Bug description:
  Without the latest os-traits, we get unit test failures like below.

  ==
  FAIL: 
nova.tests.unit.compute.test_pci_placement_translator.TestTranslator.test_trait_normalization_09
  
nova.tests.unit.compute.test_pci_placement_translator.TestTranslator.test_trait_normalization_09
  --
  testtools.testresult.real._StringException: pythonlogging:'': {{{
  2022-09-17 10:46:54,848 WARNING [oslo_policy.policy] JSON formatted 
policy_file support is deprecated since Victoria release. You need to use YAML 
format which will be default in future. You can use 
``oslopolicy-convert-json-to-yaml`` tool to convert existing JSON-formatted 
policy file to YAML-formatted in backward compatible way: 
https://docs.openstack.org/oslo.policy/latest/cli/oslopolicy-convert-json-to-yaml.html.
  2022-09-17 10:46:54,849 WARNING [oslo_policy.policy] JSON formatted 
policy_file support is deprecated since Victoria release. You need to use YAML 
format which will be default in future. You can use 
``oslopolicy-convert-json-to-yaml`` tool to convert existing JSON-formatted 
policy file to YAML-formatted in backward compatible way: 
https://docs.openstack.org/oslo.policy/latest/cli/oslopolicy-convert-json-to-yaml.html.
  2022-09-17 10:46:54,851 WARNING [oslo_policy.policy] Policy Rules 
['os_compute_api:extensions', 'os_compute_api:os-floating-ip-pools', 
'os_compute_api:os-quota-sets:defaults', 
'os_compute_api:os-availability-zone:list', 'os_compute_api:limits', 
'project_member_api', 'project_reader_api', 'project_member_or_admin', 
'project_reader_or_admin', 'os_compute_api:limits:other_project', 
'os_compute_api:os-lock-server:unlock:unlock_override', 
'os_compute_api:servers:create:zero_disk_flavor', 
'compute:servers:resize:cross_cell', 
'os_compute_api:os-shelve:unshelve_to_host'] specified in policy files are the 
same as the defaults provided by the service. You can remove these rules from 
policy files which will make maintenance easier. You can detect these redundant 
rules by ``oslopolicy-list-redundant`` tool also.
  }}}

  Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/ddt.py", line 191, in wrapper
  return func(self, *args, **kwargs)
File 
"/<>/nova/tests/unit/compute/test_pci_placement_translator.py", 
line 92, in test_trait_normalization
  ppt._get_traits_for_dev({"traits": trait_names})
File "/<>/nova/compute/pci_placement_translator.py", line 78, 
in _get_traits_for_dev
  os_traits.COMPUTE_MANAGED_PCI_DEVICE
  AttributeError: module 'os_traits' has no attribute 
'COMPUTE_MANAGED_PCI_DEVICE'

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1990121/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1896617] Re: [SRU] Creation of image (or live snapshot) from the existing VM fails if libvirt-image-backend is configured to qcow2 starting from Ussuri

2022-09-13 Thread Sylvain Bauza
Putting the bug to Opinion/Wishlist as this sounds half a Nova problem
(since we set the chmod) and half a distro-specific configuration.

I'm not against any modification but we need to correctly address this
gap as a blueprint ideally.

** Changed in: nova
   Status: Triaged => Opinion

** Changed in: nova
   Importance: Undecided => Wishlist

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1896617

Title:
  [SRU] Creation of image (or live snapshot) from the existing VM fails
  if libvirt-image-backend is configured to qcow2 starting from Ussuri

Status in OpenStack Nova Compute Charm:
  Invalid
Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive ussuri series:
  Fix Released
Status in Ubuntu Cloud Archive victoria series:
  Fix Released
Status in OpenStack Compute (nova):
  Opinion
Status in nova package in Ubuntu:
  Fix Released
Status in nova source package in Focal:
  Fix Released
Status in nova source package in Groovy:
  Fix Released

Bug description:
  [Impact]

  tl;dr

  1) creating the image from the existing VM fails if qcow2 image backend is 
used, but everything is fine if using rbd image backend in nova-compute.
  2) openstack server image create --name   fails with some unrelated error:

  $ openstack server image create --wait 842fa12c-19ee-44cb-bb31-36d27ec9d8fc
  HTTP 404 Not Found: No image found with ID 
f4693860-cd8d-4088-91b9-56b2f173ffc7

  == Details ==

  Two Tempest tests ([1] and [2]) from the 2018.02 Refstack test lists
  [0] are failing with the following exception:

  49701867-bedc-4d7d-aa71-7383d877d90c
  Traceback (most recent call last):
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/base.py",
 line 369, in create_image_from_server
  waiters.wait_for_image_status(client, image_id, wait_until)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/common/waiters.py",
 line 161, in wait_for_image_status
  image = show_image(image_id)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/services/compute/images_client.py",
 line 74, in show_image
  resp, body = self.get("images/%s" % image_id)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py",
 line 298, in get
  return self.request('GET', url, extra_headers, headers)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/services/compute/base_compute_client.py",
 line 48, in request
  method, url, extra_headers, headers, body, chunked)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py",
 line 687, in request
  self._error_checker(resp, resp_body)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py",
 line 793, in _error_checker
  raise exceptions.NotFound(resp_body, resp=resp)
  tempest.lib.exceptions.NotFound: Object not found
  Details: {'code': 404, 'message': 'Image not found.'}

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/images/test_images_oneserver.py",
 line 69, in test_create_delete_image
  wait_until='ACTIVE')
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/base.py",
 line 384, in create_image_from_server
  image_id=image_id)
  tempest.exceptions.SnapshotNotFoundException: Server snapshot image 
d82e95b0-9c62-492d-a08c-5bb118d3bf56 not found.

  So far I was able to identify the following:

  1) 
https://github.com/openstack/tempest/blob/master/tempest/api/compute/images/test_images_oneserver.py#L69
 invokes a "create image from server"
  2) It fails with the following error message in the nova-compute logs: 
https://pastebin.canonical.com/p/h6ZXdqjRRm/

  The same occurs if the "openstack server image create --wait" will be
  executed; however, according to
  https://docs.openstack.org/nova/ussuri/admin/migrate-instance-with-
  snapshot.html the VM has to be shut down before the image creation:

  "Shut down the source VM before you take the snapshot to ensure that
  all data is flushed to disk. If necessary, list the instances to view
  the instance name. Use the openstack server stop command to shut down
  the instance:"

  This step is definitely being skipped by the test (e.g it's trying 

[Yahoo-eng-team] [Bug 1988311] Re: Concurrent evacuation of vms with pinned cpus to the same host fail randomly

2022-09-13 Thread Sylvain Bauza
Setting to High as we need to bump our requirements on master to prevent
older releases of oslo.concurrency.

Also, need to backport the patch into stable releases of
oslo.concurrency for Yoga.

** Also affects: nova/yoga
   Importance: Undecided
   Status: New

** Changed in: nova/yoga
   Status: New => Confirmed

** Changed in: nova/yoga
   Importance: Undecided => High

** Changed in: nova
   Importance: Critical => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1988311

Title:
  Concurrent evacuation of vms with pinned cpus to the same host fail
  randomly

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) yoga series:
  Confirmed
Status in oslo.concurrency:
  Fix Released

Bug description:
  Reproduction:

  Boot two vms (each with one pinned cpu) on devstack0.
  Then evacuate them to devtack0a.
  devstack0a has two dedicated cpus, so both vms should fit.
  However sometimes (for example 6 out of 10 times) the evacuation of one vm 
fails with this error message: 'CPU set to pin [0] must be a subset of free CPU 
set [1]'.

  devstack0 - all-in-one host
  devstack0a - compute-only host

  # have two dedicated cpus for pinning on the evacuation target host
  devstack0a:/etc/nova/nova-cpu.conf:
  [compute]
  cpu_dedicated_set = 0,1

  # the dedicated cpus are properly tracked in placement
  $ openstack resource provider list
  
+--+++--+--+
  | uuid | name   | generation | 
root_provider_uuid   | parent_provider_uuid |
  
+--+++--+--+
  | a0574d87-42ee-4e13-b05a-639dc62c1196 | devstack0a |  2 | 
a0574d87-42ee-4e13-b05a-639dc62c1196 | None |
  | 2e6fac42-d6e3-4366-a864-d5eb2bdc2241 | devstack0  |  2 | 
2e6fac42-d6e3-4366-a864-d5eb2bdc2241 | None |
  
+--+++--+--+
  $ openstack resource provider inventory list 
a0574d87-42ee-4e13-b05a-639dc62c1196
  
++--+--+--+--+---+---+--+
  | resource_class | allocation_ratio | min_unit | max_unit | reserved | 
step_size | total | used |
  
++--+--+--+--+---+---+--+
  | MEMORY_MB  |  1.5 |1 | 3923 |  512 |
 1 |  3923 |0 |
  | DISK_GB|  1.0 |1 |   28 |0 |
 1 |28 |0 |
  | PCPU   |  1.0 |1 |2 |0 |
 1 | 2 |0 |
  
++--+--+--+--+---+---+--+

  # use vms with one pinned cpu
  openstack flavor create cirros256-pinned --public --ram 256 --disk 1 --vcpus 
1 --property hw_rng:allowed=True --property hw:cpu_policy=dedicated

  # boot two vms (each with one pinned cpu) on devstack0
  n=2 ; for i in $( seq $n ) ; do openstack server create --flavor 
cirros256-pinned --image cirros-0.5.2-x86_64-disk --nic net-id=private 
--availability-zone :devstack0 --wait vm$i ; done

  # kill n-cpu on devstack0
  devstack0 $ sudo systemctl stop devstack@n-cpu
  # and force it down, so we can start evacuating
  openstack compute service set devstack0 nova-compute --down

  # evacuate both vms to devstack0a concurrently
  for vm in $( openstack server list --host devstack0 -f value -c ID ) ; do 
openstack --os-compute-api-version 2.29 server evacuate --host devstack0a $vm & 
done

  # follow up on how the evacuation is going, check if the bug occured, see 
details a bit below
  for i in $( seq $n ) ; do openstack server show vm$i -f value -c 
OS-EXT-SRV-ATTR:host -c status ; done

  # clean up
  devstack0 $ sudo systemctl start devstack@n-cpu
  openstack compute service set devstack0 nova-compute --up
  for i in $( seq $n ) ; do openstack server delete vm$i --wait ; done

  This bug is not deterministic. For example out of 10 tries (like
  above) I have seen 4 successes - when both vms successfully evacuated
  to (went to ACTIVE on) devstack0a.

  But in the other 6 cases only one vm evacuated successfully. The other
  vm went to ERROR state, with the error message: "CPU set to pin [0]
  must be a subset of free CPU set [1]". For example:

  $ openstack server show vm2
  ...
  | fault   | {'code': 400, 'created': 
'2022-08-24T13:50:33Z', 'message': 'CPU set to pin [0] must be a subset of free 
CPU set [1]'} |
  ...

  In n-cpu logs we see the following:

  aug 24 

[Yahoo-eng-team] [Bug 1981631] Re: Nova fails to reuse mdev vgpu devices

2022-09-12 Thread Sylvain Bauza
OK, I maybe mistriaged this bug report, as this is specific to the
Ampere architecture with SR-IOV support, so nevermind comment #2.

FWIW, this hardware support is very special as you indeed need to enable VFs, 
as described in nvidia docs : 
https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#creating-sriov-vgpu-device-red-hat-el-kvm

Indeed, 32 VFs would be configured *but* if you specify
enabled_vgpu_types to the right nvidia-471 type for the PCI address,
then the VGPU inventory for this PCI device will have a total of 4, not
32 as I tested earlier.

Anyway, this whole Ampere support is very fragile upstream as this is
not fully supported upstream, so I'm about to set this bug to Opinion,
as Ampere GPUs won't be able to be tested upstream.

Please do further testing to identify whether something is missing with
current vGPU support we have in Nova but for the moment and which would
break Ampere support, but please understand that upstream support is
absolutely hardware-independant and has to not be nvidia-specific.

** Tags added: vgpu

** Changed in: nova
   Status: Confirmed => Opinion

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1981631

Title:
  Nova fails to reuse mdev vgpu devices

Status in OpenStack Compute (nova):
  Opinion

Bug description:
  Description:
  
  Hello we are experiencing a weird issue where Nova creates the mdev devices 
from virtual functions when none are created but then will not reuse them once 
they are all created and vgpu instances are removed.

  
  I believe part of this issue was the uuid issue from this bug:
  https://bugzilla.redhat.com/show_bug.cgi?id=1701281

  Manually applying the latest patch partially fixed the issue
  (placement stopped reporting no hosts available), now the error is on
  the hypervisor side saying 'no vgpu resources available'.

  If I manually remove the mdev device by with commands like the following:
  echo "1" > /sys/bus/mdev/devices/150c155c-da0b-45a6-8bc1-a8016231b100/remove

  then Im able to spin up an instance again.

  all mdev devices match in mdevctl list and virsh nodedev-list

  Steps to reproduce:
  
  1) freshly setup hypervisor with no mdev devices created yet
  2) spin up vgpu instances until all mdevs are created that will fit on 
physical gpu(s)
  3) delete vgpu instances
  4) try and spin up new vgpu instances

  Expected Result:
  =
  Instance spin up and use reuse the mdev vgpu devices

  Actual Result:
  =
  Build error from Nova API:
  Error: Failed to perform requested operation on instance "colby_gpu_test23", 
the instance has an error status: Please try again later [Error: Exceeded 
maximum number of retries. Exhausted all hosts available for retrying build 
failures for instance c18565f9-da37-42e9-97b9-fa33da5f1ad0.].

  Error in hypervisor logs:
  nova.exception.ComputeResourcesUnavailable: Insufficient compute resources: 
vGPU resource is not available

  mdevctl output:
  cdc98056-8597-4531-9e55-90ab44a71b4e :21:00.7 nvidia-563 manual
  298f1e4b-784d-42a9-b3e5-bdedd0eeb8e1 :21:01.2 nvidia-563 manual
  2abee89e-8cb4-4727-ac2f-62888daab7b4 :21:02.4 nvidia-563 manual
  32445186-57ca-43f4-b599-65a455fffe65 :21:04.2 nvidia-563 manual
  0c4f5d07-2893-49a1-990e-4c74c827083b :81:00.7 nvidia-563 manual
  75d1b78c-b097-42a9-b736-4a8518b02a3d :81:01.2 nvidia-563 manual
  a54d33e0-9ddc-49bb-8908-b587c72616a9 :81:02.5 nvidia-563 manual
  cd7a49a8-9306-41bb-b44e-00374b1e623a :81:03.4 nvidia-563 manual

  virsh nodedev-list -cap mdev:
  mdev_0c4f5d07_2893_49a1_990e_4c74c827083b__81_00_7
  mdev_298f1e4b_784d_42a9_b3e5_bdedd0eeb8e1__21_01_2
  mdev_2abee89e_8cb4_4727_ac2f_62888daab7b4__21_02_4
  mdev_32445186_57ca_43f4_b599_65a455fffe65__21_04_2
  mdev_75d1b78c_b097_42a9_b736_4a8518b02a3d__81_01_2
  mdev_a54d33e0_9ddc_49bb_8908_b587c72616a9__81_02_5
  mdev_cd7a49a8_9306_41bb_b44e_00374b1e623a__81_03_4
  mdev_cdc98056_8597_4531_9e55_90ab44a71b4e__21_00_7

  nvidia-smi vgpu output:
  Wed Jul 13 20:15:16 2022   
  
+-+
  | NVIDIA-SMI 510.73.06  Driver Version: 510.73.06 
|
  
|-+--++
  | GPU  Name   | Bus-Id   | GPU-Util   
|
  |  vGPU ID Name   | VM ID VM Name| vGPU-Util  
|
  
|=+==+|
  |   0  NVIDIA A40 | :21:00.0 |   0%   
|
  |  3251635106  NVIDIA A40-12Q | 2786...  instance-00014520   |  0%
|
  |  3251635117  

[Yahoo-eng-team] [Bug 1940425] Re: test_live_migration_with_trunk tempest test fails due to port remains in down state

2022-07-26 Thread Sylvain Bauza
As we have proof of the issue being due to os-vif 3.0.0 release,
changing the Nova status to Invalid.

** Also affects: os-vif
   Importance: Undecided
   Status: New

** Changed in: nova
   Status: Confirmed => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1940425

Title:
  test_live_migration_with_trunk tempest test fails due to port remains
  in down state

Status in neutron:
  Confirmed
Status in OpenStack Compute (nova):
  Invalid
Status in os-vif:
  New

Bug description:
  Example failure is in [1]:

  2021-08-18 10:40:52,334 124842 DEBUG[tempest.lib.common.utils.test_utils] 
Call _is_port_status_active returns false in 60.00 seconds
  }}}

  Traceback (most recent call last):
File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 89, in 
wrapper
  return func(*func_args, **func_kwargs)
File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 70, in 
wrapper
  return f(*func_args, **func_kwargs)
File "/opt/stack/tempest/tempest/api/compute/admin/test_live_migration.py", 
line 281, in test_live_migration_with_trunk
  self.assertTrue(
File "/usr/lib/python3.8/unittest/case.py", line 765, in assertTrue
  raise self.failureException(msg)
  AssertionError: False is not true

  Please note that a similar bug was reported and fixed previously
  https://bugs.launchpad.net/tempest/+bug/1924258 It seems that fix did
  not fully solved the issue.

  It is not super frequent I saw 4 occasions in the last 30 days [2].

  [1] 
https://zuul.opendev.org/t/openstack/build/fdbda223dc10456db58f922b6435f680/logs
  [2] https://paste.opendev.org/show/808166/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1940425/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1968054] Re: oslo.messaging._drivers.impl_rabbit Connection failed: timed out

2022-04-19 Thread Sylvain Bauza
Unfortunately, this doesn't look a Nova issue : this is either an 
oslo.messaging bug or rather a configuration issue.
Closing this bug for Nova.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1968054

Title:
  oslo.messaging._drivers.impl_rabbit Connection failed: timed out

Status in OpenStack Compute (nova):
  Invalid
Status in oslo.messaging:
  New

Bug description:
  I am running Wallaby Release on Ubuntu 20.04 (Openstack-Ansible
  deployment tool)

  oslo.messaging=12.7.1
  nova=23.1.1

  since i upgrade to Wallaby i have started noticed following error
  message very frequently in nova-compute and solution is to restart
  nova-compute agent.

  Here is the full logs:
  https://paste.opendev.org/show/bft9znewTxyXHkvIcQO0/

  
  01 19:43:36 compute1.example.net nova-compute[1546242]: AssertionError:
  Apr 01 19:45:35 compute1.example.net nova-compute[34090]: 2022-04-01 
19:45:35.059 34090 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable 
connection/channel error occurred, trying to reconnect: [Errno 110] Connection 
timed out
  Apr 01 19:45:40 compute1.example.net nova-compute[34090]: 2022-04-01 
19:45:40.063 34090 ERROR oslo.messaging._drivers.impl_rabbit 
[req-707abbfe-8ee0-4af7-900a-e43dc5dec597 - - - - -] 
[7d350e59-001f-4203-bd41-369650cd5c5c] AMQP server on 172.28.17.24:5671 is 
unreachable: . Trying again in 1 seconds.: socket.timeout
  Apr 01 19:45:40 compute1.example.net nova-compute[34090]: 2022-04-01 
19:45:40.079 34090 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection 
failed: timed out (retrying in 0 seconds): socket.timeout: timed out
  Apr 01 19:45:41 compute1.example.net nova-compute[34090]: 2022-04-01 
19:45:41.983 34090 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection 
failed: [Errno 113] EHOSTUNREACH (retrying in 0 seconds): OSError: [Errno 113] 
EHOSTUNREACH
  Apr 01 19:45:42 compute1.example.net nova-compute[34090]: 2022-04-01 
19:45:42.367 34090 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection 
failed: [Errno 113] EHOSTUNREACH (retrying in 2.0 seconds): OSError: [Errno 
113] EHOSTUNREACH
  Apr 01 19:45:42 compute1.example.net nova-compute[34090]: Traceback (most 
recent call last):
  Apr 01 19:45:42 compute1.example.net nova-compute[34090]:   File 
"/openstack/venvs/nova-23.1.1/lib/python3.8/site-packages/eventlet/hubs/hub.py",
 line 476, in fire_timers
  Apr 01 19:45:42 compute1.example.net nova-compute[34090]: timer()
  Apr 01 19:45:42 compute1.example.net nova-compute[34090]:   File 
"/openstack/venvs/nova-23.1.1/lib/python3.8/site-packages/eventlet/hubs/timer.py",
 line 59, in __call__
  Apr 01 19:45:42 compute1.example.net nova-compute[34090]: cb(*args, **kw)
  Apr 01 19:45:42 compute1.example.net nova-compute[34090]:   File 
"/openstack/venvs/nova-23.1.1/lib/python3.8/site-packages/eventlet/semaphore.py",
 line 152, in _do_acquire
  Apr 01 19:45:42 compute1.example.net nova-compute[34090]: waiter.switch()
  Apr 01 19:45:42 compute1.example.net nova-compute[34090]: greenlet.error: 
cannot switch to a different thread
  Apr 01 19:45:49 compute1.example.net nova-compute[34090]: 2022-04-01 
19:45:49.388 34090 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection 
failed: timed out (retrying in 0 seconds): socket.timeout: timed out
  Apr 01 19:45:50 compute1.example.net nova-compute[34090]: 2022-04-01 
19:45:50.303 34090 ERROR oslo.messaging._drivers.impl_rabbit [-] 
[08af61ee-e653-44b0-82bb-155a2a8b7ef3] AMQP server on 172.28.17.24:5671 is 
unreachable: [Errno 113] No route to host. Trying again in 1 seconds.: OSError: 
[Errno 113] No route to host
  Apr 01 19:45:51 compute1.example.net nova-compute[34090]: 2022-04-01 
19:45:51.199 34090 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection 
failed: [Errno 113] EHOSTUNREACH (retrying in 0 seconds): OSError: [Errno 113] 
EHOSTUNREACH
  Apr 01 19:45:51 compute1.example.net nova-compute[34090]: 2022-04-01 
19:45:51.583 34090 ERROR oslo.messaging._drivers.impl_rabbit [-] 
[08af61ee-e653-44b0-82bb-155a2a8b7ef3] AMQP server on 172.28.17.24:5671 is 
unreachable: [Errno 113] EHOSTUNREACH. Trying again in 1 seconds.: OSError: 
[Errno 113] EHOSTUNREACH
  Apr 01 19:45:51 compute1.example.net nova-compute[34090]: Traceback (most 
recent call last):
  Apr 01 19:45:51 compute1.example.net nova-compute[34090]:   File 
"/openstack/venvs/nova-23.1.1/lib/python3.8/site-packages/eventlet/hubs/hub.py",
 line 476, in fire_timers
  Apr 01 19:45:51 compute1.example.net nova-compute[34090]: timer()
  Apr 01 19:45:51 compute1.example.net nova-compute[34090]:   File 
"/openstack/venvs/nova-23.1.1/lib/python3.8/site-packages/eventlet/hubs/timer.py",
 line 59, in __call__
  Apr 01 19:45:51 compute1.example.net nova-compute[34090]: cb(*args, **kw)
  Apr 01 19:45:51 compute1.example.net nova-compute[34090]:   File 

[Yahoo-eng-team] [Bug 1968555] Re: evacuate after network issue will cause vm running on two host

2022-04-19 Thread Sylvain Bauza
If you see some compute flapping due to some network issue, you can force it to 
be down : 
https://docs.openstack.org/api-ref/compute/?expanded=update-forced-down-detail#update-forced-down

Once the compute is down (because either it's forced down or by the
service group API), indeed you can evacuate the instance and then you
would have two different instances, once for the original one, and the
other one for the new host.

That said, given the original host is down, you should restart the compute 
service then once it's back up, right? If so, we then verify the evacuated 
instances and we delete them :
https://github.com/openstack/nova/blob/a1f006d799d2294234d381395a9ae9c22a2d80b9/nova/compute/manager.py#L1531


** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1968555

Title:
  evacuate after network issue will cause vm running on two host

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Environment
  ===
  openstack queen + libvirt 4.5.0 + qemu 2.12 running on centos7, with ceph rbd 
storage

  Description
  ===
  If the management network of the compute host is abnormal, it may cause 
nova-compute down but the openstack-nova-compute.service is still running on 
that host. Now you evacuate a vm on that host, the evacuate will succeed, the 
vm will be running both on the old host and the new host even after the 
management network of old host recover, it may cause vm error.   

  Steps to reproduce
  ==
  1. Manually turn down the management network port of the compute host, like 
ifconfig eth0 down
  2. After the nova-compute of that host see down with openstack compute 
service list, evacuate one vm on that host:
  nova evacuate 
  3. After evacuate succeed, you can find the vm running on two host.
  4. Manually turn up the management network port of the old compute host, like 
ifconfig eth0 up, you can find the vm still running on this host, it can't be 
auto destroy unless you restart the openstack-nova-compute.service on that host.

  Expected result
  ===
  Maybe we can add a periodic task to auto destroy this vm?

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1968555/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1969054] Re: when enabled enforce_new_defaults, create server failed

2022-04-19 Thread Sylvain Bauza
Marking the bug as WONTFIX as we fixed the root cause in the Yoga
release.

** Changed in: nova
   Status: New => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1969054

Title:
  when enabled enforce_new_defaults,create server failed

Status in OpenStack Compute (nova):
  Won't Fix

Bug description:
  Description
  ===
  When enabled enforce_new_defaults in nova.conf. system scope admin failed to 
create a server.It occure an error in neutron log(controller node).

  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova Traceback (most 
recent call last):
  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova   File 
"/usr/lib/python3.6/site-packages/neutron/notifiers/nova.py", line 266, in 
send_events
  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova 
batched_events)
  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova   File 
"/usr/lib/python3.6/site-packages/novaclient/v2/server_external_events.py", 
line 39, in create
  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova 
return_raw=True)
  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova   File 
"/usr/lib/python3.6/site-packages/novaclient/base.py", line 363, in _create
  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova resp, body = 
self.api.client.post(url, body=body)
  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova   File 
"/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py", line 401, in post
  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova return 
self.request(url, 'POST', **kwargs)
  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova   File 
"/usr/lib/python3.6/site-packages/novaclient/client.py", line 78, in request
  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova raise 
exceptions.from_response(resp, body, url, method)
  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova 
novaclient.exceptions.Forbidden: Policy doesn't allow 
os_compute_api:os-server-external-events:create to be performed. (HTTP 403) 
(Request-ID: req-928afad8-32b9-420
  8-8e5e-e2bc9061a56a) 

  Steps to reproduce
  ==
  1、enabled enforce_new_defaults in nova.conf and restart nova
  2、empty policy.yaml  >/etc/nova/policy.yaml
  3、use admin(system scope) to create a server 
  4、create server failed
  5、disabled enforce_new_defaults ,admin could create server successfully.

  Expected result
  ===
  admin user create the server successfully.

  Actual result
  =
  The status of server stuck in "BUILD" ,after 5 mimutes,it become "error".

  It occure an error in neutron log(controller node).

  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova Traceback (most 
recent call last):
  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova   File 
"/usr/lib/python3.6/site-packages/neutron/notifiers/nova.py", line 266, in 
send_events
  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova 
batched_events)
  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova   File 
"/usr/lib/python3.6/site-packages/novaclient/v2/server_external_events.py", 
line 39, in create
  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova 
return_raw=True)
  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova   File 
"/usr/lib/python3.6/site-packages/novaclient/base.py", line 363, in _create
  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova resp, body = 
self.api.client.post(url, body=body)
  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova   File 
"/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py", line 401, in post
  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova return 
self.request(url, 'POST', **kwargs)
  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova   File 
"/usr/lib/python3.6/site-packages/novaclient/client.py", line 78, in request
  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova raise 
exceptions.from_response(resp, body, url, method)
  2022-04-14 09:36:29.743 876530 ERROR neutron.notifiers.nova 
novaclient.exceptions.Forbidden: Policy doesn't allow 
os_compute_api:os-server-external-events:create to be performed. (HTTP 403) 
(Request-ID: req-928afad8-32b9-420
  8-8e5e-e2bc9061a56a) 

  Environment
  ===
  OS release centos8.2
  openstack victoria
  nova 22.2.2
  neutron 17.2
  keystone 18.0

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1969054/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1965441] Re: Error: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible

2022-03-22 Thread Sylvain Bauza
You're asking to shrew the disk from 160GB to 100GB and that's something
we don't support.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1965441

Title:
  Error: Unexpected API Error. Please report this at
  http://bugs.launchpad.net/nova/ and attach the Nova API log if
  possible

Status in OpenStack Compute (nova):
  Invalid

Bug description:
   (HTTP 500)
  (Request-ID: req-819ea4bc-7115-4645-a640-7e3a9bb9595a)

  Resize from m1.xlarge to m1.medium-xdisk on VIO

  Flavor Name
  m1.xlarge
  Flavor ID
  5
  RAM
  16GB
  VCPUs
  8 VCPU
  Disk
  160GB

  Flavor Details
  Name  
  m1.medium-xdisk
  VCPUs 2
  Root Disk 100 GB
  Ephemeral Disk0 GB
  Total Disk100 GB
  RAM   4,096 MB

  
  Ubuntu18.04LTS-pristine

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1965441/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1962726] Re: ssh-rsa key is no longer allowed by recent openssh

2022-03-08 Thread Sylvain Bauza
We discussed this during the previous Nova meeting and we agreed on the
fact this is a correct issue, but we need to deprecate the generation
API (and continue to accept to import the public keys).

As this means a new API microversion, we need a spec for it so we'll
discuss this during the next PTG.

Closing the bug.

** Changed in: nova
   Importance: Undecided => Wishlist

** Changed in: nova
   Status: New => Opinion

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1962726

Title:
  ssh-rsa key is no longer allowed by recent openssh

Status in OpenStack Compute (nova):
  Opinion

Bug description:
  Description
  ===
  Currently create Key-pair API without actual key content returns the key 
generated at server side which is formatted in ssh-rsa.

  However ssh-rsa is no longer supported by default since openssh 8.8

  https://www.openssh.com/txt/release-8.8

  ```
  This release disables RSA signatures using the SHA-1 hash algorithm
  by default. This change has been made as the SHA-1 hash algorithm is
  cryptographically broken, and it is possible to create chosen-prefix
  hash collisions for https://bugs.launchpad.net/nova/+bug/1962726/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1962381] Re: Nova Instance Creation Fails with Error: USB is diabled for this domain

2022-03-08 Thread Sylvain Bauza
Looks to me a config issue, not a project bug.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1962381

Title:
  Nova Instance Creation Fails with Error: USB is diabled for this
  domain

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  My instances fail to create and I get the following error in
  /var/log/nova-compute.log on the compute nodes and /var/log/nova-
  conductor.log on the controller nodeabout USB being disabled for a
  domain but devices being present in the domain.xml.

  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager
  [req-b7f8ef2c-f89a-4380-b573-bba4b99aa296
  d20aa0616f264b39a2b72422d2d5d947 53a12573b5e14406bf85e864dc0acd68 -
  default default] [instance: bd456534-9ccd-458b-b0d1-f73bd0f85d2a]
  Failed to build and run instance: libvirt.libvirtError: unsupported
  configuration: USB is disabled for this domain, but USB devices are
  present in the domain XML

  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: 
bd456534-9ccd-458b-b0d1-f73bd0f85d2a] Traceback (most recent call last):
  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: 
bd456534-9ccd-458b-b0d1-f73bd0f85d2a]   File 
"/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2442, in 
_build_and_run_instance
  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: 
bd456534-9ccd-458b-b0d1-f73bd0f85d2a] self.driver.spawn(context, instance, 
image_meta,
  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: 
bd456534-9ccd-458b-b0d1-f73bd0f85d2a]   File 
"/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 3766, in 
spawn
  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: 
bd456534-9ccd-458b-b0d1-f73bd0f85d2a] self._create_guest_with_network(
  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: 
bd456534-9ccd-458b-b0d1-f73bd0f85d2a]   File 
"/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6758, in 
_create_guest_with_network
  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: 
bd456534-9ccd-458b-b0d1-f73bd0f85d2a] self._cleanup_failed_start(
  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: 
bd456534-9ccd-458b-b0d1-f73bd0f85d2a]   File 
"/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: 
bd456534-9ccd-458b-b0d1-f73bd0f85d2a] self.force_reraise()
  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: 
bd456534-9ccd-458b-b0d1-f73bd0f85d2a]   File 
"/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: 
bd456534-9ccd-458b-b0d1-f73bd0f85d2a] six.reraise(self.type_, self.value, 
self.tb)
  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: 
bd456534-9ccd-458b-b0d1-f73bd0f85d2a]   File 
"/usr/lib/python3/dist-packages/six.py", line 703, in reraise
  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: 
bd456534-9ccd-458b-b0d1-f73bd0f85d2a] raise value
  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: 
bd456534-9ccd-458b-b0d1-f73bd0f85d2a]   File 
"/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6727, in 
_create_guest_with_network
  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: 
bd456534-9ccd-458b-b0d1-f73bd0f85d2a] guest = self._create_guest(
  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: 
bd456534-9ccd-458b-b0d1-f73bd0f85d2a]   File 
"/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6655, in 
_create_guest
  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: 
bd456534-9ccd-458b-b0d1-f73bd0f85d2a] guest = 
libvirt_guest.Guest.create(xml, self._host)
  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: 
bd456534-9ccd-458b-b0d1-f73bd0f85d2a]   File 
"/usr/lib/python3/dist-packages/nova/virt/libvirt/guest.py", line 144, in create
  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: 
bd456534-9ccd-458b-b0d1-f73bd0f85d2a] LOG.error('Error defining a guest 
with XML: %s',
  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: 
bd456534-9ccd-458b-b0d1-f73bd0f85d2a]   File 
"/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: 
bd456534-9ccd-458b-b0d1-f73bd0f85d2a] self.force_reraise()
  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: 
bd456534-9ccd-458b-b0d1-f73bd0f85d2a]   File 
"/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
  2022-02-27 02:31:42.000 1806 ERROR nova.compute.manager [instance: 

[Yahoo-eng-team] [Bug 1963553] Re: Openstack Fails to Launch Instances "/usr/bin/qemu-system-arm' does not support virt type 'kvm; "

2022-03-08 Thread Sylvain Bauza
This doesn't sound a Nova project-specific bug, rather a config issue
for a specific OS/arch.

AFAIK, you need to use CentOS AArch64 images for the RPi.
Anyway, closing the bug.

** Changed in: nova
   Status: New => Incomplete

** Changed in: nova
   Status: Incomplete => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1963553

Title:
  Openstack Fails to Launch Instances "/usr/bin/qemu-system-arm' does
  not support virt type 'kvm; "

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Environment - Openstack Victoria on Ubuntu 20.0.4 on Raspberry Pi 4B,
  1 controller, 2 compute, 1 storage nodes.

  Been troubleshooting an Raspberry Pi 4B Openstack setup. I have all my
  openstack components running and verified but when trying to launch an
  instance it errors out (seems to be from libvirt):

   Emulator '/usr/bin/qemu-system-arm' does not support virt type
  'kvm'\n", '\nDuring handling of the above exception, another exception
  occurred

  Per these instructions
  (https://docs.openstack.org/nova/victoria/configuration/config.html)
  as a fix I tried I've tried the following options in nova.conf in all
  variations with no difference in outcome (Restarted Libvirt and Nova-
  Computer service after each change as well)

  cpu_mode = (default)

  cpu_mode = host-passthrough

  virt_type = kvm

  virt_type = qemu

  I'm at a loss for what to do as to my knowledge the only way
  information is passed through to libvirt is through nova.conf, and
  appreciate any assistance.

  Controller Node /var/log/nova-conductor.log

  2022-03-03 16:39:13.462 1987 ERROR nova.scheduler.utils 
[req-de8dd1d0-3ff5-4495-ad8b-8af235b4d8c4 d20aa0616f264b39a2b72422d2d5d947 - - 
default default] [instance: c453aedc-08d5-4b5c-95c9-ddda1eab4514] Error from 
last host: compute2 (node compute2): ['Traceback (most recent call last):\n', ' 
File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2442, in 
_build_and_run_instance\n self.driver.spawn(context, instance, image_meta,\n', 
' File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 3766, 
in spawn\n self._create_guest_with_network(\n', ' File 
"/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6758, in 
_create_guest_with_network\n
  self.cleanup_failed_start(\n', ' File 
"/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in exit\n 
self.force_reraise()\n', ' File 
"/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise\n six.reraise(self.type, self.value, self.tb)\n', ' File 
"/usr/lib/python3/dist-packages/six.py", line 703, in reraise\n
  raise value\n', ' File 
"/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6727, in 
_create_guest_with_network\n guest = self._create_guest(\n', ' File 
"/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6655, in 
_create_guest\n guest = libvirt_guest.Guest.create(xml, self.host)\n', ' File 
"/usr/lib/python3/dist-packages/nova/virt/libvirt/guest.py", line 144, in 
create\n LOG.error('Error defining a guest with XML: %s',\n', ' File 
"/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in exit\n 
self.force_reraise()\n', ' File 
"/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise\n six.reraise(self.type, self.value, self.tb)\n', ' File 
"/usr/lib/python3/dist-packages/six.py", line 703, in reraise\n
  raise value\n', ' File 
"/usr/lib/python3/dist-packages/nova/virt/libvirt/guest.py", line 141, in 
create\n guest = host.write_instance_config(xml)\n', ' File 
"/usr/lib/python3/dist-packages/nova/virt/libvirt/host.py", line 1144, in 
write_instance_config\n domain = self.get_connection().defineXML(xml)\n', ' 
File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 193, in doit\n 
result = proxy_call(self._autowrap, f, *args, **kwargs)\n', ' File 
"/usr/lib/python3/dist-packages/eventlet/tpool.py", line 151, in proxy_call\n 
rv = execute(f, *args, **kwargs)\n', ' File 
"/usr/lib/python3/dist-packages/eventlet/tpool.py", line 132, in execute\n 
six.reraise(c, e, tb)\n', ' File "/usr/lib/python3/dist-packages/six.py", line 
703, in reraise\n
  raise value\n', ' File 
"/usr/lib/python3/dist-packages/eventlet/tpool.py", line 86, in tworker\n rv = 
meth(*args, **kwargs)\n', ' File "/usr/lib/python3/dist-packages/libvirt.py", 
line 4047, in defineXML\n if ret is None:raise 
libvirtError('virDomainDefineXML() failed', conn=self)\n', 
"libvirt.libvirtError: unsupported configuration: Emulator 
'/usr/bin/qemu-system-arm' does not support virt type 'kvm'\n", '\nDuring 
handling of the above exception, another exception occurred:\n\n', 'Traceback 
(most recent call last):\n', ' File 
"/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2268, in 
_do_build_and_run_instance\n
  

[Yahoo-eng-team] [Bug 1964097] Re: Questions about the command "nova list & openstack server list"

2022-03-08 Thread Sylvain Bauza
No, it's just calling the API DB.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1964097

Title:
  Questions about the command "nova list & openstack server list"

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  When I use the command "nova list" to list all instances in the system, does 
this operation go through the message queue?
  Thank you all!

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1964097/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1959742] Re: Cant launch Instance (Nova, https://cloud.lab.fiware.org/project/instances/)

2022-02-08 Thread Sylvain Bauza
You need to give us more logs in order to understand what the issue is.

Looks to me it's not a bug, rather a configuration issue.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1959742

Title:
  Cant launch Instance (Nova,
  https://cloud.lab.fiware.org/project/instances/)

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Whenever i try to create instance, I get

  Error: Unexpected API Error. Please report this at
  http://bugs.launchpad.net/nova/ and attach the Nova API log if
  possible. 
  (HTTP 500) (Request-ID: req-273ba8b2-95fa-4b66-a958-317bf4f59a50)

  Error: Unable to launch instance named "learning-1"

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1959742/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1959682] Re: String concatenation TypeError in resize flavor helper

2022-02-01 Thread Sylvain Bauza
** Also affects: nova
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1959682

Title:
  String concatenation TypeError in resize flavor helper

Status in OpenStack Compute (nova):
  Invalid
Status in tempest:
  In Progress

Bug description:
  In cae966812, for certain resize tests, we started adding a numeric ID
  to the new flavor name to avoid collisions. This was incorrectly done
  as a string + int concatenation, which is raising a `TypeError: can
  only concatenate str (not "int") to str`.

  Example of this happening in nova-next job:
  
https://zuul.opendev.org/t/openstack/build/7f750faf22ec48219ddd072cfe6e02e1/logs

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1959682/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1956432] Re: Old allocation of VM is not deleted after evacuating

2022-01-18 Thread Sylvain Bauza
This is expected behaviour. As we assume that we can only evacuate when
a nova-compute service is down, there are no ways for the nova-compute
service to ask Placement to remove those allocations.

That's only when nova-compute is back up that we can delete those
allocations. We also provide some nova-manage commands for deleting
orphaned allocations in case of a non-recoverable compute service. See
https://docs.openstack.org/nova/latest/cli/nova-manage.html#placement-
audit

** Changed in: nova
   Status: New => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1956432

Title:
  Old allocation of VM is not deleted after evacuating

Status in OpenStack Compute (nova):
  Won't Fix

Bug description:
  I found that the old instance allocation in placement is not deleted after 
executing evacuate, 
  it will lead to wrong resources info of old compute node.

  
  -

  MariaDB [placement]> select * from  allocations where 
consumer_id='4c6c29e7-a1f0-4dac-a3ef-a98b5598abe9';
  
+-++---+--+--+---+--+
  | created_at  | updated_at | id| resource_provider_id | 
consumer_id  | resource_class_id | used |
  
+-++---+--+--+---+--+
  | 2022-01-05 08:23:19 | NULL   | 18315 |   11 | 
4c6c29e7-a1f0-4dac-a3ef-a98b5598abe9 | 2 |1 |
  | 2022-01-05 08:23:19 | NULL   | 18318 |   11 | 
4c6c29e7-a1f0-4dac-a3ef-a98b5598abe9 | 1 |  512 |
  | 2022-01-05 08:23:19 | NULL   | 18321 |   11 | 
4c6c29e7-a1f0-4dac-a3ef-a98b5598abe9 | 0 |1 |
  | 2022-01-05 08:23:19 | NULL   | 18324 |   33 | 
4c6c29e7-a1f0-4dac-a3ef-a98b5598abe9 | 0 |1 |
  | 2022-01-05 08:23:19 | NULL   | 18327 |   33 | 
4c6c29e7-a1f0-4dac-a3ef-a98b5598abe9 | 1 |  512 |
  | 2022-01-05 08:23:19 | NULL   | 18330 |   33 | 
4c6c29e7-a1f0-4dac-a3ef-a98b5598abe9 | 2 |1 |
  
+-++---+--+--+---+--+

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1956432/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1956983] Re: Consider making upgrade check for old computes a failure

2022-01-18 Thread Sylvain Bauza
I'm not sure I'd classify it as a bug. Probably a good thought so, so
marking it Invalid/Wishlist but I'm open to thoughts.

To answer your question, now that we have an hardstop blocking nova
services to restart if there are old enough, this sound a legit
blueprint to address.

** Changed in: nova
   Importance: Undecided => Wishlist

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1956983

Title:
  Consider making upgrade check for old computes a failure

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Currently, the upgrade check for older than N-1 computes only produces a 
warning.
  For example:

  Check: Older than N-1 computes
  Result: Warning
  Details: Current Nova version does not support computes older than
Victoria but the minimum compute service level in your
system is 30 and the oldest supported service level is 52.

  If this is overlooked, Nova services will fail to start after upgrade.
  With Nova API down, the old services cannot be removed without
  database edits.

  Is there a specific reason to keep this check as a warning rather than
  a failure?

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1956983/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1953734] Re: POST /os-security-groups returns HTTP 500 on invalid input

2021-12-09 Thread Sylvain Bauza
As we document in our API docs, this /os-security-groups API resource is
now deprecated [1] since API microversion 2.36 [2] which is shipped with
the Newton release

[1] https://docs.openstack.org/api-ref/compute/#create-security-group
[2] 
https://docs.openstack.org/nova/latest/reference/api-microversion-history.html#microversion

Accordingly, we can't fix this bug in our project, even within the
existing stable branches.

** Changed in: nova
   Status: New => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1953734

Title:
  POST /os-security-groups returns HTTP 500 on invalid input

Status in OpenStack Compute (nova):
  Won't Fix

Bug description:
  Nova does not validate the input on os-security-groups API resource.

  curl -X POST 'http://10.1.0.21/compute/v2.1/os-security-groups' -d 
'{"security_group": "nostrud commodo tempor", "name": "eiusmod veniam", 
"description": "non esse occaecat"}' -H "Content-Type: application/json; 
charset=UTF-8" -H "Accept: application/json" -H "X-Auth-Token: ${token}" 
  {"computeFault": {"code": 500, "message": "Unexpected API Error. Please 
report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if 
possible.\n"}}

  Dec 09 10:27:16 master0 devstack@n-api.service[3644655]: ERROR 
nova.api.openstack.wsgi [None req-910489b0-9e02-4748-84ed-f2b2574ec7bb admin 
admin] Unexpected exception in API method: Attribut>
  Dec 09 10:27:16 master0 devstack@n-api.service[3644655]: ERROR 
nova.api.openstack.wsgi Traceback (most recent call last):
  Dec 09 10:27:16 master0 devstack@n-api.service[3644655]: ERROR 
nova.api.openstack.wsgi   File "/opt/stack/nova/nova/api/openstack/wsgi.py", 
line 658, in wrapped
  Dec 09 10:27:16 master0 devstack@n-api.service[3644655]: ERROR 
nova.api.openstack.wsgi return f(*args, **kwargs)
  Dec 09 10:27:16 master0 devstack@n-api.service[3644655]: ERROR 
nova.api.openstack.wsgi   File 
"/opt/stack/nova/nova/api/openstack/compute/security_groups.py", line 219, in 
create
  Dec 09 10:27:16 master0 devstack@n-api.service[3644655]: ERROR 
nova.api.openstack.wsgi group_name = security_group.get('name', None)
  Dec 09 10:27:16 master0 devstack@n-api.service[3644655]: ERROR 
nova.api.openstack.wsgi AttributeError: 'str' object has no attribute 'get'
  Dec 09 10:27:16 master0 devstack@n-api.service[3644655]: ERROR 
nova.api.openstack.wsgi 
  Dec 09 10:27:16 master0 devstack@n-api.service[3644655]: INFO 
nova.api.openstack.wsgi [None req-910489b0-9e02-4748-84ed-f2b2574ec7bb admin 
admin] HTTP exception thrown: Unexpected API Error. >
  Dec 09 10:27:16 master0 devstack@n-api.service[3644655]: 
  Dec 09 10:27:16 master0 devstack@n-api.service[3644655]: DEBUG 
nova.api.openstack.wsgi [None req-910489b0-9e02-4748-84ed-f2b2574ec7bb admin 
admin] Returning 500 to user: Unexpected API Error.>
  Dec 09 10:27:16 master0 devstack@n-api.service[3644655]:  {{(pid=3644655) __call__ 
/opt/stack/nova/nova/api/openstack/wsgi.py:936}}

  reproducible on recent master with simple devstack setup

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1953734/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1951983] Re: nova contains a regular expression that is vulnerable to ReDoS (Regular Expression Denial of Service).

2021-11-30 Thread Sylvain Bauza
If I understand correctly which module has this issue, this is about
hacking.py.

@dw1s, you tell this is before SHA1
8f250f50446ca2d7aa84609d5144088aa4cded78 but I can't find it in the nova
repo.

Either way, this hacking.py module isn't run by our services and is just
used by our PEP8 jobs, so I don't see any problem here.


** Changed in: nova
   Status: New => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1951983

Title:
  nova contains a regular expression that is vulnerable to ReDoS
  (Regular Expression Denial of Service).

Status in OpenStack Compute (nova):
  Won't Fix
Status in OpenStack Security Advisory:
  Won't Fix

Bug description:
  # Summary
  nova contains a regular expression that is vulnerable to ReDoS (Regular 
Expression Denial of Service).

  # Description

  ReDoS, or Regular Expression Denial of Service, is a vulnerability
  affecting inefficient regular expressions which can perform extremely
  badly when run on a crafted input string.

  # Proof of Concept
  To see that the regular expression is vulnerable, copy-paste it into a 
separate file & run the code as shown in below.

  ```python
  import re

  log_remove_context = re.compile(
  r"(.)*LOG\.(.*)\(.*(context=[_a-zA-Z0-9].*)+.*\)")
  log_remove_context.match('LOG.' + '(' * 3456)
  ```

  # Impact
  This issue may lead to a denial of service.

  # References
  - 
https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1951983/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1951720] Re: Virtual interface creation failed

2021-11-23 Thread Sylvain Bauza
Yeah, I'll put it to the Neutron team to ask them to look at this bug.
In case they say it's a Nova bug, please modify the nova status to "New" again.

Thanks.

** Also affects: neutron
   Importance: Undecided
   Status: New

** Changed in: nova
   Status: New => Opinion

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1951720

Title:
  Virtual interface creation failed

Status in neutron:
  New
Status in OpenStack Compute (nova):
  Opinion

Bug description:
  Hi,

  I have testig stack of openstack wallaby (deployed with kolla-ansible with 
kolla source images)
  and found probably weird bug in nova/neutron.

  I have testing heat template which was starting about 6 instances with
  bunch of network interfaces with security group - nothing special.
  Testing openstack ENV is clean, fresh installed, tempest passing. So
  what is going on ?

  1. Sometimes heat stack is created successfully without error
  2. Sometimes heat stack is created sucessfully - BUT with errors in 
nova-compute - so retry 
 mechanism works

  Errors in nova-compute :

  7d241eaae5bb137aedc6fcc] [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] 
Took 0.10 seconds to destroy the instance on the hypervisor.
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager 
[req-5b21ebad-ffb7-46b0-8c37-fd665d01013e 
64fe2842ff8c6302c0450bee25600a10e54f2b9793e9c8776f956c993a7a7ee8 
0960461696f64f82ba108f8397bf508c - e01e19b257d241eaae5bb137aedc6fcc e01e19b2
  57d241eaae5bb137aedc6fcc] [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] 
Failed to allocate network(s): nova.exception.VirtualInterfaceCreateException: 
Virtual Interface creation failed
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 
69b8dd7e-a9db-44fb-ab41-832942cb9e7e] Traceback (most recent call last):
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 
69b8dd7e-a9db-44fb-ab41-832942cb9e7e]  File 
"/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6930, in 
_create_guest_with_network
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 
69b8dd7e-a9db-44fb-ab41-832942cb9e7e]guest = self._create_guest(
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 
69b8dd7e-a9db-44fb-ab41-832942cb9e7e]  File "/usr/lib/python3.9/contextlib.py", 
line 124, in __exit__
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 
69b8dd7e-a9db-44fb-ab41-832942cb9e7e]next(self.gen)
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 
69b8dd7e-a9db-44fb-ab41-832942cb9e7e]  File 
"/usr/lib/python3/dist-packages/nova/compute/manager.py", line 479, in 
wait_for_instance_event
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 
69b8dd7e-a9db-44fb-ab41-832942cb9e7e]actual_event = event.wait()
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 
69b8dd7e-a9db-44fb-ab41-832942cb9e7e]  File 
"/usr/lib/python3/dist-packages/eventlet/event.py", line 125, in wait
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 
69b8dd7e-a9db-44fb-ab41-832942cb9e7e]result = hub.switch()
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 
69b8dd7e-a9db-44fb-ab41-832942cb9e7e]  File 
"/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 313, in switch
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 
69b8dd7e-a9db-44fb-ab41-832942cb9e7e]return self.greenlet.switch()
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 
69b8dd7e-a9db-44fb-ab41-832942cb9e7e] eventlet.timeout.Timeout: 300 seconds
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 
69b8dd7e-a9db-44fb-ab41-832942cb9e7e] 
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 
69b8dd7e-a9db-44fb-ab41-832942cb9e7e] During handling of the above exception, 
another exception occurred:
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 
69b8dd7e-a9db-44fb-ab41-832942cb9e7e] 
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 
69b8dd7e-a9db-44fb-ab41-832942cb9e7e] Traceback (most recent call last):
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 
69b8dd7e-a9db-44fb-ab41-832942cb9e7e]  File 
"/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2366, in 
_build_and_run_instance
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 
69b8dd7e-a9db-44fb-ab41-832942cb9e7e]self.driver.spawn(context, instance, 
image_meta,
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 
69b8dd7e-a9db-44fb-ab41-832942cb9e7e]  File 
"/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 3885, in 
spawn
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 
69b8dd7e-a9db-44fb-ab41-832942cb9e7e]self._create_guest_with_network(
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 

[Yahoo-eng-team] [Bug 1947753] Re: Evacuated instances are not removed from the source

2021-10-27 Thread Sylvain Bauza
OK, let me get it right.

You say that if you want to evacuate an instance, you don't really know whether 
the original service runs correctly, right?
That's basically why Nova verifies whether the host is not operational and 
somehow 'failed'.
Sometimes, you're right, Nova thinks the compute service isn't faulty and then 
you can't evacuate. Some other time, Nova thinks the compute service *is* 
faulty and then you can evacuate.

If you're doing so, then indeed you could have problems *if* the host is 
actually running. 
That's why in general we recommend operators to "fence" the original faulty 
host that's detected by Nova before evacuating.

Either way, if the service continues to run, it verifies the evacuation
status periodically and deletes the host. So, maybe you're getting a
race when you evacuate while a compute fault is transient and then you
see a problem.

If so, I'd recommend you, as I said, to 'fence' the host before evacuating 
instances... or wait a little bit before evacuating the instances if the issue 
is transient.
Maybe that's something related to healthchecks we want to work on : if you're 
getting a better status of a faulty compute service, you wouldn't issue 
evacuations unless you're sure it went down.

Putting the bug report as Opinion but I'm more than happy to discuss
with you, Belmiro, on #openstack-nova if you wish.

** Changed in: nova
   Status: New => Opinion

** Changed in: nova
   Importance: Undecided => Wishlist

** Tags added: evacuate

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1947753

Title:
  Evacuated instances are not removed from the source

Status in OpenStack Compute (nova):
  Opinion

Bug description:
  Instance "evacuation" is a great feature and we are trying to take advantage 
of it.
  But, it has some limitations, depending how "broken" is the node.

  Let me give some context...

  In the scenario where the compute node loses connectivity (broken
  switch port, loose network cable, ...) or nova-compute is suck
  (filesystem issue) evacuating instances can have some unexpected
  consequences and lead to data corruption in the application (for
  example in a DB application).

  If a compute node loses connectivity (or an entire set of compute nodes), 
nova-compute and the instances are "not available".
  If the node runs critical applications (let's suppose a MySQL DB), the cloud 
operator could be tempted to "evacuate" the instance to recover the critical 
application for the user. At this point the cloud operator may not know yet the 
compute node issue and maybe it won't be possible to shut it down (management 
network affected?, ...) or even simply don't want to interfere with the work of 
the repair team.

  The repair teams fixes the issue (it can take few minutes or hours...)
  and nova-compute and the instances are available again.

  The problem is that nova-compute doesn't destroy the evacuated
  instances in the source.

  ```
  2021-10-19 11:17:51.519 3050 WARNING nova.compute.resource_tracker 
[req-0ed10e35-2715-466a-918b-69eb1fc770e8 - - - - -] Instance 
fc3be091-56d3-4c69-8adb-2fdb8b0a35d2 has been moved to another host 
foo.cern.ch(foo.cern.ch). There are allocations remaining against the source 
host that might need to be removed: {u'resources': {u'VCPU': 1, u'MEMORY_MB': 
1875}}.
  ```

  At this point we have 2 instances sharing the same IP and possibly
  writing into the same volume.

  Only when nova-compute is restarted (I guess that was always the
  assumption... the compute node was really broken) the evacuated
  instances in the affected node are removed.

  ```
  2021-10-19 15:39:49.257 21189 INFO nova.compute.manager 
[req-ded45b0c-20ab-4587-9533-8c613d977f79 - - - - -] Destroying instance as it 
has been evacuated from this host but still exists in the hypervisor
  2021-10-19 15:39:52.949 21189 INFO nova.virt.libvirt.driver [ ] Instance 
destroyed successfully.
  ```

  I would expect that nova-compute will constantly check for the evacuated 
instances and then removed them.
  Otherwise, this requires a lot of coordination between different support 
teams.

  Should this be moved to a periodic task?
  
https://github.com/openstack/nova/blob/e14eef0719eceef35e7e96b3e3d242ec79a80969/nova/compute/manager.py#L1440

  
  I'm running Stein, but looking into the code, we have the same behaviour in 
master.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1947753/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1947824] Re: 503 Service Unavailable: The server is currently unavailable. Please try again at a later time.: The Keystone service is temporarily unavailable. (HTTP 503)

2021-10-26 Thread Sylvain Bauza
*** This bug is a duplicate of bug 1947825 ***
https://bugs.launchpad.net/bugs/1947825

** This bug has been marked a duplicate of bug 1947825
   503 Service Unavailable: The server is currently unavailable. Please try 
again at a later time.: The Keystone service is temporarily unavailable. (HTTP 
503)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1947824

Title:
  503 Service Unavailable: The server is currently unavailable. Please
  try again at a later time.: The Keystone service is temporarily
  unavailable. (HTTP 503)

Status in OpenStack Compute (nova):
  New

Bug description:
  I am trying to install the rocky version of openstack, and while
  configuring the glance service and executing the command to upload the
  image we are facing the following error.

  503 Service Unavailable: The server is currently unavailable. Please
  try again at a later time.: The Keystone service is temporarily
  unavailable. (HTTP 503)

  and the keystone service status is empty, its not displaying anything.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1947824/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1947825] Re: 503 Service Unavailable: The server is currently unavailable. Please try again at a later time.: The Keystone service is temporarily unavailable. (HTTP 503)

2021-10-26 Thread Sylvain Bauza
Looks a configuration issue from Keystone. Not really a Nova bug : the
nova-api service tells you that the Keystone API service is not
available.


** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1947825

Title:
  503 Service Unavailable: The server is currently unavailable. Please
  try again at a later time.: The Keystone service is temporarily
  unavailable. (HTTP 503)

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  I am trying to install the rocky version of openstack, and while
  configuring the glance service and executing the command to upload the
  image we are facing the following error.

  503 Service Unavailable: The server is currently unavailable. Please
  try again at a later time.: The Keystone service is temporarily
  unavailable. (HTTP 503)

  and the keystone service status is empty, its not displaying anything.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1947825/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1948393] Re: useless configuration options in 'nova.conf'

2021-10-26 Thread Sylvain Bauza
This looks a glanceclient issue, nope ?


** Also affects: glance
   Importance: Undecided
   Status: New

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1948393

Title:
  useless configuration options in 'nova.conf'

Status in Glance:
  New
Status in OpenStack Compute (nova):
  Invalid

Bug description:
  I was trying to add a retry for glance operations via nova. I came
  across below options defined in the conf:

  [glance]
  ..
  #connect_retries = 
  #connect_retry_delay = 
  #status_code_retries = 
  #status_code_retry_delay = 
  ===

  I tried to set some value for `connect_retries` and tried to reproduce a 
connection error for the snapshot upload. Somehow the `connect_retries` value 
is not getting picked up. I tried to search these options in 
  the code also(nova/nova/conf/glance.py), but could not find them.

  let me know if this is a known issue. could not find any duplicate bug
  for this.

  Nova release version: Train

To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/1948393/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1948637] Re: nova should support deleting or adding tag when server 's status is error

2021-10-26 Thread Sylvain Bauza
This is not a bug, rather a feature request.
If the instance is in ERROR, why should you want to modify the tag ?

** Changed in: nova
   Importance: Undecided => Wishlist

** Changed in: nova
   Status: New => Opinion

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1948637

Title:
  nova should support deleting or adding tag when server 's status is
  error

Status in OpenStack Compute (nova):
  Opinion

Bug description:
  when adding or deleting tag to a error VM,it report

  b"Cannot 'update tag' instance b7c8cc1c-8c26-4767-bd11-7faf99cee2df while it 
is in vm_state error (HTTP 409) 
  b"Cannot 'delete tag' instance b7c8cc1c-8c26-4767-bd11-7faf99cee2df while it 
is in vm_state error (HTTP 409)

  tags and names belong to user-defined data, the name of the virtual
  machine can be modified in an error state, should the label also have
  similar operations?

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1948637/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1946298] Re: live-migration fails when option rom size is different in guest's memory

2021-10-07 Thread Sylvain Bauza
We discussed this bug report during the latest Asian-friendly Nova
meeting [1] and we agreed on the fact this report is about asking Nova
to support iPXE, as for the moment Nova doesn't really expose it for the
moment, even if libvirtd does this.

Please provide a blueprint for explaining your needs and then we will
discuss on it.


[1] 
https://meetings.opendev.org/meetings/nova_extra/2021/nova_extra.2021-10-07-08.04.log.html#l-59


** Changed in: nova
   Status: New => Invalid

** Changed in: nova
   Importance: Undecided => Wishlist

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1946298

Title:
  live-migration fails when option rom size is different in guest's
  memory

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Description
  ===
  This problem is found when doing live-migration accross nova versions,
  especially ipxe versions depended on libvirt. If an instance is attached to
  interface, option rom is loaded into guest's rom. When doing live migration,
  qemu will check rom size and try to resize resizable memory region. However,
  option rom is not resizable. Once the destination node found option rom size
  changed when loaded to memory, an exception will occur and stop the migration
  process.

  Steps to reproduce
  ==
  A simple way to reproduce:
  * Prepare two nova-compute node, which can be the same version
  * Create an instance on Node A, and attach an interface to it
  * Check which ipxe rom is loaded into memory by its model type.
For example, if an interface is defined with ``, then
`/usr/lib/ipxe/qemu/efi-virtio.rom` is loaded to rom on ubuntu x86 system.
  * Change the rom's virtual size on the destination Node B. 
  Simply `echo "hello" > /usr/lib/ipxe/qemu/efi-virtio.rom`
  The virtual size is the max length when rom is loaded to guest's memory, which
  is exponential times of 2. We can use the following command to get rom's 
  virtual size.
`virsh qemu-monitor-command  --hmp 'info ramblock'` 
  * Do live-migration

`nova live-migration --block-migrate cirros1 cmp02`

  Expected result
  ===
  Normally, if the rom's virtual size is not changed, migration will succeed.

  Actual result
  =
  After the execution of the steps above, the live-migration will fail with 
  error.

  Environment
  ===
  Nova version:
  $ dpkg -l | grep nova
  ii  nova-common2:21.2.1-0ubuntu1 all
  ii  nova-compute   2:21.2.1-0ubuntu1 all
  ii  nova-compute-kvm   2:21.2.1-0ubuntu1 all
  ii  nova-compute-libvirt   2:21.2.1-0ubuntu1 all
  ii  python3-nova   2:21.2.1-0ubuntu1 all
  ii  python3-novaclient 2:17.0.0-0ubuntu1 all

  Hypervisor type: libvirt
  $ dpkg -l | grep libvirt
  ii  libvirt-clients6.0.0-0ubuntu8.13 amd64
  ii  libvirt-daemon 6.0.0-0ubuntu8.13 amd64
  ii  libvirt-daemon-driver-qemu 6.0.0-0ubuntu8.13 amd64
  ii  libvirt-daemon-driver-storage-rbd  6.0.0-0ubuntu8.13 amd64
  ii  libvirt-daemon-system  6.0.0-0ubuntu8.13 amd64
  ii  libvirt-daemon-system-systemd  6.0.0-0ubuntu8.13 amd64
  ii  libvirt0:amd64 6.0.0-0ubuntu8.13 amd64
  ii  nova-compute-libvirt   2:21.2.1-0ubuntu1 all
  ii  python3-libvirt6.1.0-1   amd64

  Networking type: Neutron with OpenVSwitch
  $ dpkg -l | grep neutron
  ii  neutron-common 2:16.4.0-0ubuntu3 all
  ii  neutron-openvswitch-agent  2:16.4.0-0ubuntu3 all
  ii  python3-neutron2:16.4.0-0ubuntu3 all
  ii  python3-neutron-lib2.3.0-0ubuntu1all
  ii  python3-neutronclient  1:7.1.1-0ubuntu1  all

  Logs & Configs
  ==
  ```text
  2021-09-22 10:10:31.451 35235 ERROR nova.virt.libvirt.driver [-] [instance: 
6d91c241-75b8-4067-8874-c64970b87f6a] Migration operation has aborted
  2021-09-22 10:10:31.644 35235 INFO nova.compute.manager [-] [instance: 
6d91c241-75b8-4067-8874-c64970b87f6a] Swapping old allocation on 
dict_keys(['61b9a486-f53e-4b70-b54c-0db29f8ff978']) held by migration 
f5308871-0e91-48b0-8a68-a7d66239b3bd for instance
  2021-09-22 10:10:31.671 35235 ERROR nova.virt.libvirt.driver [-] [instance: 
6d91c241-75b8-4067-8874-c64970b87f6a] Live Migration failure: internal error: 
qemu unexpectedly closed the monitor: 2021-09-22T02:10:31.450377Z 
qemu-system-x86_64: Length mismatch: :00:03.0/virtio-net-pci.rom: 0x1000 in 
!= 0x8: Invalid argument
  2021-09-22T02:10:31.450414Z qemu-system-x86_64: error while loading state for 
instance 0x0 of device 'ram'
  

[Yahoo-eng-team] [Bug 1945401] Re: scheduler can not filter node by storage backend

2021-10-05 Thread Sylvain Bauza
You need to use aggregates if you want to use different storage backends
per compute.

For what's it worth, if you really want to have the scheduler having a
way to verify storage backends, that would be a new feature and not a
bug.


** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1945401

Title:
  scheduler can not filter node by storage backend

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  
  If my aggregation has ceph backend node cmp01 and fcsan backend node cmp02, 
  and I create a ceph backend VM01 in cmp01, then execute migrate it. 

  The migration will be failed if scheduler filter cmp02 or I set the
  target node is cmp02.

  
  --Traceback--
  oslo_messaging.rpc.client.RemoteError: Remote error: 
  ClientException Unable to create attachment for volume 
  (Invalid input received: Connector doesn't have required information: wwpns). 
  (HTTP 500) 

  
  I think nova need to pre-check the target node I set or 
  filter the available storage backend compute nodes when selecting destination.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1945401/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1945538] Re: Database permission configured properly but I have the Access deny

2021-10-05 Thread Sylvain Bauza
This looks to me an unrelated issue for the Nova repository. This rather
looks to me an issue with the Ubuntu packages.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1945538

Title:
  Database permission configured properly but I have the Access deny

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Description
  ===
  I installed Openstack using [OpenStack Installation Guide][OpenStack 
Installation Guide], all command and configuration is on my [Github][My-Github].

  At this point **one controller** with **two compute** node.

  * Nova configuration on controller*
  user@controller001:~$ sudo !!
  sudo grep -v '^\s*$\|^\s*\#' /etc/nova/nova.conf
  [DEFAULT]
  log_dir = /var/log/nova
  lock_path = /var/lock/nova
  state_path = /var/lib/nova
  transport_url = rabbit://openstack:openstack@controller001:5672/
  my_ip = 192.168.56.50
  [api]
  auth_strategy = keystone
  [api_database]
  connection = mysql+pymysql://nova:openstack@controller001/nova_api
  [barbican]
  [cache]
  [cinder]
  [compute]
  [conductor]
  [console]
  [consoleauth]
  [cors]
  [cyborg]
  [database]
  connection = mysql+pymysql://nova:openstack@controller001/nova
  [devices]
  [ephemeral_storage_encryption]
  [filter_scheduler]
  [glance]
  api_servers = http://controller001:9292
  [guestfs]
  [healthcheck]
  [hyperv]
  [image_cache]
  [ironic]
  [key_manager]
  [keystone]
  [keystone_authtoken]
  www_authenticate_uri = http://controller001:5000/
  auth_url = http://controller001:5000/
  memcached_servers = controller001:11211
  auth_type = password
  project_domain_name = Default
  user_domain_name = Default
  project_name = service
  username = nova
  password = openstack
  [libvirt]
  [metrics]
  [mks]
  [neutron]
  [notifications]
  [oslo_concurrency]
  lock_path = /var/lib/nova/tmp
  [oslo_messaging_amqp]
  [oslo_messaging_kafka]
  [oslo_messaging_notifications]
  [oslo_messaging_rabbit]
  [oslo_middleware]
  [oslo_policy]
  [pci]
  [placement]
  region_name = RegionOne
  project_domain_name = Default
  project_name = service
  auth_type = password
  user_domain_name = Default
  auth_url = http://controller001:5000/v3
  username = placement
  password = openstack
  [powervm]
  [privsep]
  [profiler]
  [quota]
  [rdp]
  [remote_debug]
  [scheduler]
  discover_hosts_in_cells_interval = 300
  [serial_console]
  [service_user]
  [spice]
  [upgrade_levels]
  [vault]
  [vendordata_dynamic_auth]
  [vmware]
  [vnc]
  enabled = true
  server_listen = $my_ip
  server_proxyclient_address = $my_ip
  [workarounds]
  [wsgi]
  [zvm]
  [cells]
  enable = False
  [os_region_name]
  openstack = 

  
  * Nova configuration on compute:*
  user@compute001:~$ sudo grep -v '^\s*$\|^\s*\#' /etc/nova/nova.conf
  [DEFAULT]
  log_dir = /var/log/nova
  lock_path = /var/lock/nova
  state_path = /var/lib/nova
  transport_url = rabbit://openstack:openstack@controller001
  my_ip = 172.16.56.51
  [api]
  auth_strategy = keystone
  [api_database]
  connection = sqlite:var/lib/nova/nova_api.sqlite
  [barbican]
  [cache]
  [cinder]
  [compute]
  [conductor]
  [console]
  [consoleauth]
  [cors]
  [cyborg]
  [database]
  connection = sqlite:var/lib/nova/nova.sqlite
  [devices]
  [ephemeral_storage_encryption]
  [filter_scheduler]
  [glance]
  api_servers = http://controller001:9292
  [guestfs]
  [healthcheck]
  [hyperv]
  [image_cache]
  [ironic]
  [key_manager]
  [keystone]
  [keystone_authtoken]
  www_authenticate_uri = http://controller001:5000/
  auth_url = http://controller001:5000/
  memcached_servers = controller001:11211
  auth_type = password
  project_domain_name = Default
  user_domain_name = Default
  project_name = service
  username = nova
  password = openstack
  [libvirt]
  [metrics]
  [mks]
  [neutron]
  [notifications]
  [oslo_concurrency]
  lock_path = /var/lib/nova/tmp
  [oslo_messaging_amqp]
  [oslo_messaging_kafka]
  [oslo_messaging_notifications]
  [oslo_messaging_rabbit]
  [oslo_middleware]
  [oslo_policy]
  [pci]
  [placement]
  region_name = RegionOne
  project_domain_name = Default
  project_name = service
  auth_type = password
  user_domain_name = Default
  auth_url = http://controller001:5000/v3
  username = placement
  password = openstack
  [powervm]
  [privsep]
  [profiler]
  [quota]
  [rdp]
  [remote_debug]
  [scheduler]
  discover_hosts_in_cells_interval = 300
  [serial_console]
  [service_user]
  [spice]
  [upgrade_levels]
  [vault]
  [vendordata_dynamic_auth]
  [vmware]
  [vnc]
  enabled = true
  server_listen = 0.0.0.0
  server_proxyclient_address = $my_ip
  novncproxy_base_url = http://controller001:6080/vnc_auto.html
  [workarounds]
  [wsgi]
  [zvm]
  [cells]
  enable = False
  [os_region_name]
  openstack =

  
  In the below section improve database permission configured correctly
  

[Yahoo-eng-team] [Bug 1944111] Re: Missing __init__.py in nova/db/api

2021-09-23 Thread Sylvain Bauza
** Changed in: nova/yoga
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1944111

Title:
  Missing __init__.py in nova/db/api

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) xena series:
  Fix Released
Status in OpenStack Compute (nova) yoga series:
  Fix Released

Bug description:
  Looks like nova/db/api is missing an __init__.py, which breaks *at
  least* my Debian packaging.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1944111/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1938765] [NEW] nova-lvm job constantly fails on Glance image upload

2021-08-03 Thread Sylvain Bauza
Public bug reported:

The nova-lvm job seems to fail for every change [1] which prevents us to
merge any change touching nova/virt/libvirt [2]

All failures seem to relate to the same Tempest tests (6 of them)
failing with the same problem : a Glance image upload issue as Glance
API returns a HTTP502.


Aug 01 03:50:12.036077 ubuntu-focal-ovh-bhs1-0025715861 nova-compute[106038]: 
ERROR os_brick.initiator.linuxscsi [None 
req-8fa95fb0-15e8-4bf7-8314-0d2dac1c9b9c 
tempest-VolumesAdminNegativeTest-243853426 
tempest-VolumesAdminNegativeTest-243853426-project] multipathd is not running: 
exit code None: oslo_concurrency.processutils.ProcessExecutionError: [Errno 2] 
No such file or directory

Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova-
compute[106038]: ERROR oslo_messaging.rpc.server [None
req-76a536ab-394f-4347-8dec-30842c1ec1d2 tempest-
ListImageFiltersTestJSON-1499681503 tempest-
ListImageFiltersTestJSON-1499681503-project] Exception during message
handling: glanceclient.exc.HTTPBadGateway: HTTP 502 Bad Gateway: Bad
Gateway: The proxy server received an invalid: response from an upstream
server.: Apache/2.4.41 (Ubuntu) Server at 158.69.72.121 Port 80

Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova-
compute[106038]: ERROR oslo_messaging.rpc.server Traceback (most recent
call last):

Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova-
compute[106038]: ERROR oslo_messaging.rpc.server   File
"/opt/stack/nova/nova/virt/libvirt/driver.py", line 2916, in snapshot

Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova-
compute[106038]: ERROR oslo_messaging.rpc.server
metadata['location'] = root_disk.direct_snapshot(

Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova-
compute[106038]: ERROR oslo_messaging.rpc.server   File
"/opt/stack/nova/nova/virt/libvirt/imagebackend.py", line 452, in
direct_snapshot

Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova-
compute[106038]: ERROR oslo_messaging.rpc.server raise
NotImplementedError(_('direct_snapshot() is not implemented'))

Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova-
compute[106038]: ERROR oslo_messaging.rpc.server NotImplementedError:
direct_snapshot() is not implemented

Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova-
compute[106038]: ERROR oslo_messaging.rpc.server

Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova-
compute[106038]: ERROR oslo_messaging.rpc.server During handling of the
above exception, another exception occurred:

Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova-
compute[106038]: ERROR oslo_messaging.rpc.server

Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova-
compute[106038]: ERROR oslo_messaging.rpc.server Traceback (most recent
call last):

Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova-
compute[106038]: ERROR oslo_messaging.rpc.server   File
"/usr/local/lib/python3.8/dist-packages/oslo_messaging/rpc/server.py",
line 165, in _process_incoming

Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova-
compute[106038]: ERROR oslo_messaging.rpc.server res =
self.dispatcher.dispatch(message)

Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova-
compute[106038]: ERROR oslo_messaging.rpc.server   File
"/usr/local/lib/python3.8/dist-
packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch

Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova-
compute[106038]: ERROR oslo_messaging.rpc.server return
self._do_dispatch(endpoint, method, ctxt, args)

Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova-
compute[106038]: ERROR oslo_messaging.rpc.server   File
"/usr/local/lib/python3.8/dist-
packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch

Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova-
compute[106038]: ERROR oslo_messaging.rpc.server result = func(ctxt,
**new_args)

Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova-
compute[106038]: ERROR oslo_messaging.rpc.server   File
"/opt/stack/nova/nova/exception_wrapper.py", line 71, in wrapped

Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova-
compute[106038]: ERROR oslo_messaging.rpc.server
_emit_versioned_exception_notification(

Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova-
compute[106038]: ERROR oslo_messaging.rpc.server   File
"/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line
227, in __exit__

Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova-
compute[106038]: ERROR oslo_messaging.rpc.server
self.force_reraise()

Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova-
compute[106038]: ERROR oslo_messaging.rpc.server   File
"/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line
200, in force_reraise

Aug 01 03:51:41.348545 ubuntu-focal-ovh-bhs1-0025715861 nova-
compute[106038]: ERROR oslo_messaging.rpc.server raise self.value

Aug 01 03:51:41.348545 

[Yahoo-eng-team] [Bug 1918340] Re: Fault Injection #1 - improve unit test effectiveness

2021-04-06 Thread Sylvain Bauza
Fixing unit tests or tech debt concern don't really need to have bug reports. 
That's also why we have Gerrit, for discussing whether the debt fix is good or 
not.
So, instead of discussing here about what to do, please upload a new change 
fixing what you want and ask us to review it by #openstack-nova, we'll do.


** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1918340

Title:
  Fault Injection #1 - improve unit test effectiveness

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Description
  ===
  I have performed fault injection in openstack nova by changing the code of 
compute/api.py (inserting a representative/probable bug) and then ran the unit, 
functional and integration tests and discover that some of the bugs inserted 
were not detected by the test suite:
  The reference WIDS (Wrong string in initial data) is a type of fault where 
the string used in a variable initialization is set to an incorrect value.
   

  Steps to reproduce
  ==

  Line of Code  Original Code   Incorrect Code
  102   AGGREGATE_ACTION_UPDATE_META = 'UpdateMeta' 
AGGREGATE_ACTION_UPDATE_META = 'NHZWTCGB'

  Refactor the line of code above to the incorrect code. Then execute
  the unit tests.

  Expected result
  ===
  The unit tests should detect the fault.

  Actual result
  ===
  The fault was not detected by the unit tests.

  Environment
  ===
  The code tested is on the stable/ussuri branch.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1918340/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1900006] Re: Asking for different vGPU types is racey

2021-04-06 Thread Sylvain Bauza
Victoria backport candidate :
https://review.opendev.org/c/openstack/nova/+/784907

** Also affects: nova/victoria
   Importance: Undecided
   Status: New

** Changed in: nova/victoria
   Status: New => Confirmed

** Changed in: nova/victoria
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/196

Title:
  Asking for different vGPU types is racey

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) victoria series:
  Confirmed

Bug description:
  When testing on Victoria virtual GPUs, I wanted to have different
  types :

  [devices]
  enabled_vgpu_types = nvidia-320,nvidia-321

  [vgpu_nvidia-320]
  device_addresses = :04:02.1,:04:02.2

  [vgpu_nvidia-321]
  device_addresses = :04:02.3

  
  Unfortunately, I saw that only the first type was used.
  When restarting the nova-compute service, we got the log :
  WARNING nova.virt.libvirt.driver [None 
req-a23d9cb4-6554-499c-9fcf-d7f9706535ef None None] The vGPU type 'nvidia-320' 
was listed in '[devices] enabled_vgpu_types' but no corresponding 
'[vgpu_nvidia-320]' group or '[vgpu_nvidia-320] device_addresses' option was 
defined. Only the first type 'nvidia-320' will be used.

  
  It's due to the fact that we call _get_supported_vgpu_types() first when 
creating the libvirt implementation [1] while we only register the new CONF 
options by init_host() [2] which is called after.

  
  [1] 
https://github.com/openstack/nova/blob/90777d790d7c268f50851ac3e5b4e02617f5ae1c/nova/virt/libvirt/driver.py#L418

  [2]
  https://github.com/openstack/nova/blob/90777d7/nova/compute/manager.py#L1405

  A simple fix would just be to make sure we have dynamic options within
  _get_supported_vgpu_types()

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/196/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1920977] Re: Error 504 when disabling a nova-compute service recently down

2021-04-06 Thread Sylvain Bauza
You shouldn't disable the host by calling the host API, but rather
either waiting for the periodic verification (indeed, around 60 secs) or
calling the force-down API.

https://docs.openstack.org/api-ref/compute/?expanded=update-forced-down-
detail#update-forced-down


** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1920977

Title:
  Error 504 when disabling a nova-compute service recently down

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Description
  ===

  When a host fails and the nova-compute service stops working, it takes
  some time for the nova control plane to detect it and mark the service
  as "down" (I believe up to 60 seconds by default?).

  During this time where nova-compute is dead but not marked as "down"
  in nova, if an operator tries to set the compute service as
  'disabled', the command hangs for quite some time before returning an
  error.

  Showing the status of compute services immediately after this error
  indicates that the service was actually updated and marked as
  disabled.

  If the host is already seen as "down" in nova-api when trying to
  update status, the command ends successfully

  Steps to reproduce
  ==

  - On a working and enabled nova-compute host, stop nova-compute service
  - Before host is reported as down in nova-api, run:

  $ openstack compute service set --disable  nova-compute

  Expected result
  ===

  - nova-compute service is marked as disabled in nova-api
  - command returns with a success
  - a nova-api log says something like "The trait will be synchronized 
automatically by the compute service when the update_available_resource 
periodic task runs or when the service is restarted."

  Actual result
  =

  - nova-compute service is marked as disabled in nova-api
  - command hangs for some time before returning an error:
  ```
  Failed to set service status to disabled
  Compute service nova-compute of host  failed to set.
  ```

  Logs & Configs
  ==

  When nova-api still thinks nova-compute is up and command fails, nova-api 
shows a stack trace with the following error:
  ```
  An error occurred while updating the COMPUTE_STATUS_DISABLED trait on compute 
node resource providers managed by host . The trait will be synchronized 
automatically by the compute service when the update_available_resource 
periodic task runs.: oslo_messaging.exceptions.MessagingTimeout: Timed out 
waiting for a reply to message ID 
  ```

  When nova-api already knows service is down, there is only an info log:
  ```
  Compute service on host  is down. The COMPUTE_STATUS_DISABLED trait 
will be synchronized when the service is restarted.
  ```

  Environment
  ===

  Encountered on ussuri

  Impact
  ==

  I would say disabling nova-compute may be one of the 1st actions an operator 
will try when a host is failing.
  This behavior also has a bad impact when using Masakari, as the 1st action 
taken by default is to disable the nova-compute service (see 
https://docs.openstack.org/masakari/latest/configuration/recovery_workflow_custom_task.html).
  As a result, recovery process in masakari ends up in error (even if a retry 
mecanism saves the day).

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1920977/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1922264] Re: On a compute node with 3 GPUs and 2 vgpu groups, nova fails to load second group

2021-04-06 Thread Sylvain Bauza
*** This bug is a duplicate of bug 196 ***
https://bugs.launchpad.net/bugs/196

Marking this bug report as duplicate, so we can directly backport the
change down to stable/victoria.

** This bug has been marked a duplicate of bug 196
   Asking for different vGPU types is racey

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1922264

Title:
  On a compute node with 3 GPUs and 2 vgpu groups, nova fails to load
  second group

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  Description
  ===
  We have a multiple compute nodes with multiple NVIDIA GPU cards 
(RTX8000/RTX6000).
  Nodes with a mix of RTX8000 and RTX6000 cards have 2 gpu groups configured in 
nova.conf but nova-compute only creates resource providers for the first gpu 
group.

  Steps to reproduce
  ==

  For example, on a node with 2 RTX8000 and 1 RTX6000.

  $ lspci | grep -i nvidia
  21:00.0 3D controller: NVIDIA Corporation TU102GL [Quadro RTX 6000/8000] (rev 
a1)
  81:00.0 3D controller: NVIDIA Corporation TU102GL [Quadro RTX 6000/8000] (rev 
a1)
  e2:00.0 3D controller: NVIDIA Corporation TU102GL [Quadro RTX 6000/8000] (rev 
a1)

  $ nvidia-smi
  Thu Apr  1 17:22:53 2021
  
+-+
  | NVIDIA-SMI 460.32.04Driver Version: 460.32.04CUDA Version: N/A  
|
  
|---+--+--+
  | GPU  NamePersistence-M| Bus-IdDisp.A | Volatile Uncorr. ECC 
|
  | Fan  Temp  Perf  Pwr:Usage/Cap| Memory-Usage | GPU-Util  Compute M. 
|
  |   |  |   MIG M. 
|
  
|===+==+==|
  |   0  Quadro RTX 8000 On   | :21:00.0 Off |0 
|
  | N/A   30CP827W / 250W |285MiB / 46079MiB |  0%  Default 
|
  |   |  |  N/A 
|
  
+---+--+--+
  |   1  Quadro RTX 8000 On   | :81:00.0 Off |0 
|
  | N/A   30CP827W / 250W |285MiB / 46079MiB |  0%  Default 
|
  |   |  |  N/A 
|
  
+---+--+--+
  |   2  Quadro RTX 6000 On   | :E2:00.0 Off |0 
|
  | N/A   30CP824W / 250W |150MiB / 23039MiB |  0%  Default 
|
  |   |  |  N/A 
|
  
+---+--+--+

  Extract from nova.conf :
  ...
  [devices]
  enabled_vgpu_types = nvidia-428, nvidia-387

  [vgpu_nvidia-428]
  device_addresses = :21:00.0,:81:00.0

  [vgpu_nvidia-387]
  device_addresses = :e2:00.0

  
  When nova-compute starts, log shows :
  2021-04-01 17:15:25.454 7 WARNING nova.virt.libvirt.driver 
[req-bebc8637-d231-435c-a6cc-4613e14e2f76 - - - - -] The vGPU type 'nvidia-428' 
was listed in '[devices] enabled_vgpu_types' but no corresponding 
'[vgpu_nvidia-428]' group or '[vgpu_nvidia-428] device_addresses' option was 
defined. Only the first type 'nvidia-428' will be used.

  And a listing of resource providers on this node shows that only nvidia-428 
GPUs were used :
  $ openstack resource provider list --os-placement-api-version 1.14 --in-tree 
f5d35bdc-b4b7-4764-a9d0-41f67fd95385
  
+--+++--+--+
  | uuid | name   | 
generation | root_provider_uuid   | parent_provider_uuid
 |
  
+--+++--+--+
  | f5d35bdc-b4b7-4764-a9d0-41f67fd95385 | cloud-lyse-cmp-02  | 
32 | f5d35bdc-b4b7-4764-a9d0-41f67fd95385 | None
 |
  | 21a4a16e-8d33-4a23-a924-b00f8c31f0d0 | cloud-lyse-cmp-02_pci__81_00_0 | 
 4 | f5d35bdc-b4b7-4764-a9d0-41f67fd95385 | 
f5d35bdc-b4b7-4764-a9d0-41f67fd95385 |
  | 76e1ee94-fbf2-410e-9711-fba71c709388 | cloud-lyse-cmp-02_pci__21_00_0 | 
 2 | f5d35bdc-b4b7-4764-a9d0-41f67fd95385 | 
f5d35bdc-b4b7-4764-a9d0-41f67fd95385 |
  
+--+++--+--+

  In nova.conf, if I swap nvidia-428 & nvidia-387 in 

[Yahoo-eng-team] [Bug 1921804] Re: leftover bdm when rabbitmq unstable

2021-03-30 Thread Sylvain Bauza
While I understand your concern, the nova community has a consensus
about telling that nova services shouldn't verify the status of Rabbit
MQs and those should expect that MQs are working.

Given there are workarounds for removing the attachment in case you had
a failure, I'll move this bug to Wontfix.

Unfortunately, this consensus isn't captured in
https://docs.openstack.org/nova/latest/contributor/project-scope.html
but I'll propose a patch for writing it clearly.

** Changed in: nova
   Status: New => Won't Fix

** Tags added: volumes

** Tags added: oslo

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1921804

Title:
  leftover bdm when rabbitmq unstable

Status in OpenStack Compute (nova):
  Won't Fix

Bug description:
  Description
  ===

  When rabbitMQ unstable, there might be a chance that method
  
https://github.com/openstack/nova/blob/7a1222a8654684262a8e589d91e67f2b9a9da336/nova/compute/api.py#L4741
  will timeout but bdm is successfully created.

  Under such cases, volume will be shown in server show, but cannot be
  detached. and volume status is available.

  Steps to reproduce
  ==
  there might be no way to safely reproduce this failure, because when rabbitmq 
is
  unstable, many other services will also show unusual behavior.

  Expected result
  ===
  We should be able to remove such attachment from api without manually fixing 
db...

  ```console
  root@mgt02:~# openstack server show 4e5c3c7d-6b4c-4841-9e6e-9a3374036a3e
  
+-+---+
  | Field   | Value 
|
  
+-+---+
  | OS-DCF:diskConfig   | MANUAL
|
  | OS-EXT-AZ:availability_zone | cn-north-3a   
|
  | OS-EXT-SRV-ATTR:host| compute01 
|
  | OS-EXT-SRV-ATTR:hypervisor_hostname | compute01 
|
  | OS-EXT-SRV-ATTR:instance_name   | instance-ce4c 
|
  | OS-EXT-STS:power_state  | Running   
|
  | OS-EXT-STS:task_state   | None  
|
  | OS-EXT-STS:vm_state | active
|
  | OS-SRV-USG:launched_at  | 2021-03-29T09:06:38.00
|
  | OS-SRV-USG:terminated_at| None  
|
  | accessIPv4  |   
|
  | accessIPv6  |   
|
  | addresses   | newsql-net=192.168.1.217; 
service_mgt=100.114.3.41|
  | config_drive| True  
|
  | created | 2021-03-29T09:05:19Z  
|
  | flavor  | newsql_2C8G40G_general 
(51db3192-cece-4b9a-9969-7916b4543beb) |
  | hostId  | 
cf1f3937a3286677b3020d817541ac33d7c8f1ca74be49b26f128093
  |
  | id  | 4e5c3c7d-6b4c-4841-9e6e-9a3374036a3e  
|
  | image   | 
newsql-bini2.0.0alpha-ubuntu18.04-x64-20210112-pub 
(4531e3bf-0433-40c6-816b-6763f9d02c7a) |
  | key_name| None  
|
  | name| 
NewSQL-1abc5b28-b9e6-45cd-893d-5bb3a7732a43-3   
  |
  | progress| 0 
  

[Yahoo-eng-team] [Bug 1921381] Re: iSCSI: Flushing issues when multipath config has changed

2021-03-30 Thread Sylvain Bauza
** Also affects: nova
   Importance: Undecided
   Status: New

** Changed in: nova
   Status: New => Confirmed

** Changed in: nova
   Importance: Undecided => Critical

** Changed in: nova
 Assignee: (unassigned) => Lee Yarwood (lyarwood)

** Tags added: wallaby-rc-potential

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1921381

Title:
  iSCSI: Flushing issues when multipath config has changed

Status in OpenStack Compute (nova):
  Confirmed
Status in os-brick:
  In Progress
Status in os-brick wallaby series:
  New
Status in os-brick xena series:
  In Progress

Bug description:
  OS-Brick disconnect_volume code assumes that the use_multipath
  parameter that is used to instantiate the connector has the same value
  than the connector that was used on the original connect_volume call.

  Unfortunately this is not necessarily true, because Nova can attach a
  volume, then its multipath configuration can be enabled or disabled,
  and then a detach can be issued.

  This leads to a series of serious issues such as:

  - Not flushing the single path on disconnect_volume (possible data loss) and 
leaving it as a leftover device on the host when Nova calls 
terminate-connection on Cinder.
  - Not flushing the multipath device (possible data loss) and leaving it as a 
lefover device similarly to the other case.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1921381/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1463631] Re: 60_nova/resources.sh:106:ping_check_public fails intermittently

2021-03-23 Thread Sylvain Bauza
putting it as invalid as we can't really help here, but in case I'm
wrong, please punt it again as New.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1463631

Title:
  60_nova/resources.sh:106:ping_check_public fails intermittently

Status in grenade:
  Confirmed
Status in neutron:
  New
Status in OpenStack Compute (nova):
  Invalid

Bug description:
  http://logs.openstack.org/12/186112/17/gate/gate-grenade-
  dsvm/4da364e/logs/grenade.sh.txt.gz#_2015-06-09_22_42_15_929

  2015-06-09 22:42:13.960 | --- 172.24.5.1 ping statistics ---
  2015-06-09 22:42:13.960 | 1 packets transmitted, 0 received, 100% packet 
loss, time 0ms
  2015-06-09 22:42:13.960 | 
  2015-06-09 22:42:15.929 | + [[ True = \T\r\u\e ]]
  2015-06-09 22:42:15.929 | + die 67 '[Fail] Couldn'\''t ping server'
  2015-06-09 22:42:15.929 | + local exitcode=0
  2015-06-09 22:42:15.929 | [Call Trace]
  2015-06-09 22:42:15.929 | 
/opt/stack/new/grenade/projects/60_nova/resources.sh:134:verify
  2015-06-09 22:42:15.929 | 
/opt/stack/new/grenade/projects/60_nova/resources.sh:101:verify_noapi
  2015-06-09 22:42:15.929 | 
/opt/stack/new/grenade/projects/60_nova/resources.sh:106:ping_check_public
  2015-06-09 22:42:15.929 | /opt/stack/new/grenade/functions:67:die
  2015-06-09 22:42:15.931 | [ERROR] /opt/stack/new/grenade/functions:67 [Fail] 
Couldn't ping server
  2015-06-09 22:42:16.933 | 1 die /opt/stack/old/devstack/functions-common
  2015-06-09 22:42:16.933 | 67 ping_check_public 
/opt/stack/new/grenade/functions
  2015-06-09 22:42:16.933 | 106 verify_noapi 
/opt/stack/new/grenade/projects/60_nova/resources.sh
  2015-06-09 22:42:16.933 | 101 verify 
/opt/stack/new/grenade/projects/60_nova/resources.sh
  2015-06-09 22:42:16.933 | 134 main 
/opt/stack/new/grenade/projects/60_nova/resources.sh
  2015-06-09 22:42:16.933 | Exit code: 1
  2015-06-09 22:42:16.961 | World dumping... see 
/opt/stack/old/worlddump-2015-06-09-224216.txt for details
  2015-06-09 22:42:26.139 | [Call Trace]
  2015-06-09 22:42:26.139 | ./grenade.sh:250:resources
  2015-06-09 22:42:26.139 | /opt/stack/new/grenade/inc/plugin:82:die
  2015-06-09 22:42:26.141 | [ERROR] /opt/stack/new/grenade/inc/plugin:82 Failed 
to run /opt/stack/new/grenade/projects/60_nova/resources.sh verify

  I wonder if there is a race in setting up security groups.

  
http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiW0ZhaWxdIENvdWxkbid0IHBpbmcgc2VydmVyXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6ImN1c3RvbSIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJmcm9tIjoiMjAxNS0wNS0yN1QwMDozMDoxNiswMDowMCIsInRvIjoiMjAxNS0wNi0xMFQwMDozMDoxNiswMDowMCIsInVzZXJfaW50ZXJ2YWwiOiIwIn0sInN0YW1wIjoxNDMzODk2MjUwNTAyfQ==

  This hits in nova-network and neutron grenade jobs.

To manage notifications about this bug go to:
https://bugs.launchpad.net/grenade/+bug/1463631/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1902925] [NEW] Upgrades to compute RPC API 5.12 are broken

2020-11-04 Thread Sylvain Bauza
Public bug reported:

In change https://review.opendev.org/#/c/715326/ we allowed a new
argument to the rebuild_instance() RPC method named 'accel_uuids'.

In the same change, in order to manage different version of computes, we 
allowed to not pass this argument if the destination RPC service is not able to 
speak 5.12.
That being said, as we forgot to make the accel_uuids argument be nullable, we 
then accordingly cast a call to the compute manager without this attribute 
while it expects it, which would lead to a TypeError on the server side.

FWIW, this can happen with any RPC pin, even with the compute='auto'
default value as this value will elect to automatically pin a version
that both the source and destination can support.

** Affects: nova
 Importance: Critical
 Assignee: Sylvain Bauza (sylvain-bauza)
 Status: Confirmed

** Affects: nova/victoria
 Importance: Critical
 Assignee: Sylvain Bauza (sylvain-bauza)
 Status: Confirmed


** Tags: compute upgrade

** Changed in: nova
   Status: New => Confirmed

** Changed in: nova
   Importance: Undecided => High

** Changed in: nova
   Importance: High => Critical

** Also affects: nova/victoria
   Importance: Undecided
   Status: New

** Changed in: nova/victoria
   Importance: Undecided => Critical

** Changed in: nova
 Assignee: (unassigned) => Sylvain Bauza (sylvain-bauza)

** Changed in: nova/victoria
 Assignee: (unassigned) => Sylvain Bauza (sylvain-bauza)

** Changed in: nova/victoria
   Status: New => Confirmed

** Tags added: compute upgrade

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1902925

Title:
  Upgrades to compute RPC API 5.12 are broken

Status in OpenStack Compute (nova):
  Confirmed
Status in OpenStack Compute (nova) victoria series:
  Confirmed

Bug description:
  In change https://review.opendev.org/#/c/715326/ we allowed a new
  argument to the rebuild_instance() RPC method named 'accel_uuids'.

  In the same change, in order to manage different version of computes, we 
allowed to not pass this argument if the destination RPC service is not able to 
speak 5.12.
  That being said, as we forgot to make the accel_uuids argument be nullable, 
we then accordingly cast a call to the compute manager without this attribute 
while it expects it, which would lead to a TypeError on the server side.

  FWIW, this can happen with any RPC pin, even with the compute='auto'
  default value as this value will elect to automatically pin a version
  that both the source and destination can support.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1902925/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1900006] [NEW] Asking for different vGPU types is racey

2020-10-15 Thread Sylvain Bauza
Public bug reported:

When testing on Victoria virtual GPUs, I wanted to have different types
:

[devices]
enabled_vgpu_types = nvidia-320,nvidia-321

[vgpu_nvidia-320]
device_addresses = :04:02.1,:04:02.2

[vgpu_nvidia-321]
device_addresses = :04:02.3


Unfortunately, I saw that only the first type was used.
When restarting the nova-compute service, we got the log :
WARNING nova.virt.libvirt.driver [None req-a23d9cb4-6554-499c-9fcf-d7f9706535ef 
None None] The vGPU type 'nvidia-320' was listed in '[devices] 
enabled_vgpu_types' but no corresponding '[vgpu_nvidia-320]' group or 
'[vgpu_nvidia-320] device_addresses' option was defined. Only the first type 
'nvidia-320' will be used.


It's due to the fact that we call _get_supported_vgpu_types() first when 
creating the libvirt implementation [1] while we only register the new CONF 
options by init_host() [2] which is called after.


[1] 
https://github.com/openstack/nova/blob/90777d790d7c268f50851ac3e5b4e02617f5ae1c/nova/virt/libvirt/driver.py#L418

[2]
https://github.com/openstack/nova/blob/90777d7/nova/compute/manager.py#L1405

A simple fix would just be to make sure we have dynamic options within
_get_supported_vgpu_types()

** Affects: nova
 Importance: Medium
 Assignee: Sylvain Bauza (sylvain-bauza)
 Status: Confirmed


** Tags: libvirt vgpu

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/196

Title:
  Asking for different vGPU types is racey

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  When testing on Victoria virtual GPUs, I wanted to have different
  types :

  [devices]
  enabled_vgpu_types = nvidia-320,nvidia-321

  [vgpu_nvidia-320]
  device_addresses = :04:02.1,:04:02.2

  [vgpu_nvidia-321]
  device_addresses = :04:02.3

  
  Unfortunately, I saw that only the first type was used.
  When restarting the nova-compute service, we got the log :
  WARNING nova.virt.libvirt.driver [None 
req-a23d9cb4-6554-499c-9fcf-d7f9706535ef None None] The vGPU type 'nvidia-320' 
was listed in '[devices] enabled_vgpu_types' but no corresponding 
'[vgpu_nvidia-320]' group or '[vgpu_nvidia-320] device_addresses' option was 
defined. Only the first type 'nvidia-320' will be used.

  
  It's due to the fact that we call _get_supported_vgpu_types() first when 
creating the libvirt implementation [1] while we only register the new CONF 
options by init_host() [2] which is called after.

  
  [1] 
https://github.com/openstack/nova/blob/90777d790d7c268f50851ac3e5b4e02617f5ae1c/nova/virt/libvirt/driver.py#L418

  [2]
  https://github.com/openstack/nova/blob/90777d7/nova/compute/manager.py#L1405

  A simple fix would just be to make sure we have dynamic options within
  _get_supported_vgpu_types()

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/196/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1896741] Re: Intel mediated device info doesn't provide a name attribute

2020-09-24 Thread Sylvain Bauza
** Changed in: nova/ussuri
   Status: New => Confirmed

** Also affects: nova/victoria
   Importance: Low
 Assignee: Sylvain Bauza (sylvain-bauza)
   Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1896741

Title:
  Intel mediated device info doesn't provide a name attribute

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) train series:
  Confirmed
Status in OpenStack Compute (nova) ussuri series:
  Confirmed
Status in OpenStack Compute (nova) victoria series:
  In Progress

Bug description:
  When testing some Xeon server for virtual GPU support, I saw that Nova
  provides an exception as the i915 driver doesn't provide a name for
  mdev types :

  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager Traceback (most recent call last):
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/compute/manager.py", line 
9824, in _update_available_resource_for_node
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager startup=startup)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", 
line 896, in update_available_resource
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager self._update_available_resource(context, resources, 
startup=startup)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File 
"/usr/local/lib/python3.7/site-packages/oslo_concurrency/lockutils.py", line 
360, in inner
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager return f(*args, **kwargs)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", 
line 981, in _update_available_resource
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager self._update(context, cn, startup=startup)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", 
line 1233, in _update
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager self._update_to_placement(context, compute_node, 
startup)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File 
"/usr/local/lib/python3.7/site-packages/retrying.py", line 49, in wrapped_f
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager return Retrying(*dargs, **dkw).call(f, *args, **kw)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File 
"/usr/local/lib/python3.7/site-packages/retrying.py", line 206, in call
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager return attempt.get(self._wrap_exception)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File 
"/usr/local/lib/python3.7/site-packages/retrying.py", line 247, in get
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager six.reraise(self.value[0], self.value[1], 
self.value[2])
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/usr/local/lib/python3.7/site-packages/six.py", 
line 703, in reraise
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager raise value
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File 
"/usr/local/lib/python3.7/site-packages/retrying.py", line 200, in call
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager attempt = Attempt(fn(*args, **kwargs), attempt_number, 
False)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", 
line 1169, in _update_to_placement
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager self.driver.update_provider_tree(prov_tree, nodename)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 
7857, in update_provider_tree
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager provider_tree, nodename, allocations=allocations)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.ma

[Yahoo-eng-team] [Bug 1896741] Re: Intel mediated device info doesn't provide a name attribute

2020-09-23 Thread Sylvain Bauza
** Also affects: nova/train
   Importance: Undecided
   Status: New

** Also affects: nova/ussuri
   Importance: Undecided
   Status: New

** Changed in: nova/train
   Status: New => Confirmed

** Changed in: nova/ussuri
   Importance: Undecided => Low

** Changed in: nova/train
   Importance: Undecided => Low

** Changed in: nova/train
 Assignee: (unassigned) => Sylvain Bauza (sylvain-bauza)

** Changed in: nova/ussuri
 Assignee: (unassigned) => Sylvain Bauza (sylvain-bauza)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1896741

Title:
  Intel mediated device info doesn't provide a name attribute

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) train series:
  Confirmed
Status in OpenStack Compute (nova) ussuri series:
  New

Bug description:
  When testing some Xeon server for virtual GPU support, I saw that Nova
  provides an exception as the i915 driver doesn't provide a name for
  mdev types :

  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager Traceback (most recent call last):
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/compute/manager.py", line 
9824, in _update_available_resource_for_node
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager startup=startup)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", 
line 896, in update_available_resource
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager self._update_available_resource(context, resources, 
startup=startup)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File 
"/usr/local/lib/python3.7/site-packages/oslo_concurrency/lockutils.py", line 
360, in inner
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager return f(*args, **kwargs)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", 
line 981, in _update_available_resource
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager self._update(context, cn, startup=startup)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", 
line 1233, in _update
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager self._update_to_placement(context, compute_node, 
startup)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File 
"/usr/local/lib/python3.7/site-packages/retrying.py", line 49, in wrapped_f
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager return Retrying(*dargs, **dkw).call(f, *args, **kw)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File 
"/usr/local/lib/python3.7/site-packages/retrying.py", line 206, in call
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager return attempt.get(self._wrap_exception)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File 
"/usr/local/lib/python3.7/site-packages/retrying.py", line 247, in get
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager six.reraise(self.value[0], self.value[1], 
self.value[2])
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/usr/local/lib/python3.7/site-packages/six.py", 
line 703, in reraise
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager raise value
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File 
"/usr/local/lib/python3.7/site-packages/retrying.py", line 200, in call
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager attempt = Attempt(fn(*args, **kwargs), attempt_number, 
False)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", 
line 1169, in _update_to_placement
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager self.driver.update_provider_tree(prov_tree, nodename)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 
7857, in up

[Yahoo-eng-team] [Bug 1896741] [NEW] Intel mediated device info doesn't provide a name attribute

2020-09-23 Thread Sylvain Bauza
.py", line 
6984, in _count_mdev_capable_devices
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager types=enabled_vgpu_types)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 
7268, in _get_mdev_capable_devices
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager device = self._get_mdev_capabilities_for_dev(name, 
types)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 
7253, in _get_mdev_capabilities_for_dev
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager 'name': cap['name'],
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager KeyError: 'name'


For example :

[root@mymachine ~]# ll 
/sys/class/mdev_bus/\:00\:02.0/mdev_supported_types/i915-GVTg_V5_8/
total 0
-r--r--r--. 1 root root 4096 Sep 22 14:18 available_instances
--w---. 1 root root 4096 Sep 23 06:01 create
-r--r--r--. 1 root root 4096 Sep 23 05:43 description
-r--r--r--. 1 root root 4096 Sep 22 14:18 device_api
drwxr-xr-x. 2 root root0 Sep 23 06:01 devices

When looking at the kernel driver API documentation
https://www.kernel.org/doc/html/latest/driver-api/vfio-mediated-
device.html it says that the "name" attribute is optional:

"name

This attribute should show human readable name. This is optional
attribute."


The fix should be easy, we don't use this attribute in Nova.

** Affects: nova
 Importance: Low
 Assignee: Sylvain Bauza (sylvain-bauza)
 Status: Triaged


** Tags: libvirt vgpu

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1896741

Title:
  Intel mediated device info doesn't provide a name attribute

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  When testing some Xeon server for virtual GPU support, I saw that Nova
  provides an exception as the i915 driver doesn't provide a name for
  mdev types :

  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager Traceback (most recent call last):
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/compute/manager.py", line 
9824, in _update_available_resource_for_node
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager startup=startup)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", 
line 896, in update_available_resource
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager self._update_available_resource(context, resources, 
startup=startup)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File 
"/usr/local/lib/python3.7/site-packages/oslo_concurrency/lockutils.py", line 
360, in inner
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager return f(*args, **kwargs)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", 
line 981, in _update_available_resource
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager self._update(context, cn, startup=startup)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", 
line 1233, in _update
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager self._update_to_placement(context, compute_node, 
startup)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File 
"/usr/local/lib/python3.7/site-packages/retrying.py", line 49, in wrapped_f
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager return Retrying(*dargs, **dkw).call(f, *args, **kw)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File 
"/usr/local/lib/python3.7/site-packages/retrying.py", line 206, in call
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager return attempt.get(self._wrap_exception)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File 
"/usr/local/lib/python3.7/site-packages/retrying.py", line 247, in get
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager six.reraise(self.value[0], self.value[1], 

[Yahoo-eng-team] [Bug 1887377] Re: nova does not loadbalance asignmnet of resources on a host based on avaiablity of pci device, hugepages or pcpus.

2020-09-16 Thread Sylvain Bauza
While I totally understand the use case, I think this is a new feature
for performance reasons and not a bug. CLosing it as Wishlist but of
course you can work on it if you wish ;)

** Changed in: nova
   Importance: Undecided => Wishlist

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1887377

Title:
  nova does not loadbalance asignmnet of resources on a host based on
  avaiablity of pci device, hugepages or pcpus.

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Nova has supported hugpages, cpu pinning and pci numa affintiy for a very long
  time. since its introduction the advice has always been to create a flavor 
that mimic your
  typeical hardware toplogy. i.e. if all your compute host have 2 numa nodes 
the you should create
  flavor that request 2 numa nodes. for along time operators have ignored this 
advice
  and continued to create singel numa node flavor sighting that after 5+ year 
of hardware venders
  working with VNF vendor to make there product numa aware, vnf often still do 
not optimize
  properly for a multi numa environment.

  as a result many operator still deploy single numa vms although that
  is becoming less common over time.  when you deploy a vm with a single
  numa node today we more or less iterate over the host numa node in
  order and assign the vm to the first numa nodes where it fits. on a
  host without any pci devices whitelisted for openstack management this
  behvaior result in numa nodes being filled linerally form numa 0 to
  numa n. that mean if a host had 100G of hugepage on both numa node 0
  and 1 and you schduled 101 1G singel numa vms to the host, 100 vm
  would spawn on numa0 and 1 vm would spwan on numa node 1.

  that means that the first 100 vms would all contened for cpu resouces
  on the first numa node while the last vm had all of the secound numa
  ndoe to its own use.

  the correct behavior woudl be for nova to round robin asign the vms
  attepmetin to keep the resouce avapiableity  blanced. this will
  maxiumise performance for indivigual vms while pessimisng the
  schduling of large vms on a host.

  to this end a new numa blancing config option (unset, pack or spread)
  should be added and we should sort numa nodes in decending(spread) or
  acending(pack) order based on pMEM, pCPUs, mempages and pci devices in
  that sequence.

  in future release when numa is in placment this sorting will need to
  be done in a weigher that sorts the allocation caindiates based on the
  same pack/spread cirtira.

  i am filing this as a bug not a feature as this will have a
  significant impact for existing deployment that either expected
  https://specs.openstack.org/openstack/nova-
  specs/specs/pike/implemented/reserve-numa-with-pci.html to implement
  this logic already or who do not follow our existing guidance on
  creating flavor that align to the host topology.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1887377/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1893904] Re: Placement is not updated if a VGPU is re-created on a new GPU upon host reboot

2020-09-16 Thread Sylvain Bauza
This was a known issue that should have been fixed by
https://review.opendev.org/#/c/715489/ which was merged during the
Ussuri timeframe.

For being clear, since mdevs disappear when you reboot, Nova now tries
to find the already provided GPU by looking at the guest XML.

Closing this bug as the master branch no longer has the bug, but please
reopen it in case you can reproduce the problem with master.

** Changed in: nova
   Status: New => Won't Fix

** Changed in: nova
   Importance: Undecided => Low

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1893904

Title:
  Placement is not updated if a VGPU is re-created on a new GPU upon
  host reboot

Status in OpenStack Compute (nova):
  Won't Fix

Bug description:
  First of all, I'm not really sure which project to "blame" for this
  bug, but here's the problem:

  When you reboot a compute-node with Nvidia GRID and guests running
  with a VGPU attached, the guests will often have their VGPU re-created
  on a different GPU than before the reboot. This is not updated in
  placement, causing the placement API to provide false information
  about which resource provider that is actually a valid allocation
  candidate for a new VGPU.

  Steps to reproduce:
  1. Create a new instance with a VGPU attached, take note of wich GPU the VGPU 
is created on (with nvidia-smi vgpu)
  2. Reboot the compute-node
  3. Start the instance, and observe that its VGPU now lives on a different GPU
  4. Check "openstack allocation candidate list --resource VGPU=1" and 
correlate the resource provider id with "openstack resource provider list" to 
see that placement now will list the allocated GPU as free, and the inital GPU 
(from before the reboot) is still marked as used.

  This will obviously only be an issue on compute-nodes with multiple
  physical GPUs.

  Examples:
  https://paste.ubuntu.com/p/PZ6qgKtnRb/

  This will eventually cause scheduling of new VGPU instances to fail,
  because they will try to use a device that in reality is already used
  (but marked as available in placement)

  Expected results:
  Either that the GRID-driver and libvirt should ensure that an instance keeps 
the same GPU for its VGPU through reboots (effectively making this.. not a nova 
bug)

  OR

  nova-compute should notify placement of the change and update the
  allocations

  Versions:
  This was first observed in stein, but the issue is also present in train.
  # rpm -qa | grep nova
  python2-nova-20.3.0-1.el7.noarch
  python2-novaclient-15.1.1-1.el7.noarch
  openstack-nova-compute-20.3.0-1.el7.noarch
  openstack-nova-common-20.3.0-1.el7.noarch

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1893904/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1895092] Re: Error when trying block migration

2020-09-15 Thread Sylvain Bauza
This sounds a configuration issue :
2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi 
[req-2c75be17-d5c4-4acd-a302-326388068067 170fdf1f861847fa995f2f0646ec4143 
85dd9df42f4e47b3b0fc5848ab947b62 - default default] Unexpected exception in API 
method: MigrationError_Remote: Migration error: Unable to establish connection 
to http://controller01:5000/v3/auth/tokens: ('Connection aborted.', 
BadStatusLine('No status line received - the server has closed the 
connection',))

When trying to get a token for the migration, Keystone closed the
connection abruptely. Please introspect the keystone logs, this is
either way unrelated to Nova itself.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1895092

Title:
  Error when trying block migration

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  When I try live-migration to different host by using block migration
  with following command, API error is occurred.

  $ openstack server migrate --block-migration
  3bf28a9d-0545-4d30-8892-6e2af655db4a --live compute40

  --- error message ---
  Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ 
and attach the Nova API log if possible.
   (HTTP 500) (Request-ID: 
req-2c75be17-d5c4-4acd-a302-326388068067)
  -

  I attached the log of nova-api and information of openstack
  environment as follows.

  ### /var/log/nova/nova-api.log
  2020-09-10 14:27:47.614 3545 INFO nova.osapi_compute.wsgi.server 
[req-1028bb0a-a95a-41b1-9ab9-2ebf7b19039f 170fdf1f861847fa995f2f0646ec4143 
85dd9df42f4e47b3b0fc5848ab947b62 - default default] 10.81.0.2 "GET 
/v2.1/servers/3bf28a9d-0545-4d30-8892-6e2af655db4a HTTP/1.1" status: 200 len: 
2312 time: 0.7340441
  2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi 
[req-2c75be17-d5c4-4acd-a302-326388068067 170fdf1f861847fa995f2f0646ec4143 
85dd9df42f4e47b3b0fc5848ab947b62 - default default] Unexpected exception in API 
method: MigrationError_Remote: Migration error: Unable to establish connection 
to http://controller01:5000/v3/auth/tokens: ('Connection aborted.', 
BadStatusLine('No status line received - the server has closed the 
connection',))
  2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi Traceback (most 
recent call last):
  2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/dist-packages/nova/api/openstack/wsgi.py", line 801, in 
wrapped
  2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi return 
f(*args, **kwargs)
  2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/dist-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/dist-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/dist-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/dist-packages/nova/api/openstack/compute/migrate_server.py",
 line 111, in _migrate_live
  2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi async_)
  2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 206, in inner
  2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi return 
function(self, context, instance, *args, **kwargs)
  2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 214, in _wrapped
  2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi return 
fn(self, context, instance, *args, **kwargs)
  2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 154, in inner
  2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi return f(self, 
context, instance, *args, **kw)
  2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 4550, in 
live_migrate
  2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi 
request_spec=request_spec, async_=async_)
  2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/dist-packages/nova/conductor/api.py", line 112, in 
live_migrate_instance
  2020-09-10 14:27:50.134 3545 ERROR nova.api.openstack.wsgi 

[Yahoo-eng-team] [Bug 1838309] Re: Live migration might fail when run after revert of previous live migration

2020-05-12 Thread Sylvain Bauza
Now that the minimum versions for Ussuri are libvirt 4.0.0 are QEMU 2.1,
I think we can close this one unless libvirt 4.0.0 with QEMU 2.5 have
the same issues.

Please open this one again if you see this.

** Changed in: nova
   Status: New => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1838309

Title:
  Live migration might fail when run after revert of previous live
  migration

Status in OpenStack Compute (nova):
  Won't Fix

Bug description:
  When migrating an instance between two computes on queens, running two
  different qemu versions, first live migration failed and was rolled
  back (traceback follows just in case, unrelated to this issue):

  2019-07-26 14:39:44.469 1576 ERROR nova.virt.libvirt.driver 
[req-26f3a831-8e4f-43a2-83ce-e60645264147 0aa8a4a6ed7d4733871ef79fa0302d43 
31ee6aa6bff7498fba21b9807697ec32 - default default] [instance: 
b0681d51-2924-44be-a8b7-36db0d86b92f] Live Migration failure: internal error: 
qemu unexpectedly closed the monitor: 2019-07-26 14:39:43.479+: Domain 
id=16 is tainted: shell-scripts
  2019-07-26T14:39:43.630545Z qemu-system-x86_64: -drive 
file=rbd:cinder/volume-df3d0060-451c-4b22-8d15-2c579fb47681:id=cinder:auth_supported=cephx\;none:mon_host=192.168.16.14\:6789\;192.168.16.15\:6789\;192.168.16.16\:6789,file.password-secret=virtio-disk2-secret0,format=raw,if=none,id=drive-virtio-disk2,serial=df3d0060-451c-4b22-8d15-2c579fb47681,cache=writeback,discard=unmap:
 'serial' is deprecated, please use the corresponding option of '-device' 
instead
  2019-07-26T14:39:44.075108Z qemu-system-x86_64: VQ 2 size 0x80 < 
last_avail_idx 0xedda - used_idx 0xeddd
  2019-07-26T14:39:44.075130Z qemu-system-x86_64: Failed to load 
virtio-balloon:virtio
  2019-07-26T14:39:44.075134Z qemu-system-x86_64: error while loading state for 
instance 0x0 of device ':00:07.0/virtio-balloon'
  2019-07-26T14:39:44.075582Z qemu-system-x86_64: load of migration failed: 
Operation not permitted: libvirtError: internal error: qemu unexpectedly closed 
the monitor: 2019-07-26 14:39:43.479+: Domain id=16 is tainted: 
shell-scripts

  then, after revert, live migration was retried, and now it failed
  because of the following problem:

  {u'message': u'Requested operation is not valid: cannot undefine transient 
domain', u'code': 500, u'details': u'  File 
"/usr/lib/python2.7/dist-packages/nova/compute/manag
  er.py", line 202, in decorated_function\nreturn function(self, context, 
*args, **kwargs)\n  File 
"/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 6438, in 
_post_live_migration\ndestroy_vifs=destroy_vifs)\n  File 
"/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 1100, in 
cleanup\nself._undefine_domain(instance)\n  File 
"/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 1012, in 
_undefine_domain\ninstance=instance)\n  File 
"/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in 
__exit__\nself.force_reraise()\n  File 
"/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise\nsix.reraise(self.type_, self.value, self.tb)\n  File 
"/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 999, in 
_undefine_domain\nguest.delete_configuration(support_uefi)\n  File 
"/usr/lib/python2.7/dist-packages/nova/virt/libvirt/guest.py", line 271, in 
delete_configuration\nself._domain.undefine()\n  File 
"/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit\n
result = proxy_call(self._autowrap, f, *args, **kwargs)\n  File 
"/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in proxy_call\n 
   rv = execute(f, *args, **kwargs)\n  File 
"/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute\n
six.reraise(c, e, tb)\n  File 
"/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker\n
rv = meth(*args, **kwargs)\n  File 
"/usr/lib/python2.7/dist-packages/libvirt.py", line 2701, in undefine\nif 
ret == -1: raise libvirtError (\'virDomainUndefine() failed\', dom=self)\n', 
u'created': u'2019-07-29T14:39:41Z'}

  It seems to happen because a domain was already undefined once on the
  first try to live migrate and after that it can not be undefined
  second time. We might need to check if the domain is persistent before
  undefining it in case of live migrations.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1838309/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1877281] [NEW] vGPU multiple instance creation test is racey

2020-05-07 Thread Sylvain Bauza
Public bug reported:

Zuul can fail sometimes on :
2020-05-05 09:07:46.656481 | ubuntu-bionic | ==
2020-05-05 09:07:46.656502 | ubuntu-bionic | Failed 1 tests - output below:
2020-05-05 09:07:46.656518 | ubuntu-bionic | ==
2020-05-05 09:07:46.656533 | ubuntu-bionic |
2020-05-05 09:07:46.656548 | ubuntu-bionic | 
nova.tests.functional.libvirt.test_vgpu.VGPUTests.test_multiple_instance_create
2020-05-05 09:07:46.656563 | ubuntu-bionic | 
---
2020-05-05 09:07:46.656577 | ubuntu-bionic |
2020-05-05 09:07:46.656594 | ubuntu-bionic | Captured traceback:
2020-05-05 09:07:46.656609 | ubuntu-bionic | ~~~
2020-05-05 09:07:46.656625 | ubuntu-bionic | Traceback (most recent call 
last):
2020-05-05 09:07:46.656651 | ubuntu-bionic |
2020-05-05 09:07:46.656669 | ubuntu-bionic |   File 
"/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/libvirt/test_vgpu.py",
 line 248, in test_multiple_instance_create
2020-05-05 09:07:46.656686 | ubuntu-bionic | 
self.assert_vgpu_usage_for_compute(self.compute1, expected=2)
2020-05-05 09:07:46.656701 | ubuntu-bionic |
2020-05-05 09:07:46.656716 | ubuntu-bionic |   File 
"/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/libvirt/test_vgpu.py",
 line 178, in assert_vgpu_usage_for_compute
2020-05-05 09:07:46.656732 | ubuntu-bionic | self.assertEqual(expected, 
len(mdevs))
2020-05-05 09:07:46.656784 | ubuntu-bionic |
2020-05-05 09:07:46.656803 | ubuntu-bionic |   File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/testtools/testcase.py",
 line 415, in assertEqual
2020-05-05 09:07:46.656818 | ubuntu-bionic | self.assertThat(observed, 
matcher, message)
2020-05-05 09:07:46.656834 | ubuntu-bionic |
2020-05-05 09:07:46.656848 | ubuntu-bionic |   File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/testtools/testcase.py",
 line 502, in assertThat
2020-05-05 09:07:46.656863 | ubuntu-bionic | raise mismatch_error
2020-05-05 09:07:46.656878 | ubuntu-bionic |
2020-05-05 09:07:46.656892 | ubuntu-bionic | 
testtools.matchers._impl.MismatchError: 2 != 1
2020-05-05 09:07:46.656907 | ubuntu-bionic |

Logstash query :
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22testtools.matchers._impl.MismatchError%3A%202%20!%3D%201%5C%22%20AND%20build_name%3A%5C%22nova-tox-functional-py36%5C%22

8 occurrences over 7 days.

** Affects: nova
 Importance: High
     Assignee: Sylvain Bauza (sylvain-bauza)
 Status: In Progress


** Tags: gate-failure vgpu

** Tags added: vgpu

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1877281

Title:
  vGPU multiple instance creation test is racey

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  Zuul can fail sometimes on :
  2020-05-05 09:07:46.656481 | ubuntu-bionic | ==
  2020-05-05 09:07:46.656502 | ubuntu-bionic | Failed 1 tests - output below:
  2020-05-05 09:07:46.656518 | ubuntu-bionic | ==
  2020-05-05 09:07:46.656533 | ubuntu-bionic |
  2020-05-05 09:07:46.656548 | ubuntu-bionic | 
nova.tests.functional.libvirt.test_vgpu.VGPUTests.test_multiple_instance_create
  2020-05-05 09:07:46.656563 | ubuntu-bionic | 
---
  2020-05-05 09:07:46.656577 | ubuntu-bionic |
  2020-05-05 09:07:46.656594 | ubuntu-bionic | Captured traceback:
  2020-05-05 09:07:46.656609 | ubuntu-bionic | ~~~
  2020-05-05 09:07:46.656625 | ubuntu-bionic | Traceback (most recent call 
last):
  2020-05-05 09:07:46.656651 | ubuntu-bionic |
  2020-05-05 09:07:46.656669 | ubuntu-bionic |   File 
"/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/libvirt/test_vgpu.py",
 line 248, in test_multiple_instance_create
  2020-05-05 09:07:46.656686 | ubuntu-bionic | 
self.assert_vgpu_usage_for_compute(self.compute1, expected=2)
  2020-05-05 09:07:46.656701 | ubuntu-bionic |
  2020-05-05 09:07:46.656716 | ubuntu-bionic |   File 
"/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/libvirt/test_vgpu.py",
 line 178, in assert_vgpu_usage_for_compute
  2020-05-05 09:07:46.656732 | ubuntu-bionic | self.assertEqual(expected, 
len(mdevs))
  2020-05-05 09:07:46.656784 | ubuntu-bionic |
  2020-05-05 09:07:46.656803 | ubuntu-bionic |   File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/testtools/testcase.py",
 line 415, in assertEqual
  2020-05-05 09:07:46.656818 | ubuntu-bionic | self.assertThat(observed, 
matcher, mess

[Yahoo-eng-team] [Bug 1780225] Re: Libvirt error when using --max > 1 with vGPU

2020-04-28 Thread Sylvain Bauza
In Stein, we merged the ability to have multiple Resource Providers, each of 
them being a pGPU.
In Ussuri, we accepted to have a specific vGPU type per pGPU.

Now, I tested the above behaviour with https://review.opendev.org/723858
and it works now, unless you ask for a specific total capacity.

I'll close this bug that was only for libvirt vGPUs and please look at
https://bugs.launchpad.net/nova/+bug/1874664 for the related issue.

** Changed in: nova
   Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1780225

Title:
  Libvirt error when using --max > 1 with vGPU

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Description
  ===

  Using devstack Rocky with a NVIDIA Tesla M10 + GRID driver on RHEL 7.5.
  Profile used in nova: nvidia-35 (num_heads=2, frl_config=45, 
framebuffer=512M, max_resolution=2560x1600, max_instance=16)

  I can launch instances one by one without any issue.
  I cannot use --max paramater greater than 1.

  Expected result
  ===

  Be able to use --max parameter with vGPU

  Steps to reproduce
  ==

  [root@host2 ~]# openstack server list
  
+--+---++-+++
  | ID   | Name  | Status | Networks
| Image  | Flavor |
  
+--+---++-+++
  | 56aeda96-f193-49fc-914d-8b507674eb16 | instance0 | ACTIVE | 
private=fda2:f16f:605e:0:f816:3eff:fef2:8e20, 10.0.0.12, 172.24.4.2 | rhel75 | 
vgpu   |
  
+--+---++-+++

  [root@host2 ~]# openstack server create --flavor vgpu --image rhel75 
--key-name myself --max 2 instance
  
+-+---+
  | Field   | Value 
|
  
+-+---+
  | OS-DCF:diskConfig   | MANUAL
|
  | OS-EXT-AZ:availability_zone |   
|
  | OS-EXT-SRV-ATTR:host| None  
|
  | OS-EXT-SRV-ATTR:hypervisor_hostname | None  
|
  | OS-EXT-SRV-ATTR:instance_name   |   
|
  | OS-EXT-STS:power_state  | NOSTATE   
|
  | OS-EXT-STS:task_state   | scheduling
|
  | OS-EXT-STS:vm_state | building  
|
  | OS-SRV-USG:launched_at  | None  
|
  | OS-SRV-USG:terminated_at| None  
|
  | accessIPv4  |   
|
  | accessIPv6  |   
|
  | addresses   |   
|
  | adminPass   | iNiFmD6kNszw  
|
  | config_drive|   
|
  | created | 2018-07-05T09:19:25Z  
|
  | flavor  | vgpu (vgpu1)  
|
  | hostId  |   
|
  | id  | 5a8691a8-a18c-4c71-8541-be00f224fd82  
|
  | image   | rhel75 
(e63a49a8-4568-4b57-9d12-1eb1ede28438) |
  | key_name| myself
|
  | name| instance-1
|
  | progress| 0 
|
  | project_id  | fdea2c781db74ae593c5e9501e9290cc  
|
  | properties  |   
|
  | security_groups | name='default'
|
  | status  | BUILD 
|
  | updated | 2018-07-05T09:19:25Z  
|
  

[Yahoo-eng-team] [Bug 1874664] Re: Boot more than one instances failed with accelerators in its flavor

2020-04-28 Thread Sylvain Bauza
Given we are after RC1 (which means that we only accept regression
bugfixes for RC2 and later versions), I think we should just document
the current caveat in https://docs.openstack.org/api-guide/compute
/accelerator-support.html and trying to backport the bugfix for a later
Ussuri release (say 21.0.1).


** Also affects: nova/ussuri
   Importance: Undecided
   Status: New

** Changed in: nova/ussuri
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1874664

Title:
  Boot more than one instances failed with accelerators in its flavor

Status in OpenStack Compute (nova):
  Confirmed
Status in OpenStack Compute (nova) ussuri series:
  New

Bug description:
  When boot more than one instance with accelerator, and the
  accelerators are in one compute node, there will be two problems as
  below:

  One problem is as we always get the first item(alloc_reqs[0]) in
  alloc_reqs, when we iterator the second instance, it will throw
  conflict exception when putting the allocations.

  Another is as we always get the first item in
  alloc_reqs_by_rp_uuid.get(selected_host.uuid), the selected_alloc_req
  is always stable, that will cause the values in selections_to_return
  are same . In fact, it's not right for subsequent operations.

  More details you can see:
  https://etherpad.opendev.org/p/filter_scheduler_issue_with_accelerators

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1874664/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1636825] Re: Instances for which rebuild failed get deleted from source host

2020-04-24 Thread Sylvain Bauza
This is indeed fixed upstream, as you can see in the source code there
https://github.com/openstack/nova/blob/2cddf595a8cdedbdb844e800d853ea143817b545/nova/compute/manager.py#L721-L738

We only delete instances if the evacuation was either done, or just precreated.
If the migration wasn't good, then we don't delete the instance.


** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1636825

Title:
  Instances for which rebuild failed get deleted from source host

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Description
  ===
  In the current implementation we have the method 
'_destroy_evacuated_instances' in compute.manager which deletes any instances 
from source after they have been evacuated. This method is called as part of 
host initialization (init_host) and checks the migration records for VMs which 
were evacuated.

  There is a possibility that if a VM fails as part of rebuild operation
  on destination host after creating a migration record, then when the
  source host is brought back up it may end up deleting the VM from
  source as well.

  To fix this we should check the 'host' attribute in instances table
  before deleting the VM and delete VM from source only if the host has
  been updated in db after rebuild.

  Steps to reproduce
  ==
  * deploy a VM
  * Bring down the host where VM was deployed.
  * Evacuate the instance to another host where the rebuild operation may fail 
(insufficient resources or storage issue)
  * This will result in VM not being present on destination host.
  * Check that a migration record of type 'evacuation' is present in db.
  * Bring the source host up.

  Expected result
  ===
  The VM should be present on the source host.

  Actual result
  =
  VM gets deleted as part of evacuated instance cleanup on stat-up of compute 
service on source host.

  Environment
  ===
  1. Exact version of OpenStack you are running. See the following
  Openstack Newton

  2. Which hypervisor did you use?
   PowerVM

  3. Which networking type did you use?
     Neutron with OpenVSwitch

  Logs & Configs
  ==
  In the logs following message is seen on startup -
  2016-10-24 09:32:11.131 3169 INFO nova.compute.manager 
[req-6611fe85-0515-4cb4-b1c0-3f34f196a0c7 - - - - -] [instance: 
ed9ca4b9-8938-4d7b-9eec-1dd6ca7bc8c8] Deleting instance as it has been 
evacuated from this host

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1636825/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1738297] Re: Nova Destroys Local Disks for Instance with Broken iSCSI Connection to Cinder Volume Upon Resume from Suspend

2020-04-24 Thread Sylvain Bauza
I'm happy the main root cause is fixed (deleting the source disks).

To be clear, you can configure to resume guest states on compute service
restarts with the flag
https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.resume_guests_state_on_host_boot

Closing the bug.


** Changed in: nova
   Status: New => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1738297

Title:
  Nova Destroys Local Disks for Instance with Broken iSCSI Connection to
  Cinder Volume Upon Resume from Suspend

Status in OpenStack Compute (nova):
  Won't Fix

Bug description:
  Background: Libvirt + KVM cloud running Newton (but relevant code
  appears the same on master). Earlier this week we had some issues with
  a Cinder storage server (it uses LVM+iSCSI). tgt service was consuming
  100% CPU (after running for several months) and Compute nodes lost
  iSCSI connection. I had to restart tgt, cinder-volume service, and a
  number of compute hosts + instances.

  Today, a user tried resuming their instance which was suspended before
  aforementioned trouble. (Note: this instance has root and ephemeral
  disks stored locally, third disk on shared Cinder storage). It appears
  (per below-linked logs) that the iSCSI connection from the compute
  host to the Cinder storage server was broken/missing, and because of
  this, Cinder apparently "cleaned up" the instance including
  *destroying its disk files*. Instance is now in error state.

  nova-compute.log: http://paste.openstack.org/show/628991/
  /var/log/syslog: http://paste.openstack.org/show/628992/

  We're still running Newton but the code appears the same on master.
  Based on the log messages ("Deleting instance files" and "Deletion of 
/var/lib/nova/instances/68058b22-e17f-42f7-80ff-aeb06cbc82cb_del complete"), it 
appears that we ended up in this function, `delete_instance_files`: 
https://github.com/openstack/nova/blob/stable/newton/nova/virt/libvirt/driver.py#L7745-L7801
  A trace wasn't logged for this, but I'm guessing we got here from the 
`cleanup` function: 
https://github.com/openstack/nova/blob/a0e4f627f0be48db65c23f4f180d4bc6dd68cc83/nova/virt/libvirt/driver.py#L933-L1032
  One of `cleanup`'s arguments is `destroy_disks=True`, so I'm guessing this 
was run with defaults or not overridden.
  (Someone, please correct me if the available data suggest otherwise!)

  Nobody requested a Delete action, so this appears to be Nova deciding
  to destroy an instance's local disks after encountering an otherwise-
  unhandled exception related to the iSCSI device being unavailable. I
  will try to reproduce and update the bug if successful.

  For us, losing an instance's data is a Problem -- our users
  (scientists) often store unique data on instances that are configured
  by hand. If an instance cannot be resumed, I would much rather Nova
  leave the instance's disks intact for investigation / data recovery,
  instead of throwing everything out. For deployments whose instances
  may contain important data, could this behavior be made configurable?
  Perhaps "destroy_disks_on_failed_resume = False" in nova.conf?

  Thank you!

  Chris Martin

  (P.S. actually a Cinder question, but someone here may know: is there
  something that can/should be done to re-initialize iSCSI connections
  between compute nodes and a Cinder storage server after a recovered
  failure of the iSCSI target service on the storage server?)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1738297/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


  1   2   3   >