[Yahoo-eng-team] [Bug 1921306] [NEW] When Instance is paused state, live migration failure.

2021-03-24 Thread Xinxin Shen
Public bug reported:

When the virtual machine is in the paused state and continuously live
migrated twice, the Instance state changes to ERROR.

2021-03-22 19:34:38.388 6 DEBUG nova.virt.libvirt.guest [- req-None - - - - -] 
Failed to get job stats: Unable to read from monitor: Connection reset by peer 
get_job_info 
/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/guest.py:767
2021-03-22 19:34:38.389 6 WARNING nova.virt.libvirt.driver [- req-None - - - - 
-] [instance: d1d2af1f-e973-438c-a6b6-b628091d3596] Error monitoring migration: 
Unable to read from monitor: Connection reset by peer: libvirtError: Unable to 
read from monitor: Connection reset by peer
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: 
d1d2af1f-e973-438c-a6b6-b628091d3596] Traceback (most recent call last):
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: 
d1d2af1f-e973-438c-a6b6-b628091d3596]   File 
"/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", 
line 7793, in _live_migration
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: 
d1d2af1f-e973-438c-a6b6-b628091d3596] finish_event, disk_paths)
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: 
d1d2af1f-e973-438c-a6b6-b628091d3596]   File 
"/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", 
line 7593, in _live_migration_monitor
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: 
d1d2af1f-e973-438c-a6b6-b628091d3596] info = guest.get_job_info()
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: 
d1d2af1f-e973-438c-a6b6-b628091d3596]   File 
"/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", 
line 751, in get_job_info
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: 
d1d2af1f-e973-438c-a6b6-b628091d3596] stats = self._domain.jobStats()
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: 
d1d2af1f-e973-438c-a6b6-b628091d3596]   File 
"/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/tpool.py", line 186, 
in doit
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: 
d1d2af1f-e973-438c-a6b6-b628091d3596] result = proxy_call(self._autowrap, 
f, *args, **kwargs)
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: 
d1d2af1f-e973-438c-a6b6-b628091d3596]   File 
"/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/tpool.py", line 144, 
in proxy_call
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: 
d1d2af1f-e973-438c-a6b6-b628091d3596] rv = execute(f, *args, **kwargs)
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: 
d1d2af1f-e973-438c-a6b6-b628091d3596]   File 
"/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/tpool.py", line 125, 
in execute
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: 
d1d2af1f-e973-438c-a6b6-b628091d3596] six.reraise(c, e, tb)
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: 
d1d2af1f-e973-438c-a6b6-b628091d3596]   File 
"/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/tpool.py", line 83, 
in tworker
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: 
d1d2af1f-e973-438c-a6b6-b628091d3596] rv = meth(*args, **kwargs)
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: 
d1d2af1f-e973-438c-a6b6-b628091d3596]   File 
"/usr/lib64/python2.7/site-packages/libvirt.py", line 1433, in jobStats
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: 
d1d2af1f-e973-438c-a6b6-b628091d3596] if ret is None: raise libvirtError 
('virDomainGetJobStats() failed', dom=self)
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: 
d1d2af1f-e973-438c-a6b6-b628091d3596] libvirtError: Unable to read from 
monitor: Connection reset by peer
2021-03-22 19:34:38.389 6 ERROR nova.virt.libvirt.driver [instance: 
d1d2af1f-e973-438c-a6b6-b628091d3596]
2021-03-22 19:34:38.394 6 DEBUG nova.virt.libvirt.driver [- req-None - - - - -] 
[instance: d1d2af1f-e973-438c-a6b6-b628091d3596] Live migration monitoring is 
all done _live_migration 
/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:7800
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [- req-None - - - - -] 
[instance: d1d2af1f-e973-438c-a6b6-b628091d3596] Live migration failed.: 
libvirtError: Unable to read from monitor: Connection reset by peer
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: 
d1d2af1f-e973-438c-a6b6-b628091d3596] Traceback (most recent call last):
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: 
d1d2af1f-e973-438c-a6b6-b628091d3596]   File 
"/var/lib/kolla/venv/lib/python2.7/site-packages/nova/compute/manager.py", line 
6510, in _do_live_migration
2021-03-22 19:34:38.394 6 ERROR nova.compute.manager [instance: 
d1d2af1f-e973-438c-a6b6-b628091d3596] block_migration, migrate_data)
2021-03-22 19:34:38.394 6 

[Yahoo-eng-team] [Bug 1921154] [NEW] os.kill(SIGTERM) does not finish and timeouts

2021-03-24 Thread Rodolfo Alonso
Public bug reported:

Since [1], the process signals are sent using os.kill() method.

It is happening that, in some cases, the method never returns and raises
a timeout.

Because os.kill() is a blocking method, we should find a way to monitor
this process and end it properly.

Error logs:
- 
https://bec7cc51aed17868a822-df0c8eba46b67aa6178498e0cf2208c0.ssl.cf1.rackcdn.com/779310/6/check/neutron-functional-with-uwsgi/e6d8aae/testr_results.html
- 
https://b19b5ab35f9ced465dd5-2eb50734132c0e56282483bcdf57bf8a.ssl.cf5.rackcdn.com/782587/5/check/neutron-functional-with-uwsgi/8a6a6d3/testr_results.html
- 
https://52682a1fca01b92ead5c-4c2761265b996c18f02000b5e0f64005.ssl.cf5.rackcdn.com/782275/1/check/neutron-functional-with-uwsgi/b890243/testr_results.html

Snippet: http://paste.openstack.org/show/803873/

[1]https://review.opendev.org/c/openstack/neutron/+/681671

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1921154

Title:
  os.kill(SIGTERM) does not finish and timeouts

Status in neutron:
  New

Bug description:
  Since [1], the process signals are sent using os.kill() method.

  It is happening that, in some cases, the method never returns and
  raises a timeout.

  Because os.kill() is a blocking method, we should find a way to
  monitor this process and end it properly.

  Error logs:
  - 
https://bec7cc51aed17868a822-df0c8eba46b67aa6178498e0cf2208c0.ssl.cf1.rackcdn.com/779310/6/check/neutron-functional-with-uwsgi/e6d8aae/testr_results.html
  - 
https://b19b5ab35f9ced465dd5-2eb50734132c0e56282483bcdf57bf8a.ssl.cf5.rackcdn.com/782587/5/check/neutron-functional-with-uwsgi/8a6a6d3/testr_results.html
  - 
https://52682a1fca01b92ead5c-4c2761265b996c18f02000b5e0f64005.ssl.cf5.rackcdn.com/782275/1/check/neutron-functional-with-uwsgi/b890243/testr_results.html

  Snippet: http://paste.openstack.org/show/803873/

  [1]https://review.opendev.org/c/openstack/neutron/+/681671

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1921154/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1921150] [NEW] Repeated ERROR log: Unable to save resource provider ... because: re-parenting a provider is not currently allowed

2021-03-24 Thread Balazs Gibizer
Public bug reported:

Description
===
If neutron is configured with QoS guaranteed minimum bandwidth and the 
deployment is upgraded from a Stein 14.0.4 or older, or Train 15.0.1 or older 
to any newer OpenStack versions the following stack trace appears repeatedly in 
the neutron-server log:

Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin Traceback (most recent call last):
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin   File 
"/opt/stack/neutron-lib/neutron_lib/placement/client.py", line 53, in wrapper
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin return f(self, *a, **k)
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin   File 
"/opt/stack/neutron-lib/neutron_lib/placement/client.py", line 232, in 
update_resource_provider
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin return self._put(url, 
update_body).json()
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin   File 
"/opt/stack/neutron-lib/neutron_lib/placement/client.py", line 188, in _put
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin endpoint_filter=self._ks_filter, 
**kwargs)
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin   File 
"/usr/local/lib/python3.6/dist-packages/keystoneauth1/session.py", line 1114, 
in put
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin return self.request(url, 'PUT', 
**kwargs)
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin   File 
"/usr/local/lib/python3.6/dist-packages/keystoneauth1/session.py", line 943, in 
request
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin raise 
exceptions.from_response(resp, method, url)
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin 
keystoneauth1.exceptions.http.BadRequest: Bad Request (HTTP 400) (Request-ID: 
req-31ef5696-dc60-4478-939b-a12d3d3bdf65)
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin 
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin During handling of the above 
exception, another exception occurred:
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin 
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin Traceback (most recent call last):
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin   File 
"/opt/stack/neutron/neutron/services/placement_report/plugin.py", line 163, in 
batch
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin deferred.execute()
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin   File 
"/opt/stack/neutron/neutron/agent/common/placement_report.py", line 43, in 
execute
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin return self.func(*self.args, 
**self.kwargs)
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin   File 
"/opt/stack/neutron-lib/neutron_lib/placement/client.py", line 53, in wrapper
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin return f(self, *a, **k)
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin   File 
"/opt/stack/neutron-lib/neutron_lib/placement/client.py", line 254, in 
ensure_resource_provider
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin 
resource_provider=resource_provider)
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin   File 
"/opt/stack/neutron-lib/neutron_lib/placement/client.py", line 62, in wrapper
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin 
msg=exc.response.text.replace('\n', ' '))
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin 
neutron_lib.exceptions.placement.PlacementClientError: Placement Client Error 
(4xx): {"errors": [{"status": 400, "title": "Bad Request", "detail": "The 
server could not comply with the request since it is either malformed or 
otherwise incorrect.\n\n Unable to save resource provider 
af0bc0aa-525e-563f-bb4d-2f26f70371d6: Object action update failed because: 
re-parenting a provider is not currently allowed.  ", "request_id": 
"req-31ef5696-dc60-4478-939b-a12d3d3bdf65"}]}
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR 
neutron.services.placement_report.plugin 
Mar 

[Yahoo-eng-team] [Bug 1916761] Re: [dvr] bound port permanent arp entries never deleted

2021-03-24 Thread Corey Bryant
This bug was fixed in the package neutron - 
2:17.1.0+git2021012815.0fb63f7297-0ubuntu4~cloud0
---

 neutron (2:17.1.0+git2021012815.0fb63f7297-0ubuntu4~cloud0) focal-wallaby; 
urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 neutron (2:17.1.0+git2021012815.0fb63f7297-0ubuntu4) hirsute; urgency=medium
 .
   * d/p/revert-dvr-remove-control-plane-arp-updates.patch: Cherry-picked
 from https://review.opendev.org/c/openstack/neutron/+/777903 to prevent
 permanent arp entries that never get deleted (LP: #1916761).


** Changed in: cloud-archive
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1916761

Title:
  [dvr] bound port permanent arp entries never deleted

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive train series:
  Fix Committed
Status in Ubuntu Cloud Archive ussuri series:
  Fix Committed
Status in Ubuntu Cloud Archive victoria series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Focal:
  Fix Released
Status in neutron source package in Groovy:
  Fix Released
Status in neutron source package in Hirsute:
  Fix Released

Bug description:
  [Impact]

  See original bug desription but in short commit b3a42cddc5 removed all
  the arp management code in favour of using the arp_reponder but missed
  the fact that DVR floating ips don't use the arp_responder. As a
  result it was possible to end up with permanent arp entries in qrouter
  namespaces such that if you created a new port with the same IP as
  that of a previous port for which there is an arp entry, associating a
  fip with that port would never be accessible until that arp entry was
  manually deleted. This patch adds the reverted code back in.

  [Test Plan]

    * deploy Openstack Train/Ussuri/Victoria
    * create port P1 with address A1 and create vm on node C1 with this port
    * associate floating ip with P1 and ping it
    * observe REACHABLE or PERMANENT arp entry for A1 in qrouter arp cache
    * delete vm and port
    * ensure arp entry for A1 in qrouter arp cache is deleted
    * create port P2 with address A1 and create vm on node C1 with this port
    * associate floating ip with P2 and ping it

  [Where problems could occur]

  No problems anticipated from re-introducing this code. Of course this
  code uses RPC notifications and as a result will incur some extra amqp
  load but is not anticipated to be a problem and it was not considered
  a problem when the code existed prior to removal.

  --

  With Openstack Ussuri using dvr-snat I do the following:

    * create port P1 with address A1 and create vm on node C1 with this port
    * associate floating ip with P1 and ping it
    * observe REACHABLE arp entry for A1 in qrouter arp cache
    * so far so good
    * restart the neutron-l3-agent
    * observe REACHABLE arp entry for A1 is now PERMANENT
    * delete vm and port
    * create port P2 with address A1 and create vm on node C1 with this port
    * vm is unreachable since arp cache contains PERMANENT entry for old port 
P1 mac/ip combo

  If I don't restart the l3-agent, once I have deleted the port it's arp
  entry does REACHABLE -> STALE and will either be replaced or timeout
  as expected but once it is set to PERMANENT it will never disappear
  which means any future use of that ip address (by a port with a
  different mac) will not work until that entry is manually deleted.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1916761/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1921126] [NEW] [RFE] Allow explicit management of default routes

2021-03-24 Thread Bence Romsics
Public bug reported:

This RFE proposes to allow explicit management of the default route(s)
of a Neutron router.  This is mostly useful for a user to install
multiple default routes for Equal Cost Multipath (ECMP) and treat all
these routes uniformly.

Since I already written a spec proposal for this, please see the details
there:

https://review.opendev.org/c/openstack/neutron-specs/+/781475

** Affects: neutron
 Importance: Wishlist
 Assignee: Bence Romsics (bence-romsics)
 Status: New


** Tags: rfe

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1921126

Title:
  [RFE] Allow explicit management of default routes

Status in neutron:
  New

Bug description:
  This RFE proposes to allow explicit management of the default route(s)
  of a Neutron router.  This is mostly useful for a user to install
  multiple default routes for Equal Cost Multipath (ECMP) and treat all
  these routes uniformly.

  Since I already written a spec proposal for this, please see the
  details there:

  https://review.opendev.org/c/openstack/neutron-specs/+/781475

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1921126/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1921124] [NEW] Randomly-set credentials not available within the instance

2021-03-24 Thread Dan Watkins
Public bug reported:

As part of the fix for bug 1918303 in [0], we stopped emitting randomly-
generated passwords to /var/log/cloud-init-output.log.  This
functionality _is_ used by some cloud-init consumers, most notably
subiquity.

We should reintroduce saving these passwords somewhere in the instance,
securely, so that these use cases are not regressed.

[0] https://github.com/canonical/cloud-
init/commit/b794d426b9ab43ea9d6371477466070d86e10668

** Affects: cloud-init
 Importance: Undecided
 Assignee: Dan Watkins (oddbloke)
 Status: In Progress

** Changed in: cloud-init
 Assignee: (unassigned) => Dan Watkins (oddbloke)

** Changed in: cloud-init
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1921124

Title:
  Randomly-set credentials not available within the instance

Status in cloud-init:
  In Progress

Bug description:
  As part of the fix for bug 1918303 in [0], we stopped emitting
  randomly-generated passwords to /var/log/cloud-init-output.log.  This
  functionality _is_ used by some cloud-init consumers, most notably
  subiquity.

  We should reintroduce saving these passwords somewhere in the
  instance, securely, so that these use cases are not regressed.

  [0] https://github.com/canonical/cloud-
  init/commit/b794d426b9ab43ea9d6371477466070d86e10668

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1921124/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1921098] [NEW] test_init_application_called_twice unit test fails intermittently

2021-03-24 Thread Balazs Gibizer
Public bug reported:

https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_350/782634/3/check
/openstack-tox-py38/35000f9/testr_results.html

Traceback (most recent call last):
  File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/py38/lib/python3.8/site-packages/mock/mock.py",
 line 1346, in patched
return func(*newargs, **newkeywargs)
  File 
"/home/zuul/src/opendev.org/openstack/nova/nova/tests/unit/api/test_wsgi.py", 
line 65, in test_init_application_passes_sys_argv_to_config
mock_parse_args.assert_called_once_with(
  File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/py38/lib/python3.8/site-packages/mock/mock.py",
 line 925, in assert_called_once_with
raise AssertionError(msg)
AssertionError: Expected 'parse_args' to be called once. Called 0 times.

It seems that after [1] merged the two init tests
test_init_application_passes_sys_argv_to_config and
test_init_application_called_twice cannot be run in the same executor
without the second failing.

It is probably due the global state introduced in [1] not handled
properly in the test.

[1] https://review.opendev.org/c/openstack/nova/+/733627

** Affects: nova
 Importance: High
 Assignee: Balazs Gibizer (balazs-gibizer)
 Status: Triaged


** Tags: gate-failure

** Changed in: nova
 Assignee: (unassigned) => Balazs Gibizer (balazs-gibizer)

** Changed in: nova
   Status: New => Triaged

** Changed in: nova
   Importance: Undecided => High

** Tags added: gate-failure

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1921098

Title:
  test_init_application_called_twice unit test fails intermittently

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  
https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_350/782634/3/check
  /openstack-tox-py38/35000f9/testr_results.html

  Traceback (most recent call last):
File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/py38/lib/python3.8/site-packages/mock/mock.py",
 line 1346, in patched
  return func(*newargs, **newkeywargs)
File 
"/home/zuul/src/opendev.org/openstack/nova/nova/tests/unit/api/test_wsgi.py", 
line 65, in test_init_application_passes_sys_argv_to_config
  mock_parse_args.assert_called_once_with(
File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/py38/lib/python3.8/site-packages/mock/mock.py",
 line 925, in assert_called_once_with
  raise AssertionError(msg)
  AssertionError: Expected 'parse_args' to be called once. Called 0 times.

  It seems that after [1] merged the two init tests
  test_init_application_passes_sys_argv_to_config and
  test_init_application_called_twice cannot be run in the same executor
  without the second failing.

  It is probably due the global state introduced in [1] not handled
  properly in the test.

  [1] https://review.opendev.org/c/openstack/nova/+/733627

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1921098/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1918863] Re: find secret not has usage_type='vtpm'

2021-03-24 Thread Balazs Gibizer
@Brin: thanks. I assume that it is a new feature not a bug so I mark
this Invalid. Let's continue the work in the bp.

** Changed in: nova
   Status: Confirmed => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1918863

Title:
  find secret not has usage_type='vtpm'

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Currently, nova supports creating a vtpm instance,but in the implement
  code logical, the create_secret and delete_secret interface can deal
  'vtpm' type, but when we want to use delete_secret() to delete the
  vtpm device, it cannot find the 'vypm', see below.

  Use case: we want to live migrate the vtpm instance to another
  instance, and there need to be clean the vtpm in the old host, so we
  should fix this bug, and we would like to register a blueprint to
  support live migratge vtpm instance.

  like this:

  def create_secret(self, usage_type, usage_id, password=None, uuid=None):
  """Create a secret.

  :param usage_type: one of 'iscsi', 'ceph', 'rbd', 'volume', 'vtpm'.
     'rbd' will be converted to 'ceph'. 'vtpm' secrets
     are private and ephemeral; others are not.
  :param usage_id: name of resource in secret
  :param password: optional secret value to set
  :param uuid: optional UUID of the secret; else one is generated by
  libvirt
  """
  secret_conf = vconfig.LibvirtConfigSecret()
  secret_conf.ephemeral = usage_type == 'vtpm'
  secret_conf.private = usage_type == 'vtpm'
  secret_conf.usage_id = usage_id
  secret_conf.uuid = uuid
  if usage_type in ('rbd', 'ceph'):
  secret_conf.usage_type = 'ceph'
  elif usage_type == 'iscsi':
  secret_conf.usage_type = 'iscsi'
  elif usage_type == 'volume':
  secret_conf.usage_type = 'volume'
  elif usage_type == 'vtpm':
  secret_conf.usage_type = 'vtpm'
  else:
  msg = _("Invalid usage_type: %s")
  raise exception.InternalError(msg % usage_type)

  xml = secret_conf.to_xml()
  try:
  LOG.debug('Secret XML: %s', xml)
  conn = self.get_connection()
  secret = conn.secretDefineXML(xml)
  if password is not None:
  secret.setValue(password)
  return secret
  except libvirt.libvirtError:
  with excutils.save_and_reraise_exception():
  LOG.error('Error defining a secret with XML: %s', xml)

  def delete_secret(self, usage_type, usage_id):
  """Delete a secret.

  :param usage_type: one of 'iscsi', 'ceph', 'rbd', 'volume' or 'vtpm'
  :param usage_id: name of resource in secret
  """
  secret = self.find_secret(usage_type, usage_id)
  if secret is not None:
  secret.undefine()

  def find_secret(self, usage_type, usage_id):
  """Find a secret.

  usage_type: one of 'iscsi', 'ceph', 'rbd' or 'volume'
  usage_id: name of resource in secret
  """
  if usage_type == 'iscsi':
  usage_type_const = libvirt.VIR_SECRET_USAGE_TYPE_ISCSI
  elif usage_type in ('rbd', 'ceph'):
  usage_type_const = libvirt.VIR_SECRET_USAGE_TYPE_CEPH
  elif usage_type == 'volume':
  usage_type_const = libvirt.VIR_SECRET_USAGE_TYPE_VOLUME
  else:
  msg = _("Invalid usage_type: %s")
  raise exception.InternalError(msg % usage_type)

  try:
  conn = self.get_connection()
  return conn.secretLookupByUsage(usage_type_const, usage_id)
  except libvirt.libvirtError as e:
  if e.get_error_code() == libvirt.VIR_ERR_NO_SECRET:
  return None

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1918863/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1921085] [NEW] neutron-server ovsdbapp timeout exceptions after intermittent connectivity issues

2021-03-24 Thread Hemanth Nakkina
Public bug reported:

Cloud environment: bionic-ussuri with 3 neutron-server and 3 ovn-central
components each running on separate rack. (ovn-central runs ovn-northd,
ovsdb-nb, ovsdb-sb services)

There is some network glitch between rack3 to other racks for a minute and so 
neutron-server/2 not able to communicate with ovn-central/0 and ovn-central/1. 
ovsdb-nb and ovsdb-sb leaders are on one of ovn-central/0 or ovn-central/1.
However neutron-server/2 able to connect with ovn-central/2 ovndb-sb but its 
not a leader. 

Logs from neutron-server on neutron-server/2 unit
2021-02-15 14:20:08.119 15554 INFO ovsdbapp.backend.ovs_idl.vlog 
[req-a3778f18-7b4d-4739-b20a-bff355fed9b0 - - - - -] ssl:10.216.241.118:6641: 
clustered database server is disconnected from cluster; trying another server
2021-02-15 14:20:08.121 15554 INFO ovsdbapp.backend.ovs_idl.vlog 
[req-a3778f18-7b4d-4739-b20a-bff355fed9b0 - - - - -] ssl:10.216.241.118:6641: 
connection closed by client
2021-02-15 14:20:08.121 15554 INFO ovsdbapp.backend.ovs_idl.vlog 
[req-a3778f18-7b4d-4739-b20a-bff355fed9b0 - - - - -] ssl:10.216.241.118:6641: 
continuing to reconnect in the background but suppressing further logging
2021-02-15 14:20:08.853 15553 INFO ovsdbapp.backend.ovs_idl.vlog [-] 
ssl:10.216.241.118:16642: connected
2021-02-15 14:20:08.864 15563 INFO ovsdbapp.backend.ovs_idl.vlog [-] 
ssl:10.216.241.251:16642: connecting...
2021-02-15 14:20:08.869 15542 INFO ovsdbapp.backend.ovs_idl.vlog [-] 
ssl:10.216.241.251:16642: connecting...
2021-02-15 14:20:08.872 15558 INFO ovsdbapp.backend.ovs_idl.vlog 
[req-c047c84e-8fdc-404c-8284-bba80c34fe90 - - - - -] ssl:10.216.241.251:16642: 
connecting...
2021-02-15 14:20:08.877 15553 INFO ovsdbapp.backend.ovs_idl.vlog [-] 
ssl:10.216.241.118:16642: clustered database server is disconnected from 
cluster; trying another server
2021-02-15 14:20:08.879 15553 INFO ovsdbapp.backend.ovs_idl.vlog [-] 
ssl:10.216.241.118:16642: connection closed by client
2021-02-15 14:20:08.879 15553 INFO ovsdbapp.backend.ovs_idl.vlog [-] 
ssl:10.216.241.118:16642: continuing to reconnect in the background but 
suppressing further logging
2021-02-15 14:20:09.093 15548 INFO ovsdbapp.backend.ovs_idl.vlog 
[req-b3fb3d36-3477-454e-97e0-11673e64eff5 - - - - -] ssl:10.216.241.251:6641: 
connecting...
2021-02-15 14:20:09.126 15558 INFO ovsdbapp.backend.ovs_idl.vlog 
[req-3de7f22d-c26c-493b-9463-3140898e35f0 - - - - -] ssl:10.216.241.251:6641: 
connecting...
2021-02-15 14:20:09.129 15557 INFO ovsdbapp.backend.ovs_idl.vlog 
[req-89da4e64-10f9-45c1-ba11-c0ff429961c9 - - - - -] ssl:10.216.241.251:6641: 
connecting...
2021-02-15 14:20:09.129 15571 INFO ovsdbapp.backend.ovs_idl.vlog 
[req-68cd67e7-592c-4869-bc11-2d18fc070c12 - - - - -] ssl:10.216.241.251:6641: 
connecting...
2021-02-15 14:20:09.132 15563 INFO ovsdbapp.backend.ovs_idl.vlog [-] 
ssl:10.216.241.251:6641: connecting...
2021-02-15 14:20:10.284 15546 ERROR ovsdbapp.backend.ovs_idl.connection [-] 
(113, 'EHOSTUNREACH'): OpenSSL.SSL.SysCallError: (113, 'EHOSTUNREACH')
... (and more EHOSTUNREACH messages probably from each thread) 

And I believe network connectivity is restored and then started seeing
the Timeout exceptions to ovsdb. Any *_postcommit operations on neutron-
server/2 got timed out.

2021-02-15 15:17:21.163 15554 ERROR neutron.api.v2.resource 
[req-6b3381c3-69ac-44fc-b71d-a3110714f32e 84fca387fca043b984358c34174e1070 
24471fcdff7e4cac9f7fe7b4ec0d04e3 - cb47060fffe34ed0a8913db979e06523 
cb47060fffe34ed0a8913db979e06523] index failed: No details.: 
ovsdbapp.exceptions.TimeoutException: Commands 
[] exceeded timeout 180 seconds
2021-02-15 16:03:18.018 15554 ERROR neutron.plugins.ml2.managers 
[req-3c4f2b06-2be3-4ccc-a00e-a91bf61b8473 - 6e3dac6cf8f14582be2c8a6fdc0a7458 - 
- -] Mechanism driver 'ovn' failed in create_port_postcommit: 
ovsdbapp.exceptions.TimeoutException: Commands 
[, 
, , 
, , 
, , 
, , 
, , ] exceeded timeout 180 seconds
...

One reference of complete Timeout exception (points to Queue Full):
2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource 
[req-9fa36c11-fcaf-4716-8371-3d4e357b5154 2ae54808a32e4ba6baec08cbc3df6cec 
64f175c521c847c5a7d31a7443a861f2 - 8b226be7ba0a4e62a16072c0c08c6d8f 
8b226be7ba0a4e62a16072c0c08c6d8f] index failed: No details.: 
ovsdbapp.exceptions.TimeoutException: Commands 
[] exceeded timeout 180 seconds
2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource Traceback (most 
recent call last):
2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File 
"/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/connection.py", line 
144, in queue_txn
2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource 
self.txns.put(txn, timeout=self.timeout)
2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File 
"/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/connection.py", line 
50, in put
2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource 
super(TransactionQueue, self).put(*args, 

[Yahoo-eng-team] [Bug 1896734] Re: A privsep daemon spawned by neutron-openvswitch-agent hangs when debug logging is enabled (large number of registered NICs) - an RPC response is too large for msgpac

2021-03-24 Thread Chris MacNaughton
This bug was fixed in the package neutron - 2:17.1.0-0ubuntu3~cloud0
---

 neutron (2:17.1.0-0ubuntu3~cloud0) focal-victoria; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 neutron (2:17.1.0-0ubuntu3) groovy; urgency=medium
 .
   * d/p/revert-dvr-remove-control-plane-arp-updates.patch: Cherry-picked
 from https://review.opendev.org/c/openstack/neutron/+/777903 to prevent
 permanent arp entries that never get deleted (LP: #1916761).
   * d/p/improve-get-devices-with-ip-performance.patch: Performance of
 get_devices_with_ip is improved to limit the amount of information
 to be sent and reduce the number of syscalls. (LP: #1896734).


** Changed in: cloud-archive/victoria
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1896734

Title:
  A privsep daemon spawned by neutron-openvswitch-agent hangs when debug
  logging is enabled (large number of registered NICs) - an RPC response
  is too large for msgpack

Status in OpenStack neutron-openvswitch charm:
  Invalid
Status in Ubuntu Cloud Archive:
  Fix Committed
Status in Ubuntu Cloud Archive ussuri series:
  Fix Committed
Status in Ubuntu Cloud Archive victoria series:
  Fix Released
Status in neutron:
  Fix Released
Status in oslo.privsep:
  New
Status in neutron package in Ubuntu:
  Fix Released
Status in python-oslo.privsep package in Ubuntu:
  New
Status in neutron source package in Focal:
  Fix Released
Status in python-oslo.privsep source package in Focal:
  New
Status in neutron source package in Groovy:
  Fix Released
Status in python-oslo.privsep source package in Groovy:
  New
Status in neutron source package in Hirsute:
  Fix Released
Status in python-oslo.privsep source package in Hirsute:
  New

Bug description:
  [Impact]

  When there is a large amount of netdevs registered in the kernel and
  debug logging is enabled, neutron-openvswitch-agent and the privsep
  daemon spawned by it hang since the RPC call result sent by the
  privsep daemon over a unix socket exceeds the message sizes that the
  msgpack library can handle.

  The impact of this is that enabling debug logging on the cloud
  completely stalls neutron-openvswitch-agents and makes them "dead"
  from the Neutron server perspective.

  The issue is summarized in detail in comment #5
  https://bugs.launchpad.net/oslo.privsep/+bug/1896734/comments/5

  [Test Plan]

    * deploy Openstack Train/Ussuri/Victoria
    * need at least one compute host
    * enable neutron debug logging
    * create a load of interfaces on your compute host to create a large 'ip 
addr show' output
    * for ((i=0;i<400;i++)); do ip tuntap add mode tap tap-`uuidgen| cut 
-c1-11`; done
    * create a single vm
    * add floating ip
    * ping fip
    * create 20 ports and attach them to the vm
    * for ((i=0;i<20;i++)); do id=`uuidgen`; openstack port create --network 
private --security-group __SG__ X-$id; openstack server add port __VM__ X-$id; 
done
    * attaching ports should not result in errors

  [Where problems could occur]

  No problems anticipated this patchset.

  

  When there is a large amount of netdevs registered in the kernel and
  debug logging is enabled, neutron-openvswitch-agent and the privsep
  daemon spawned by it hang since the RPC call result sent by the
  privsep daemon over a unix socket exceeds the message sizes that the
  msgpack library can handle.

  The impact of this is that enabling debug logging on the cloud
  completely stalls neutron-openvswitch-agents and makes them "dead"
  from the Neutron server perspective.

  The issue is summarized in detail in comment #5
  https://bugs.launchpad.net/oslo.privsep/+bug/1896734/comments/5

  
  Old Description

  While trying to debug a different issue, I encountered a situation
  where privsep hangs in the process of handling a request from neutron-
  openvswitch-agent when debug logging is enabled (juju debug-log
  neutron-openvswitch=true):

  https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1895652/comments/11
  https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1895652/comments/12

  The issue gets reproduced reliably in the environment where I
  encountered it on all units. As a result, neutron-openvswitch-agent
  services hang while waiting for a response from the privsep daemon and
  do not progress past basic initialization. They never post any state
  back to the Neutron server and thus are marked dead by it.

  The processes though are shown as "active (running)" by systemd which
  adds to the confusion since they do indeed start from the systemd's
  perspective.

  systemctl --no-pager status neutron-openvswitch-agent.service
  ● neutron-openvswitch-agent.service - 

[Yahoo-eng-team] [Bug 1916761] Re: [dvr] bound port permanent arp entries never deleted

2021-03-24 Thread Chris MacNaughton
This bug was fixed in the package neutron - 2:17.1.0-0ubuntu3~cloud0
---

 neutron (2:17.1.0-0ubuntu3~cloud0) focal-victoria; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 neutron (2:17.1.0-0ubuntu3) groovy; urgency=medium
 .
   * d/p/revert-dvr-remove-control-plane-arp-updates.patch: Cherry-picked
 from https://review.opendev.org/c/openstack/neutron/+/777903 to prevent
 permanent arp entries that never get deleted (LP: #1916761).
   * d/p/improve-get-devices-with-ip-performance.patch: Performance of
 get_devices_with_ip is improved to limit the amount of information
 to be sent and reduce the number of syscalls. (LP: #1896734).


** Changed in: cloud-archive/victoria
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1916761

Title:
  [dvr] bound port permanent arp entries never deleted

Status in Ubuntu Cloud Archive:
  Fix Committed
Status in Ubuntu Cloud Archive train series:
  Fix Committed
Status in Ubuntu Cloud Archive ussuri series:
  Fix Committed
Status in Ubuntu Cloud Archive victoria series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Focal:
  Fix Released
Status in neutron source package in Groovy:
  Fix Released
Status in neutron source package in Hirsute:
  Fix Released

Bug description:
  [Impact]

  See original bug desription but in short commit b3a42cddc5 removed all
  the arp management code in favour of using the arp_reponder but missed
  the fact that DVR floating ips don't use the arp_responder. As a
  result it was possible to end up with permanent arp entries in qrouter
  namespaces such that if you created a new port with the same IP as
  that of a previous port for which there is an arp entry, associating a
  fip with that port would never be accessible until that arp entry was
  manually deleted. This patch adds the reverted code back in.

  [Test Plan]

    * deploy Openstack Train/Ussuri/Victoria
    * create port P1 with address A1 and create vm on node C1 with this port
    * associate floating ip with P1 and ping it
    * observe REACHABLE or PERMANENT arp entry for A1 in qrouter arp cache
    * delete vm and port
    * ensure arp entry for A1 in qrouter arp cache is deleted
    * create port P2 with address A1 and create vm on node C1 with this port
    * associate floating ip with P2 and ping it

  [Where problems could occur]

  No problems anticipated from re-introducing this code. Of course this
  code uses RPC notifications and as a result will incur some extra amqp
  load but is not anticipated to be a problem and it was not considered
  a problem when the code existed prior to removal.

  --

  With Openstack Ussuri using dvr-snat I do the following:

    * create port P1 with address A1 and create vm on node C1 with this port
    * associate floating ip with P1 and ping it
    * observe REACHABLE arp entry for A1 in qrouter arp cache
    * so far so good
    * restart the neutron-l3-agent
    * observe REACHABLE arp entry for A1 is now PERMANENT
    * delete vm and port
    * create port P2 with address A1 and create vm on node C1 with this port
    * vm is unreachable since arp cache contains PERMANENT entry for old port 
P1 mac/ip combo

  If I don't restart the l3-agent, once I have deleted the port it's arp
  entry does REACHABLE -> STALE and will either be replaced or timeout
  as expected but once it is set to PERMANENT it will never disappear
  which means any future use of that ip address (by a port with a
  different mac) will not work until that entry is manually deleted.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1916761/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1921075] [NEW] [arm64][libvirt] fail to load json from firmware metadata files

2021-03-24 Thread Rico Lin
Public bug reported:

Found error in [3] for libvirt with Ubuntu focal on arm64. We fail to
load JSON from QEMU firmware metadata files with error [1][2]:

Instance failed to spawn: TypeError: can't concat str to bytes
Traceback (most recent call last):
  File "/opt/stack/nova/nova/compute/manager.py", line 2620, in _build_resources
yield resources
  File "/opt/stack/nova/nova/compute/manager.py", line 2389, in 
_build_and_run_instance
self.driver.spawn(context, instance, image_meta,
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3877, in spawn
xml = self._get_guest_xml(context, instance, network_info,
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6721, in 
_get_guest_xml
conf = self._get_guest_config(instance, network_info, image_meta,
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6334, in 
_get_guest_config
self._configure_guest_by_virt_type(guest, instance, image_meta, flavor)
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5943, in 
_configure_guest_by_virt_type
loader, nvram_template = self._host.get_loader(
  File "/opt/stack/nova/nova/virt/libvirt/host.py", line 1636, in get_loader
for loader in self.loaders:
  File "/opt/stack/nova/nova/virt/libvirt/host.py", line 1619, in loaders
self._loaders = _get_loaders()
  File "/opt/stack/nova/nova/virt/libvirt/host.py", line 112, in _get_loaders
spec = jsonutils.load(fh)
  File 
"/usr/local/lib/python3.8/dist-packages/oslo_serialization/jsonutils.py", line 
261, in load
return json.load(codecs.getreader(encoding)(fp), **kwargs)
  File "/usr/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
  File "/usr/lib/python3.8/codecs.py", line 500, in read
data = self.bytebuffer + newdata
TypeError: can't concat str to byte


Enviro
[1] http://paste.openstack.org/show/803788/
[2] 
https://zuul.opendev.org/t/openstack/build/312d8e45b079460496d90f1d940c174c/log/controller/logs/screen-n-cpu.txt#22708
[3] https://review.opendev.org/c/openstack/devstack/+/708317

** Affects: nova
 Importance: Undecided
 Status: New

** Description changed:

- We fail to load JSON from QEMU firmware metadata files with error
- [1][2]:
+ Found error in [3] for libvirt with Ubuntu focal on arm64. We fail to
+ load JSON from QEMU firmware metadata files with error [1][2]:
  
  Instance failed to spawn: TypeError: can't concat str to bytes
  Traceback (most recent call last):
-   File "/opt/stack/nova/nova/compute/manager.py", line 2620, in 
_build_resources
- yield resources
-   File "/opt/stack/nova/nova/compute/manager.py", line 2389, in 
_build_and_run_instance
- self.driver.spawn(context, instance, image_meta,
-   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3877, in spawn
- xml = self._get_guest_xml(context, instance, network_info,
-   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6721, in 
_get_guest_xml
- conf = self._get_guest_config(instance, network_info, image_meta,
-   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6334, in 
_get_guest_config
- self._configure_guest_by_virt_type(guest, instance, image_meta, flavor)
-   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5943, in 
_configure_guest_by_virt_type
- loader, nvram_template = self._host.get_loader(
-   File "/opt/stack/nova/nova/virt/libvirt/host.py", line 1636, in get_loader
- for loader in self.loaders:
-   File "/opt/stack/nova/nova/virt/libvirt/host.py", line 1619, in loaders
- self._loaders = _get_loaders()
-   File "/opt/stack/nova/nova/virt/libvirt/host.py", line 112, in _get_loaders
- spec = jsonutils.load(fh)
-   File 
"/usr/local/lib/python3.8/dist-packages/oslo_serialization/jsonutils.py", line 
261, in load
- return json.load(codecs.getreader(encoding)(fp), **kwargs)
-   File "/usr/lib/python3.8/json/__init__.py", line 293, in load
- return loads(fp.read(),
-   File "/usr/lib/python3.8/codecs.py", line 500, in read
- data = self.bytebuffer + newdata
+   File "/opt/stack/nova/nova/compute/manager.py", line 2620, in 
_build_resources
+ yield resources
+   File "/opt/stack/nova/nova/compute/manager.py", line 2389, in 
_build_and_run_instance
+ self.driver.spawn(context, instance, image_meta,
+   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3877, in spawn
+ xml = self._get_guest_xml(context, instance, network_info,
+   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6721, in 
_get_guest_xml
+ conf = self._get_guest_config(instance, network_info, image_meta,
+   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6334, in 
_get_guest_config
+ self._configure_guest_by_virt_type(guest, instance, image_meta, flavor)
+   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5943, in 
_configure_guest_by_virt_type
+ loader, nvram_template = self._host.get_loader(
+   File "/opt/stack/nova/nova/virt/libvirt/host.py", line 1636, in get_loader
+ for 

[Yahoo-eng-team] [Bug 1921073] [NEW] [arm64][libvirt] firmware metadata files not found for arm64 on ubuntu 18.04

2021-03-24 Thread Rico Lin
Public bug reported:

>From devstack arm64 job patch [1], I found this error [2][3] when using
bionic images on arm64 environment:


Failed to build and run instance: nova.exception.InternalError: Failed to 
locate firmware descriptor files
Traceback (most recent call last):
  File "/opt/stack/nova/nova/compute/manager.py", line 2393, in 
_build_and_run_instance
accel_info=accel_info)
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3880, in spawn
mdevs=mdevs, accel_info=accel_info)
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6723, in 
_get_guest_xml
context, mdevs, accel_info)
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6334, in 
_get_guest_config
self._configure_guest_by_virt_type(guest, instance, image_meta, flavor)
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5945, in 
_configure_guest_by_virt_type
has_secure_boot=guest.os_loader_secure)
  File "/opt/stack/nova/nova/virt/libvirt/host.py", line 1636, in get_loader
for loader in self.loaders:
  File "/opt/stack/nova/nova/virt/libvirt/host.py", line 1619, in loaders
self._loaders = _get_loaders()
  File "/opt/stack/nova/nova/virt/libvirt/host.py", line 102, in _get_loaders
raise exception.InternalError(msg)
nova.exception.InternalError: Failed to locate firmware descriptor files

As I move [1] to use the focal version, this error message disappeared.
The wired part is I do can locate firmware descriptor files on my AWS bionic 
arm64 test environment right at the same demanded path. So I'm not sure what 
exactly happened there


[1] https://review.opendev.org/c/openstack/devstack/+/708317

[2] 
https://zuul.opendev.org/t/openstack/build/77b0d998c9f14e1b859467016dfb7852/log/controller/logs/screen-n-cpu.txt#9821
  
[3] http://paste.openstack.org/show/803786/

** Affects: nova
 Importance: Undecided
 Status: New

** Summary changed:

- firmware metadata files not found for arm64 on ubuntu 18.04
+ [arm64][libvirt] firmware metadata files not found for arm64 on ubuntu 18.04

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1921073

Title:
  [arm64][libvirt] firmware metadata files not found for arm64 on ubuntu
  18.04

Status in OpenStack Compute (nova):
  New

Bug description:
  From devstack arm64 job patch [1], I found this error [2][3] when
  using bionic images on arm64 environment:

  
  Failed to build and run instance: nova.exception.InternalError: Failed to 
locate firmware descriptor files
  Traceback (most recent call last):
File "/opt/stack/nova/nova/compute/manager.py", line 2393, in 
_build_and_run_instance
  accel_info=accel_info)
File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3880, in spawn
  mdevs=mdevs, accel_info=accel_info)
File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6723, in 
_get_guest_xml
  context, mdevs, accel_info)
File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6334, in 
_get_guest_config
  self._configure_guest_by_virt_type(guest, instance, image_meta, flavor)
File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5945, in 
_configure_guest_by_virt_type
  has_secure_boot=guest.os_loader_secure)
File "/opt/stack/nova/nova/virt/libvirt/host.py", line 1636, in get_loader
  for loader in self.loaders:
File "/opt/stack/nova/nova/virt/libvirt/host.py", line 1619, in loaders
  self._loaders = _get_loaders()
File "/opt/stack/nova/nova/virt/libvirt/host.py", line 102, in _get_loaders
  raise exception.InternalError(msg)
  nova.exception.InternalError: Failed to locate firmware descriptor files

  As I move [1] to use the focal version, this error message disappeared.
  The wired part is I do can locate firmware descriptor files on my AWS bionic 
arm64 test environment right at the same demanded path. So I'm not sure what 
exactly happened there

  
  [1] https://review.opendev.org/c/openstack/devstack/+/708317

  [2] 
https://zuul.opendev.org/t/openstack/build/77b0d998c9f14e1b859467016dfb7852/log/controller/logs/screen-n-cpu.txt#9821
  
  [3] http://paste.openstack.org/show/803786/

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1921073/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp