[Yahoo-eng-team] [Bug 1905701] Re: Do not recreate libvirt secret when one already exists on the host during a host reboot

2021-03-17 Thread melanie witt
https://review.opendev.org/c/openstack/nova/+/765769 proposed to
stable/victoria

** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Also affects: nova/trunk
   Importance: Undecided
   Status: New

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Also affects: nova/ussuri
   Importance: Undecided
   Status: New

** Also affects: nova/victoria
   Importance: Undecided
   Status: New

** No longer affects: nova/trunk

** Changed in: nova/victoria
   Status: New => In Progress

** Changed in: nova/victoria
 Assignee: (unassigned) => Lee Yarwood (lyarwood)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1905701

Title:
  Do not recreate libvirt secret when one already exists on the host
  during a host reboot

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) queens series:
  In Progress
Status in OpenStack Compute (nova) rocky series:
  In Progress
Status in OpenStack Compute (nova) stein series:
  In Progress
Status in OpenStack Compute (nova) train series:
  In Progress
Status in OpenStack Compute (nova) ussuri series:
  In Progress
Status in OpenStack Compute (nova) victoria series:
  In Progress

Bug description:
  Description
  ===

  When [compute]/resume_guests_state_on_host_boot is enabled the compute
  manager will attempt to restart instances on start up.

  When using the libvirt driver and instances with attached LUKSv1
  encrypted volumes a call is made to _attach_encryptor that currently
  assumes that any volume libvirt secrets don't already exist on the
  host. As a result this call will currently lead to an attempt to
  lookup encryption metadata that fails as the compute service is using
  a bare bones local only admin context to drive the restart of the
  instances.

  The libvirt secrets associated with LUKSv1 encrypted volumes actually
  persist a host reboot and thus this call to fetch encryption metadata,
  fetch the symmetric key etc are not required. Removal of these calls
  in this context should allow the compute service to start instances
  with these volumes attached.

  Steps to reproduce
  ==
  * Enable [compute]/resume_guests_state_on_host_boot
  * Launch instances with encrypted LUKSv1 volumes attached
  * Reboot the underlying host

  Expected result
  ===
  * The instances are restarted successfully by Nova as no external calls are 
made and the existing libvirt secret for any encrypted LUKSv1 volumes are 
reused.

  Actual result
  =
  * The instances fail to restart as the initial calls made by the Nova service 
use an empty admin context without a service catelog etc.

  Environment
  ===
  1. Exact version of OpenStack you are running. See the following

 master

  2. Which hypervisor did you use?
 (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
 What's the version of that?

 libvirt + QEMU/KVM

  2. Which storage type did you use?
 (For example: Ceph, LVM, GPFS, ...)
 What's the version of that?

 N/A

  3. Which networking type did you use?
 (For example: nova-network, Neutron with OpenVSwitch, ...)

 N/A

  Logs & Configs
  ==

  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: 
c5b3e7d4-99ea-409c-aba6-d32751f93ccf]   File 
"/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1641, in 
_connect_volume
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: 
c5b3e7d4-99ea-409c-aba6-d32751f93ccf] self._attach_encryptor(context, 
connection_info, encryption)
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: 
c5b3e7d4-99ea-409c-aba6-d32751f93ccf]   File 
"/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1760, in 
_attach_encryptor
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: 
c5b3e7d4-99ea-409c-aba6-d32751f93ccf] key = keymgr.get(context, 
encryption['encryption_key_id'])
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: 
c5b3e7d4-99ea-409c-aba6-d32751f93ccf]   File 
"/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py",
 line 575, in get
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: 
c5b3e7d4-99ea-409c-aba6-d32751f93ccf] secret = self._get_secret(context, 
managed_object_id)
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: 
c5b3e7d4-99ea-409c-aba6-d32751f93ccf]   File 
"/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py",
 line 545, in _ge
  t_secret
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: 

[Yahoo-eng-team] [Bug 1905493] Re: cloud-init status --wait hangs indefinitely in a nested lxd container

2021-03-17 Thread Dan Streetman
it's interesting that apparmor appears to work ok in the first-level
container, but fails in the nested container, e.g.:

$ lxc shell lp1905493-f 
root@lp1905493-f:~# systemctl status apparmor
● apparmor.service - Load AppArmor profiles
 Loaded: loaded (/lib/systemd/system/apparmor.service; enabled; vendor 
preset: enabled)
 Active: active (exited) since Wed 2021-03-17 18:17:44 UTC; 2h 53min ago
   Docs: man:apparmor(7)
 https://gitlab.com/apparmor/apparmor/wikis/home/
Process: 118 ExecStart=/lib/apparmor/apparmor.systemd reload (code=exited, 
status=0/SUCCESS)
   Main PID: 118 (code=exited, status=0/SUCCESS)

Mar 17 18:17:44 lp1905493-f systemd[1]: Starting Load AppArmor profiles...
Mar 17 18:17:44 lp1905493-f apparmor.systemd[118]: Restarting AppArmor
Mar 17 18:17:44 lp1905493-f apparmor.systemd[118]: Reloading AppArmor profiles
Mar 17 18:17:44 lp1905493-f apparmor.systemd[129]: Skipping profile in 
/etc/apparmor.d/disable: usr.sbin.rsyslogd
Mar 17 18:17:44 lp1905493-f systemd[1]: Finished Load AppArmor profiles.
root@lp1905493-f:~# lxc shell layer2
root@layer2:~# systemctl status apparmor
● apparmor.service - Load AppArmor profiles
 Loaded: loaded (/lib/systemd/system/apparmor.service; enabled; vendor 
preset: enabled)
 Active: failed (Result: exit-code) since Wed 2021-03-17 18:40:16 UTC; 2h 
31min ago
   Docs: man:apparmor(7)
 https://gitlab.com/apparmor/apparmor/wikis/home/
   Main PID: 105 (code=exited, status=1/FAILURE)

Mar 17 18:40:15 layer2 apparmor.systemd[147]: /sbin/apparmor_parser: Unable to 
replace "nvidia_modprobe".  Permission denied; attempted to load a profile 
while confined?
Mar 17 18:40:15 layer2 apparmor.systemd[157]: /sbin/apparmor_parser: Unable to 
replace "/usr/bin/man".  Permission denied; attempted to load a profile while 
confined?
Mar 17 18:40:15 layer2 apparmor.systemd[164]: /sbin/apparmor_parser: Unable to 
replace "/usr/sbin/tcpdump".  Permission denied; attempted to load a profile 
while confined?
Mar 17 18:40:16 layer2 apparmor.systemd[150]: /sbin/apparmor_parser: Unable to 
replace "/usr/lib/NetworkManager/nm-dhcp-client.action".  Permission denied; 
attempted to load a profile while confined?
Mar 17 18:40:16 layer2 apparmor.systemd[161]: /sbin/apparmor_parser: Unable to 
replace "mount-namespace-capture-helper".  Permission denied; attempted to load 
a profile while confined?
Mar 17 18:40:16 layer2 apparmor.systemd[161]: /sbin/apparmor_parser: Unable to 
replace "/usr/lib/snapd/snap-confine".  Permission denied; attempted to load a 
profile while confined?
Mar 17 18:40:16 layer2 apparmor.systemd[105]: Error: At least one profile 
failed to load
Mar 17 18:40:16 layer2 systemd[1]: apparmor.service: Main process exited, 
code=exited, status=1/FAILURE
Mar 17 18:40:16 layer2 systemd[1]: apparmor.service: Failed with result 
'exit-code'.
Mar 17 18:40:16 layer2 systemd[1]: Failed to start Load AppArmor profiles.


** Also affects: apparmor
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1905493

Title:
  cloud-init status --wait hangs indefinitely in a nested lxd container

Status in AppArmor:
  New
Status in cloud-init:
  Invalid
Status in snapd:
  Confirmed
Status in dbus package in Ubuntu:
  New
Status in systemd package in Ubuntu:
  Invalid

Bug description:
  When booting a nested lxd container inside another lxd container (just
  a normal container, not a VM) (i.e. just L2), using cloud-init -status
  --wait, the "." is just printed off infinitely and never returns.

To manage notifications about this bug go to:
https://bugs.launchpad.net/apparmor/+bug/1905493/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1905493] Re: cloud-init status --wait hangs indefinitely in a nested lxd container

2021-03-17 Thread Dan Watkins
Yep, that's what I've found; cloud-init is just waiting for its later
stages to run, which are blocked by snapd.seeded.service exiting.

** Changed in: cloud-init
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1905493

Title:
  cloud-init status --wait hangs indefinitely in a nested lxd container

Status in cloud-init:
  Invalid
Status in snapd:
  Confirmed
Status in dbus package in Ubuntu:
  New
Status in systemd package in Ubuntu:
  Invalid

Bug description:
  When booting a nested lxd container inside another lxd container (just
  a normal container, not a VM) (i.e. just L2), using cloud-init -status
  --wait, the "." is just printed off infinitely and never returns.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1905493/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1905493] Re: cloud-init status --wait hangs indefinitely in a nested lxd container

2021-03-17 Thread Ian Johnson
FWIW I know what the snapd issue is, the issue is that snapd does not
and will not work in a nested LXD container, we need to add code to make
snapd.seeded.service die/exit gracefully in this situation.

** Also affects: snapd
   Importance: Undecided
   Status: New

** Changed in: snapd
   Status: New => Confirmed

** Changed in: snapd
   Importance: Undecided => Low

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1905493

Title:
  cloud-init status --wait hangs indefinitely in a nested lxd container

Status in cloud-init:
  Invalid
Status in snapd:
  Confirmed
Status in dbus package in Ubuntu:
  New
Status in systemd package in Ubuntu:
  Invalid

Bug description:
  When booting a nested lxd container inside another lxd container (just
  a normal container, not a VM) (i.e. just L2), using cloud-init -status
  --wait, the "." is just printed off infinitely and never returns.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1905493/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1552042] Re: Host data corruption through nova inject_key feature

2021-03-17 Thread Jeremy Stanley
Thanks for following up on this longstanding report. Given the fix is
unlikely to be backported to supported stable branches, the VMT
considers such reports class B1 ( https://security.openstack.org/vmt-
process.html#incident-report-taxonomy ) so there's no call for issuing
an advisory.

** Changed in: ossa
   Status: Incomplete => Won't Fix

** Information type changed from Public Security to Public

** Tags added: security

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1552042

Title:
  Host data corruption through nova inject_key feature

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Security Advisory:
  Won't Fix

Bug description:
  Reported by Garth Mollett from Red Hat.

  The nova.virt.disk.vfs.VFSLocalFS has measures to prevent symlink
  traversal outside of the root of the images directory but it does not
  prevent access to device nodes inside the image itself. A simple fix
  should be to mount with the 'nodev' option.

  Under certain circumstances, the boot process will fold back to
  VFSLocalFS when trying to inject the public key, for libvirt:

  * when libguestfs is not installed or can't be loaded.
  * use_cow_images=false and inject_partition for non-nbd
  * for loopback mount at least, there is a race condition to win in 
virt/disk/mount/api.py between kpartx and a /dev/mapper/ file creation: 
os.path.exists can run before the path exists even though it's there half a 
second later.

  The xenapi is also likely vulnerable, though untested.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1552042/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1919386] Re: Project administrators are allowed to view networks across projects

2021-03-17 Thread Slawek Kaplonski
Fix merged in neutron-lib.

** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1919386

Title:
  Project administrators are allowed to view networks across projects

Status in neutron:
  Fix Released

Bug description:
  The new default policies in neutron help fix tenancy issues where
  users of one project are not allowed to view, create, modify, or
  delete resources within another project (enforcing hard tenancy).

  With the new policies enabled by default, I'm able to view networks
  for other projects as an administrator of another project.

  ╭─ubuntu@neutron-devstack /opt/stack/neutron ‹master›
  ╰─➤  $ openstack --os-cloud devstack-alt-admin network create alt-network
  /usr/lib/python3/dist-packages/secretstorage/dhcrypto.py:15: 
CryptographyDeprecationWarning: int_from_bytes is deprecated, use 
int.from_bytes instead
from cryptography.utils import int_from_bytes
  /usr/lib/python3/dist-packages/secretstorage/util.py:19: 
CryptographyDeprecationWarning: int_from_bytes is deprecated, use 
int.from_bytes instead
from cryptography.utils import int_from_bytes
  +---+--+
  | Field | Value|
  +---+--+
  | admin_state_up| UP   |
  | availability_zone_hints   |  |
  | availability_zones|  |
  | created_at| 2021-03-16T21:27:28Z |
  | description   |  |
  | dns_domain| None |
  | id| 84c7464b-3351-4a47-88d1-3b6615967e87 |
  | ipv4_address_scope| None |
  | ipv6_address_scope| None |
  | is_default| False|
  | is_vlan_transparent   | None |
  | mtu   | 1450 |
  | name  | alt-network  |
  | port_security_enabled | True |
  | project_id| 13bde21b76fe4744904785a9a61512b7 |
  | provider:network_type | vxlan|
  | provider:physical_network | None |
  | provider:segmentation_id  | 3|
  | qos_policy_id | None |
  | revision_number   | 1|
  | router:external   | Internal |
  | segments  | None |
  | shared| False|
  | status| ACTIVE   |
  | subnets   |  |
  | tags  |  |
  | updated_at| 2021-03-16T21:27:28Z |
  +---+--+
  ╭─ubuntu@neutron-devstack /opt/stack/neutron ‹master›
  ╰─➤  $ openstack --os-cloud devstack-admin-admin network show alt-network
  /usr/lib/python3/dist-packages/secretstorage/dhcrypto.py:15: 
CryptographyDeprecationWarning: int_from_bytes is deprecated, use 
int.from_bytes instead
from cryptography.utils import int_from_bytes
  /usr/lib/python3/dist-packages/secretstorage/util.py:19: 
CryptographyDeprecationWarning: int_from_bytes is deprecated, use 
int.from_bytes instead
from cryptography.utils import int_from_bytes
  +---+--+
  | Field | Value|
  +---+--+
  | admin_state_up| UP   |
  | availability_zone_hints   |  |
  | availability_zones|  |
  | created_at| 2021-03-16T21:27:28Z |
  | description   |  |
  | dns_domain| None |
  | id| 84c7464b-3351-4a47-88d1-3b6615967e87 |
  | ipv4_address_scope| None |
  | ipv6_address_scope| None |
  | is_default| None |
  | is_vlan_transparent   | None |
  | 

[Yahoo-eng-team] [Bug 1905493] Re: cloud-init status --wait hangs indefinitely in a nested lxd container

2021-03-17 Thread Dan Streetman
The systemd-logind problem is due to dbus defaulting to apparmor mode
'enabled', but apparmor can't do much of anything inside a container so
it fails to start, and dbus can't contact it.

In the 2nd level container, create a file like '/etc/dbus-1/system.d/no-
apparmor.conf' with content:



http://www.freedesktop.org/standards/dbus/1.0/busconfig.dtd;>

  



Then restart the 2nd level container and recheck systemd-logind which should 
now work

Of course, fixing dbus should be a bit smarter about only disabling its
use of apparmor if it's inside a container.


However, cloud-init status --wait still hangs after systemd-logind starts up, 
so that wasn't the original problem (or at least wasn't the only problem)

** Also affects: dbus (Ubuntu)
   Importance: Undecided
   Status: New

** Changed in: systemd (Ubuntu)
   Status: New => Invalid

** Changed in: cloud-init
   Status: Invalid => New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1905493

Title:
  cloud-init status --wait hangs indefinitely in a nested lxd container

Status in cloud-init:
  New
Status in dbus package in Ubuntu:
  New
Status in systemd package in Ubuntu:
  Invalid

Bug description:
  When booting a nested lxd container inside another lxd container (just
  a normal container, not a VM) (i.e. just L2), using cloud-init -status
  --wait, the "." is just printed off infinitely and never returns.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1905493/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1919487] [NEW] virDomainBlockCommit called when deleting a snapshot via os-assisted-volume-snapshots even when instance is shutoff

2021-03-17 Thread Lee Yarwood
Public bug reported:

Description
===

Attempting to delete a NFS volume snapshot (via c-api and the the os-
assisted-volume-snapshots n-api) of a volume attached to a SHUTOFF
instance currently results in n-cpu attempting to fire off a
virDomainBlockCommit command even though the instance isn't running.


Steps to reproduce
==
1. Create multiple volume snapshots against a volume.
2. Attach the volume to an ACTIVE instance.
3. Stop the instance and ensure it is SHUTOFF.
4. Attempt to delete the latest snapshot.

Expected result
===
qemu-img commit or qemu-img rebase should be used to handle this offline.

Actual result
=
virDomainBlockCommit is called even though the domain isn't running.

Environment
===

1. Exact version of OpenStack you are running. See the following
  list for all releases: http://docs.openstack.org/releases/

   master

2. Which hypervisor did you use?
   (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
   What's the version of that?

   libvirt + KVM

2. Which storage type did you use?
   (For example: Ceph, LVM, GPFS, ...)
   What's the version of that?

   NFS c-vol

3. Which networking type did you use?
   (For example: nova-network, Neutron with OpenVSwitch, ...)

   N/A

Logs & Configs
==

Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server [req-570281c6-566e-44a3-9953-eeb634513778 
req-0fbbe87f-fd1d-4861-9fb3-21b8eb011e55 service nova] Exception during message 
handling: libvirt.libvirtError: Requested operation is not valid: domain is not 
>
Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server Traceback (most recent call last):
Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python3.7/site-packages/oslo_messaging/rpc/server.py", line 
165, in _process_incoming
Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python3.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 
273, in dispatch
Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, 
args)
Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python3.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 
193, in _do_dispatch
Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server result = func(ctxt, **new_args)
Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python3.7/site-packages/oslo_messaging/rpc/server.py", line 
241, in inner
Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server return func(*args, **kwargs)
Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server   File "/opt/stack/nova/nova/exception_wrapper.py", 
line 78, in wrapped
Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server function_name, call_dict, binary)
Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python3.7/site-packages/oslo_utils/excutils.py", line 220, in 
__exit__
Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server self.force_reraise()
Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python3.7/site-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python3.7/site-packages/six.py", line 703, in reraise
Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server raise value
Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server   File "/opt/stack/nova/nova/exception_wrapper.py", 
line 69, in wrapped
Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server return f(self, context, *args, **kw)
Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server   File "/opt/stack/nova/nova/compute/manager.py", 
line 3916, in volume_snapshot_delete
Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 
oslo_messaging.rpc.server snapshot_id, delete_info)
Jul 03 09:37:57 localhost.localdomain nova-compute[127223]: ERROR 

[Yahoo-eng-team] [Bug 1896621] Re: instance corrupted after volume retype

2021-03-17 Thread melanie witt
https://review.opendev.org/c/openstack/nova/+/758732 has been released
in ussuri 21.2.0

** Changed in: nova/ussuri
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1896621

Title:
  instance corrupted after volume retype

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  In Progress
Status in OpenStack Compute (nova) rocky series:
  In Progress
Status in OpenStack Compute (nova) stein series:
  In Progress
Status in OpenStack Compute (nova) train series:
  In Progress
Status in OpenStack Compute (nova) ussuri series:
  Fix Released
Status in OpenStack Compute (nova) victoria series:
  Fix Released

Bug description:
  Description
  ===

  Following a cinder volume retype on a volume attached to a running
  instance, the instance became corrupt and cannot boot into the guest
  operating system any more.

  Upon further investigating it seems the retype operation failed.  The
  nova-compute logs registered the following error:

  Exception during message handling: libvirtError: block copy still
  active: domain has active block job

  see log extract: http://paste.openstack.org/show/798201/

  Steps to reproduce
  ==

  I'm not sure how easy this would be to replicate the exact problem.

  As an admin user within the project, in Horizon go to Project | Volume
  | Volume, then from the context menu of the required volume select
  "change volume type".

  Select the new type and migration policy 'on-demand'.

  Following this it was reported that the instance was none-responsive,
  when checking in the console the instance was unable to boot from the
  volume.

  
  Environment
  ===
  DISTRIB_ID="OSA"
  DISTRIB_RELEASE="18.1.5"
  DISTRIB_CODENAME="Rocky"
  DISTRIB_DESCRIPTION="OpenStack-Ansible"

  # nova-manage --version
  18.1.1

  # virsh version
  Compiled against library: libvirt 4.0.0
  Using library: libvirt 4.0.0
  Using API: QEMU 4.0.0
  Running hypervisor: QEMU 2.11.1

  
  Cinder v13.0.3 backed volumes using Zadara VPSA driver

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1896621/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1885528] Re: snapshot delete fails on shutdown VM

2021-03-17 Thread Lee Yarwood
** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/ussuri
   Importance: Undecided
   Status: New

** Also affects: nova/victoria
   Importance: Undecided
   Status: New

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Also affects: nova/trunk
   Importance: Undecided
   Status: New

** Changed in: nova/ussuri
 Assignee: (unassigned) => Lee Yarwood (lyarwood)

** Changed in: nova/victoria
 Assignee: (unassigned) => Lee Yarwood (lyarwood)

** Changed in: nova/rocky
   Status: New => In Progress

** Changed in: nova/trunk
 Assignee: (unassigned) => Lee Yarwood (lyarwood)

** Changed in: nova/ussuri
   Status: New => In Progress

** Changed in: nova/victoria
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1885528

Title:
  snapshot delete fails on shutdown VM

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  New
Status in OpenStack Compute (nova) rocky series:
  In Progress
Status in OpenStack Compute (nova) stein series:
  New
Status in OpenStack Compute (nova) trunk series:
  New
Status in OpenStack Compute (nova) ussuri series:
  In Progress
Status in OpenStack Compute (nova) victoria series:
  In Progress

Bug description:
  Description:
  When we try to delete the last snapshot of a VM in shutdown state, this 
snapshot_delete will fail (and be stuck in state error-deleting). When setting 
state==available and redeleting the snapshot, the volume will be corrupted and 
the VM will never start again. Volumes are stored on NFS.
  (for root cause and fix, see the bottom of this post)

  To reproduce:
  - storage on NFS
  - create a VM and some snapshots
  - shut down the VM (ie volume is still considered "attached" but vm is no 
longer "active")
  - delete the last snapshot

  Expected Result:
  snapshot is deleted, vm still works

  Actual result:
  The snapshot is stuck on error deleting. After setting the snapshot 
state==available and deleting the snapshot again, the volume will be corrupted 
and the VM will never start again. (non-existing backing_file in qcow on disk)

  Environment:
  - openstack version: stein, deployed via kolla-ansible. I suspect this 
downloads from git but i don't know the exact version.
  - hypervisor: Libvirt + KVM
  - storage: NFS
  - networking: Neutron with OpenVSwitch

  Nova debug Logs:
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver 
[req-d38b5ec8-afdb-4dfe-af12-0c47598c6a47 6dd1c995b2ea4ddfbeb0685bc52e5fbf 
6bebb564667d4a75b9281fd826b32ecf - d
  efault default] [instance: 711651a3-8440-42dd-a210-e7e550a8624e] Error 
occurred during volume_snapshot_delete, sending error status to Cinder.: 
DiskNotFound: No disk at
   
volume-86c06b12-699c-4b54-8bca-fb92c99a2bf0.63d1585e-eb76-4e8f-bc96-93960e9c9692
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] Traceback (most recent call last):
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/dri
  ver.py", line 2726, in volume_snapshot_delete
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] snapshot_id, delete_info=delete_info)
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/dri
  ver.py", line 2686, in _volume_snapshot_delete
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] rebase_base)
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/dri
  ver.py", line 2519, in _rebase_with_qemu_img
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] b_file_fmt = 
images.qemu_img_info(backing_file).file_forma
  t
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e]   File 
"/usr/lib/python2.7/site-packages/nova/virt/images.py",
   line 58, in qemu_img_info
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] raise 
exception.DiskNotFound(location=path)
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 
711651a3-8440-42dd-a210-e7e550a8624e] DiskNotFound: No disk at 
volume-86c06b12-699c-4b54-8bca-fb92c9
  9a2bf0.63d1585e-eb76-4e8f-bc96-93960e9c9692
  2020-02-06 12:20:10.713 6 ERROR nova.virt.libvirt.driver [instance: 

[Yahoo-eng-team] [Bug 1886855] Re: Insufficient error handling when parsing iscsiadm -m node output with blank iscsi target

2021-03-17 Thread Lee Yarwood
** No longer affects: nova

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1886855

Title:
  Insufficient error handling when parsing iscsiadm -m node output with
  blank iscsi target

Status in Cinder:
  New
Status in os-brick:
  New

Bug description:
  We encountered the following error when attempting to reboot a VM with
  multiple attached volumes -

  2020-07-02 05:46:05.960 ERROR oslo_messaging.rpc.server 
[req-0c171deb-bc82-4687-91d6-76e8f95b8e19  service] Exception during 
message handling: IndexError: list index out of range
  ...
  2020-07-02 05:46:05.960 TRACE oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py", line 
157, in _get_iscsi_nodes
  2020-07-02 05:46:05.960 TRACE oslo_messaging.rpc.server 
lines.append((info[0].split(',')[0], info[1]))
  2020-07-02 05:46:05.960 TRACE oslo_messaging.rpc.server IndexError: list 
index out of range
  This is observed on os-brick version - 1.15.9

  The same code in current master branch -
  
https://github.com/openstack/os-brick/blob/master/os_brick/initiator/connectors/iscsi.py#L136

  iscsiadm -m node output -
  172.30.0.191:3260,-1 
iqn.2010-10.org.openstack:volume-f1ff35f1-9716-4929-831f-32e7b207c742
  172.30.0.193:3260,-1 
iqn.2010-10.org.openstack:volume-5393e371-337f-4332-b39f-4926e4a1f9f7
  172.30.0.193:3260,-1 
iqn.2010-10.org.openstack:volume-1520a7d6-4351-416a-a703-c82f1bc9839d
  []:3260,-1 
  172.30.0.191:3260,-1 
iqn.2010-10.org.openstack:volume-fd632af2-45d9-4266-be67-b84e61fb3cbb
  172.30.0.193:3260,-1 
iqn.2010-10.org.openstack:volume-c1b325b9-7bd2-4d91-a3ef-295736e52eca
  172.30.0.191:3260,-1 
iqn.2010-10.org.openstack:volume-6a1a112e-1140-482d-9064-fe1b03391f2b

  The blank target causes an unhandled exception. A simple python code snippet 
to show the same - 
  >>> out = "172.30.0.193:3260,-1 
iqn.2010-10.org.openstack:volume-1520a7d6-4351-416a-a703-c82f1bc9839d\n[]:3260,-1\n172.30.0.191:3260,-1
 iqn.2010-10.org.openstack:volume-fd632af2-45d9-4266-be67-b84e61fb3cbb"
  >>> lines = []
  >>> out.splitlines()
  ['172.30.0.193:3260,-1 
iqn.2010-10.org.openstack:volume-1520a7d6-4351-416a-a703-c82f1bc9839d', 
'[]:3260,-1', '172.30.0.191:3260,-1 
iqn.2010-10.org.openstack:volume-fd632af2-45d9-4266-be67-b84e61fb3cbb']
  >>> for line in out.splitlines():
  ... if line:
  ... info = line.split()
  ... lines.append((info[0].split(',')[0], info[1]))
  ...
  Traceback (most recent call last):
File "", line 4, in 

  The blank iscsi target was most probably due to corruption of the
  iscsi data during discovery.

  Using strace we could trace that blank target belongs to a OpenStack volume -
  
open("/var/lib/iscsi/nodes/iqn.2010-10.org.openstack:volume-d88869e6-d27b-4121-9bd1-d8c86ce9d7e1/172.30.0.193,3260",
 O_RDONLY) = 5

  Expected:
  Management of that single volume associated with blank iscsi target to be 
affected.

  Observed:
  None of the volumes and volume backed VMs can be managed on the affected host.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/1886855/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1907756] Re: ERROR: No matching distribution found for hacking<3.1.0, >=3.0.1

2021-03-17 Thread Lee Yarwood
** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1907756

Title:
  ERROR: No matching distribution found for hacking<3.1.0,>=3.0.1

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  The openstack-tox-lower-constraints job fails in stable/ussuri.

  
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_0ce/765082/1/check
  /openstack-tox-lower-constraints/0ceb0d5/job-output.txt

  2020-12-11 04:45:08.261271 | ubuntu-bionic | 
== log start ===
  2020-12-11 04:45:08.261311 | ubuntu-bionic | Looking in indexes: 
https://mirror.dfw.rax.opendev.org/pypi/simple, 
https://mirror.dfw.rax.opendev.org/wheel/ubuntu-18.04-x86_64
  2020-12-11 04:45:08.261335 | ubuntu-bionic | ERROR: Could not find a version 
that satisfies the requirement hacking<3.1.0,>=3.0.1
  2020-12-11 04:45:08.261360 | ubuntu-bionic | ERROR: No matching distribution 
found for hacking<3.1.0,>=3.0.1

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1907756/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1918250] Re: live migration is failing with libvirt >= 6.8.0

2021-03-17 Thread Balazs Gibizer
@Martin: You reported this against the upstream nova project but you are
linking to the RDO specific nova wrapper code. Is the reported problem
really affects the upstream nova project?

I'm marking this Invalid from upstream nova perspective. If you disagree
then please set it back to New and help us pointing to the fault in
upstream nova.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1918250

Title:
  live migration is failing with libvirt >= 6.8.0

Status in OpenStack Compute (nova):
  Invalid
Status in tripleo:
  In Progress

Bug description:
  With libvirt 6.8.0 introduced virt-ssh-helper:

  +  * remote: ``virt-ssh-helper`` replaces ``nc`` for SSH tunnelling
  +
  +Libvirt now provides a ``virt-ssh-helper`` binary on the server
  +side. The libvirt remote client will use this binary for setting
  +up an SSH tunnelled connection to hosts. If not present, it will
  +transparently fallback to the traditional ``nc`` tunnel. The new
  +binary makes it possible for libvirt to transparently connect
  +across hosts even if libvirt is built with a different installation
  +prefix on the client vs server. It also enables remote access to
  +the unprivileged per-user libvirt daemons(eg using a URI such as
  +``qemu+ssh://hostname/session``. The only requirement is that
  +``virt-ssh-helper`` is present in $PATH of the remote host.

  Libvirt first checks for the `virt-ssh-helper` binary, if it's not present,
  then it falls back to `nc`.

  The code where the 'nova-migration-wrapper' script looks for the
  "nc" binary is here[1]

  libvirt used to first check for `nc` (netcat).  But these two libvirt
  commits[2][3] -- which are present in the libvirt build used in this
  bug -- have now changed it to first look for `virt-ssh-helper`, if it
  not available, then fall back to `nc`.

  The nova-migration-wrapper doesn't accept this command and denies
  the connection.

  Mar 08 16:52:39 overcloud-novacompute-1
  nova_migration_wrapper[240622]: Denying connection='192.168.24.18
  54668 192.168.24.9 2022' command=['sh', '-c', "'which", 'virt-ssh-
  helper', '1>/dev/null', '2>&1;', 'if', 'test', '$?', '=', '0;',
  'then', '', '', '', '', 'virt-ssh-helper', "'qemu:///system';",
  'else', '', '', '', 'if', "'nc'", '-q', '2>&1', '|', 'grep',
  '"requires', 'an', 'argument"', '>/dev/null', '2>&1;', 'then',
  'ARG=-q0;else', "ARG=;fi;'nc'", '$ARG', '-U', '/var/run/libvirt
  /libvirt-sock;', "fi'"]

  A possible workaround is to force-use "netcat" (`nc`) by appending to the
  migration URI: "=netcat", so the `diff` of the URL:

    - 
qemu+ssh://nova_migration@compute-0.ctlplane.redhat.local:2022/system?keyfile=/etc/nova/migration/identity
    + 
qemu+ssh://nova_migration@compute-0.ctlplane.redhat.local:2022/system?keyfile=/etc/nova/migration/identity=netcat

  But longer term we want to allow the virt-ssh-helper, because that's needed
  to work properly with the split daemons as the socket path has changed

  [1] https://github.com/rdo-packages/nova-distgit/blob/rpm-master/nova-
  migration-wrapper#L32

  [2] https://libvirt.org/git/?p=libvirt.git;a=commit;h=f8ec7c842d (rpc:
  use new virt-ssh-helper binary for remote tunnelling, 2020-07-08)

  [3] https://libvirt.org/git/?p=libvirt.git;a=commit;h=7d959c302d (rpc:
  Fix virt-ssh-helper detection, 2020-10-27)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1918250/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1917409] Re: neutron-l3-agents won't become active

2021-03-17 Thread LIU Yulong
*** This bug is a duplicate of bug 1883089 ***
https://bugs.launchpad.net/bugs/1883089

** This bug has been marked a duplicate of bug 1883089
   [L3] floating IP failed to bind due to no agent gateway port(fip-ns)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1917409

Title:
  neutron-l3-agents won't become active

Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  New

Bug description:
  We have a Ubuntu Ussari cloud deployed on Ubuntu 20.04 using the juju
  charms from the 20.08 bundle (planning to upgrade soon).

  The problem that is occuring that all l3 agents for routers using a
  particular external network show up with their ha_state in standby.
  I've tried removing and re-adding, and we never see the state go to
  active.

  $ neutron l3-agent-list-hosting-router bradm-router
  neutron CLI is deprecated and will be removed in the future. Use openstack 
CLI instead.
  
+--+-++---+--+
  | id   | host| admin_state_up | alive 
| ha_state |
  
+--+-++---+--+
  | 09ae92c9-ae8f-4209-b1a8-d593cc6d6602 | oschv1.maas | True   | :-)   
| standby  |
  | 4d9fe934-b1f8-4c2b-83ea-04971f827209 | oschv2.maas | True   | :-)   
| standby  |
  | 70b8b60e-7fbd-4b3a-80a3-90875ca72ce6 | oschv4.maas | True   | :-)   
| standby  |
  
+--+-++---+--+

  This generates a stack trace:

  2021-03-01 02:59:47.344 3675486 ERROR neutron.agent.l3.router_info [-] 
'NoneType' object has no attribute 'get'
  Traceback (most recent call last):

File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/server.py", line 
165, in _process_incoming
  res = self.dispatcher.dispatch(message)

File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/dispatcher.py", 
line 276, in dispatch
  return self._do_dispatch(endpoint, method, ctxt, args)

File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/dispatcher.py", 
line 196, in _do_dispatch
  result = func(ctxt, **new_args)

File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 139, in 
wrapped
  setattr(e, '_RETRY_EXCEEDED', True)

File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in 
__exit__
  self.force_reraise()

File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
  six.reraise(self.type_, self.value, self.tb)

File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise
  raise value

File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 135, in 
wrapped
  return f(*args, **kwargs)

File "/usr/lib/python3/dist-packages/oslo_db/api.py", line 154, in wrapper
  ectxt.value = e.inner_exc

File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in 
__exit__
  self.force_reraise()

File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
  six.reraise(self.type_, self.value, self.tb)

File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise
  raise value

File "/usr/lib/python3/dist-packages/oslo_db/api.py", line 142, in wrapper
  return f(*args, **kwargs)

File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 183, in 
wrapped
  LOG.debug("Retry wrapper got retriable exception: %s", e)

File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in 
__exit__
  self.force_reraise()

File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
  six.reraise(self.type_, self.value, self.tb)

File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise
  raise value

File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 179, in 
wrapped
  return f(*dup_args, **dup_kwargs)

File "/usr/lib/python3/dist-packages/neutron/api/rpc/handlers/l3_rpc.py", 
line 306, in get_agent_gateway_port
  agent_port = self.l3plugin.create_fip_agent_gw_port_if_not_exists(

File "/usr/lib/python3/dist-packages/neutron/db/l3_dvr_db.py", line 1101, 
in create_fip_agent_gw_port_if_not_exists
  self._populate_mtu_and_subnets_for_ports(context, [agent_port])

File "/usr/lib/python3/dist-packages/neutron/db/l3_db.py", line 1772, in 
_populate_mtu_and_subnets_for_ports
  network_ids = [p['network_id']

File "/usr/lib/python3/dist-packages/neutron/db/l3_db.py", line 1772, in 

  network_ids = [p['network_id']

File "/usr/lib/python3/dist-packages/neutron/db/l3_db.py", line 1720, in 
_each_port_having_fixed_ips
  fixed_ips = port.get('fixed_ips', [])

  This system was running successfully after deployment, 

[Yahoo-eng-team] [Bug 1919357] Re: "Secure live migration with QEMU-native TLS in nova"-guide misses essential config option

2021-03-17 Thread Balazs Gibizer
** Changed in: nova
   Status: New => In Progress

** Changed in: nova
 Assignee: (unassigned) => Josephine Seifert (josei)

** Changed in: nova
   Importance: Undecided => High

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Also affects: nova/victoria
   Importance: Undecided
   Status: New

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Also affects: nova/ussuri
   Importance: Undecided
   Status: New

** Changed in: nova/stein
   Importance: Undecided => High

** Changed in: nova/train
   Importance: Undecided => High

** Changed in: nova/ussuri
   Importance: Undecided => High

** Changed in: nova/victoria
   Importance: Undecided => High

** Tags added: tls

** Tags added: live-migration

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1919357

Title:
  "Secure live migration with QEMU-native TLS in nova"-guide misses
  essential config option

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) stein series:
  New
Status in OpenStack Compute (nova) train series:
  New
Status in OpenStack Compute (nova) ussuri series:
  New
Status in OpenStack Compute (nova) victoria series:
  New
Status in OpenStack Security Advisory:
  Won't Fix
Status in OpenStack Security Notes:
  New

Bug description:
  - [x] This doc is inaccurate in this way: __

  I followed the guide to setup qemu native tls for live migration.
  After checking, that libvirt is able to use tls using tcpdump to
  listen on the port for tls,  I also wanted to check that it works when
  I live migrate an instance. Apparently it didn't. But it used the port
  for unencrypted TCP [1].

  After digging through documentation and code afterwards I found that
  in this code part:
  
https://github.com/openstack/nova/blob/stable/victoria/nova/virt/libvirt/driver.py#L1120

  @staticmethod
  def _live_migration_uri(dest):
  uris = {
  'kvm': 'qemu+%(scheme)s://%(dest)s/system',
  'qemu': 'qemu+%(scheme)s://%(dest)s/system',
  'xen': 'xenmigr://%(dest)s/system',
  'parallels': 'parallels+tcp://%(dest)s/system',
  }
  dest = oslo_netutils.escape_ipv6(dest)

  virt_type = CONF.libvirt.virt_type
  # TODO(pkoniszewski): Remove fetching live_migration_uri in Pike
  uri = CONF.libvirt.live_migration_uri
  if uri:
  return uri % dest

  uri = uris.get(virt_type)
  if uri is None:
  raise exception.LiveMigrationURINotAvailable(virt_type=virt_type)

  str_format = {
  'dest': dest,
  'scheme': CONF.libvirt.live_migration_scheme or 'tcp',
  }
  return uri % str_format

  the uri is calculated using the config parameter
  'live_migration_scheme' or using the hard coded tcp parameter. Coming
  from the guide for qemu native tls, there was no hint that this config
  option needs to be set.

  In fact without setting this 'live_migration_scheme' config option to
  tls, there is no way to see, that the live migration still uses the
  unencrypted tcp connection - one has to use tcpdump and listen for tcp
  or tls to recognize it. Neither in the logs nor in any debug output
  there is any hint that it is still unencrypted!

  Thus I conclude there might be OpenStack deployments which are
  configured as the guide say but these config changes have no effect!

  - [x] This is a doc addition request.

  To fix this the config parameter 'live_migration_scheme' should be set
  to tls and maybe there should be a warning in the documentation, that
  without doing this, the traffic is still unencrypted.

  - [ ] I have a fix to the document that I can paste below including
  example: input and output.

  [1] without setting 'live_migration_scheme' in the nova.conf
  $ tcpdump -i INTERFACE -n -X port 16509 and '(tcp[((tcp[12] & 0xf0) >> 2)] < 
0x14 || tcp[((tcp[12] & 0xf0) >> 2)] > 0x17)'
  tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
  listening on INTERFACE, link-type EN10MB (Ethernet), capture size 262144 bytes
  17:10:56.387407 IP 192.168.70.101.50900 > 192.168.70.100.16509: Flags [P.], 
seq 304:6488, ack 285, win 502, options [nop,nop,TS val 424149655 ecr 
1875309961], length 6184
   0x:  4500 185c ad05 4000 4006 677c c0a8 4665  E..\..@.@.g|..Fe
   0x0010:  c0a8 4664 c6d4 407d a407 70a6 15ad 0a5a  ..Fd..@}..pZ
   0x0020:  8018 01f6 2669  0101 080a 1948 0297  
   0x0030:  6fc6 f589  1828 2000 8086  0001  o..(
   0x0040:   012f    0009    .../
   0x0050:   0001  000f 6465 7374 696e 6174  destinat
   0x0060:  696f 6e5f 786d 6c00  0007  129b  ion_xml.
   0x0070:  3c64 6f6d 6169 6e20 7479 7065 3d27 6b76  ...inst
   0x0090:  

[Yahoo-eng-team] [Bug 1552042] Re: Host data corruption through nova inject_key feature

2021-03-17 Thread Balazs Gibizer
The fix merged to master
https://review.opendev.org/c/openstack/nova/+/324720

** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1552042

Title:
  Host data corruption through nova inject_key feature

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Security Advisory:
  Incomplete

Bug description:
  Reported by Garth Mollett from Red Hat.

  The nova.virt.disk.vfs.VFSLocalFS has measures to prevent symlink
  traversal outside of the root of the images directory but it does not
  prevent access to device nodes inside the image itself. A simple fix
  should be to mount with the 'nodev' option.

  Under certain circumstances, the boot process will fold back to
  VFSLocalFS when trying to inject the public key, for libvirt:

  * when libguestfs is not installed or can't be loaded.
  * use_cow_images=false and inject_partition for non-nbd
  * for loopback mount at least, there is a race condition to win in 
virt/disk/mount/api.py between kpartx and a /dev/mapper/ file creation: 
os.path.exists can run before the path exists even though it's there half a 
second later.

  The xenapi is also likely vulnerable, though untested.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1552042/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp