date:20201124

[Yahoo-eng-team] [Bug 1853632] Re: designate dns driver does not use domain settings for auth

2020-11-24 Thread Launchpad Bug Tracker

[Expired for neutron because there has been no activity for 60 days.]

** Changed in: neutron
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1853632

Title:
  designate dns driver does not use domain settings for auth

Status in neutron:
  Expired

Bug description:
  The designate external dns driver does not use domain settings for
  authentication **if** there is more than one openstack domain.

  If you have only the 'Default' domain the authentication system seems to have
  no doubt on which domain to use so it will use that.

  In our deployment we support federated authentication and we have this
  issue.

  The issue lies in the external dns driver as it also does not support all of 
the documented options. You can test this by setting invalid one of (or all)
  in the [designate] section of /etc/neutron/neutron.conf:

    user_domain_name
    project_domain_name
    project_name

  it should yield the same results (it should all work).

  @oammis initially found this issue, so credit where it's due.

  We have a Queens deployment, although by what we can see from the code
  it should be transverse to all releases.

  I'll post a fix soon.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1853632/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1896592] Re: [neutron-tempest-plugin] test_dhcpv6_stateless_* clashing when creating a IPv6 subnet

2020-11-24 Thread Launchpad Bug Tracker

[Expired for neutron because there has been no activity for 60 days.]

** Changed in: neutron
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1896592

Title:
  [neutron-tempest-plugin] test_dhcpv6_stateless_* clashing when
  creating a IPv6 subnet

Status in neutron:
  Expired
Status in tempest:
  Expired

Bug description:
  The following three test cases are clashing when creating, at the same time, 
an IPv6 subnet with the same CIDR:
  - test_dhcpv6_stateless_eui64
  - test_dhcpv6_stateless_no_ra
  - test_dhcpv6_stateless_no_ra_no_dhcp

  Log: https://61069af11b09b96273ad-
  d5a2c2135ef34e5fcff72992ca5eb476.ssl.cf2.rackcdn.com/662869/6/check
  /neutron-tempest-with-uwsgi/9b9c086/controller/logs/tempest_log.txt

  Snippet: http://paste.openstack.org/show/798195/

  Error:
  "Invalid input for operation: Requested subnet with cidr: 2001:db8::/64 for 
network: 31e04aec-34df-49dc-8a05-05813a37be98 overlaps with another subnet."

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1896592/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1894839] Re: hostname not getting set as per the dns

2020-11-24 Thread Launchpad Bug Tracker

[Expired for cloud-init because there has been no activity for 60 days.]

** Changed in: cloud-init
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1894839

Title:
  hostname not getting set as per the dns

Status in cloud-init:
  Expired

Bug description:
  Environment Details:
  Management Control Plane : OpenStack (Ussuri Release)
  cloud-init version : 19.1 (community)
  Data Source : Config Drive
  OS/platform of deployed VM : RHEL 8.2

  I am using cloud-init v19.1 where the control plane (OpenStack nova
  service) passes information (data source) via configdrive during VM
  deployment.

  I am using set_hostname and update_hostname module under
  cloud_init_modules section in [1] file. As per the documentation, I
  have put preserve_hostname: true  in the cfg file and rebooted the
  system(VM). The hostname is preserved and does not change after
  multiple reboot. This I believe is working as expected.

  When we using preserve_host: false and fqdn:  in [1], the
  hostname given as input value to fqdn is assigned as the hostname of
  the VM.

  But when we just give preserve_hostname: false in [1], and deploy the
  VM, the deployed VM name comes as a hostname.

  [root@host cloudinit]# hostname
  host

  Expectation :

  If we give preserve_hostname: false it should set the hostname as per
  the dns what nslookup gives.

  If we use an image where preserve_hostname: false, with no hostname
  specified in the cfg file) and use set_hostname and update_hostname in
  [1] modules in the cfg file, and deploy few VMs (say 5 VMs ) with it -
  how will the hostname get configured to all the deployed VMs ? In this
  use case, we will want the hostname configured on the VM to DNS
  resolvable and aligned with the IP address associated with the VM. How
  can this be achieved ?

  [1] /etc/cloud/cloud.cfg

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1894839/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1896592] Re: [neutron-tempest-plugin] test_dhcpv6_stateless_* clashing when creating a IPv6 subnet

2020-11-24 Thread Launchpad Bug Tracker

[Expired for tempest because there has been no activity for 60 days.]

** Changed in: tempest
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1896592

Title:
  [neutron-tempest-plugin] test_dhcpv6_stateless_* clashing when
  creating a IPv6 subnet

Status in neutron:
  Expired
Status in tempest:
  Expired

Bug description:
  The following three test cases are clashing when creating, at the same time, 
an IPv6 subnet with the same CIDR:
  - test_dhcpv6_stateless_eui64
  - test_dhcpv6_stateless_no_ra
  - test_dhcpv6_stateless_no_ra_no_dhcp

  Log: https://61069af11b09b96273ad-
  d5a2c2135ef34e5fcff72992ca5eb476.ssl.cf2.rackcdn.com/662869/6/check
  /neutron-tempest-with-uwsgi/9b9c086/controller/logs/tempest_log.txt

  Snippet: http://paste.openstack.org/show/798195/

  Error:
  "Invalid input for operation: Requested subnet with cidr: 2001:db8::/64 for 
network: 31e04aec-34df-49dc-8a05-05813a37be98 overlaps with another subnet."

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1896592/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1885527] Re: cloud-init regenerating ssh-keys

2020-11-24 Thread Launchpad Bug Tracker

This bug was fixed in the package cloud-init - 20.4-0ubuntu1

---
cloud-init (20.4-0ubuntu1) hirsute; urgency=medium

  * d/control: add gnupg to Recommends as cc_apt_configure requires it to be
installed for some operations.
  * New upstream release.
- Release 20.4 (#686) [James Falcon] (LP: #1905440)
- tox: avoid tox testenv subsvars for xenial support (#684)
- Ensure proper root permissions in integration tests (#664) [James Falcon]
- LXD VM support in integration tests (#678) [James Falcon]
- Integration test for fallocate falling back to dd (#681) [James Falcon]
- .travis.yml: correctly integration test the built .deb (#683)
- Ability to hot-attach NICs to preprovisioned VMs before reprovisioning
  (#613) [aswinrajamannar]
- Support configuring SSH host certificates. (#660) [Jonathan Lung]
- add integration test for LP: #1900837 (#679)
- cc_resizefs on FreeBSD: Fix _can_skip_ufs_resize (#655)
  [Mina Galić] (LP: #1901958, #1901958)
- DataSourceAzure: push dmesg log to KVP (#670) [Anh Vo]
- Make mount in place for tests work (#667) [James Falcon]
- integration_tests: restore emission of settings to log (#657)
- DataSourceAzure: update password for defuser if exists (#671) [Anh Vo]
- tox.ini: only select "ci" marked tests for CI runs (#677)
- Azure helper: Increase Azure Endpoint HTTP retries (#619) [Johnson Shi]
- DataSourceAzure: send failure signal on Azure datasource failure (#594)
  [Johnson Shi]
- test_persistence: simplify VersionIsPoppedFromState (#674)
- only run a subset of integration tests in CI (#672)
- cli: add --system param to allow validating system user-data on a
  machine (#575)
- test_persistence: add VersionIsPoppedFromState test (#673)
- introduce an upgrade framework and related testing (#659)
- add --no-tty option to gpg (#669) [Till Riedel] (LP: #1813396)
- Pin pycloudlib to a working commit (#666) [James Falcon]
- DataSourceOpenNebula: exclude SRANDOM from context output (#665)
- cloud_tests: add hirsute release definition (#662)
- split integration and cloud_tests requirements (#652)
- faq.rst: add warning to answer that suggests running `clean` (#661)
- Fix stacktrace in DataSourceRbxCloud if no metadata disk is found (#632)
  [Scott Moser]
- Make wakeonlan Network Config v2 setting actually work (#626)
  [dermotbradley]
- HACKING.md: unify network-refactoring namespace (#658) [Mina Galić]
- replace usage of dmidecode with kenv on FreeBSD (#621) [Mina Galić]
- Prevent timeout on travis integration tests. (#651) [James Falcon]
- azure: enable pushing the log to KVP from the last pushed byte  (#614)
  [Moustafa Moustafa]
- Fix launch_kwargs bug in integration tests (#654) [James Falcon]
- split read_fs_info into linux & freebsd parts (#625) [Mina Galić]
- PULL_REQUEST_TEMPLATE.md: expand commit message section (#642)
- Make some language improvements in growpart documentation (#649)
  [Shane Frasier]
- Revert ".travis.yml: use a known-working version of lxd (#643)" (#650)
- Fix not sourcing default 50-cloud-init ENI file on Debian (#598)
  [WebSpider]
- remove unnecessary reboot from gpart resize (#646) [Mina Galić]
- cloudinit: move dmi functions out of util (#622) [Scott Moser]
- integration_tests: various launch improvements (#638)
- test_lp1886531: don't assume /etc/fstab exists (#639)
- Remove Ubuntu restriction from PR template (#648) [James Falcon]
- util: fix mounting of vfat on *BSD (#637) [Mina Galić]
- conftest: improve docstring for disable_subp_usage (#644)
- doc: add example query commands to debug Jinja templates (#645)
- Correct documentation and testcase data for some user-data YAML (#618)
  [dermotbradley]
- Hetzner: Fix instance_id / SMBIOS serial comparison (#640)
  [Markus Schade]
- .travis.yml: use a known-working version of lxd (#643)
- tools/build-on-freebsd: fix comment explaining purpose of the script
  (#635) [Mina Galić]
- Hetzner: initialize instance_id from system-serial-number (#630)
  [Markus Schade] (LP: #1885527)
- Explicit set IPV6_AUTOCONF and IPV6_FORCE_ACCEPT_RA on static6 (#634)
  [Eduardo Otubo]
- get_interfaces: don't exclude Open vSwitch bridge/bond members (#608)
  [Lukas Märdian] (LP: #1898997)
- Add config modules for controlling IBM PowerVM RMC. (#584)
  [Aman306] (LP: #1895979)
- Update network config docs to clarify MAC address quoting (#623)
  [dermotbradley]
- gentoo: fix hostname rendering when value has a comment (#611)
  [Manuel Aguilera]
- refactor integration testing infrastructure (#610) [James Falcon]
- stages: don't reset permissions of cloud-init.log every boot (#624)
  (LP: #1900837)
- docs: Add how to use cloud-localds to boot qemu (#617) [Joshua Powers]
- Drop vestigial

[Yahoo-eng-team] [Bug 1905493] [NEW] cloud-init status --wait hangs indefinitely in a nested lxd container

2020-11-24 Thread Ian Johnson

Public bug reported:

When booting a nested lxd container inside another lxd container (just a
normal container, not a VM) (i.e. just L2), using cloud-init -status
--wait, the "." is just printed off infinitely and never returns.

** Affects: cloud-init
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1905493

Title:
  cloud-init status --wait hangs indefinitely in a nested lxd container

Status in cloud-init:
  New

Bug description:
  When booting a nested lxd container inside another lxd container (just
  a normal container, not a VM) (i.e. just L2), using cloud-init -status
  --wait, the "." is just printed off infinitely and never returns.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1905493/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1905447] [NEW] ds-identify OpenStack is odd

2020-11-24 Thread Dimitri John Ledkov

Public bug reported:

ds-identify OpenStack is odd

# LP: #1715241 : arch other than intel are not identified properly.
case "$DI_UNAME_MACHINE" in
i?86|x86_64) :;;
*) return ${DS_MAYBE};;
esac


It has that, which is not nice.

Also i think above is no longer true, i think that arm64 ppc64le s390x
do have better openstack identification these days.

Also returning DS_MAYBE is a bit harmful on arches that are known not to
have openstack yet - i.e. riscv64.

** Affects: cloud-init
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1905447

Title:
  ds-identify OpenStack is odd

Status in cloud-init:
  New

Bug description:
  ds-identify OpenStack is odd

  # LP: #1715241 : arch other than intel are not identified properly.
  case "$DI_UNAME_MACHINE" in
  i?86|x86_64) :;;
  *) return ${DS_MAYBE};;
  esac

  
  It has that, which is not nice.

  Also i think above is no longer true, i think that arm64 ppc64le s390x
  do have better openstack identification these days.

  Also returning DS_MAYBE is a bit harmful on arches that are known not
  to have openstack yet - i.e. riscv64.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1905447/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1788915] Re: sysconfig renders vlan with TYPE=Ethernet

2020-11-24 Thread Chad Smith

This bug is believed to be fixed in cloud-init in version 20.4. If this
is still a problem for you, please make a comment and set the state back
to New

Thank you.

** Changed in: cloud-init
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1788915

Title:
  sysconfig renders vlan with TYPE=Ethernet

Status in cloud-init:
  Fix Released

Bug description:
  Distribution: Fedora 28
  Cloud provider: None
  Network content of /etc/cloud/cloud.cfg.d/99_datasource.cfg (omitting users, 
etc.):
  network:
version: 1
config:
  - type: physical
name: lan1
mac_address: 0c:c4:7a:db:dc:b0
  - type: vlan
name: lan1.100
vlan_link: lan1
vlan_id: 100
subnets:
  - type: static
address: 192.168.0.2/24
gateway: 192.168.0.1
dns_nameservers:
  - 8.8.8.8
  - 8.8.4.4
  - type: vlan
name: lan1.3900
vlan_link: lan1
vlan_id: 3900
subnets:
  - type: static
address: 10.1.0.2/16
gateway: 

  I am unable to attach logs (no network connection).

  $ cloud-init --version
  /usr/bin/cloud-init 17.1

  The sysconfig renderer leaves the configured "kind" set to the default
  (ethernet), which results in a config file with "TYPE=Ethernet", which
  is incorrect and results in the VLAN interface not being created.

  $ cat ifcfg-lan1.100 
  # Created by cloud-init on instance boot automatically, do not edit.
  #
  BOOTPROTO=none
  DEFROUTE=yes
  DEVICE=lan1.100
  GATEWAY=192.168.0.1
  IPADDR=192.168.0.2
  NETMASK=255.255.255.0
  ONBOOT=yes
  PHYSDEV=lan1
  TYPE=Ethernet
  USERCTL=no
  VLAN=yes

  $ ifup lan1.100
  Error: Connection activation failed: No suitable device found for this 
connection.

  Removing the offending "TYPE=Ethernet" line from the config file
  resolves the problem (as does changing it to "TYPE=Vlan").

  I altered my configuration to use version 2 of the network
  configuration data with identical results (problem is in renderer).

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1788915/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1776958] Re: error creating lxdbr0.

2020-11-24 Thread Chad Smith

This bug is believed to be fixed in cloud-init in version 20.4. If this
is still a problem for you, please make a comment and set the state back
to New

Thank you.

** Changed in: cloud-init
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1776958

Title:
  error creating lxdbr0.

Status in cloud-init:
  Fix Released
Status in cloud-init package in Ubuntu:
  Fix Released

Bug description:
  $ cat > my.yaml <) failed
   cloudinit.util.ProcessExecutionError: Unexpected error while running command.
   Stderr: Error: The network already exists

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1776958/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1813396] Re: gpg called without no-tty

2020-11-24 Thread Chad Smith

This bug is believed to be fixed in cloud-init in version 20.4. If this
is still a problem for you, please make a comment and set the state back
to New

Thank you.

** Changed in: cloud-init
   Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1813396

Title:
  gpg called without no-tty

Status in cloud-init:
  Fix Released

Bug description:
  I am running cloud-init on a libvirt/virsh with this image:
  
https://cdimage.debian.org/cdimage/openstack/archive/9.6.5-20190122/debian-9.6.5-20190122
  -openstack-amd64.qcow2

  the relevant lines are:
  apt:
sources:
 docker:
   source: 'deb [arch=amd64] https://download.docker.com/linux/debian 
stretch stable'
   keyserver: keyserver.ubuntu.com
   keyid: 0EBFCD88

  sorry for not attaching any bugs, but the triggered command just does
  not find "/dev/tty")

  The gpg wrapper should have the "no-tty" argument for receiving key at least 
on debian systems.
  Otherwise cloudinit fails when specifying key-ids on debian cloudimages (with 
manually added dirmngr and apt-transport-https, it is quite a mess on the 
openstack debian images...) 

  I would naively propose the following patch:

  diff --git a/cloudinit/gpg.py b/cloudinit/gpg.py
  index 7fe17a2..21d598e 100644
  --- a/cloudinit/gpg.py
  +++ b/cloudinit/gpg.py
  @@ -42,7 +42,7 @@ def recv_key(key, keyserver, retries=(1, 1)):
   @param retries: an iterable of sleep lengths for retries.
   Use None to indicate no retries."""
   LOG.debug("Importing key '%s' from keyserver '%s'", key, keyserver)
  -cmd = ["gpg", "--keyserver=%s" % keyserver, "--recv-keys", key]
  +cmd = ["gpg", "--no-tty", "--keyserver=%s" % keyserver, "--recv-keys", 
key]
   if retries is None:
   retries = []
   trynum = 0

  BR

  Till

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1813396/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1897915] Re: ntp service on centos is ntp.service, but cloud-init uses nptd.service

2020-11-24 Thread Chad Smith

This bug is believed to be fixed in cloud-init in version 20.4. If this
is still a problem for you, please make a comment and set the state back
to New

Thank you.

** Changed in: cloud-init
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1897915

Title:
  ntp service on centos is ntp.service, but cloud-init uses nptd.service

Status in cloud-init:
  Fix Released

Bug description:
  The ntp (client) service file installed by the centos 7 'ntp' package is named
  'ntpd' (note the d), but cloud-init's cc_ntp module identifies that
  'service_name' as 'ntp'.

  See below on centos 7.  For centos 8, there is no 'ntp' package that I see,
  it seems to have been replaced by chrony.

  [root@cent71 ~]# rpm -ql ntp | grep systemd
  /usr/lib/systemd/ntp-units.d/60-ntpd.list
  /usr/lib/systemd/system/ntpd.service
  [root@cent71 ~]# systemctl status ntp.service 
  Unit ntp.service could not be found.
  [root@cent71 ~]# systemctl cat ntp.service
  No files found for ntp.service.
  [root@cent71 ~]# systemctl cat ntpd.service
  # /usr/lib/systemd/system/ntpd.service
  [Unit]
  Description=Network Time Service
  After=syslog.target ntpdate.service sntp.service

  [Service]
  Type=forking
  EnvironmentFile=-/etc/sysconfig/ntpd
  ExecStart=/usr/sbin/ntpd -u ntp:ntp $OPTIONS
  PrivateTmp=true

  [Install]
  WantedBy=multi-user.target

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1897915/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1885527] Re: cloud-init regenerating ssh-keys

2020-11-24 Thread Chad Smith

This bug is believed to be fixed in cloud-init in version 20.4. If this
is still a problem for you, please make a comment and set the state back
to New

Thank you.

** Changed in: cloud-init
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1885527

Title:
  cloud-init regenerating ssh-keys

Status in cloud-init:
  Fix Released
Status in cloud-init package in Ubuntu:
  In Progress

Bug description:
  Hi,

  I made some experiments with virtual machines with Ubuntu-20.04 at a
  german cloud provider (Hetzner), who uses cloud-init to initialize
  machines with a basic setup such as ip and ssh access.

  During my installation tests I had to reboot the virtual machines
  several times after installing or removing packages.

  Occassionally (not always) I noticed that the ssh host keys have
  changed, ssh complained. After accepting the new host keys (insecure!)
  I found, that all key files in /etc/ssh had fresh mod times, i.e. were
  freshly regenerated.

  This reminds me to a bug I had reported about cloud-init some time
  ago, where I could not change the host name permanently, because
  cloud-init reset it to it's initial configuration at every boot time
  (highly dangerous, because it seemed to reset passwords to their
  original state as well.

  Although cloud-init is intended to do an initial configuration for the
  first boot only, it seems to remain on the system and – even worse:
  occasionally – change configurations.

  I've never understood what's the purpose of cloud-init remaining
  active once after the machine is up and running.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1885527/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1826608] Re: sysconfig rendering ignores vlan name

2020-11-24 Thread Chad Smith

This bug is believed to be fixed in cloud-init in version 20.4. If this
is still a problem for you, please make a comment and set the state back
to New

Thank you.

** Changed in: cloud-init
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1826608

Title:
  sysconfig rendering ignores vlan name

Status in cloud-init:
  Fix Released

Bug description:
  sysconfig rendering currently just does not pay attention to the vlan
  device's name.  Instead it attempts to set the name to the backing
  device with .* stripped from the end.

  Here is an example of current master output.
  The 'PHYSDEV' entry should be 'eth0', not 'infra'.

  $ cat my2.yaml
  version: 2
  ethernets:
eth0:
  addresses: ["192.10.1.2/24"]
  match:
 macaddress: "00:16:3e:60:7c:df"
  vlans:
infra0:
  id: 1001
  link: eth0
  addresses: ["10.0.1.2/16"]

  $ tox-venv py3 python3 -m cloudinit.cmd.main devel net-convert \
--mac en0,00:16:3e:60:7c:df \
--network-data=my2.yaml --kind=yaml \
--distro=centos --output-kind=sysconfig \
--directory=out.test

  $ cat out.test/etc/sysconfig/network-scripts/ifcfg-eth0
  # Created by cloud-init on instance boot automatically, do not edit.
  #
  BOOTPROTO=none
  DEVICE=eth0
  HWADDR=00:16:3e:60:7c:df
  IPADDR=192.10.1.2
  NETMASK=255.255.255.0
  NM_CONTROLLED=no
  ONBOOT=yes
  STARTMODE=auto
  TYPE=Ethernet
  USERCTL=no

  $ cat out.test/etc/sysconfig/network-scripts/ifcfg-infra0
  # Created by cloud-init on instance boot automatically, do not edit.
  #
  BOOTPROTO=none
  DEVICE=infra0
  IPADDR=10.0.1.2
  NETMASK=255.255.0.0
  NM_CONTROLLED=no
  ONBOOT=yes
  PHYSDEV=infra
  STARTMODE=auto
  TYPE=Ethernet
  USERCTL=no
  VLAN=yes

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1826608/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1895976] Re: Fail to get http openstack metadata if the Linux instance runs on Hyper-V

2020-11-24 Thread Chad Smith

This bug is believed to be fixed in cloud-init in version 20.4. If this
is still a problem for you, please make a comment and set the state back
to New

Thank you.

** Changed in: cloud-init
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1895976

Title:
  Fail to get http openstack metadata if the Linux instance runs on
  Hyper-V

Status in cloud-init:
  Fix Released
Status in compute-hyperv:
  New
Status in OpenStack Compute (nova):
  In Progress
Status in os-win:
  Fix Committed

Bug description:
  Because of the commit that introduced platform checks for enabling /
  using http openstack metadata (https://github.com/canonical/cloud-
  init/commit/1efa8a0a030794cec68197100f31a856d0d264ab), cloud-init on
  Linux machines will stop loading http metadata when running on
  "unsupported" platforms / hypervisors like Hyper-V, XEN, OracleCloud,
  VMware, OpenTelekomCloud - leading to a whole suite of bug reports and
  fixes to a non-issue.

  Let's try to solve this problem once for all the upcoming platforms /
  hypervisors by adding a configuration option on the metadata level:
  perform_platform_check or check_if_platform_is_supported (suggestions
  are welcome for the naming). The value of the config option should be
  true in order to maintain backwards compatibility. When set to true,
  cloud-init will check if the platform is supported.

  No one would like to patch well-working OpenStack environments for
  this kind of issues and it is always easier to control / build the
  images you use on private OpenStack.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1895976/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1895979] Re: New cloud-init config modules for PowerVM Hypervisor based VMs

2020-11-24 Thread Chad Smith

This bug is believed to be fixed in cloud-init in version 20.4. If this
is still a problem for you, please make a comment and set the state back
to New

Thank you.

** Changed in: cloud-init
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1895979

Title:
  New cloud-init config modules for PowerVM Hypervisor based VMs

Status in cloud-init:
  Fix Released

Bug description:
  Linux Virtual Machines deployed (with ppcle architecture), using IBM
  PowerVM[1] Hypervisor on IBM Power System hosts, need an additional
  component (referred to as RMC - Remote Management Console) to be
  installed and successfully running on the VM. This RMC module/service
  must be installed and functioning on a ppcle VM, for the PowerVM
  Hypervisor to be able to communicate / manage these Virtual Machines.
  When a VM boots, there are some basic set of steps(generation of
  unique RMC node id; subsequent restart of the RMC service etc), that
  must be performed on a ppcle VM, to ensure that the communication of
  the VM and PowerVM Hypervisor is intact. RMC has to be active on the
  VM for the hypervisor to be able to perform many operations
  successfully. Thus a healthy RMC is a prerequisite for a PowerVM
  hypervisor based VM.

  To enable the healthy functioning of RMC services on ppcle Linux based
  VMs, there are couple of cloud-init config modules that we have been
  maintaining downstream. As part of this LP bug, we would like to
  upstream it, so that it benefits the larger community who is using
  PowerVM based VMs.

  References:
  [1] 
https://developer.ibm.com/depmodels/cloud/articles/cl-hypervisorcompare-powervm/

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1895979/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1897099] Re: create_swap do not fallback to dd when fallocate fails

2020-11-24 Thread Chad Smith

This bug is believed to be fixed in cloud-init in version 20.4. If this
is still a problem for you, please make a comment and set the state back
to New

Thank you.

** Changed in: cloud-init
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1897099

Title:
  create_swap do not fallback to dd when fallocate fails

Status in cloud-init:
  Fix Released

Bug description:
  Name: cloud-init
  Version : 20.2-1

  Code in questioning:  cloudinit/config/cc_mounts.py

  try:
  create_swap(fname, size, "fallocate")
  except util.ProcessExecutionError as e:
  LOG.warning(errmsg, fname, size, "dd", e)
  LOG.warning("Will attempt with dd.")
  create_swap(fname, size, "dd")


  as there is a kernel bug in latest's linux versions, fallocate creates swap 
images with holes.
  The workaround is to move fallocate (make create_swap function to fail) so 
that cloud-init will fallback to dd.

  
  Used bootcmd (or cloud-boothook) to rename (move) fallocate binary from my 
system but according the the logs, it didnt fallback to dd as it should. 
Probably the error was not ProcessExecutionError 

  
  Logs:

  /var/log/cloud-init-output.log:2020-09-24 09:13:16,470 - 
cc_mounts.py[WARNING]: Failed to create swapfile '/swapfile' of size 2048MB via 
fallocate: Unexpected
   error while running command.
  /var/log/cloud-init-output.log:Command: ['fallocate', '-l', '2048M', 
'/swapfile']
  /var/log/cloud-init-output.log:Reason: [Errno 2] No such file or directory: 
b'fallocate'
  /var/log/cloud-init-output.log:2020-09-24 09:13:16,479 - 
cc_mounts.py[WARNING]: failed to setup swap: [Errno 2] No such file or 
directory: '/swapfile'
  /var/log/cloud-init-output.log:chmod: cannot access '/usr/bin/fallocate': No 
such file or directory
  /var/log/cloud-init.log:2020-09-24 09:13:16,460 - cc_mounts.py[DEBUG]: 
Attempting to determine the real name of swap
  /var/log/cloud-init.log:2020-09-24 09:13:16,460 - cc_mounts.py[DEBUG]: 
changed default device swap => None
  /var/log/cloud-init.log:2020-09-24 09:13:16,460 - cc_mounts.py[DEBUG]: 
Ignoring nonexistent default named mount swap
  /var/log/cloud-init.log:2020-09-24 09:13:16,461 - cc_mounts.py[DEBUG]: 
suggest 2048.0 MB swap for 1983.953125 MB memory with '9030.296875 MB' disk 
given max=2048.0 MB [max=2048.0 MB]'
  /var/log/cloud-init.log:2020-09-24 09:13:16,461 - cc_mounts.py[DEBUG]: 
Creating swapfile in '/swapfile' on fstype 'ext4' using 'fallocate'
  /var/log/cloud-init.log:2020-09-24 09:13:16,461 - util.py[DEBUG]: Running 
command ['fallocate', '-l', '2048M', '/swapfile'] with allowed return codes [0] 
(she
  ll=False, capture=True)
  /var/log/cloud-init.log:2020-09-24 09:13:16,470 - cc_mounts.py[WARNING]: 
Failed to create swapfile '/swapfile' of size 2048MB via fallocate: Unexpected 
error while running command.
  /var/log/cloud-init.log:Command: ['fallocate', '-l', '2048M', '/swapfile']
  /var/log/cloud-init.log:Reason: [Errno 2] No such file or directory: 
b'fallocate'
  /var/log/cloud-init.log:2020-09-24 09:13:16,479 - util.py[DEBUG]: Attempting 
to remove /swapfile
  /var/log/cloud-init.log:2020-09-24 09:13:16,479 - util.py[DEBUG]: Setting up 
swap file took 0.019 seconds
  /var/log/cloud-init.log:2020-09-24 09:13:16,479 - cc_mounts.py[WARNING]: 
failed to setup swap: [Errno 2] No such file or directory: '/swapfile'

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1897099/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1898997] Re: MAAS cannot deploy/boot if OVS bridge is configured on a single PXE NIC

2020-11-24 Thread Chad Smith

This bug is believed to be fixed in cloud-init in version 20.4. If this
is still a problem for you, please make a comment and set the state back
to New

Thank you.

** Changed in: cloud-init
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1898997

Title:
  MAAS cannot deploy/boot if OVS bridge is configured on a single PXE
  NIC

Status in cloud-init:
  Fix Released
Status in netplan:
  Fix Released
Status in netplan.io package in Ubuntu:
  Fix Released
Status in netplan.io source package in Focal:
  Fix Released
Status in netplan.io source package in Groovy:
  Fix Released

Bug description:
  Problem description:
  If we try to deploy a single-NIC machine via MAAS, configuring an Open 
vSwitch bridge as the primary/PXE interface, the machine will install and boot 
Ubuntu 20.04 but it cannot finish the whole configuration (e.g. copying of SSH 
keys) and cannot be accessed/controlled via MAAS. It ends up in a "Failed" 
state.

  This is because systemd-network-wait-online.service fails (for some
  reason), before netplan can fully setup and configure the OVS bridge.
  Because of broken networking cloud-init cannot complete its final
  stages, like setup of SSH keys or signaling its state back to MAAS. If
  we wait a little longer the OVS bridge will actually come online and
  networking is working – SSH not being setup and MAAS state still
  "Failed", though.

  Steps to reproduce:
  * Setup a (virtual) MAAS system, e.g. inside a LXD container using a KVM 
host, as described here:
  
https://discourse.maas.io/t/setting-up-a-flexible-virtual-maas-test-environment/142
  * Install & setup maas[-cli] snap from 2.9/beta channel (instead of the 
deb/PPA from the discourse post)
  * Configure netplan PPA+key for testing via "Settings" -> "Package repos":
  https://launchpad.net/~slyon/+archive/ubuntu/ovs
  * Prepare curtin preseed in /var/snap/maas/current/preseeds/curtin_userdata, 
inside the LXD container (so you can access the broken machine afterwards):
  ==
  #cloud-config
  debconf_selections:
   maas: |
    {{for line in str(curtin_preseed).splitlines()}}
    {{line}}
    {{endfor}}
  late_commands:
    maas: [wget, '--no-proxy', '{{node_disable_pxe_url}}', '--post-data', 
'{{node_disable_pxe_data}}', '-O', '/dev/null']
    90_create_user: ["curtin", "in-target", "--", "sh", "-c", "sudo useradd 
test -g 0 -G sudo"]
    92_set_user_password: ["curtin", "in-target", "--", "sh", "-c", "echo 
'test:test' | sudo chpasswd"]
    94_cat: ["curtin", "in-target", "--", "sh", "-c", "cat /etc/passwd"]
    98_cloud_init: ["curtin", "in-target", "--", "apt-get", "-y", "install", 
"cloud-init"]
  ==
  * Compose a new virtual machine via MAAS' "KVM" menu, named e.g. "test1"
  * Watch it being commissioned via MAAS' "Machines" menu
  * Once it's ready select your machine (e.g. "test1.maas") -> Network
  * Select the single network interface (e.g. "ens4") -> Create bridge
  * Choose "Bridge type: Open vSwitch (ovs)", Select "Subnet" and "IP mode", 
save.
  * Deploy machine to Ubuntu 20.04 via "Take action" button

  The machine will install the OS and boot, but will end up in a
  "Failed" state inside MAAS due to network/OVS not being setup
  correctly. MAAS/SSH has no control over it. You can access the
  (broken) machine via serial console from the KVM-host (i.e. LXD
  container) via "virsh console test1" using the "test:test"
  credentials.

  === SRU/Focal/netplan.io ===
  [Impact]
  This update contains bug-fixes and packaging improvements and we would like 
to make sure all of our supported customers have access to these improvements.

  The notable ones are:

     * Setup OVS early in network-pre.target to avoid delays (LP:
  #1898997)

  See the changelog entry below for a full list of changes and bugs.

  [Test Case]
  The following development and SRU process was followed:
  https://wiki.ubuntu.com/NetplanUpdates

  Netplan contains an extensive integration test suite that is ran using
  the SRU package for each releases. This test suite's results are available 
here:
  http://autopkgtest.ubuntu.com/packages/n/netplan.io

  A successful run is required before the proposed netplan package
  can be let into -updates.

  The netplan team will be in charge of attaching the artifacts and console
  output of the appropriate run to the bug. Netplan team members will not
  mark ‘verification-done’ until this has happened.

  [Regression Potential]
  In order to mitigate the regression potential, the results of the
  aforementioned integration tests are attached to this bug.

  Focal:
  https://git.launchpad.net/~slyon/+git/files/tree/LP1898997/focal_amd64.log
  https://git.launchpad.net/~slyon/+git/files/tree/LP1898997/focal_arm64.log
  https://git.launchpad.net/~slyon/+git/files/tree/LP1898997/focal_armhf.log

[Yahoo-eng-team] [Bug 1900837] Re: cloud-init resets permissions on log file after reboot

2020-11-24 Thread Chad Smith

This bug is believed to be fixed in cloud-init in version 20.4. If this
is still a problem for you, please make a comment and set the state back
to New

Thank you.

** Changed in: cloud-init
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1900837

Title:
  cloud-init resets permissions on log file after reboot

Status in cloud-init:
  Fix Released

Bug description:
  In attempting to apply CIS security guidelines onto an Ubuntu system
  it was found that changing the log files in /var/log to 640, that on a
  reboot cloud-init would reset the permissions to 644. As long as
  cloud-init can write to the file it should be ok to alter the
  permissions without issue.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1900837/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1901958] Re: FreeBSD fix fs related bugs

2020-11-24 Thread Chad Smith

This bug is believed to be fixed in cloud-init in version 20.4. If this
is still a problem for you, please make a comment and set the state back
to New

Thank you.

** Changed in: cloud-init
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1901958

Title:
  FreeBSD fix fs related bugs

Status in cloud-init:
  Fix Released

Bug description:
  1) FreeBSD not support vfat use msdosfs
  Original report https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=250496
  "Feel free to submit upstream if you have signed the CLA. I do not want to 
sign it."

  2) if fs have trim: (-t) or MAC multilabel: (-l) flag, resize FS fail.
  https://www.freebsd.org/cgi/man.cgi?query=tunefs=8

  
   2020-10-28 17:15:07,015 - handlers.py[DEBUG]: finish: 
init-network/config-resizefs: FAIL: running config-resizefs with frequency 
always
   ...
File 
"/usr/local/lib/python3.7/site-packages/cloudinit/config/cc_resizefs.py", line 
114, in _can_skip_resize_ufs
  optlist, _args = getopt.getopt(newfs_cmd[1:], opt_value)
File "/usr/local/lib/python3.7/getopt.py", line 95, in getopt
  opts, args = do_shorts(opts, args[0][1:], shortopts, args[1:])
File "/usr/local/lib/python3.7/getopt.py", line 195, in do_shorts
  if short_has_arg(opt, shortopts):
File "/usr/local/lib/python3.7/getopt.py", line 211, in short_has_arg
  raise GetoptError(_('option -%s not recognized') % opt, opt)
   getopt.GetoptError: option -t not recognized

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1901958/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1905440] Re: Release 20.4

2020-11-24 Thread Chad Smith

This bug is believed to be fixed in cloud-init in version 20.4. If this
is still a problem for you, please make a comment and set the state back
to New

Thank you.

** Changed in: cloud-init
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1905440

Title:
  Release 20.4

Status in cloud-init:
  Fix Released

Bug description:
  == Release Notes ==

  Cloud-init release 20.4 is now available

  The 20.4 release:
   * spanned about 3 months
   * had 29 contributors from 31 domains
   * fixed 14 Launchpad issues

  Highlights:
   - Azure ability to hot-attach NICs to preprovisioned VMs before 
reprovisioning
   - Additional Azure failure handling
   - Add NoCloud seed from vendordata
   - Ability to blacklist network interfaces based on driver
   - New IBM PowerVM specific RMC module
   - Allow OVS bridge as primary interface
   - add cli "--system" param to allow validating system user-data
   - Support configuring SSH host certificates
   - New integration testing framework

  == Changelog ==
   - tox: avoid tox testenv subsvars for xenial support (#684)
   - Ensure proper root permissions in integration tests (#664) [James Falcon]
   - LXD VM support in integration tests (#678) [James Falcon]
   - Integration test for fallocate falling back to dd (#681) [James Falcon]
   - .travis.yml: correctly integration test the built .deb (#683)
   - Ability to hot-attach NICs to preprovisioned VMs before reprovisioning
 (#613) [aswinrajamannar]
   - Support configuring SSH host certificates. (#660) [Jonathan Lung]
   - add integration test for LP: #1900837 (#679)
   - cc_resizefs on FreeBSD: Fix _can_skip_ufs_resize (#655)
 [Mina Galić] (LP: #1901958, #1901958)
   - DataSourceAzure: push dmesg log to KVP (#670) [Anh Vo]
   - Make mount in place for tests work (#667) [James Falcon]
   - integration_tests: restore emission of settings to log (#657)
   - DataSourceAzure: update password for defuser if exists (#671) [Anh Vo]
   - tox.ini: only select "ci" marked tests for CI runs (#677)
   - Azure helper: Increase Azure Endpoint HTTP retries (#619) [Johnson Shi]
   - DataSourceAzure: send failure signal on Azure datasource failure (#594)
 [Johnson Shi]
   - test_persistence: simplify VersionIsPoppedFromState (#674)
   - only run a subset of integration tests in CI (#672)
   - cli: add --system param to allow validating system user-data on a
 machine (#575)
   - test_persistence: add VersionIsPoppedFromState test (#673)
   - introduce an upgrade framework and related testing (#659)
   - add --no-tty option to gpg (#669) [Till Riedel] (LP: #1813396)
   - Pin pycloudlib to a working commit (#666) [James Falcon]
   - DataSourceOpenNebula: exclude SRANDOM from context output (#665)
   - cloud_tests: add hirsute release definition (#662)
   - split integration and cloud_tests requirements (#652)
   - faq.rst: add warning to answer that suggests running `clean` (#661)
   - Fix stacktrace in DataSourceRbxCloud if no metadata disk is found (#632)
 [Scott Moser]
   - Make wakeonlan Network Config v2 setting actually work (#626)
 [dermotbradley]
   - HACKING.md: unify network-refactoring namespace (#658) [Mina Galić]
   - replace usage of dmidecode with kenv on FreeBSD (#621) [Mina Galić]
   - Prevent timeout on travis integration tests. (#651) [James Falcon]
   - azure: enable pushing the log to KVP from the last pushed byte  (#614)
 [Moustafa Moustafa]
   - Fix launch_kwargs bug in integration tests (#654) [James Falcon]
   - split read_fs_info into linux & freebsd parts (#625) [Mina Galić]
   - PULL_REQUEST_TEMPLATE.md: expand commit message section (#642)
   - Make some language improvements in growpart documentation (#649)
 [Shane Frasier]
   - Revert ".travis.yml: use a known-working version of lxd (#643)" (#650)
   - Fix not sourcing default 50-cloud-init ENI file on Debian (#598)
 [WebSpider]
   - remove unnecessary reboot from gpart resize (#646) [Mina Galić]
   - cloudinit: move dmi functions out of util (#622) [Scott Moser]
   - integration_tests: various launch improvements (#638)
   - test_lp1886531: don't assume /etc/fstab exists (#639)
   - Remove Ubuntu restriction from PR template (#648) [James Falcon]
   - util: fix mounting of vfat on *BSD (#637) [Mina Galić]
   - conftest: improve docstring for disable_subp_usage (#644)
   - doc: add example query commands to debug Jinja templates (#645)
   - Correct documentation and testcase data for some user-data YAML (#618)
 [dermotbradley]
   - Hetzner: Fix instance_id / SMBIOS serial comparison (#640)
 [Markus Schade]
   - .travis.yml: use a known-working version of lxd (#643)
   - tools/build-on-freebsd: fix comment explaining purpose of the script
 (#635) [Mina Galić]
   - Hetzner: initialize instance_id from system-serial-number (#630)
 [Markus Schade] (LP: #1885527)

[Yahoo-eng-team] [Bug 1905440] [NEW] Release 20.4

2020-11-24 Thread James Falcon

Public bug reported:

== Release Notes ==

Cloud-init release 20.4 is now available

The 20.4 release:
 * spanned about 3 months
 * had 29 contributors from 31 domains
 * fixed 14 Launchpad issues

Highlights:
  

== Changelog ==
 - tox: avoid tox testenv subsvars for xenial support (#684)
 - Ensure proper root permissions in integration tests (#664) [James Falcon]
 - LXD VM support in integration tests (#678) [James Falcon]
 - Integration test for fallocate falling back to dd (#681) [James Falcon]
 - .travis.yml: correctly integration test the built .deb (#683)
 - Ability to hot-attach NICs to preprovisioned VMs before reprovisioning
   (#613) [aswinrajamannar]
 - Support configuring SSH host certificates. (#660) [Jonathan Lung]
 - add integration test for LP: #1900837 (#679)
 - cc_resizefs on FreeBSD: Fix _can_skip_ufs_resize (#655)
   [Mina Galić] (LP: #1901958, #1901958)
 - DataSourceAzure: push dmesg log to KVP (#670) [Anh Vo]
 - Make mount in place for tests work (#667) [James Falcon]
 - integration_tests: restore emission of settings to log (#657)
 - DataSourceAzure: update password for defuser if exists (#671) [Anh Vo]
 - tox.ini: only select "ci" marked tests for CI runs (#677)
 - Azure helper: Increase Azure Endpoint HTTP retries (#619) [Johnson Shi]
 - DataSourceAzure: send failure signal on Azure datasource failure (#594)
   [Johnson Shi]
 - test_persistence: simplify VersionIsPoppedFromState (#674)
 - only run a subset of integration tests in CI (#672)
 - cli: add --system param to allow validating system user-data on a
   machine (#575)
 - test_persistence: add VersionIsPoppedFromState test (#673)
 - introduce an upgrade framework and related testing (#659)
 - add --no-tty option to gpg (#669) [Till Riedel] (LP: #1813396)
 - Pin pycloudlib to a working commit (#666) [James Falcon]
 - DataSourceOpenNebula: exclude SRANDOM from context output (#665)
 - cloud_tests: add hirsute release definition (#662)
 - split integration and cloud_tests requirements (#652)
 - faq.rst: add warning to answer that suggests running `clean` (#661)
 - Fix stacktrace in DataSourceRbxCloud if no metadata disk is found (#632)
   [Scott Moser]
 - Make wakeonlan Network Config v2 setting actually work (#626)
   [dermotbradley]
 - HACKING.md: unify network-refactoring namespace (#658) [Mina Galić]
 - replace usage of dmidecode with kenv on FreeBSD (#621) [Mina Galić]
 - Prevent timeout on travis integration tests. (#651) [James Falcon]
 - azure: enable pushing the log to KVP from the last pushed byte  (#614)
   [Moustafa Moustafa]
 - Fix launch_kwargs bug in integration tests (#654) [James Falcon]
 - split read_fs_info into linux & freebsd parts (#625) [Mina Galić]
 - PULL_REQUEST_TEMPLATE.md: expand commit message section (#642)
 - Make some language improvements in growpart documentation (#649)
   [Shane Frasier]
 - Revert ".travis.yml: use a known-working version of lxd (#643)" (#650)
 - Fix not sourcing default 50-cloud-init ENI file on Debian (#598)
   [WebSpider]
 - remove unnecessary reboot from gpart resize (#646) [Mina Galić]
 - cloudinit: move dmi functions out of util (#622) [Scott Moser]
 - integration_tests: various launch improvements (#638)
 - test_lp1886531: don't assume /etc/fstab exists (#639)
 - Remove Ubuntu restriction from PR template (#648) [James Falcon]
 - util: fix mounting of vfat on *BSD (#637) [Mina Galić]
 - conftest: improve docstring for disable_subp_usage (#644)
 - doc: add example query commands to debug Jinja templates (#645)
 - Correct documentation and testcase data for some user-data YAML (#618)
   [dermotbradley]
 - Hetzner: Fix instance_id / SMBIOS serial comparison (#640)
   [Markus Schade]
 - .travis.yml: use a known-working version of lxd (#643)
 - tools/build-on-freebsd: fix comment explaining purpose of the script
   (#635) [Mina Galić]
 - Hetzner: initialize instance_id from system-serial-number (#630)
   [Markus Schade] (LP: #1885527)
 - Explicit set IPV6_AUTOCONF and IPV6_FORCE_ACCEPT_RA on static6 (#634)
   [Eduardo Otubo]
 - get_interfaces: don't exclude Open vSwitch bridge/bond members (#608)
   [Lukas Märdian] (LP: #1898997)
 - Add config modules for controlling IBM PowerVM RMC. (#584)
   [Aman306] (LP: #1895979)
 - Update network config docs to clarify MAC address quoting (#623)
   [dermotbradley]
 - gentoo: fix hostname rendering when value has a comment (#611)
   [Manuel Aguilera]
 - refactor integration testing infrastructure (#610) [James Falcon]
 - stages: don't reset permissions of cloud-init.log every boot (#624)
   (LP: #1900837)
 - docs: Add how to use cloud-localds to boot qemu (#617) [Joshua Powers]
 - Drop vestigial update_resolve_conf_file function (#620) [Scott Moser]
 - cc_mounts: correctly fallback to dd if fallocate fails (#585)
   (LP: #1897099)
 - .travis.yml: add integration-tests to Travis matrix (#600)
 - ssh_util: handle non-default AuthorizedKeysFile config (#586)
   [Eduardo Otubo]
 - Multiple file fix for AuthorizedKeysFile config (#60)

[Yahoo-eng-team] [Bug 1550919] Re: [Libvirt]Evacuate fail may cause disk image be deleted

2020-11-24 Thread Elod Illes

** Changed in: nova/ussuri
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1550919

Title:
  [Libvirt]Evacuate fail may cause disk image be deleted

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  I checked latest source of nova on master branch, this problem is
  still exists.

  When we are doing evacuate, eventually _do_rebuild_instance will be called.
  As rebuild is not implemented in libvirt driver, in fact 
_rebuild_default_impl is called.

  try:
  with instance.mutated_migration_context():
  self.driver.rebuild(**kwargs)
  except NotImplementedError:
  # NOTE(rpodolyaka): driver doesn't provide specialized version 
  # of rebuild, fall back to the default implementation
  self._rebuild_default_impl(**kwargs)

  _rebuild_default_impl will call self.driver.spawn to boot up the instance, 
and spawn will in turn call _create_domain_and_network
  when VirtualInterfaceCreateException or Timeout happen, self.cleanup will be 
called.

  except exception.VirtualInterfaceCreateException:
  # Neutron reported failure and we didn't swallow it, so
  # bail here
  with excutils.save_and_reraise_exception():
  if guest:
  guest.poweroff()
  self.cleanup(context, instance, network_info=network_info,
   block_device_info=block_device_info)
  except eventlet.timeout.Timeout:
  # We never heard from Neutron
  LOG.warn(_LW('Timeout waiting for vif plugging callback for '
   'instance %(uuid)s'), {'uuid': instance.uuid},
   instance=instance)
  if CONF.vif_plugging_is_fatal:
  if guest:
  guest.poweroff()
  self.cleanup(context, instance, network_info=network_info,
   block_device_info=block_device_info)
  raise exception.VirtualInterfaceCreateException()

  Because default value for parameter destroy_disks is True
  def cleanup(self, context, instance, network_info, block_device_info=None,
  destroy_disks=True, migrate_data=None, destroy_vifs=True):

  So if error occur when doing evacuate during wait neutron's event,
  instance's disk file will be deleted unexpectedly

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1550919/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1868033] Re: Booting instance with pci_device fails during rocky->stein live upgrade

2020-11-24 Thread Elod Illes

** Changed in: nova/ussuri
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1868033

Title:
  Booting instance with pci_device fails during rocky->stein live
  upgrade

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  Environment:

  Stein nova-conductor having set upgrade_levels to rocky 
  Rocky nova-compute

  Boot an instance with a flavour that has a pci_device

  Error:

  Failed to publish message to topic 'nova': maximum recursion depth
  exceeded: RuntimeError: maximum recursion depth exceeded

  
  Tracked this down it it continually trying to backport the 
InstancePCIRequests:

  It gets as arguments:
  objinst={u'nova_object.version': u'1.1', u'nova_object.name': 
u'InstancePCIRequests', u'nova_object.data': {u'instance_uuid': 
u'08212b12-8fa8-42d9-8d3e-52ed60a64135', u'requests': [{u'nova_object.version': 
u'1.3', u'nova_object.name': u'InstancePCIRequest', u'nova_object.data': 
{u'count': 1, u'is_new': False, u'numa_policy': None, u'request_id': None, 
u'requester_id': None, u'alias_name': u'V100-32G', u'spec': [{u'vendor_id': 
u'10de', u'product_id': u'1db6'}]}, u'nova_object.namespace': u'nova'}]}, 
u'nova_object.namespace': u'nova'}, 

  object_versions={u'InstancePCIRequests': '1.1', 'InstancePCIRequest':
  '1.2'}

  
  It fails because it doesn't backport the individual InstancePCIRequest inside 
the InstancePCIRequests object and so keeps trying.

  Error it shows is: IncompatibleObjectVersion: Version 1.3 of
  InstancePCIRequest is not supported, supported version is 1.2


  I have fixed this in our setup by altering obj_make_compatible to
  downgrade the individual requests to version 1.2 which seems to work
  and all is good

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1868033/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1869050] Re: migration of anti-affinity server fails due to stale scheduler instance info

2020-11-24 Thread Elod Illes

** Changed in: nova/ussuri
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1869050

Title:
  migration of anti-affinity server fails due to stale scheduler
  instance info

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  Invalid
Status in OpenStack Compute (nova) queens series:
  Invalid
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  Description
  ===

  
  Steps to reproduce
  ==
  Have a deployment with 3 compute nodes

  * make sure that the deployment is configured with 
tracks_instance_changes=True (True is the default)
  * create and server group with anti-affinity policy
  * boot server1 into the group
  * boot server2 into the group
  * migrate server2
  * confirm the migration
  * boot server3

  Make sure that between the last two steps there was no periodic
  _sync_scheduler_instance_info running on the compute that was hosted
  server2 before the migration. This could done by doing the last too
  steps after each other without waiting too much as interval of that
  periodic (scheduler_instance_sync_interval) is defaulted to 120 sec.

  Expected result
  ===
  server3 is booted on the host where server2 is moved away

  Actual result
  =
  server3 cannot be booted (NoValidHost)

  Triage
  ==

  The confirm resize call on the source compute does not update the
  scheduler that the instance is removed from this host. This makes the
  scheduler instance info stale and causing the subsequent scheduling
  error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1869050/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1878024] Re: disk usage of the nova image cache is not counted as used disk space

2020-11-24 Thread Elod Illes

** Changed in: nova/ussuri
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1878024

Title:
  disk usage of the nova image cache is not counted as used disk space

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  Description
  ===
  The nova-compute service keeps a local image cache for glance images used for 
nova servers to avoid multiple download of the same image from glance. The disk 
usage of such cache is not calculated as local disk usage in nova and not 
reported to placement as used DISK_GB. This leads to disk over-allocation.

  Also the size of that cache cannot be limited by nova configuration so
  the deployer cannot reserve  disk space for that cache with
  reserved_host_disk_mb config.

  Steps to reproduce
  ==
  * Set up a single node devstack
  * Create and upload an image with a not too small physical size. Like an 
image with 1G physical size.
  * Check the current disk usage of the Host OS and configure 
reserved_host_disk_mb in nova-cpu.conf accordingly.
  * Boot two servers from that image with a flavor, like d1 (disk=5G)
  * Nova will download the glance image once to the local cache which result in 
a 1GB disk usage
  * Nova will create two root file systems, one for each VM. Those disks 
initially has minimal physical disk size, but has 5G virtual size.
  * At this point Nova allocated 5G + 5G of DISK_GB in placement, but due to 
the image in the cache the total disk usage of the two VMs + cache can be 5G + 
5G + 1G, if both VMs overwrite and fills the content of its own disk.

  Expected result
  ===
  Option A)
  Nova maintains a DISK_GB allocation in placement for the images in its cache. 
This way the expected DISK_GB allocation in placement is 5G + 5G + 1G at the end

  Option B)
  Nova provides a config option to limit the maximum size of the image cache 
and therefore the deployer can include the maximum image cache size into the 
reserved_host_disk_mb during dimensioning of the disk space of the compute.

  Actual result
  =
  Only 5G + 5G was allocation from placement. So disk space is over-allocated 
by the image cache.

  Environment
  ===

  Devstack from recent master

  stack@aio:/opt/stack/nova$ git log --oneline | head -n 1
  4b62c90063 Merge "Remove stale nested backport from InstancePCIRequests"

  libvirt driver with file based image backend

  Logs & Configs
  ==
  http://paste.openstack.org/show/793388/

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1878024/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1878583] Re: Unable to createImage/snapshot paused volume backed instances

2020-11-24 Thread Elod Illes

** Changed in: nova/ussuri
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1878583

Title:
  Unable to createImage/snapshot paused volume backed instances

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  Description
  ===
  Unable to createImage/snapshot paused volume backed instances.

  Steps to reproduce
  ==
  - Pause a volume backed instance
  - Attempt to snapshot the instance using the createImage API

  Expected result
  ===
  A snapshot image is successfully created as is the case for paused instances 
that are not volume backed.

  Actual result
  =
  n-api returns the following error:

  {'code': 409, 'message': "Cannot 'createImage' instance
  bc5a7ae4-fca9-4d83-b1b8-5534f51a9404 while it is in vm_state paused"}

  Environment
  ===
  1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/

 master

  2. Which hypervisor did you use?
 (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
 What's the version of that?

 N/A

  2. Which storage type did you use?
 (For example: Ceph, LVM, GPFS, ...)
 What's the version of that?

 N/A

  3. Which networking type did you use?
 (For example: nova-network, Neutron with OpenVSwitch, ...)

 N/A

  Logs & Configs
  ==

  As above.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1878583/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1878979] Re: Quota code does not respect [api]/instance_list_per_project_cells

2020-11-24 Thread Elod Illes

** Changed in: nova/ussuri
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1878979

Title:
  Quota code does not respect [api]/instance_list_per_project_cells

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  The function which counts resources using the legacy method involves
  getting a list of all cell mappings assigned to a specific project:

  
https://github.com/openstack/nova/blob/575a91ff5be79ac35aef4b61d84c78c693693304/nova/quota.py#L1170-L1209

  This code can be very heavy on a database which contains a lot of
  instances (but not a lot of mappings), potentially scanning millions
  of rows to gather 1-2 cell mappings.  In a single cell environment, it
  is just extra CPU usage with exactly the same outcome.

  The [api]/instance_list_per_project_cells was introduced to workaround
  this:

  
https://github.com/openstack/nova/blob/575a91ff5be79ac35aef4b61d84c78c693693304/nova/compute/instance_list.py#L146-L153

  However, the quota code does not implement it which means quota count
  take a big toll on the database server.  We should ideally mirror the
  same behaviour in the quota code.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1878979/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1879964] Re: Invalid value for 'hw:mem_page_size' raises confusing error

2020-11-24 Thread Elod Illes

** Changed in: nova/ussuri
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1879964

Title:
  Invalid value for 'hw:mem_page_size' raises confusing error

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  Configure a flavor like so:

openstack flavor create hugepage  --ram 1024 --disk 10 --vcpus 1 test
openstack flavor set  hugepage --property hw:mem_page_size=2M test

  Attempt to boot an instance. It will fail with the following error
  message:

Invalid memory page size '0' (HTTP 400) (Request-ID: req-
  338bf619-3a54-45c5-9c59-ad8c1d425e91)

  You wouldn't know from reading it, but this is because the property
  should read 'hw:mem_page_size=2MB' (note the extra 'B').

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1879964/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1882233] Re: Libvirt driver always reports 'memory_mb_used' of 0

2020-11-24 Thread Elod Illes

** Changed in: nova/ussuri
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1882233

Title:
  Libvirt driver always reports 'memory_mb_used' of 0

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  The nova-compute service periodically logs a summary of the free RAM,
  disk and vCPUs as reported by the hypervisor. For example:

Hypervisor/Node resource view: name=vtpm-f31.novalocal
  free_ram=7960MB free_disk=11.379043579101562GB free_vcpus=7
  pci_devices=[{...}]

  On a recent deployment using the libvirt driver, it's observed that
  the 'free_ram' value never changes despite instances being created and
  destroyed. This is because the 'get_memory_mb_used' function in
  'nova.virt.libvirt.host' always returns 0 unless the host platform -
  reported by 'sys.platform' is either 'linux2' or 'linux3'. Since
  Python 3.3, the major version is not included in this return value
  since it was misleading.

  This is low priority because the value only appears to be used for
  logging purposes and the values stored in e.g. the 'ComputeNode'
  object and reported to placement are calculated based on config
  options and number of instances on the node. We may wish to stop
  reporting this information instead.

  [1] https://stackoverflow.com/a/10429736/613428

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1882233/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1882821] Re: '[libvirt] file_backed_memory' and '[DEFAULT] reserved_host_memory_mb' are incompatible

2020-11-24 Thread Elod Illes

** Also affects: nova/ussuri
   Importance: Undecided
   Status: New

** Changed in: nova/ussuri
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1882821

Title:
  '[libvirt] file_backed_memory' and '[DEFAULT] reserved_host_memory_mb'
  are incompatible

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  Per title, the '[libvirt] file_backed_memory' and '[DEFAULT]
  reserved_host_memory_mb' config options are incompatible. Not only
  does '[DEFAULT] reserved_host_memory_mb' not really make sense for
  file backed memory (if you want to reserve "memory", configure a lower
  '[libvirt] file_backed_memory' value) but configuring a value for
  '[libvirt] file_backed_memory' that is lower than the value for
  '[DEFAULT] reserved_host_memory_mb', which currently defaults to
  512MB, will break nova's resource reporting to placement:

nova.exception.ResourceProviderUpdateFailed: Failed to update
  resource provider via URL
  /resource_providers/f39bde61-6f73-4ccb-9488-6efb9689730f/inventories:
  {"errors": [{"status": 400, "title": "Bad Request", "detail": "The
  server could not comply with the request since it is either malformed
  or otherwise incorrect.\n\n Unable to update inventory for resource
  provider f39bde61-6f73-4ccb-9488-6efb9689730f: Invalid inventory for
  'MEMORY_MB' on resource provider
  'f39bde61-6f73-4ccb-9488-6efb9689730f'. The reserved value is greater
  than total.  ", "code": "placement.undefined_code", "request_id":
  "req-977e43e7-1a7c-4309-96ec-49a75bdea58a"}]}

  Ideally we should error out if both values are configured, however,
  doing so would be a breaking change. Instead, we can warn if these are
  incompatible and then error our in a future release.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1882821/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1882919] Re: e1000e interface reported as unsupported

2020-11-24 Thread Elod Illes

** Changed in: nova/ussuri
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1882919

Title:
  e1000e interface reported as unsupported

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  Per this downstream bug [1], attempting to boot a Windows Server 2012
  or 2016 image will fail because libosinfo is attempting to configure
  an e1000e VIF which nova does not explicitly support. There doesn't
  appear to be any reason not to support this, since libvirt, and
  specifically QEMU/KVM, support it.

  [1] https://bugzilla.redhat.com/show_bug.cgi?id=1839808

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1882919/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1884214] Re: reserve disk usage for image cache fails on a fresh hypervisor

2020-11-24 Thread Elod Illes

** Changed in: nova/ussuri
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1884214

Title:
  reserve disk usage for image cache fails on a fresh hypervisor

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  If the image cache _base directory does not exists on the hypervisor
  yet and [workarounds]/reserve_disk_resource_for_image_cache = True is
  set in the nova compute config then nova-compute logs a stack trace
  [1] and resource state is not update in placement.

  [1] http://paste.openstack.org/show/794993/

  This issue was reported originally in
  https://bugs.launchpad.net/nova/+bug/1878024 by MarkMielke (mark-
  mielke).

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1884214/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1753676] Re: Live migration not working as Expected when Restarting nova-compute service while migration from source node

2020-11-24 Thread Elod Illes

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Changed in: nova/train
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1753676

Title:
  Live migration not working as Expected when Restarting nova-compute
  service while migration from source node

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  Description
  ===

  Environment: Ubuntu 16.04
  Openstack Version: Pike

  I am trying to migrate VM ( live migration ( block migration ) ) form
  one compute node to another compute node...Everything looks good
  unless I restart nova-compute service, live migration still running
  underneath with help of libvirt, once the vm reaches destination,
  database is not updated properly.

  Steps to reproduce:
  ===

  nova.conf ( libvirt setting on both compute nodes )

  [libvirt]
  live_migration_bandwidth=1200
  live_migration_downtime=100
  live_migration_downtime_steps =3
  live_migration_downtime_delay=10
  live_migration_flag = 
VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE
  virt_type = kvm
  inject_password = False
  disk_cachemodes = network=writeback
  live_migration_uri = "qemu+tcp://nova@%s/system"
  live_migration_tunnelled = False
  block_migration_flag = 
VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_NON_SHARED_INC

  ( default openstack live migration configuration ( pre-copy with no tunneling 
)
  Source vm root disk ( boot from volume with one ephemernal disk (160GB) )

  Trying to migrate vm from compute1 to compute2, below is my source vm.

  | OS-EXT-SRV-ATTR:host | compute1 
   |
  | OS-EXT-SRV-ATTR:hostname | 
testcase1-all-ephemernal-boot-from-vol  
 |
  | OS-EXT-SRV-ATTR:hypervisor_hostname  | compute1 
   |
  | OS-EXT-SRV-ATTR:instance_name| instance-0153

  1) nova live-migration --block-migrate  compute2

  [req-48a3df61-3974-46ac-8019-c4c4a0f8a8c8
  4a8150eb246a4450829331e993f8c3fd f11a5d3631f14c4f879a2e7dddb96c06 -
  default default] pre_live_migration data is
  
LibvirtLiveMigrateData(bdms=,block_migration=True,disk_available_mb=6900736,disk_over_commit=,filename='tmpW5ApOS',graphics_listen_addr_spice=x.x.x.x,graphics_listen_addr_vnc=127.0.0.1,image_type='default',instance_relative_path='504028fc-1381
  -42ca-ad7c-
  
def7f749a722',is_shared_block_storage=False,is_shared_instance_path=False,is_volume_backed=True,migration=,serial_listen_addr=None,serial_listen_ports=,supported_perf_events=,target_connect_addr=)
  pre_live_migration /openstack/venvs/nova-16.0.6/lib/python2.7/site-
  packages/nova/compute/manager.py:5437

  Migration started, able to see the data and memory transfer ( using
  iftop )

  Data transfer between compute nodes using iftop
    
<=  
4.94Gb  4.99Gb  5.01Gb

  Restarted Nova-compute service on source compute node ( where the vm
  is migrating)

  Live migration still it is going, once migration completes, below is
  my total data transfer ( using iftop )

  TX: cum:   17.3MB   peak:   2.50Mb

  rates:   11.1Kb  7.11Kb   463Kb
  RX:97.7GB   4.97Gb

   3.82Kb  1.93Kb  1.87Gb
  TOTAL: 97.7GB   4.97Gb

  Once migration completes, from the destination compute node ( we can
  able to see the virsh domain running)

  root@compute2:~# virsh list --all
   IdName   State
  
   3 instance-0153  running

  From the nova-compute.log

  Instance  has been moved to another host compute1(compute1). There
  are allocations remaining against the source host that might need to
  be removed: {u'resources': {u'VCPU': 8, u'MEMORY_MB': 23808,
  u'DISK_GB': 180}}. _remove_deleted_instances_allocations
  /openstack/venvs/nova-16.0.6/lib/python2.7/site-
  packages/nova/compute/resource_tracker.py:123

  Nova compute still showing 0 vcpus ( but 8 core vm was there )

  Total usable vcpus: 56, total allocated vcpus: 0
  _report_final_resource_view /openstack/venvs/nova-16.0.6/lib/python2.7
  /site-packages/nova/compute/resource_tracker.py:792

  nova show  ( still nova db shows src hostname, db is

[Yahoo-eng-team] [Bug 1805767] Re: The new numa topology in the new flavor extra specs weren't parsed when resize

2020-11-24 Thread Elod Illes

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Changed in: nova/train
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1805767

Title:
  The new numa topology in the new flavor extra specs weren't parsed
  when resize

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  Env:
  host with 2 numa nodes.

  flavor n2 request two instance numa nodes
  flavor n3 request three instance numa nodes

  Reproduce:
  Boot an instance with flavor n2, which scheduled to the host.
  Resize the instance with n3.

  
  The scheduler logs:
  Nov 28 18:27:16 jfz1r03h15 nova-scheduler[47260]: DEBUG nova.virt.hardware 
[None req-953d07bf-8ead-4f21-bd64-1ab12244eec1 admin admin] Attempting to fit 
instance cell 
InstanceNUMACell(cpu_pinning_raw=None,cpu_policy=None,cpu_thread_policy=None,cpu_topology=,cpuset=set([0]),cpuset_reserved=None,id=0,memory=256,pagesize=None)
 on host_cell 
NUMACell(cpu_usage=1,cpuset=set([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53]),id=0,memory=128835,memory_usage=256,mempages=[NUMAPagesTopology,NUMAPagesTopology,NUMAPagesTopology],network_metadata=NetworkMetadata,pinned_cpus=set([]),siblings=[set([43,7]),set([16,52]),set([2,38]),set([8,44]),set([50,14]),set([0,36]),set([51,15]),set([1,37]),set([10,46]),set([11,47]),set([42,6]),set([41,5]),set([9,45]),set([3,39]),set([48,12]),set([49,13]),set([17,53]),set([40,4])])
 {{(pid=48606) _numa_fit_instance_cell 
/opt/stack/nova/nova/virt/hardware.py:1019}}

  
  Nov 28 18:27:16 jfz1r03h15 nova-scheduler[47260]: DEBUG nova.virt.hardware 
[None req-953d07bf-8ead-4f21-bd64-1ab12244eec1 admin admin] Attempting to fit 
instance cell 
InstanceNUMACell(cpu_pinning_raw=None,cpu_policy=None,cpu_thread_policy=None,cpu_topology=,cpuset=set([1]),cpuset_reserved=None,id=1,memory=256,pagesize=None)
 on host_cell 
NUMACell(cpu_usage=1,cpuset=set([18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71]),id=1,memory=129009,memory_usage=256,mempages=[NUMAPagesTopology,NUMAPagesTopology,NUMAPagesTopology],network_metadata=NetworkMetadata,pinned_cpus=set([]),siblings=[set([59,23]),set([65,29]),set([18,54]),set([34,70]),set([24,60]),set([33,69]),set([58,22]),set([67,31]),set([66,30]),set([26,62]),set([35,71]),set([57,21]),set([25,61]),set([19,55]),set([64,28]),set([32,68]),set([27,63]),set([56,20])])
 {{(pid=48606) _numa_fit_instance_cell 
/opt/stack/nova/nova/virt/hardware.py:1019}}

  
  As above, the scheduler only see two instance numa nodes. It means the new 
flavor extra specs weren't parsed.

  The nova-compute log:
  Nov 28 18:27:27 jfz1r03h15 nova-scheduler[47260]: DEBUG 
oslo_service.periodic_task [None req-7aff4535-fe99-48b4-bab9-d206d35412ff None 
None] Running periodic task SchedulerManager._run_periodic_tasks {{(pid=48606) 
run_periodic_tasks 
/usr/local/lib/python2.7/dist-packages/oslo_service/periodic_task.py:219}}
  Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR 
oslo_messaging.rpc.server result = func(ctxt, **new_args)
  Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR 
oslo_messaging.rpc.server   File "/opt/stack/nova/nova/exception_wrapper.py", 
line 79, in wrapped
  Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR 
oslo_messaging.rpc.server function_name, call_dict, binary, tb)
  Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in 
__exit__
  Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR 
oslo_messaging.rpc.server self.force_reraise()
  Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
  Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR 
oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
  Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR 
oslo_messaging.rpc.server   File "/opt/stack/nova/nova/exception_wrapper.py", 
line 69, in wrapped
  Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR 
oslo_messaging.rpc.server return f(self, context, *args, **kw)
  Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR 
oslo_messaging.rpc.server   File "/opt/stack/nova/nova/compute/manager.py", 
line 187, in decorated_function
  Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR 
oslo_messaging.rpc.server "Error: %s", e, instance=instance)
  Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR 
oslo_messaging.rpc.server   File 
"/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in 
__exit__
  Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR

[Yahoo-eng-team] [Bug 1843708] Re: Key-pair is not updated during the rebuild

2020-11-24 Thread Elod Illes

** Changed in: nova/ussuri
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1843708

Title:
  Key-pair is not updated during the rebuild

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  Description
  ===

  When we want to rebuild an instance and change the keypair we can specified 
it with : 
  openstack --os-compute-api-version 2.54  server rebuild --image "Debian 10" 
--key-name key1 instance1

  This comes from this implementation : 
  https://review.opendev.org/#/c/379128/
  
https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/rebuild-keypair-reset.html

  But when rebuilding the instance, Cloud-Init will set the key in 
authorized_keys from 
  http://169.254.169.254/openstack/latest/meta_data.json

  And this meta_data.json uses the keys from instance_extra tables
  But the keypair will be updated in the 'instances' table but not in the 
'instance_extra' table.

  So the keypair is not updated inside the VM

  
  May be this is the function for saving the keypair, but the save() do nothing 
: 
  
https://opendev.org/openstack/nova/src/branch/master/nova/objects/instance.py#L714


  Steps to reproduce
  ==

  - Deploy a DevStack
  - Boot an instance with keypair key1
  - Rebuild it with key2
  - A nova show will show the key_name key2, keypairs object in table 
instance_extra is not updated and you cannot connect with key2 to the instance

  Expected result
  ===
  Connecte to the Vm with the new keypair added during the rebuild call

  Actual result
  =
  The keypair added during the rebuild call is not set in the VM

  Environment
  ===
  I tested it on a Devstack from master and we have the behaviour.
  NOVA : commit 5fa49cd0b8b6015aa61b4312b2ce1ae780c42c64

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1843708/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1847054] Re: kolla-ansible CI: nova-compute-ironic reports errors in the ironic scenario

2020-11-24 Thread Elod Illes

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Changed in: nova/train
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1847054

Title:
  kolla-ansible CI: nova-compute-ironic reports errors in the ironic
  scenario

Status in kolla-ansible:
  Invalid
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  /var/log/kolla/nova/nova-compute-ironic.log
  2019-10-07 07:32:21.268 6 ERROR nova.compute.manager 
[req-9de2cbda-9d8f-4a0f-a5be-de74d26077a2 - - - - -] No compute node record for 
host primary-ironic: nova.exception_Remote.ComputeHostNotFound_Remote: Compute 
host primary-ironic could not be found.
  2019-10-07 07:33:22.454 6 ERROR nova.compute.manager 
[req-9de2cbda-9d8f-4a0f-a5be-de74d26077a2 - - - - -] No compute node record for 
host primary-ironic: nova.exception_Remote.ComputeHostNotFound_Remote: Compute 
host primary-ironic could not be found.
  2019-10-07 07:34:22.416 6 ERROR nova.compute.manager 
[req-9de2cbda-9d8f-4a0f-a5be-de74d26077a2 - - - - -] No compute node record for 
host primary-ironic: nova.exception_Remote.ComputeHostNotFound_Remote: Compute 
host primary-ironic could not be found.
  2019-10-07 07:35:22.422 6 ERROR nova.compute.manager 
[req-9de2cbda-9d8f-4a0f-a5be-de74d26077a2 - - - - -] No compute node record for 
host primary-ironic: nova.exception_Remote.ComputeHostNotFound_Remote: Compute 
host primary-ironic could not be found.
  2019-10-07 07:36:24.422 6 ERROR nova.compute.manager 
[req-9de2cbda-9d8f-4a0f-a5be-de74d26077a2 - - - - -] No compute node record for 
host primary-ironic: nova.exception_Remote.ComputeHostNotFound_Remote: Compute 
host primary-ironic could not be found.
  2019-10-07 07:37:26.423 6 ERROR nova.compute.manager 
[req-9de2cbda-9d8f-4a0f-a5be-de74d26077a2 - - - - -] No compute node record for 
host primary-ironic: nova.exception_Remote.ComputeHostNotFound_Remote: Compute 
host primary-ironic could not be found.
  2019-10-07 07:38:27.419 6 ERROR nova.compute.manager 
[req-9de2cbda-9d8f-4a0f-a5be-de74d26077a2 - - - - -] No compute node record for 
host primary-ironic: nova.exception_Remote.ComputeHostNotFound_Remote: Compute 
host primary-ironic could not be found.
  2019-10-07 07:39:29.430 6 ERROR nova.compute.manager 
[req-9de2cbda-9d8f-4a0f-a5be-de74d26077a2 - - - - -] No compute node record for 
host primary-ironic: nova.exception_Remote.ComputeHostNotFound_Remote: Compute 
host primary-ironic could not be found.
  2019-10-07 07:40:30.420 6 ERROR nova.compute.manager 
[req-9de2cbda-9d8f-4a0f-a5be-de74d26077a2 - - - - -] No compute node record for 
host primary-ironic: nova.exception_Remote.ComputeHostNotFound_Remote: Compute 
host primary-ironic could not be found.
  2019-10-07 07:41:32.420 6 ERROR nova.compute.manager 
[req-9de2cbda-9d8f-4a0f-a5be-de74d26077a2 - - - - -] No compute node record for 
host primary-ironic: nova.exception_Remote.ComputeHostNotFound_Remote: Compute 
host primary-ironic could not be found.

To manage notifications about this bug go to:
https://bugs.launchpad.net/kolla-ansible/+bug/1847054/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1848308] Re: Impossible to set instance CPU policy to 'shared' through flavor image property

2020-11-24 Thread Elod Illes

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Changed in: nova/train
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1848308

Title:
  Impossible to set instance CPU policy to 'shared' through flavor image
  property

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  According to the content defined in doc
  https://docs.openstack.org/nova/latest/admin/cpu-topologies.html
  #customizing-instance-cpu-pinning-policies, the instance CPU
  allocation policy could be explicitly set through instance flavor
  property 'hw:cpu_policy', or its corresponding image property
  'hw_cpu_policy'.

  One general rule to solve the conflict between these two properties, if I 
understand it correctly, is if 'hw:cpu_policy' is not given with any policy, 
then it should take the policy from 'hw_cpu_policy'. But currently, if 
'hw_cpu_policy' is set as 'shared' and no value set in 'hw:cpu_policy' the 
logic in 'hardware.get_cpu_policy_constraint' tells that there is no CPU policy 
(throug return a value of None).
  In this case, it should return the "shared" policy.

  This issue is first reported in https://review.opendev.org/#/c/688603/
  but without tracking with a bug ID.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1848308/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1879500] Re: Unable to rescue using volume snapshot based images

2020-11-24 Thread Elod Illes

** Changed in: nova/ussuri
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1879500

Title:
  Unable to rescue using volume snapshot based images

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  Description
  ===

  While working on bug #1876330 it was observed that attempts to rescue
  an instance with a volume snapshot image were permitted but would
  ultimately fail to boot the instance with file based imagebackends
  *or* fail out right with the rbd imagebackend. This is due to these
  images being metadata containers and containing no image data, thus
  resulting in Nova attempting to rescue with zero length images.

  Steps to reproduce
  ==

  * Launch a volume backed instance
  * Snapshot the instance using the imageCreate API.
  * Attempt to rescue the image using the created image.

  Expected result
  ===

  The request is rejected as there is no support for rescuing using a
  volume snapshot based image.

  Actual result
  =

  The request is acceptable and either fails to boot the instance or
  fails earlier due to the zero length image.

  
  Environment
  ===
  1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/

 master

  2. Which hypervisor did you use?
 (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
 What's the version of that?

 Libvirt + KVM

  2. Which storage type did you use?
 (For example: Ceph, LVM, GPFS, ...)
 What's the version of that?

 N/A

  3. Which networking type did you use?
 (For example: nova-network, Neutron with OpenVSwitch, ...)

 N/A

  Logs & Configs
  ==

  See bug #1876330.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1879500/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1891547] Re: AttributeError: module 'libvirt' has no attribute 'VIR_ERR_DEVICE_MISSING'

2020-11-24 Thread Elod Illes

** Changed in: nova/ussuri
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1891547

Title:
  AttributeError: module 'libvirt' has no attribute
  'VIR_ERR_DEVICE_MISSING'

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  New
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  Description
  ===

  I7eb86edc130d186a66c04b229d46347ec5c0b625 introduced support for the
  libvirt hot unplug error code VIR_ERR_DEVICE_MISSING that was itself
  introduced in libvirt v4.1.0. The change did not however cover
  versions < v4.1.0 such as v4.0.0 installed in our bionic based CI test
  envs, causing attribute errors when we attempt to reference.

  Steps to reproduce
  ==

  * Attempt to detach a busy or missing device from an instance with
  libvirt < v4.1.0 installed.

  Expected result
  ===
  The correct error codes are referenced and checked to confirm what happened.

  Actual result
  =
  AttributeError: module 'libvirt' has no attribute 'VIR_ERR_DEVICE_MISSING' as 
the error code is not available prior to libvirt v4.1.0.

  Environment
  ===
  1. Exact version of OpenStack you are running. See the following
    list for all releases: http://docs.openstack.org/releases/

     master

  2. Which hypervisor did you use?
     (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
     What's the version of that?

     libvirt + *

  2. Which storage type did you use?
     (For example: Ceph, LVM, GPFS, ...)
     What's the version of that?

     N/A

  3. Which networking type did you use?
     (For example: nova-network, Neutron with OpenVSwitch, ...)

     N/A

  Logs & Configs
  ==

  
https://zuul.opendev.org/t/openstack/build/2d57acc8c90741e6ba5a6795195e3ffd/log/controller/logs/screen-n-cpu.txt?severity=4

  Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 
nova-compute[32162]: ERROR oslo_messaging.rpc.server Traceback (most recent 
call last):
  Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 
nova-compute[32162]: ERROR oslo_messaging.rpc.server   File 
"/usr/local/lib/python3.6/dist-packages/oslo_messaging/rpc/server.py", line 
165, in _process_incoming
  Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 
nova-compute[32162]: ERROR oslo_messaging.rpc.server res = 
self.dispatcher.dispatch(message)
  Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 
nova-compute[32162]: ERROR oslo_messaging.rpc.server   File 
"/usr/local/lib/python3.6/dist-packages/oslo_messaging/rpc/dispatcher.py", line 
273, in dispatch
  Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 
nova-compute[32162]: ERROR oslo_messaging.rpc.server return 
self._do_dispatch(endpoint, method, ctxt, args)
  Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 
nova-compute[32162]: ERROR oslo_messaging.rpc.server   File 
"/usr/local/lib/python3.6/dist-packages/oslo_messaging/rpc/dispatcher.py", line 
193, in _do_dispatch
  Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 
nova-compute[32162]: ERROR oslo_messaging.rpc.server result = func(ctxt, 
**new_args)
  Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 
nova-compute[32162]: ERROR oslo_messaging.rpc.server   File 
"/opt/stack/nova/nova/exception_wrapper.py", line 78, in wrapped
  Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 
nova-compute[32162]: ERROR oslo_messaging.rpc.server function_name, 
call_dict, binary)
  Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 
nova-compute[32162]: ERROR oslo_messaging.rpc.server   File 
"/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in 
__exit__
  Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 
nova-compute[32162]: ERROR oslo_messaging.rpc.server self.force_reraise()
  Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 
nova-compute[32162]: ERROR oslo_messaging.rpc.server   File 
"/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
  Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 
nova-compute[32162]: ERROR oslo_messaging.rpc.server 
six.reraise(self.type_, self.value, self.tb)
  Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 
nova-compute[32162]: ERROR oslo_messaging.rpc.server   File 
"/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
  Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 
nova-compute[32162]: ERROR oslo_messaging.rpc.server raise value
  Aug 13 13:50:50.038271

[Yahoo-eng-team] [Bug 1870357] Re: raw disk usage is not correctly reported during resource update

2020-11-24 Thread Elod Illes

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Changed in: nova/train
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1870357

Title:
  raw disk usage is not correctly reported  during resource update

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  Description
  ===

  available_disk_least r(free disk for a new instance) seems not be calculated
  correctly when instance is in raw (images_type=raw) and preallocate_space 
option is not set.
  This may lead  placement/scheduler to take wrong decision regarding space 
availabilty on hosts.

  when total amount of over_committed_disk_size is evaluated on a host, it 
seems that in 
  raw disk type case it is always set to 0.

  
  Steps to reproduce

  ===
  on a master devstack:

  . devstack/openrc admin admin
  $openstack hypervisor show alex-devstack | grep  available_least
  | disk_available_least | 381

  # create an instance with 80GB of disk with qcow2 by default:
  $openstack server create  --flavor m1.large --image cirros-0.4.0-x86_64-disk 
--nic net-id=private  alex

  # few seconds later we can see disk available is minus by 80, all is fine.
  $ openstack hypervisor show alex-devstack | grep  available_least
  | disk_available_least | 301 

  # delete instance
  $ openstack server delete xxx

  # Now set images_type = raw in [libvirt] section in /etc/nova/nova-cpu.conf
  $ grep images_type /etc/nova/nova-cpu.conf 
  images_type = raw
  # restart compute
  $ sudo service devstack@n-cpu restart

  # respawn the same instance, it will create a instance with raw disk now
  $openstack server create  --flavor m1.large --image cirros-0.4.0-x86_64-disk 
--nic net-id=private  alex

  # few seconds later we can see disk available is minus by only by 3GB which 
is not correct:
  openstack hypervisor show alex-devstack | grep  available_least
  | disk_available_least | 378

  # only allocated size use is decreased:
  $ ls -lhs 
/opt/stack/data/nova/instances/31e46f53-6223-40c3-ad84-0f19d10b52be/disk
  2.6G -rw-r--r-- 1 libvirt-qemu kvm 80G Apr  1 10:00 
/opt/stack/data/nova/instances/31e46f53-6223-40c3-ad84-0f19d10b52be/disk

  Expected result
  ===
  calculation of over_committed_disk_size must be done for raw disk (at least 
on not preallocated one)

  Actual result
  =
  over_committed_disk_size is set to 0 in all cases for raw disk.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1870357/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1887946] Re: Unable to detach volume from instance when previously removed from the inactive config

2020-11-24 Thread Elod Illes

** Changed in: nova/ussuri
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1887946

Title:
  Unable to detach volume from instance when previously removed from the
  inactive config

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  New
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  Description
  ===
  $subject, can often be encountered when previous attempts to detach a volume 
have failed due to the device still being used within the guestOS.

  This initial attempt will remove the device from the inactive config
  but fail to remove it from the active config. Any subsequent attempt
  will then fail as the initial call continues to attempt to remove the
  device from both the inactive and live configs.

  Prior to libvirt v4.1.0 this raised either a VIR_ERR_INVALID_ARG or
  VIR_ERR_OPERATION_FAILED error code from libvirt that n-cpu would
  handle, retrying the detach against the live config.

  Since libvirt v4.1.0 however this now raises a VIR_ERR_DEVICE_MISSING
  error code. This is not handled by Nova resulting in no attempt being
  made to detach the device from the live config.

  Steps to reproduce
  ==

  # Start with a volume attached as vdb (ignore the source ;))

  $ sudo virsh domblklist 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8
   Target   Source
  

   vda  
/opt/stack/data/nova/instances/4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8/disk
   vdb  
iqn.2010-10.org.openstack:volume-37cc97fa-9776-4b31-8f3f-cb1f18ff1db6/0

  # Detach from the inactive config

  $ sudo virsh detach-disk --config 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8 vdb
  Disk detached successfully

  # Confirm the device is still listed on the live config

  $ sudo virsh domblklist 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8
   Target   Source
  

   vda  
/opt/stack/data/nova/instances/4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8/disk
   vdb  
iqn.2010-10.org.openstack:volume-37cc97fa-9776-4b31-8f3f-cb1f18ff1db6/0

  # and removed from the persistent config

  $ sudo virsh domblklist --inactive 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8
   Target   Source
  

   vda  
/opt/stack/data/nova/instances/4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8/disk

  # Attempt to detach the volume

  $ openstack server remove volume 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8
  test

  Expected result
  ===
  The initial attempt to detach the device fails as the device isn't present in 
the inactive config but we continue to ensure the device is removed from the 
live config.

  Actual result
  =
  n-cpu doesn't handle the initial failure as the raised libvirt error code 
isn't recongnised.

  Environment
  ===
  1. Exact version of OpenStack you are running. See the following
    list for all releases: http://docs.openstack.org/releases/

     b7161fe9b92f0045e97c300a80e58d32b6f49be1

  2. Which hypervisor did you use?
     (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
     What's the version of that?

     libvirt + KVM

  2. Which storage type did you use?
     (For example: Ceph, LVM, GPFS, ...)
     What's the version of that?

     N/A

  3. Which networking type did you use?
     (For example: nova-network, Neutron with OpenVSwitch, ...)

     N/A

  Logs & Configs
  ==

  $ openstack server remove volume 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8 test ; 
journalctl -u devstack@n-cpu -f
  [..]
  Jul 16 17:26:53 localhost.localdomain nova-compute[190210]: DEBUG 
oslo_concurrency.lockutils [None req-16d62ef9-d492-4012-bb6d-37e5611ede50 admin 
admin] Lock "4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8" released by 
"nova.compute.manager.ComputeManager.detach_volume..do_detach_volume" 
:: held 0.141s {{(pid=190210) inner 
/usr/local/lib/python3.7/site-packages/oslo_concurrency/lockutils.py:371}}
  Jul 16 17:26:53 localhost.localdomain nova-compute[190210]: ERROR 
oslo_messaging.rpc.server [None req-16d62ef9-d492-4012-bb6d-37e5611ede50 admin 
admin] Exception during message handling: libvirt.libvirtError: device not 
found: no target device vdb
  Jul 16 17:26:53 localhost.localdomain nova-compute[190210]: ERROR 
oslo_messaging.rpc.server Traceback (most recent call last):
  Jul 16 17:26:53 localhost.localdomain nova-compute[190210]: ERROR 
oslo_messaging.rpc.server   File

[Yahoo-eng-team] [Bug 1889108] Re: failures during driver.pre_live_migration remove source attachments during rollback

2020-11-24 Thread Elod Illes

** Changed in: nova/ussuri
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1889108

Title:
  failures during driver.pre_live_migration remove source attachments
  during rollback

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  Description
  ===

  $subject, the initial rollback and removal of any destination volume
  attachments is then repeated for the source volume attachments,
  leaving the volumes connected on the host but listed as `available` in
  cinder.

  Steps to reproduce
  ==
  Cause a failure during the call to driver.pre_live_migration with volumes 
attached.

  Expected result
  ===
  Any volume attachments for the destination host are deleted during the 
rollback.

  Actual result
  =
  Both sets of volumes attachments for the destination *and* the source are 
removed.

  Environment
  ===
  1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/

 eeeb964a5f65e6ac31dfb34b1256aaf95db5ba3a

  2. Which hypervisor did you use?
 (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
 What's the version of that?

 libvirt + KVM

  2. Which storage type did you use?
 (For example: Ceph, LVM, GPFS, ...)
 What's the version of that?

 N/A

  3. Which networking type did you use?
 (For example: nova-network, Neutron with OpenVSwitch, ...)

 N/A

  Logs & Configs
  ==

  When live-migration fails with attached volume changed to active and still in 
nova
  https://bugzilla.redhat.com/show_bug.cgi?id=1860914

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1889108/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1889257] Re: Live migration of realtime instances is broken

2020-11-24 Thread Elod Illes

** Changed in: nova/ussuri
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1889257

Title:
  Live migration of realtime instances is broken

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  Attempting to live migrate an instance with realtime enabled fails on
  master (commit d4c857dfcb1). This appears to be a bug with the live
  migration of pinned instances feature introduced in Train.

  # Steps to reproduce

  Create a server using realtime attributes and then attempt to live
  migrate it. For example:

$ openstack flavor create --ram 1024 --disk 0 --vcpu 4 \
  --property 'hw:cpu_policy=dedicated' \
  --property 'hw:cpu_realtime=yes' \
  --property 'hw:cpu_realtime_mask=^0-1' \
  realtime

$ openstack server create --os-compute-api-version=2.latest \
  --flavor realtime --image cirros-0.5.1-x86_64-disk --nic none \
  --boot-from-volume 1 --wait \
  test.realtime

$ openstack server migrate --live-migration test.realtime

  # Expected result

  Instance should be live migrated.

  # Actual result

  The live migration never happens. Looking at the logs we see the
  following error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/eventlet/hubs/hub.py", line 
461, in fire_timers
timer()
  File "/usr/local/lib/python3.6/dist-packages/eventlet/hubs/timer.py", 
line 59, in __call__
cb(*args, **kw)
  File "/usr/local/lib/python3.6/dist-packages/eventlet/event.py", line 
175, in _do_send
waiter.switch(result)
  File "/usr/local/lib/python3.6/dist-packages/eventlet/greenthread.py", 
line 221, in main
result = function(*args, **kwargs)
  File "/opt/stack/nova/nova/utils.py", line 670, in context_wrapper
return func(*args, **kwargs)
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 8966, in 
_live_migration_operation
# is still ongoing, or failed
  File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", 
line 220, in __exit__
self.force_reraise()
  File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", 
line 196, in force_reraise
six.reraise(self.type_, self.value, self.tb)
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
raise value
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 8959, in 
_live_migration_operation
#  2. src==running, dst==paused
  File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 658, in migrate
destination, params=params, flags=flags)
  File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 
190, in doit
result = proxy_call(self._autowrap, f, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 
148, in proxy_call
rv = execute(f, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 
129, in execute
six.reraise(c, e, tb)
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
raise value
  File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 83, 
in tworker
rv = meth(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/libvirt.py", line 1745, in 
migrateToURI3
if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', 
dom=self)
libvirt.libvirtError: vcpussched attributes 'vcpus' must not overlap

  Looking further, we see there are issues with the XML we are
  generating for the destination. Compare what we have on the source
  before updating the XML for the destination:

DEBUG nova.virt.libvirt.migration [-] _update_numa_xml input xml=
  ...
  
4096







  
 {{(pid=12600) _update_numa_xml 
/opt/stack/nova/nova/virt/libvirt/migration.py:97}

  To what we have after the update:

DEBUG nova.virt.libvirt.migration [-] _update_numa_xml output xml=
  ...
  
4096







  
  ...

 {{(pid=12600) _update_numa_xml 
/opt/stack/nova/nova/virt/libvirt/migration.py:131}}

  The issue is the 'vcpusched' elements. We're assuming there are only
  one of these elements when updating the XML for the destination [1].
  Have to figure out why there are multiple elements and how best to
  handle this (likely by deleting and recreating everything).

  I suspect the reason we didn't spot this is because libvirt is
  rewriting the XML on us. This is what nova is providing libvirt upon
  boot:

DEBUG

[Yahoo-eng-team] [Bug 1890428] Re: format_message() is specifica novaException is not should raise in generic exeptions

2020-11-24 Thread Elod Illes

** Changed in: nova/ussuri
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1890428

Title:
  format_message() is specifica novaException is not should raise in
  generic  exeptions

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  In [1] we used format_message() to print the exception info, but the
  format_message() was specific for nova exception, we dont should do
  for that, just need to print exec is enough.

  [1]https://review.opendev.org/#/c/631244/69/nova/compute/manager.py@2599

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1890428/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1866373] Re: URLS in os-keypairs 'links' body are incorrect

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1866373

Title:
  URLS in os-keypairs 'links' body are incorrect

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  Similar to https://bugs.launchpad.net/nova/+bug/1864428, the URLs in
  the 'links' element of the response are incorrect. They read
  '/keypairs', not '/os-keypairs'. From the current api-ref
  (2020-03-06):

  {
  "keypairs": [
  {
  "keypair": {
  "fingerprint": 
"7e:eb:ab:24:ba:d1:e1:88:ae:9a:fb:66:53:df:d3:bd",
  "name": "keypair-5d935425-31d5-48a7-a0f1-e76e9813f2c3",
  "type": "ssh",
  "public_key": "ssh-rsa 
B3NzaC1yc2EDAQABAAABAQCkF3MX59OrlBs3dH5CU7lNmvpbrgZxSpyGjlnE8Flkirnc/Up22lpjznoxqeoTAwTW034k7Dz6aYIrZGmQwe2TkE084yqvlj45Dkyoj95fW/sZacm0cZNuL69EObEGHdprfGJQajrpz22NQoCD8TFB8Wv+8om9NH9Le6s+WPe98WC77KLw8qgfQsbIey+JawPWl4O67ZdL5xrypuRjfIPWjgy/VH85IXg/Z/GONZ2nxHgSShMkwqSFECAC5L3PHB+0+/12M/iikdatFSVGjpuHvkLOs3oe7m6HlOfluSJ85BzLWBbvva93qkGmLg4ZAc8rPh2O+YIsBUHNLLMM/oQp
 Generated-by-Nova\n"
  }
  }
  ],
  "keypairs_links": [
  {
  "href": 
"http://openstack.example.com/v2.1/6f70656e737461636b20342065766572/keypairs?limit=1=keypair-5d935425-31d5-48a7-a0f1-e76e9813f2c3;,
  "rel": "next"
  }
  ]
  }

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1866373/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1866380] Re: Ironic driver hash ring treats hostnames differing only by case as different hostnames

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1866380

Title:
  Ironic driver hash ring treats hostnames differing only by case as
  different hostnames

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  In Progress
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  Recently we had a customer case where attempts to add new ironic nodes
  to an existing undercloud resulted in half of the nodes failing to be
  detected and added to nova. Ironic API returned all of the newly added
  nodes when called by the driver, but half of the nodes were not
  returned to the compute manager by the driver.

  There was only one nova-compute service managing all of the ironic
  nodes of the all-in-one typical undercloud deployment.

  After days of investigation and examination of a database dump from
  the customer, we noticed that at some point the customer had changed
  the hostname of the machine from something containing uppercase
  letters to the same name but all lowercase. The nova-compute service
  record had the mixed case name and the CONF.host
  (socket.gethostname()) had the lowercase name.

  The hash ring logic adds all of the nova-compute service hostnames
  plus CONF.host to hash ring, then the ironic driver reports only the
  nodes it owns by retrieving a service hostname from the ring based on
  a hash of each ironic node UUID.

  Because of the machine hostname change, the hash ring contained, for
  example: {'MachineHostName', 'machinehostname'} when it should have
  contained only one hostname. And because the hash ring contained two
  hostnames, the driver was able to retrieve only half of the nodes as
  nodes that it owned. So half of the new nodes were excluded and not
  added as new compute nodes.

  I propose adding some logging to the driver related to the hash ring
  to help with debugging in the future.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1866380/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1866937] Re: Requests to neutron API do not use retries

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1866937

Title:
  Requests to neutron API do not use retries

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  We have a customer bug report downstream [1] where nova occasionally
  fails to carry out server actions requiring calls to neutron API if
  haproxy happens to close a connection after idle time of 10 seconds at
  nearly the same time as an incoming request that attempts to re-use
  the connection while it is being torn down. Here is an excerpt from
  [1]:

   The result of our investigation, the cause is follows.

   1. neutron-client in nova uses connection pool ( urllib3/requests )
  for http.

   2. Sometimes, http connection is reused for different requests.

   3. Connection between neutron-client and haproxy is closed from
  haproxy when it is in idle for 10 seconds.

   4. If reusing connection from client side and closing connection from 
haproxy side are happend almost same time,
  client gets RST and end with "bad status line".

  To address this problem, we can add a new config option for neutron
  client (similar to the existing config options we have for cinder
  client and glance client retries) to be more resilient during such
  scenarios.

  [1] https://bugzilla.redhat.com/show_bug.cgi?id=1788853

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1866937/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1867380] Re: nova-live-migration and nova-grenade-multinode fail due to n-cpu restarting slowly after being reconfigured for ceph

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1867380

Title:
  nova-live-migration and nova-grenade-multinode fail due to n-cpu
  restarting slowly after being reconfigured for ceph

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  Fix Committed
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  Description
  ===

  $subject, it appears the current check of using grep to find active
  n-cpu processes isn't enough and we actually need to wait for the
  services to report as UP before starting to run Tempest.

  In the following we can see Tempest starting at 2020-03-13
  13:01:19.528 while n-cpu within the instance isn't marked as UP for
  another ~20 seconds:

  
https://zuul.opendev.org/t/openstack/build/5c213f869f324b69a423a983034d4539/log
  /job-output.txt#6305

  
https://zuul.opendev.org/t/openstack/build/5c213f869f324b69a423a983034d4539/log/logs/screen-n-cpu.txt#3825

  
https://zuul.opendev.org/t/openstack/build/5c213f869f324b69a423a983034d4539/log/logs/subnode-2/screen-n-cpu.txt#3534

  I've only seen this on stable/pike at present but it could potentially
  hit all branches with slow enough CI nodes.

  
  Steps to reproduce
  ==
  Run nova-live-migration on slow CI nodes.

  Expected result
  ===
  nova/tests/live_migration/hooks/ceph.sh waits until hosts are marked as UP 
before running Tempest.

  Actual result
  =
  nova/tests/live_migration/hooks/ceph.sh checks for running n-cpu processes 
and then immediately starts Tempest.

  Environment
  ===
  1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/

 stable/pike but you be present on other branches with slow enough
  CI nodes.

  2. Which hypervisor did you use?
 (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
 What's the version of that?

 Libvirt / KVM.

  2. Which storage type did you use?
 (For example: Ceph, LVM, GPFS, ...)
 What's the version of that?

 N/A

  3. Which networking type did you use?
 (For example: nova-network, Neutron with OpenVSwitch, ...)

 N/A

  Logs & Configs
  ==

  Mar 13 13:01:39.170201 ubuntu-xenial-rax-iad-0015199005 nova-compute[30153]: 
DEBUG nova.compute.manager [None req-beafe617-34df-4bec-9ff6-4a0b7bebb15f None 
None] [instance: 74932102-3737-4f8f-9002-763b2d580c3a] Instance spawn was 
interrupted before instance_claim, setting instance to ERROR state 
{{(pid=30153) _error_out_instances_whose_build_was_interrupted 
/opt/stack/new/nova/nova/compute/manager.py:1323}}
  Mar 13 13:01:39.255008 ubuntu-xenial-rax-iad-0015199005 nova-compute[30153]: 
DEBUG nova.compute.manager [None req-beafe617-34df-4bec-9ff6-4a0b7bebb15f None 
None] [instance: 042afab0-fbef-4506-84e2-1f54cb9d67ca] Instance spawn was 
interrupted before instance_claim, setting instance to ERROR state 
{{(pid=30153) _error_out_instances_whose_build_was_interrupted 
/opt/stack/new/nova/nova/compute/manager.py:1323}}
  Mar 13 13:01:39.322508 ubuntu-xenial-rax-iad-0015199005 nova-compute[30153]: 
DEBUG nova.compute.manager [None req-beafe617-34df-4bec-9ff6-4a0b7bebb15f None 
None] [instance: cc293f53-7428-4e66-9841-20cce219e24f] Instance spawn was 
interrupted before instance_claim, setting instance to ERROR state 
{{(pid=30153) _error_out_instances_whose_build_was_interrupted 
/opt/stack/new/nova/nova/compute/manager.py:1323}}

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1867380/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1774249] Re: update_available_resource will raise DiskNotFound after resize but before confirm

2020-11-24 Thread Elod Illes

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Changed in: nova/train
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1774249

Title:
  update_available_resource will raise DiskNotFound after resize but
  before confirm

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ocata series:
  Triaged
Status in OpenStack Compute (nova) pike series:
  Fix Committed
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  Original reported in RH Bugzilla:
  https://bugzilla.redhat.com/show_bug.cgi?id=1584315

  Tested on OSP12 (Pike), but appears to be still present on master.
  Should only occur if nova compute is configured to use local file
  instance storage.

  Create instance A on compute X

  Resize instance A to compute Y
Domain is powered off
/var/lib/nova/instances/ renamed to _resize on X
Domain is *not* undefined

  On compute X:
update_available_resource runs as a periodic task
First action is to update self
rt calls driver.get_available_resource()
...calls _get_disk_over_committed_size_total
...iterates over all defined domains, including the ones whose disks we 
renamed
...fails because a referenced disk no longer exists

  Results in errors in nova-compute.log:

  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager 
[req-bd52371f-c6ec-4a83-9584-c00c5377acd8 - - - - -] Error updating resources 
for node compute-0.localdomain.: DiskNotFound: No disk at 
/var/lib/nova/instances/f3ed9015-3984-43f4-b4a5-c2898052b47d/disk
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager Traceback (most 
recent call last):
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6695, in 
update_available_resource_for_node
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager 
rt.update_available_resource(context, nodename)
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 641, 
in update_available_resource
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager resources = 
self.driver.get_available_resource(nodename)
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5892, in 
get_available_resource
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager 
disk_over_committed = self._get_disk_over_committed_size_total()
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7393, in 
_get_disk_over_committed_size_total
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager config, 
block_device_info)
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7301, in 
_get_instance_disk_info_from_config
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager dk_size = 
disk_api.get_allocated_disk_size(path)
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/virt/disk/api.py", line 156, in 
get_allocated_disk_size
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager return 
images.qemu_img_info(path).disk_size
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/virt/images.py", line 57, in 
qemu_img_info
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager raise 
exception.DiskNotFound(location=path)
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager DiskNotFound: No 
disk at /var/lib/nova/instances/f3ed9015-3984-43f4-b4a5-c2898052b47d/disk

  And resource tracker is no longer updated. We can find lots of these
  in the gate.

  Note that change Icec2769bf42455853cbe686fb30fda73df791b25 nearly
  mitigates this, but doesn't because task_state is not set while the
  instance is awaiting confirm.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1774249/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1831771] Re: UnexpectedDeletingTaskStateError exception can leave traces of VIFs on host

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1831771

Title:
  UnexpectedDeletingTaskStateError exception can leave traces of VIFs on
  host

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  This was originally reported in Bugzilla

  https://bugzilla.redhat.com/show_bug.cgi?id=1668159

  The 'UnexpectedDeletingTaskStateError' exception can be raised by
  something like aborting a large heat stack, where the instance hasn't
  finished setting up before the stack is aborted and the instances
  deleted.

  https://github.com/openstack/nova/blob/19.0.0/nova/db/sqlalchemy/api.py#L2864

  We handle this in the compute manager and as part of that handling, we
  clean up the resource tracking of network interfaces.

  
https://github.com/openstack/nova/blob/19.0.0/nova/compute/manager.py#L2034-L2040

  However, we don't unplug these interfaces. This can result in things
  being left over on the host.

  We should attempt to unplug VIFs as part of this cleanup.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1831771/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1844929] Re: grenade jobs failing due to "Timed out waiting for response from cell" in scheduler

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1844929

Title:
  grenade jobs failing due to "Timed out waiting for response from cell"
  in scheduler

Status in grenade:
  Invalid
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  Seen here:

  
https://zuul.opendev.org/t/openstack/build/d53346210978403f888b85b82b2fe0c7/log/logs/screen-n-sch.txt.gz?severity=3#2368

  Sep 22 00:50:54.174385 ubuntu-bionic-ovh-gra1-0011664420 nova-
  scheduler[18043]: WARNING nova.context [None req-
  1929039e-1517-4326-9700-738d4b570ba6 tempest-
  AttachInterfacesUnderV243Test-2009753731 tempest-
  AttachInterfacesUnderV243Test-2009753731] Timed out waiting for
  response from cell 8acfb79b-2e40-4e1c-bc3d-d404dac6db90

  Looks like something is causing timeouts reaching cell1 during grenade
  runs. The only errors I see in the rabbit logs are these for the uwsgi
  (API) servers:

  =ERROR REPORT 22-Sep-2019::00:35:30 ===

  closing AMQP connection <0.1511.0> (217.182.141.188:48492 ->
  217.182.141.188:5672 - uwsgi:19453:72e08501-61ca-4ade-865e-
  f0605979ed7d):

  missed heartbeats from client, timeout: 60s

  --

  It looks like we don't have mysql logs in this grenade run, maybe we
  need a fix like this somewhere for grenade:

  
https://github.com/openstack/devstack/commit/f92c346131db2c89b930b1a23f8489419a2217dc

  logstash shows 1101 hits in the last 7 days, since Sept 17 actually:

  
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Timed%20out%20waiting%20for%20response%20from%20cell%5C%22%20AND%20tags%3A%5C%22screen-n-sch.txt%5C%22=7d

  check and gate queues, all failures. It also appears to only show up
  on fortnebula and OVH nodes, primarily fortnebula. I wonder if there
  is a performing/timing issue if those nodes are slower and we aren't
  waiting for something during the grenade upgrade before proceeding.

To manage notifications about this bug go to:
https://bugs.launchpad.net/grenade/+bug/1844929/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1837199] Re: nova-manage Tracebeck on missing arg

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1837199

Title:
  nova-manage  Tracebeck on missing arg

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Confirmed
Status in OpenStack Compute (nova) stein series:
  Confirmed
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  # nova-manage cell_v2
  An error has occurred:
  Traceback (most recent call last):
    File "/usr/local/lib/python3.7/site-packages/oslo_config/cfg.py", line 
3179, in __getattr__
  return getattr(self._conf._namespace, name)
  AttributeError: '_Namespace' object has no attribute 'action_fn'

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/opt/stack/nova/nova/cmd/manage.py", line 2205, in main
  fn, fn_args, fn_kwargs = cmd_common.get_action_fn()
    File "/opt/stack/nova/nova/cmd/common.py", line 169, in get_action_fn
  fn = CONF.category.action_fn
    File "/usr/local/lib/python3.7/site-packages/oslo_config/cfg.py", line 
3181, in __getattr__
  raise NoSuchOptError(name)
  oslo_config.cfg.NoSuchOptError: no such option action_fn in group [DEFAULT]

  # nova-manage cell_v2 help
  usage: nova-manage cell_v2 [-h]
     
{create_cell,delete_cell,delete_host,discover_hosts,list_cells,list_hosts,map_cell0,map_cell_and_hosts,map_instances,simple_cell_setup,update_cell,verify_instance}
     ...
  nova-manage cell_v2: error: argument action: invalid choice: 'help' (choose 
from 'create_cell', 'delete_cell', 'delete_host', 'discover_hosts', 
'list_cells', 'list_hosts', 'map_cell0', 'map_cell_and_hosts', 'map_instances', 
'simple_cell_setup', 'update_cell', 'verify_instance')

  # nova-manage cell_v2 -h
  usage: nova-manage cell_v2 [-h]
     
{create_cell,delete_cell,delete_host,discover_hosts,list_cells,list_hosts,map_cell0,map_cell_and_hosts,map_instances,simple_cell_setup,update_cell,verify_instance}
     ...

  positional arguments:
    
{create_cell,delete_cell,delete_host,discover_hosts,list_cells,list_hosts,map_cell0,map_cell_and_hosts,map_instances,simple_cell_setup,update_cell,verify_instance}

  optional arguments:
    -h, --helpshow this help message and exit

  python version:
  /usr/bin/python3 --version
  Python 3.7.3

  nova version:
  $ git log -1
  commit 78f9961d293e3b3e0ac62345b78abb1c9e2bd128 (HEAD -> master, 
origin/master, origin/HEAD)

  oslo.config  6.11.0

  Instead of printing Traceback, nova-manage should give a hint for the
  user choices.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1837199/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1852458] Re: "create" instance action not created when instance is buried in cell0

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1852458

Title:
  "create" instance action not created when instance is buried in cell0

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ocata series:
  Triaged
Status in OpenStack Compute (nova) pike series:
  Triaged
Status in OpenStack Compute (nova) queens series:
  Triaged
Status in OpenStack Compute (nova) rocky series:
  Triaged
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  Before cell0 was introduced the API would create the "create" instance
  action for each instance in the nova cell database before casting off
  to conductor to do scheduling:

  https://github.com/openstack/nova/blob/mitaka-
  eol/nova/compute/api.py#L1180

  Note that conductor failed to "complete" the action with a failure
  event:

  https://github.com/openstack/nova/blob/mitaka-
  eol/nova/conductor/manager.py#L374

  But at least the action was created.

  Since then, with cell0, if scheduling fails the instance is buried in
  the cell0 database but no instance action is created. To illustrate, I
  disabled the single nova-compute service on my devstack host and
  created a server which failed with NoValidHost:

  $ openstack server show build-fail1 -f value -c fault
  {u'message': u'No valid host was found. ', u'code': 500, u'created': 
u'2019-11-13T15:57:13Z'}

  When listing instance actions I expected to see a "create" action but
  there were none:

  $ nova instance-action-list 008a7d52-dd83-4f52-a720-b3cfcc498259
  +++-+++
  | Action | Request_ID | Message | Start_Time | Updated_At |
  +++-+++
  +++-+++

  This is because the "create" action is only created when the instance
  is scheduled to a specific cell:

  https://github.com/openstack/nova/blob/20.0.0/nova/conductor/manager.py#L1460

  Solution:

  The ComputeTaskManager._bury_in_cell0 method should also create a
  "create" action in cell0 like it does for the instance BDMs and tags.

  This goes back to Ocata: https://review.opendev.org/#/c/319379/

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1852458/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1854126] Re: s390x: failed to live migrate VM

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1854126

Title:
  s390x: failed to live migrate VM

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  see following logs when doing live migration on s390s platform with
  KVM

  openstack server migrate --live kvm02 --block-migration d28caa4a-215b-
  44c8-bed0-e0e7faca07e5


  
  ogs:

  2019-10-10 12:03:25.710 19003 ERROR nova.virt.libvirt.driver [req-
  83d11ac0-3414-489e-8ad2-bfd0078e059f 44cdcb0bbe9e40fc91c043533d4dcbac
  4067c50d412549c29b2deb58ec400ea1 - default default] CPU doesn't have
  compatibility.

  XML error: Missing CPU model name

  Refer to 
http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult: 
libvirtError: XML error: Missing CPU model name
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server 
[req-83d11ac0-3414-489e-8ad2-bfd0078e059f 44cdcb0bbe9e40fc91c043533d4dcbac 
4067c50d412549c29b2deb58ec400ea1 - default default] Exception during message 
handling: MigrationPreCheckError: Migration pre-check error: CPU doesn't have 
compatibility.

  XML error: Missing CPU model name

  Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server Traceback (most 
recent call last):
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 166, in 
_process_incoming
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server res = 
self.dispatcher.dispatch(message)
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 265, 
in dispatch
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server return 
self.do_dispatch(endpoint, method, ctxt, args)
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, 
in do_dispatch
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server result = 
func(ctxt, **new_args)
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 79, in 
wrapped
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server function_name, 
call_dict, binary, tb)
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in exit
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server 
self.force_reraise()
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server 
six.reraise(self.type, self.value, self.tb)
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 69, in 
wrapped
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server return f(self, 
context, *args, **kw)
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/nova/compute/utils.py", line 1418, in 
decorated_function
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server return 
function(self, context, *args, **kwargs)
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 215, in 
decorated_function
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server 
kwargs['instance'], e, sys.exc_info())
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in exit
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server 
self.force_reraise()
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server 
six.reraise(self.type, self.value, self.tb)
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 203, in 
decorated_function
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server return 
function(self, context, *args, **kwargs)
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6262, in 
check_can_live_migrate_destination
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server

[Yahoo-eng-team] [Bug 1862633] Re: unshelve leak allocation if update port fails

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1862633

Title:
  unshelve leak allocation if update port fails

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  Fix Committed
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  If updating the port binding during unshelve of an offloaded server
  fails then nova leaks placement allocation.

  Steps to reproduce
  ==
  1) boot a server with a neutron port
  2) shelve and offload the server
  3) disable the original host of the server to force scheduling during 
unshelve to select a differetn host. This is important as it triggers a non 
empty port update during unshelve
  4) unshelve the server and inject network fault in the communication between 
nova and neutron. You can try to simply shut down neutron-server at the right 
moment as well. Right means just before the target compute tries to send the 
port update
  5) observer that the unshelve fails, the server goes back to offloaded state, 
but the placement allocation on the target host remains.

  Triage: the problem is cause by a missing fault handling code in the
  compute manager[1]. The compute manager has proper error handling if
  the unshelve fails in the virt driver spawn call, but it does not
  handle failure if the neutron communication fails. The compute manager
  method simply logs and re-raises the neutron exceptions. This means
  that the exception is dropped as the unshelve_instance compute RPC is
  a cast.

  [1]
  
https://github.com/openstack/nova/blob/1fcd74730d343b7cee12a0a50ea537dc4ff87f65/nova/compute/manager.py#L6473

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1862633/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1856925] Re: Nova compute service exception that performs cold migration virtual machine stuck in resize state.

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1856925

Title:
  Nova compute service exception that performs cold migration virtual
  machine stuck in resize state.

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  Description:  
   In the case of a nova-compute service exception, such as down, the instance 
gets stuck in the resize state during cold migration and cannot perform 
evacuation.The command request for nova API is also issued, server_status and 
Task State have been changed, but compute cannot receive the request, resulting 
in the server State remaining in the resize State. When nova-compute is 
restarted, the server State becomes ERROR.It is recommended to add validation 
to prevent instances from entering inoperable states.
This can also happen with commands such as stop/rebuild/reboot.
  Environment:
  1. openstack-Q;nova -version:9.1.1

  2. hypervisor: Libvirt + KVM

  3. One control node, two compute nodes.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1856925/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1860990] Re: RBD image backend tries to flatten images even if they are already flat

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1860990

Title:
  RBD image backend tries to flatten images even if they are already
  flat

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  When [DEFAULT]show_multiple_locations option is not set in glance, and
  both glance and nova use ceph as their backend, with properly
  configured accesses, nova will fail with the following exception:

  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager 
[req-8021fd76-d5ab-4a9b-bd17-f5eb4d4faf62 0e96a04f360644818632b7e46fe8d3e7 
ac01daacc7424a40b8b464a163902dcb - default default] [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] Instance failed to spawn: 
rbd.InvalidArgument: [errno 22] error flattening 
b'fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6_disk'
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] Traceback (most recent call last):
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6]   File 
"/var/lib/openstack/lib/python3.6/site-packages/nova/compute/manager.py", line 
5757, in _unshelve_instance
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] block_device_info=block_device_info)
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6]   File 
"/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", 
line 3457, in spawn
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] block_device_info=block_device_info)
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6]   File 
"/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", 
line 3832, in _create_image
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] fallback_from_host)
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6]   File 
"/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", 
line 3923, in _create_and_inject_local_root
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] instance, size, fallback_from_host)
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6]   File 
"/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", 
line 9267, in _try_fetch_image_cache
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] image.flatten()
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6]   File 
"/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/imagebackend.py",
 line 983, in flatten
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] self.driver.flatten(self.rbd_name, 
pool=self.driver.pool)
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6]   File 
"/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/storage/rbd_utils.py",
 line 290, in flatten
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] vol.flatten()
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6]   File 
"/var/lib/openstack/lib/python3.6/site-packages/eventlet/tpool.py", line 190, 
in doit
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] result = proxy_call(self._autowrap, 
f, *args, **kwargs)
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6]   File 
"/var/lib/openstack/lib/python3.6/site-packages/eventlet/tpool.py", line 148, 
in proxy_call
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] rv = execute(f, *args, **kwargs)
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6]   File 
"/var/lib/openstack/lib/python3.6/site-packages/eventlet/tpool.py", line 129, 
in execute
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6]

[Yahoo-eng-team] [Bug 1864428] Re: Hypervisors collection_name affects pagination query

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1864428

Title:
  Hypervisors collection_name affects pagination query

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack SDK:
  New

Bug description:
  In nova, hypervisor view builder's collect_name is 'hypervisors':

  ```
  # nova.api.openstack.compute.views.hypervisors.ViewBuilder
  class ViewBuilder(common.ViewBuilder):
  _collection_name = "hypervisors"
   
  def get_links(self, request, hypervisors, detail=False):
  coll_name = (self._collection_name + '/detail' if detail else
   self._collection_name)
  return self._get_collection_links(request, hypervisors, coll_name,
'id')
  ```

  So when we do paginated query via openstacksdk, we would get response
  like this:

  ```
  {u'hypervisors': [{u'status': u'enabled', u'state': u'up', u'id': 
u'53fb5bdc-f9a4-4fc4-a4be-8eb33cd236b1', u'hypervisor_hostname': 
u'gd02-compute-11e115e64e19'}, {u'status': u'enabled', u'state': u'up', u'id': 
u'a4db6ea8-2a91-45e7-a4b4-cb26c2dbc514', u'hypervisor_hostname': 
u'gd02-compute-11e115e64e11'}, {u'status': u'enabled', u'state': u'up', u'id': 
u'd92c5452-ea75-4f58-8e0b-b4a6823850d8', u'hypervisor_hostname': 
u'gd02-compute-11e115e64e12'}], u'hypervisors_links': [{u'href': 
u'http://nova-api.cty.os:11010/v2.1/hypervisors?limit=3=d92c5452-ea75-4f58-8e0b-b4a6823850d8',
 u'rel': u'next'}]}
  ```

  And openstacksdk uses wrong hypervisors_links to query next page, and
  get error:

  ```
  Traceback (most recent call last):
File "p_hypervisors-admin.py", line 48, in 
  do_operation(limit, marker)
File "p_hypervisors-admin.py", line 38, in do_operation
  srvs = [i for i in info]
File "/usr/lib/python2.7/site-packages/openstack/resource.py", line 898, in 
list
  exceptions.raise_from_response(response)
File "/usr/lib/python2.7/site-packages/openstack/exceptions.py", line 212, 
in raise_from_response
  http_status=http_status, request_id=request_id
  openstack.exceptions.NotFoundException: NotFoundException: 404: Client Error 
for url: 
http://nova-api.cty.os:11010/v2.1/hypervisors?limit=3=ea9d857a-6328-47a8-abde-dc38972f4ca2,
 Not Found
  ```

  The right uri should be `/v2.1/os-hypervisors`

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1864428/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1863605] Re: live migration with vpmem will go to error in Train

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1863605

Title:
  live migration with vpmem will go to error in Train

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  We introduced vpmem support in Train release, including
  create/resize/cold migration, but not live migration(with
  libvirt/qemu).

  Since live migration will essentially rely on an libvirt xml, for vpmem there 
will be backend files configued in xml, if we live migrate an instance with 
vpmem under Train release, we may got two
  unexpected results:

  1. If the dest host has the same vpmem backend files as that used by instance 
on source host,
  the live migration will succeed but the vpmems consumed on dest host will not 
be tracked.

  2. If the dest host doesn't have those vpmems, the live migration will
  fail.

  We need reject the live migration with vpmem in nova conductor when do
  the precheck. And backport to T release.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1863605/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1862633] Re: unshelve leak allocation if update port fails

2020-11-24 Thread Elod Illes

** Also affects: nova/pike
   Importance: Undecided
   Status: New

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Changed in: nova/stein
   Status: New => Fix Released

** Changed in: nova/train
   Status: New => Fix Committed

** Changed in: nova/rocky
   Status: New => Fix Committed

** Changed in: nova/pike
   Status: New => Fix Committed

** Changed in: nova/queens
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1862633

Title:
  unshelve leak allocation if update port fails

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  Fix Committed
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  If updating the port binding during unshelve of an offloaded server
  fails then nova leaks placement allocation.

  Steps to reproduce
  ==
  1) boot a server with a neutron port
  2) shelve and offload the server
  3) disable the original host of the server to force scheduling during 
unshelve to select a differetn host. This is important as it triggers a non 
empty port update during unshelve
  4) unshelve the server and inject network fault in the communication between 
nova and neutron. You can try to simply shut down neutron-server at the right 
moment as well. Right means just before the target compute tries to send the 
port update
  5) observer that the unshelve fails, the server goes back to offloaded state, 
but the placement allocation on the target host remains.

  Triage: the problem is cause by a missing fault handling code in the
  compute manager[1]. The compute manager has proper error handling if
  the unshelve fails in the virt driver spawn call, but it does not
  handle failure if the neutron communication fails. The compute manager
  method simply logs and re-raises the neutron exceptions. This means
  that the exception is dropped as the unshelve_instance compute RPC is
  a cast.

  [1]
  
https://github.com/openstack/nova/blob/1fcd74730d343b7cee12a0a50ea537dc4ff87f65/nova/compute/manager.py#L6473

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1862633/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1866373] Re: URLS in os-keypairs 'links' body are incorrect

2020-11-24 Thread Elod Illes

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Changed in: nova/train
   Status: New => Fix Committed

** Changed in: nova/stein
   Status: New => Fix Released

** Changed in: nova/rocky
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1866373

Title:
  URLS in os-keypairs 'links' body are incorrect

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  Similar to https://bugs.launchpad.net/nova/+bug/1864428, the URLs in
  the 'links' element of the response are incorrect. They read
  '/keypairs', not '/os-keypairs'. From the current api-ref
  (2020-03-06):

  {
  "keypairs": [
  {
  "keypair": {
  "fingerprint": 
"7e:eb:ab:24:ba:d1:e1:88:ae:9a:fb:66:53:df:d3:bd",
  "name": "keypair-5d935425-31d5-48a7-a0f1-e76e9813f2c3",
  "type": "ssh",
  "public_key": "ssh-rsa 
B3NzaC1yc2EDAQABAAABAQCkF3MX59OrlBs3dH5CU7lNmvpbrgZxSpyGjlnE8Flkirnc/Up22lpjznoxqeoTAwTW034k7Dz6aYIrZGmQwe2TkE084yqvlj45Dkyoj95fW/sZacm0cZNuL69EObEGHdprfGJQajrpz22NQoCD8TFB8Wv+8om9NH9Le6s+WPe98WC77KLw8qgfQsbIey+JawPWl4O67ZdL5xrypuRjfIPWjgy/VH85IXg/Z/GONZ2nxHgSShMkwqSFECAC5L3PHB+0+/12M/iikdatFSVGjpuHvkLOs3oe7m6HlOfluSJ85BzLWBbvva93qkGmLg4ZAc8rPh2O+YIsBUHNLLMM/oQp
 Generated-by-Nova\n"
  }
  }
  ],
  "keypairs_links": [
  {
  "href": 
"http://openstack.example.com/v2.1/6f70656e737461636b20342065766572/keypairs?limit=1=keypair-5d935425-31d5-48a7-a0f1-e76e9813f2c3;,
  "rel": "next"
  }
  ]
  }

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1866373/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1878024] Re: disk usage of the nova image cache is not counted as used disk space

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1878024

Title:
  disk usage of the nova image cache is not counted as used disk space

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Committed

Bug description:
  Description
  ===
  The nova-compute service keeps a local image cache for glance images used for 
nova servers to avoid multiple download of the same image from glance. The disk 
usage of such cache is not calculated as local disk usage in nova and not 
reported to placement as used DISK_GB. This leads to disk over-allocation.

  Also the size of that cache cannot be limited by nova configuration so
  the deployer cannot reserve  disk space for that cache with
  reserved_host_disk_mb config.

  Steps to reproduce
  ==
  * Set up a single node devstack
  * Create and upload an image with a not too small physical size. Like an 
image with 1G physical size.
  * Check the current disk usage of the Host OS and configure 
reserved_host_disk_mb in nova-cpu.conf accordingly.
  * Boot two servers from that image with a flavor, like d1 (disk=5G)
  * Nova will download the glance image once to the local cache which result in 
a 1GB disk usage
  * Nova will create two root file systems, one for each VM. Those disks 
initially has minimal physical disk size, but has 5G virtual size.
  * At this point Nova allocated 5G + 5G of DISK_GB in placement, but due to 
the image in the cache the total disk usage of the two VMs + cache can be 5G + 
5G + 1G, if both VMs overwrite and fills the content of its own disk.

  Expected result
  ===
  Option A)
  Nova maintains a DISK_GB allocation in placement for the images in its cache. 
This way the expected DISK_GB allocation in placement is 5G + 5G + 1G at the end

  Option B)
  Nova provides a config option to limit the maximum size of the image cache 
and therefore the deployer can include the maximum image cache size into the 
reserved_host_disk_mb during dimensioning of the disk space of the compute.

  Actual result
  =
  Only 5G + 5G was allocation from placement. So disk space is over-allocated 
by the image cache.

  Environment
  ===

  Devstack from recent master

  stack@aio:/opt/stack/nova$ git log --oneline | head -n 1
  4b62c90063 Merge "Remove stale nested backport from InstancePCIRequests"

  libvirt driver with file based image backend

  Logs & Configs
  ==
  http://paste.openstack.org/show/793388/

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1878024/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1824858] Re: nova instance remnant left behind after cold migration completes

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1824858

Title:
  nova instance remnant left behind after cold migration completes

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in StarlingX:
  Fix Released

Bug description:
  Brief Description
  -
  After cold migration to a new worker node, instances remnants are left behind

  
  Severity
  
  standard

  
  Steps to Reproduce
  --
  worker nodes compute-1 and compute-2 have label   remote-storage enabled
  1. Launch instance on compute-1
  2. cold migrate to compute-2
  3. confirm cold migration to complete

  
  Expected Behavior
  --
  Migration to compute-2 and cleanup on files on compute-1

  
  Actual Behavior
  
  At 16:35:24 cold migration for instance a416ead6-a17f-4bb9-9a96-3134b426b069  
completed to compute-2 but the following path is left behind on compute-1
  compute-1:/var/lib/nova/instances/a416ead6-a17f-4bb9-9a96-3134b426b069

  compute-1:/var/lib/nova/instances$ ls
  a416ead6-a17f-4bb9-9a96-3134b426b069 _base  locks
  a416ead6-a17f-4bb9-9a96-3134b426b069_resize  compute_nodes  lost+found

  
  compute-1:/var/lib/nova/instances$ ls
  a416ead6-a17f-4bb9-9a96-3134b426b069  _base  compute_nodes  locks  lost+found

  compute-1:/var/lib/nova/instances$ ls
  a416ead6-a17f-4bb9-9a96-3134b426b069  _base  compute_nodes  locks  lost+found


  2019-04-15T16:35:24.646749clear   700.010 Instance 
tenant2-migration_test-1 owned by tenant2 has been cold-migrated to host 
compute-2 waiting for confirmation
tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069
   critical
  2019-04-15T16:35:24.482575log 700.168 Cold-Migrate-Confirm 
complete for instance tenant2-migration_test-1 enabled on host compute-2   
tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069
   critical
  2019-04-15T16:35:16.815223log 700.163 Cold-Migrate-Confirm 
issued by tenant2 against instance tenant2-migration_test-1 owned by tenant2 on 
host compute-2 
tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069
   critical
  2019-04-15T16:35:10.030068clear   700.009 Instance 
tenant2-migration_test-1 owned by tenant2 is cold migrating from host compute-1 
   
tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069
   critical
  2019-04-15T16:35:09.971414set 700.010 Instance 
tenant2-migration_test-1 owned by tenant2 has been cold-migrated to host 
compute-2 waiting for confirmation
tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069
   critical
  2019-04-15T16:35:09.970212log 700.162 Cold-Migrate complete 
for instance tenant2-migration_test-1 now enabled on host compute-2 waiting for 
confirmation  
tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069
   critical
  2019-04-15T16:34:51.637687set 700.009 Instance 
tenant2-migration_test-1 owned by tenant2 is cold migrating from host compute-1 
   
tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069
   critical
  2019-04-15T16:34:51.637636log 700.158 Cold-Migrate inprogress 
for instance tenant2-migration_test-1 from host compute-1   
tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069
   critical
  2019-04-15T16:34:51.478442log 700.157 Cold-Migrate issued by 
tenant2 against instance tenant2-migration_test-1 owned by tenant2 from host 
compute-1   
tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069
   critical
  2019-04-15T16:34:20.181155log 700.101 Instance 
tenant2-migration_test-1 is enabled on host compute-1  
tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069
   critical

  
  see nova-compute.log (compute-1)
  compute-1 nova-compute log

  [instance: a416ead6-a17f-4bb9-9a96-3134b426b069 claimed and spawned
  here on compute-1]

  {"log":"2019-04-15 16:34:04,617.617 60908 INFO nova.compute.claims 
[req-f1195bbb-d5b0-4a75-a598-ff287d247643 3fd3229d3e6248cf9b5411b2ecec86e9 
7f1d42233341428a918855614770e676 - default default]

[Yahoo-eng-team] [Bug 1834659] Re: Volume not removed on instance deletion

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1834659

Title:
  Volume not removed on instance deletion

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  Description
  ===
  When we deploy a non-ephemeral instance (i.e. Creating a new volume), and 
  indicate "YES" in "Delete Volume on Instance delete",then delete the 
  instance, and volume driver terminate connection in cinder takes too long 
  to return, the volume is not removed.

  The volume status remains as "In-use" and "Attached to None on /dev/vda".
  For example:
  abcfa1db-1748-4f04-9a29-128cf22efcc5  - 130GiB In-use - Attached to None on 
/dev/vda

  Steps to reproduce
  ==
  Please refer to this bug comment #2 below

  Expected result
  ===
  Volume gets removed

  Actual result
  =
  Volume remains attached

  Environment
  ===
  Issue was initially reported downstream against Newton release (see 
  comment #1 below). Customer was using hitachi volume driver:
 volume_driver = cinder.volume.drivers.hitachi.hbsd.hbsd_fc.HBSDFCDriver
  As a note, the hitachi drivers are unsupported as of Pike (see cinder 
  commit 595c8d3f8523a9612ccc64ff4147eab993493892

  Issue was reproduced in a devstack environment runnning the Stein release.
  Volume driver used was lvm (devstack default)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1834659/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1864428] Re: Hypervisors collection_name affects pagination query

2020-11-24 Thread Elod Illes

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Changed in: nova/stein
   Status: New => Fix Released

** Changed in: nova/train
   Status: New => Fix Committed

** Changed in: nova/rocky
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1864428

Title:
  Hypervisors collection_name affects pagination query

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack SDK:
  New

Bug description:
  In nova, hypervisor view builder's collect_name is 'hypervisors':

  ```
  # nova.api.openstack.compute.views.hypervisors.ViewBuilder
  class ViewBuilder(common.ViewBuilder):
  _collection_name = "hypervisors"
   
  def get_links(self, request, hypervisors, detail=False):
  coll_name = (self._collection_name + '/detail' if detail else
   self._collection_name)
  return self._get_collection_links(request, hypervisors, coll_name,
'id')
  ```

  So when we do paginated query via openstacksdk, we would get response
  like this:

  ```
  {u'hypervisors': [{u'status': u'enabled', u'state': u'up', u'id': 
u'53fb5bdc-f9a4-4fc4-a4be-8eb33cd236b1', u'hypervisor_hostname': 
u'gd02-compute-11e115e64e19'}, {u'status': u'enabled', u'state': u'up', u'id': 
u'a4db6ea8-2a91-45e7-a4b4-cb26c2dbc514', u'hypervisor_hostname': 
u'gd02-compute-11e115e64e11'}, {u'status': u'enabled', u'state': u'up', u'id': 
u'd92c5452-ea75-4f58-8e0b-b4a6823850d8', u'hypervisor_hostname': 
u'gd02-compute-11e115e64e12'}], u'hypervisors_links': [{u'href': 
u'http://nova-api.cty.os:11010/v2.1/hypervisors?limit=3=d92c5452-ea75-4f58-8e0b-b4a6823850d8',
 u'rel': u'next'}]}
  ```

  And openstacksdk uses wrong hypervisors_links to query next page, and
  get error:

  ```
  Traceback (most recent call last):
File "p_hypervisors-admin.py", line 48, in 
  do_operation(limit, marker)
File "p_hypervisors-admin.py", line 38, in do_operation
  srvs = [i for i in info]
File "/usr/lib/python2.7/site-packages/openstack/resource.py", line 898, in 
list
  exceptions.raise_from_response(response)
File "/usr/lib/python2.7/site-packages/openstack/exceptions.py", line 212, 
in raise_from_response
  http_status=http_status, request_id=request_id
  openstack.exceptions.NotFoundException: NotFoundException: 404: Client Error 
for url: 
http://nova-api.cty.os:11010/v2.1/hypervisors?limit=3=ea9d857a-6328-47a8-abde-dc38972f4ca2,
 Not Found
  ```

  The right uri should be `/v2.1/os-hypervisors`

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1864428/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1869050] Re: migration of anti-affinity server fails due to stale scheduler instance info

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1869050

Title:
  migration of anti-affinity server fails due to stale scheduler
  instance info

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  Invalid
Status in OpenStack Compute (nova) queens series:
  Invalid
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Committed

Bug description:
  Description
  ===

  
  Steps to reproduce
  ==
  Have a deployment with 3 compute nodes

  * make sure that the deployment is configured with 
tracks_instance_changes=True (True is the default)
  * create and server group with anti-affinity policy
  * boot server1 into the group
  * boot server2 into the group
  * migrate server2
  * confirm the migration
  * boot server3

  Make sure that between the last two steps there was no periodic
  _sync_scheduler_instance_info running on the compute that was hosted
  server2 before the migration. This could done by doing the last too
  steps after each other without waiting too much as interval of that
  periodic (scheduler_instance_sync_interval) is defaulted to 120 sec.

  Expected result
  ===
  server3 is booted on the host where server2 is moved away

  Actual result
  =
  server3 cannot be booted (NoValidHost)

  Triage
  ==

  The confirm resize call on the source compute does not update the
  scheduler that the instance is removed from this host. This makes the
  scheduler instance info stale and causing the subsequent scheduling
  error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1869050/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1878979] Re: Quota code does not respect [api]/instance_list_per_project_cells

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1878979

Title:
  Quota code does not respect [api]/instance_list_per_project_cells

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Committed

Bug description:
  The function which counts resources using the legacy method involves
  getting a list of all cell mappings assigned to a specific project:

  
https://github.com/openstack/nova/blob/575a91ff5be79ac35aef4b61d84c78c693693304/nova/quota.py#L1170-L1209

  This code can be very heavy on a database which contains a lot of
  instances (but not a lot of mappings), potentially scanning millions
  of rows to gather 1-2 cell mappings.  In a single cell environment, it
  is just extra CPU usage with exactly the same outcome.

  The [api]/instance_list_per_project_cells was introduced to workaround
  this:

  
https://github.com/openstack/nova/blob/575a91ff5be79ac35aef4b61d84c78c693693304/nova/compute/instance_list.py#L146-L153

  However, the quota code does not implement it which means quota count
  take a big toll on the database server.  We should ideally mirror the
  same behaviour in the quota code.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1878979/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1867075] Re: Arm64: Instance with Configure Drive attach volume failed failed

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1867075

Title:
  Arm64: Instance with Configure Drive attach volume failed failed

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  Arm64.

  Image: cirros-0.5.1
  hw_cdrom_bus='scsi', hw_disk_bus='scsi', hw_machine_type='virt', 
hw_rng_model='virtio', hw_scsi_model='virtio-scsi', 
os_command_line=''console=ttyAMA0''

  Boot a vm.
  Create a volume: openstack volume create --size 1 test

  Attach:
  openstack server add volume cirros-test test

  Error:
  DEBUG nova.virt.libvirt.guest [None req-8dfbf677-50bb-42be-869f-52c9ac638d59 
admin admin] attach device xml: 
  
   



  


b9abb789-1c55-4210-ab5c-78b0e3619405   


 

  


  ror: Requested operation is not valid: Domain already contains a disk with 
that address
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
Traceback (most recent call last):
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
  File "/opt/stack/nova/nova/virt/block_device.py", line 599, in _volume_attach
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
device_type=self['device_type'], encryption=encryption)
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1731, in 
attach_volume
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
conf = self._get_volume_config(connection_info, disk_info)
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
  File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 
220, in __exit__
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
self.force_reraise()
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
  File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 
196, in force_reraise
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
six.reraise(self.type_, self.value, self.tb)
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
raise value
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1731, in 
attach_volume
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
conf = self._get_volume_config(connection_info, disk_info)
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
  File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 293, in attach_device
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
self._domain.attachDeviceFlags(device_xml, flags=flags)
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
  File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 190, in 
doit
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
result = proxy_call(self._autowrap, f, *args, **kwargs)
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
  File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 148, in 
proxy_call
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
rv = execute(f, *args, **kwargs)
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
  File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 129, in 
execute
  ERROR

[Yahoo-eng-team] [Bug 1854126] Re: s390x: failed to live migrate VM

2020-11-24 Thread Elod Illes

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Changed in: nova/train
   Status: New => Fix Committed

** Changed in: nova/stein
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1854126

Title:
  s390x: failed to live migrate VM

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Committed

Bug description:
  see following logs when doing live migration on s390s platform with
  KVM

  openstack server migrate --live kvm02 --block-migration d28caa4a-215b-
  44c8-bed0-e0e7faca07e5


  
  ogs:

  2019-10-10 12:03:25.710 19003 ERROR nova.virt.libvirt.driver [req-
  83d11ac0-3414-489e-8ad2-bfd0078e059f 44cdcb0bbe9e40fc91c043533d4dcbac
  4067c50d412549c29b2deb58ec400ea1 - default default] CPU doesn't have
  compatibility.

  XML error: Missing CPU model name

  Refer to 
http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult: 
libvirtError: XML error: Missing CPU model name
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server 
[req-83d11ac0-3414-489e-8ad2-bfd0078e059f 44cdcb0bbe9e40fc91c043533d4dcbac 
4067c50d412549c29b2deb58ec400ea1 - default default] Exception during message 
handling: MigrationPreCheckError: Migration pre-check error: CPU doesn't have 
compatibility.

  XML error: Missing CPU model name

  Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server Traceback (most 
recent call last):
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 166, in 
_process_incoming
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server res = 
self.dispatcher.dispatch(message)
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 265, 
in dispatch
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server return 
self.do_dispatch(endpoint, method, ctxt, args)
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, 
in do_dispatch
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server result = 
func(ctxt, **new_args)
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 79, in 
wrapped
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server function_name, 
call_dict, binary, tb)
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in exit
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server 
self.force_reraise()
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server 
six.reraise(self.type, self.value, self.tb)
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 69, in 
wrapped
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server return f(self, 
context, *args, **kw)
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/nova/compute/utils.py", line 1418, in 
decorated_function
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server return 
function(self, context, *args, **kwargs)
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 215, in 
decorated_function
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server 
kwargs['instance'], e, sys.exc_info())
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in exit
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server 
self.force_reraise()
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server 
six.reraise(self.type, self.value, self.tb)
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 203, in 
decorated_function
  2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server return 
function(self, context, *args, **kwargs)
  2019-10-10 12:03:25.748 19003 ERROR

[Yahoo-eng-team] [Bug 1844929] Re: grenade jobs failing due to "Timed out waiting for response from cell" in scheduler

2020-11-24 Thread Elod Illes

** Changed in: nova/stein
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1844929

Title:
  grenade jobs failing due to "Timed out waiting for response from cell"
  in scheduler

Status in grenade:
  Invalid
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Committed

Bug description:
  Seen here:

  
https://zuul.opendev.org/t/openstack/build/d53346210978403f888b85b82b2fe0c7/log/logs/screen-n-sch.txt.gz?severity=3#2368

  Sep 22 00:50:54.174385 ubuntu-bionic-ovh-gra1-0011664420 nova-
  scheduler[18043]: WARNING nova.context [None req-
  1929039e-1517-4326-9700-738d4b570ba6 tempest-
  AttachInterfacesUnderV243Test-2009753731 tempest-
  AttachInterfacesUnderV243Test-2009753731] Timed out waiting for
  response from cell 8acfb79b-2e40-4e1c-bc3d-d404dac6db90

  Looks like something is causing timeouts reaching cell1 during grenade
  runs. The only errors I see in the rabbit logs are these for the uwsgi
  (API) servers:

  =ERROR REPORT 22-Sep-2019::00:35:30 ===

  closing AMQP connection <0.1511.0> (217.182.141.188:48492 ->
  217.182.141.188:5672 - uwsgi:19453:72e08501-61ca-4ade-865e-
  f0605979ed7d):

  missed heartbeats from client, timeout: 60s

  --

  It looks like we don't have mysql logs in this grenade run, maybe we
  need a fix like this somewhere for grenade:

  
https://github.com/openstack/devstack/commit/f92c346131db2c89b930b1a23f8489419a2217dc

  logstash shows 1101 hits in the last 7 days, since Sept 17 actually:

  
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Timed%20out%20waiting%20for%20response%20from%20cell%5C%22%20AND%20tags%3A%5C%22screen-n-sch.txt%5C%22=7d

  check and gate queues, all failures. It also appears to only show up
  on fortnebula and OVH nodes, primarily fortnebula. I wonder if there
  is a performing/timing issue if those nodes are slower and we aren't
  waiting for something during the grenade upgrade before proceeding.

To manage notifications about this bug go to:
https://bugs.launchpad.net/grenade/+bug/1844929/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1860990] Re: RBD image backend tries to flatten images even if they are already flat

2020-11-24 Thread Elod Illes

** Changed in: nova/stein
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1860990

Title:
  RBD image backend tries to flatten images even if they are already
  flat

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Committed

Bug description:
  When [DEFAULT]show_multiple_locations option is not set in glance, and
  both glance and nova use ceph as their backend, with properly
  configured accesses, nova will fail with the following exception:

  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager 
[req-8021fd76-d5ab-4a9b-bd17-f5eb4d4faf62 0e96a04f360644818632b7e46fe8d3e7 
ac01daacc7424a40b8b464a163902dcb - default default] [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] Instance failed to spawn: 
rbd.InvalidArgument: [errno 22] error flattening 
b'fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6_disk'
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] Traceback (most recent call last):
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6]   File 
"/var/lib/openstack/lib/python3.6/site-packages/nova/compute/manager.py", line 
5757, in _unshelve_instance
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] block_device_info=block_device_info)
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6]   File 
"/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", 
line 3457, in spawn
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] block_device_info=block_device_info)
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6]   File 
"/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", 
line 3832, in _create_image
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] fallback_from_host)
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6]   File 
"/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", 
line 3923, in _create_and_inject_local_root
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] instance, size, fallback_from_host)
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6]   File 
"/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", 
line 9267, in _try_fetch_image_cache
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] image.flatten()
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6]   File 
"/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/imagebackend.py",
 line 983, in flatten
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] self.driver.flatten(self.rbd_name, 
pool=self.driver.pool)
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6]   File 
"/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/storage/rbd_utils.py",
 line 290, in flatten
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] vol.flatten()
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6]   File 
"/var/lib/openstack/lib/python3.6/site-packages/eventlet/tpool.py", line 190, 
in doit
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] result = proxy_call(self._autowrap, 
f, *args, **kwargs)
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6]   File 
"/var/lib/openstack/lib/python3.6/site-packages/eventlet/tpool.py", line 148, 
in proxy_call
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] rv = execute(f, *args, **kwargs)
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: 
fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6]   File 
"/var/lib/openstack/lib/python3.6/site-packages/eventlet/tpool.py", line 129, 
in execute
  2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance:

[Yahoo-eng-team] [Bug 1859766] Re: functional tests intermittently fails with "ReadOnlyFieldError: Cannot modify readonly field uuid"

2020-11-24 Thread Elod Illes

** Changed in: nova/stein
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1859766

Title:
  functional tests intermittently fails with "ReadOnlyFieldError: Cannot
  modify readonly field uuid"

Status in OpenStack Compute (nova):
  Invalid
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released

Bug description:
  On stable/stain and stable/rocky multiple functional tests fails
  randomly with the following stack trace[1]:

  Traceback (most recent call last):
File "nova/compute/manager.py", line 2322, in _build_and_run_instance
  with self.rt.instance_claim(context, instance, node, limits):
File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/oslo_concurrency/lockutils.py",
 line 328, in inner
  return f(*args, **kwargs)
File "nova/compute/resource_tracker.py", line 235, in instance_claim
  self._update(elevated, cn)
File "nova/compute/resource_tracker.py", line 1034, in _update
  self.old_resources[nodename] = old_compute
File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/oslo_utils/excutils.py",
 line 220, in __exit__
  self.force_reraise()
File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/oslo_utils/excutils.py",
 line 196, in force_reraise
  six.reraise(self.type_, self.value, self.tb)
File "nova/compute/resource_tracker.py", line 1028, in _update
  compute_node.save()
File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/oslo_versionedobjects/base.py",
 line 226, in wrapper
  return fn(self, *args, **kwargs)
File "nova/objects/compute_node.py", line 341, in save
  self._from_db_object(self._context, self, db_compute)
File "nova/objects/compute_node.py", line 214, in _from_db_object
  setattr(compute, key, value)
File 
"/home/zuul/src/opendev.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/oslo_versionedobjects/base.py",
 line 77, in setter
  raise exception.ReadOnlyFieldError(field=name)
  ReadOnlyFieldError: Cannot modify readonly field uuid

  logstash signature:
  
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22ReadOnlyFieldError%3A%20Cannot%20modify%20readonly%20field%20uuid%5C%22

  
  [1] 
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_234/702181/1/check/nova-tox-functional/2341192/testr_results.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1859766/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1866380] Re: Ironic driver hash ring treats hostnames differing only by case as different hostnames

2020-11-24 Thread Elod Illes

** Changed in: nova/stein
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1866380

Title:
  Ironic driver hash ring treats hostnames differing only by case as
  different hostnames

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  In Progress
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Committed

Bug description:
  Recently we had a customer case where attempts to add new ironic nodes
  to an existing undercloud resulted in half of the nodes failing to be
  detected and added to nova. Ironic API returned all of the newly added
  nodes when called by the driver, but half of the nodes were not
  returned to the compute manager by the driver.

  There was only one nova-compute service managing all of the ironic
  nodes of the all-in-one typical undercloud deployment.

  After days of investigation and examination of a database dump from
  the customer, we noticed that at some point the customer had changed
  the hostname of the machine from something containing uppercase
  letters to the same name but all lowercase. The nova-compute service
  record had the mixed case name and the CONF.host
  (socket.gethostname()) had the lowercase name.

  The hash ring logic adds all of the nova-compute service hostnames
  plus CONF.host to hash ring, then the ironic driver reports only the
  nodes it owns by retrieving a service hostname from the ring based on
  a hash of each ironic node UUID.

  Because of the machine hostname change, the hash ring contained, for
  example: {'MachineHostName', 'machinehostname'} when it should have
  contained only one hostname. And because the hash ring contained two
  hostnames, the driver was able to retrieve only half of the nodes as
  nodes that it owned. So half of the new nodes were excluded and not
  added as new compute nodes.

  I propose adding some logging to the driver related to the hash ring
  to help with debugging in the future.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1866380/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1866937] Re: Requests to neutron API do not use retries

2020-11-24 Thread Elod Illes

** Changed in: nova/stein
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1866937

Title:
  Requests to neutron API do not use retries

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Committed

Bug description:
  We have a customer bug report downstream [1] where nova occasionally
  fails to carry out server actions requiring calls to neutron API if
  haproxy happens to close a connection after idle time of 10 seconds at
  nearly the same time as an incoming request that attempts to re-use
  the connection while it is being torn down. Here is an excerpt from
  [1]:

   The result of our investigation, the cause is follows.

   1. neutron-client in nova uses connection pool ( urllib3/requests )
  for http.

   2. Sometimes, http connection is reused for different requests.

   3. Connection between neutron-client and haproxy is closed from
  haproxy when it is in idle for 10 seconds.

   4. If reusing connection from client side and closing connection from 
haproxy side are happend almost same time,
  client gets RST and end with "bad status line".

  To address this problem, we can add a new config option for neutron
  client (similar to the existing config options we have for cinder
  client and glance client retries) to be more resilient during such
  scenarios.

  [1] https://bugzilla.redhat.com/show_bug.cgi?id=1788853

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1866937/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1867380] Re: nova-live-migration and nova-grenade-multinode fail due to n-cpu restarting slowly after being reconfigured for ceph

2020-11-24 Thread Elod Illes

** Changed in: nova/stein
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1867380

Title:
  nova-live-migration and nova-grenade-multinode fail due to n-cpu
  restarting slowly after being reconfigured for ceph

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  Fix Committed
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Committed

Bug description:
  Description
  ===

  $subject, it appears the current check of using grep to find active
  n-cpu processes isn't enough and we actually need to wait for the
  services to report as UP before starting to run Tempest.

  In the following we can see Tempest starting at 2020-03-13
  13:01:19.528 while n-cpu within the instance isn't marked as UP for
  another ~20 seconds:

  
https://zuul.opendev.org/t/openstack/build/5c213f869f324b69a423a983034d4539/log
  /job-output.txt#6305

  
https://zuul.opendev.org/t/openstack/build/5c213f869f324b69a423a983034d4539/log/logs/screen-n-cpu.txt#3825

  
https://zuul.opendev.org/t/openstack/build/5c213f869f324b69a423a983034d4539/log/logs/subnode-2/screen-n-cpu.txt#3534

  I've only seen this on stable/pike at present but it could potentially
  hit all branches with slow enough CI nodes.

  
  Steps to reproduce
  ==
  Run nova-live-migration on slow CI nodes.

  Expected result
  ===
  nova/tests/live_migration/hooks/ceph.sh waits until hosts are marked as UP 
before running Tempest.

  Actual result
  =
  nova/tests/live_migration/hooks/ceph.sh checks for running n-cpu processes 
and then immediately starts Tempest.

  Environment
  ===
  1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/

 stable/pike but you be present on other branches with slow enough
  CI nodes.

  2. Which hypervisor did you use?
 (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
 What's the version of that?

 Libvirt / KVM.

  2. Which storage type did you use?
 (For example: Ceph, LVM, GPFS, ...)
 What's the version of that?

 N/A

  3. Which networking type did you use?
 (For example: nova-network, Neutron with OpenVSwitch, ...)

 N/A

  Logs & Configs
  ==

  Mar 13 13:01:39.170201 ubuntu-xenial-rax-iad-0015199005 nova-compute[30153]: 
DEBUG nova.compute.manager [None req-beafe617-34df-4bec-9ff6-4a0b7bebb15f None 
None] [instance: 74932102-3737-4f8f-9002-763b2d580c3a] Instance spawn was 
interrupted before instance_claim, setting instance to ERROR state 
{{(pid=30153) _error_out_instances_whose_build_was_interrupted 
/opt/stack/new/nova/nova/compute/manager.py:1323}}
  Mar 13 13:01:39.255008 ubuntu-xenial-rax-iad-0015199005 nova-compute[30153]: 
DEBUG nova.compute.manager [None req-beafe617-34df-4bec-9ff6-4a0b7bebb15f None 
None] [instance: 042afab0-fbef-4506-84e2-1f54cb9d67ca] Instance spawn was 
interrupted before instance_claim, setting instance to ERROR state 
{{(pid=30153) _error_out_instances_whose_build_was_interrupted 
/opt/stack/new/nova/nova/compute/manager.py:1323}}
  Mar 13 13:01:39.322508 ubuntu-xenial-rax-iad-0015199005 nova-compute[30153]: 
DEBUG nova.compute.manager [None req-beafe617-34df-4bec-9ff6-4a0b7bebb15f None 
None] [instance: cc293f53-7428-4e66-9841-20cce219e24f] Instance spawn was 
interrupted before instance_claim, setting instance to ERROR state 
{{(pid=30153) _error_out_instances_whose_build_was_interrupted 
/opt/stack/new/nova/nova/compute/manager.py:1323}}

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1867380/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1774249] Re: update_available_resource will raise DiskNotFound after resize but before confirm

2020-11-24 Thread Elod Illes

** Changed in: nova/stein
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1774249

Title:
  update_available_resource will raise DiskNotFound after resize but
  before confirm

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ocata series:
  Triaged
Status in OpenStack Compute (nova) pike series:
  Fix Committed
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released

Bug description:
  Original reported in RH Bugzilla:
  https://bugzilla.redhat.com/show_bug.cgi?id=1584315

  Tested on OSP12 (Pike), but appears to be still present on master.
  Should only occur if nova compute is configured to use local file
  instance storage.

  Create instance A on compute X

  Resize instance A to compute Y
Domain is powered off
/var/lib/nova/instances/ renamed to _resize on X
Domain is *not* undefined

  On compute X:
update_available_resource runs as a periodic task
First action is to update self
rt calls driver.get_available_resource()
...calls _get_disk_over_committed_size_total
...iterates over all defined domains, including the ones whose disks we 
renamed
...fails because a referenced disk no longer exists

  Results in errors in nova-compute.log:

  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager 
[req-bd52371f-c6ec-4a83-9584-c00c5377acd8 - - - - -] Error updating resources 
for node compute-0.localdomain.: DiskNotFound: No disk at 
/var/lib/nova/instances/f3ed9015-3984-43f4-b4a5-c2898052b47d/disk
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager Traceback (most 
recent call last):
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6695, in 
update_available_resource_for_node
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager 
rt.update_available_resource(context, nodename)
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 641, 
in update_available_resource
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager resources = 
self.driver.get_available_resource(nodename)
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5892, in 
get_available_resource
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager 
disk_over_committed = self._get_disk_over_committed_size_total()
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7393, in 
_get_disk_over_committed_size_total
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager config, 
block_device_info)
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7301, in 
_get_instance_disk_info_from_config
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager dk_size = 
disk_api.get_allocated_disk_size(path)
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/virt/disk/api.py", line 156, in 
get_allocated_disk_size
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager return 
images.qemu_img_info(path).disk_size
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/virt/images.py", line 57, in 
qemu_img_info
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager raise 
exception.DiskNotFound(location=path)
  2018-05-30 02:17:08.647 1 ERROR nova.compute.manager DiskNotFound: No 
disk at /var/lib/nova/instances/f3ed9015-3984-43f4-b4a5-c2898052b47d/disk

  And resource tracker is no longer updated. We can find lots of these
  in the gate.

  Note that change Icec2769bf42455853cbe686fb30fda73df791b25 nearly
  mitigates this, but doesn't because task_state is not set while the
  instance is awaiting confirm.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1774249/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1834659] Re: Volume not removed on instance deletion

2020-11-24 Thread Elod Illes

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Changed in: nova/stein
   Status: New => Fix Released

** Changed in: nova/rocky
   Status: New => Fix Committed

** Changed in: nova/queens
   Status: New => Fix Committed

** Changed in: nova/train
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1834659

Title:
  Volume not removed on instance deletion

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Committed

Bug description:
  Description
  ===
  When we deploy a non-ephemeral instance (i.e. Creating a new volume), and 
  indicate "YES" in "Delete Volume on Instance delete",then delete the 
  instance, and volume driver terminate connection in cinder takes too long 
  to return, the volume is not removed.

  The volume status remains as "In-use" and "Attached to None on /dev/vda".
  For example:
  abcfa1db-1748-4f04-9a29-128cf22efcc5  - 130GiB In-use - Attached to None on 
/dev/vda

  Steps to reproduce
  ==
  Please refer to this bug comment #2 below

  Expected result
  ===
  Volume gets removed

  Actual result
  =
  Volume remains attached

  Environment
  ===
  Issue was initially reported downstream against Newton release (see 
  comment #1 below). Customer was using hitachi volume driver:
 volume_driver = cinder.volume.drivers.hitachi.hbsd.hbsd_fc.HBSDFCDriver
  As a note, the hitachi drivers are unsupported as of Pike (see cinder 
  commit 595c8d3f8523a9612ccc64ff4147eab993493892

  Issue was reproduced in a devstack environment runnning the Stein release.
  Volume driver used was lvm (devstack default)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1834659/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1831771] Re: UnexpectedDeletingTaskStateError exception can leave traces of VIFs on host

2020-11-24 Thread Elod Illes

** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Changed in: nova/train
   Status: New => Fix Committed

** Changed in: nova/stein
   Status: New => Fix Released

** Changed in: nova/rocky
   Status: New => Fix Committed

** Changed in: nova/queens
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1831771

Title:
  UnexpectedDeletingTaskStateError exception can leave traces of VIFs on
  host

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Committed

Bug description:
  This was originally reported in Bugzilla

  https://bugzilla.redhat.com/show_bug.cgi?id=1668159

  The 'UnexpectedDeletingTaskStateError' exception can be raised by
  something like aborting a large heat stack, where the instance hasn't
  finished setting up before the stack is aborted and the instances
  deleted.

  https://github.com/openstack/nova/blob/19.0.0/nova/db/sqlalchemy/api.py#L2864

  We handle this in the compute manager and as part of that handling, we
  clean up the resource tracking of network interfaces.

  
https://github.com/openstack/nova/blob/19.0.0/nova/compute/manager.py#L2034-L2040

  However, we don't unplug these interfaces. This can result in things
  being left over on the host.

  We should attempt to unplug VIFs as part of this cleanup.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1831771/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1550919] Re: [Libvirt]Evacuate fail may cause disk image be deleted

2020-11-24 Thread Elod Illes

** Changed in: nova/stein
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1550919

Title:
  [Libvirt]Evacuate fail may cause disk image be deleted

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Committed

Bug description:
  I checked latest source of nova on master branch, this problem is
  still exists.

  When we are doing evacuate, eventually _do_rebuild_instance will be called.
  As rebuild is not implemented in libvirt driver, in fact 
_rebuild_default_impl is called.

  try:
  with instance.mutated_migration_context():
  self.driver.rebuild(**kwargs)
  except NotImplementedError:
  # NOTE(rpodolyaka): driver doesn't provide specialized version 
  # of rebuild, fall back to the default implementation
  self._rebuild_default_impl(**kwargs)

  _rebuild_default_impl will call self.driver.spawn to boot up the instance, 
and spawn will in turn call _create_domain_and_network
  when VirtualInterfaceCreateException or Timeout happen, self.cleanup will be 
called.

  except exception.VirtualInterfaceCreateException:
  # Neutron reported failure and we didn't swallow it, so
  # bail here
  with excutils.save_and_reraise_exception():
  if guest:
  guest.poweroff()
  self.cleanup(context, instance, network_info=network_info,
   block_device_info=block_device_info)
  except eventlet.timeout.Timeout:
  # We never heard from Neutron
  LOG.warn(_LW('Timeout waiting for vif plugging callback for '
   'instance %(uuid)s'), {'uuid': instance.uuid},
   instance=instance)
  if CONF.vif_plugging_is_fatal:
  if guest:
  guest.poweroff()
  self.cleanup(context, instance, network_info=network_info,
   block_device_info=block_device_info)
  raise exception.VirtualInterfaceCreateException()

  Because default value for parameter destroy_disks is True
  def cleanup(self, context, instance, network_info, block_device_info=None,
  destroy_disks=True, migrate_data=None, destroy_vifs=True):

  So if error occur when doing evacuate during wait neutron's event,
  instance's disk file will be deleted unexpectedly

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1550919/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1868033] Re: Booting instance with pci_device fails during rocky->stein live upgrade

2020-11-24 Thread Elod Illes

** Also affects: nova/ussuri
   Importance: Undecided
   Status: New

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Changed in: nova/train
   Status: New => Fix Released

** Changed in: nova/ussuri
   Status: New => Fix Committed

** Changed in: nova/stein
   Status: New => Fix Committed

** Changed in: nova/stein
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1868033

Title:
  Booting instance with pci_device fails during rocky->stein live
  upgrade

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Committed

Bug description:
  Environment:

  Stein nova-conductor having set upgrade_levels to rocky 
  Rocky nova-compute

  Boot an instance with a flavour that has a pci_device

  Error:

  Failed to publish message to topic 'nova': maximum recursion depth
  exceeded: RuntimeError: maximum recursion depth exceeded

  
  Tracked this down it it continually trying to backport the 
InstancePCIRequests:

  It gets as arguments:
  objinst={u'nova_object.version': u'1.1', u'nova_object.name': 
u'InstancePCIRequests', u'nova_object.data': {u'instance_uuid': 
u'08212b12-8fa8-42d9-8d3e-52ed60a64135', u'requests': [{u'nova_object.version': 
u'1.3', u'nova_object.name': u'InstancePCIRequest', u'nova_object.data': 
{u'count': 1, u'is_new': False, u'numa_policy': None, u'request_id': None, 
u'requester_id': None, u'alias_name': u'V100-32G', u'spec': [{u'vendor_id': 
u'10de', u'product_id': u'1db6'}]}, u'nova_object.namespace': u'nova'}]}, 
u'nova_object.namespace': u'nova'}, 

  object_versions={u'InstancePCIRequests': '1.1', 'InstancePCIRequest':
  '1.2'}

  
  It fails because it doesn't backport the individual InstancePCIRequest inside 
the InstancePCIRequests object and so keeps trying.

  Error it shows is: IncompatibleObjectVersion: Version 1.3 of
  InstancePCIRequest is not supported, supported version is 1.2


  I have fixed this in our setup by altering obj_make_compatible to
  downgrade the individual requests to version 1.2 which seems to work
  and all is good

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1868033/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1867075] Re: Arm64: Instance with Configure Drive attach volume failed failed

2020-11-24 Thread Elod Illes

** Changed in: nova/stein
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1867075

Title:
  Arm64: Instance with Configure Drive attach volume failed failed

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Committed

Bug description:
  Arm64.

  Image: cirros-0.5.1
  hw_cdrom_bus='scsi', hw_disk_bus='scsi', hw_machine_type='virt', 
hw_rng_model='virtio', hw_scsi_model='virtio-scsi', 
os_command_line=''console=ttyAMA0''

  Boot a vm.
  Create a volume: openstack volume create --size 1 test

  Attach:
  openstack server add volume cirros-test test

  Error:
  DEBUG nova.virt.libvirt.guest [None req-8dfbf677-50bb-42be-869f-52c9ac638d59 
admin admin] attach device xml: 
  
   



  


b9abb789-1c55-4210-ab5c-78b0e3619405   


 

  


  ror: Requested operation is not valid: Domain already contains a disk with 
that address
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
Traceback (most recent call last):
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
  File "/opt/stack/nova/nova/virt/block_device.py", line 599, in _volume_attach
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
device_type=self['device_type'], encryption=encryption)
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1731, in 
attach_volume
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
conf = self._get_volume_config(connection_info, disk_info)
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
  File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 
220, in __exit__
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
self.force_reraise()
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
  File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 
196, in force_reraise
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
six.reraise(self.type_, self.value, self.tb)
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
raise value
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1731, in 
attach_volume
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
conf = self._get_volume_config(connection_info, disk_info)
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
  File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 293, in attach_device
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
self._domain.attachDeviceFlags(device_xml, flags=flags)
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
  File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 190, in 
doit
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
result = proxy_call(self._autowrap, f, *args, **kwargs)
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
  File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 148, in 
proxy_call
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
rv = execute(f, *args, **kwargs)
  ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] 
  File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 129, in 
execute
  ERROR

[Yahoo-eng-team] [Bug 1869050] Re: migration of anti-affinity server fails due to stale scheduler instance info

2020-11-24 Thread Elod Illes

** Changed in: nova/stein
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1869050

Title:
  migration of anti-affinity server fails due to stale scheduler
  instance info

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  Invalid
Status in OpenStack Compute (nova) queens series:
  Invalid
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Committed
Status in OpenStack Compute (nova) ussuri series:
  Fix Committed

Bug description:
  Description
  ===

  
  Steps to reproduce
  ==
  Have a deployment with 3 compute nodes

  * make sure that the deployment is configured with 
tracks_instance_changes=True (True is the default)
  * create and server group with anti-affinity policy
  * boot server1 into the group
  * boot server2 into the group
  * migrate server2
  * confirm the migration
  * boot server3

  Make sure that between the last two steps there was no periodic
  _sync_scheduler_instance_info running on the compute that was hosted
  server2 before the migration. This could done by doing the last too
  steps after each other without waiting too much as interval of that
  periodic (scheduler_instance_sync_interval) is defaulted to 120 sec.

  Expected result
  ===
  server3 is booted on the host where server2 is moved away

  Actual result
  =
  server3 cannot be booted (NoValidHost)

  Triage
  ==

  The confirm resize call on the source compute does not update the
  scheduler that the instance is removed from this host. This makes the
  scheduler instance info stale and causing the subsequent scheduling
  error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1869050/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1852458] Re: "create" instance action not created when instance is buried in cell0

2020-11-24 Thread Elod Illes

** Changed in: nova/stein
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1852458

Title:
  "create" instance action not created when instance is buried in cell0

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ocata series:
  Triaged
Status in OpenStack Compute (nova) pike series:
  Triaged
Status in OpenStack Compute (nova) queens series:
  Triaged
Status in OpenStack Compute (nova) rocky series:
  Triaged
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Committed

Bug description:
  Before cell0 was introduced the API would create the "create" instance
  action for each instance in the nova cell database before casting off
  to conductor to do scheduling:

  https://github.com/openstack/nova/blob/mitaka-
  eol/nova/compute/api.py#L1180

  Note that conductor failed to "complete" the action with a failure
  event:

  https://github.com/openstack/nova/blob/mitaka-
  eol/nova/conductor/manager.py#L374

  But at least the action was created.

  Since then, with cell0, if scheduling fails the instance is buried in
  the cell0 database but no instance action is created. To illustrate, I
  disabled the single nova-compute service on my devstack host and
  created a server which failed with NoValidHost:

  $ openstack server show build-fail1 -f value -c fault
  {u'message': u'No valid host was found. ', u'code': 500, u'created': 
u'2019-11-13T15:57:13Z'}

  When listing instance actions I expected to see a "create" action but
  there were none:

  $ nova instance-action-list 008a7d52-dd83-4f52-a720-b3cfcc498259
  +++-+++
  | Action | Request_ID | Message | Start_Time | Updated_At |
  +++-+++
  +++-+++

  This is because the "create" action is only created when the instance
  is scheduled to a specific cell:

  https://github.com/openstack/nova/blob/20.0.0/nova/conductor/manager.py#L1460

  Solution:

  The ComputeTaskManager._bury_in_cell0 method should also create a
  "create" action in cell0 like it does for the instance BDMs and tags.

  This goes back to Ocata: https://review.opendev.org/#/c/319379/

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1852458/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1824858] Re: nova instance remnant left behind after cold migration completes

2020-11-24 Thread Elod Illes

** Changed in: nova/stein
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1824858

Title:
  nova instance remnant left behind after cold migration completes

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Committed
Status in StarlingX:
  Fix Released

Bug description:
  Brief Description
  -
  After cold migration to a new worker node, instances remnants are left behind

  
  Severity
  
  standard

  
  Steps to Reproduce
  --
  worker nodes compute-1 and compute-2 have label   remote-storage enabled
  1. Launch instance on compute-1
  2. cold migrate to compute-2
  3. confirm cold migration to complete

  
  Expected Behavior
  --
  Migration to compute-2 and cleanup on files on compute-1

  
  Actual Behavior
  
  At 16:35:24 cold migration for instance a416ead6-a17f-4bb9-9a96-3134b426b069  
completed to compute-2 but the following path is left behind on compute-1
  compute-1:/var/lib/nova/instances/a416ead6-a17f-4bb9-9a96-3134b426b069

  compute-1:/var/lib/nova/instances$ ls
  a416ead6-a17f-4bb9-9a96-3134b426b069 _base  locks
  a416ead6-a17f-4bb9-9a96-3134b426b069_resize  compute_nodes  lost+found

  
  compute-1:/var/lib/nova/instances$ ls
  a416ead6-a17f-4bb9-9a96-3134b426b069  _base  compute_nodes  locks  lost+found

  compute-1:/var/lib/nova/instances$ ls
  a416ead6-a17f-4bb9-9a96-3134b426b069  _base  compute_nodes  locks  lost+found


  2019-04-15T16:35:24.646749clear   700.010 Instance 
tenant2-migration_test-1 owned by tenant2 has been cold-migrated to host 
compute-2 waiting for confirmation
tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069
   critical
  2019-04-15T16:35:24.482575log 700.168 Cold-Migrate-Confirm 
complete for instance tenant2-migration_test-1 enabled on host compute-2   
tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069
   critical
  2019-04-15T16:35:16.815223log 700.163 Cold-Migrate-Confirm 
issued by tenant2 against instance tenant2-migration_test-1 owned by tenant2 on 
host compute-2 
tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069
   critical
  2019-04-15T16:35:10.030068clear   700.009 Instance 
tenant2-migration_test-1 owned by tenant2 is cold migrating from host compute-1 
   
tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069
   critical
  2019-04-15T16:35:09.971414set 700.010 Instance 
tenant2-migration_test-1 owned by tenant2 has been cold-migrated to host 
compute-2 waiting for confirmation
tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069
   critical
  2019-04-15T16:35:09.970212log 700.162 Cold-Migrate complete 
for instance tenant2-migration_test-1 now enabled on host compute-2 waiting for 
confirmation  
tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069
   critical
  2019-04-15T16:34:51.637687set 700.009 Instance 
tenant2-migration_test-1 owned by tenant2 is cold migrating from host compute-1 
   
tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069
   critical
  2019-04-15T16:34:51.637636log 700.158 Cold-Migrate inprogress 
for instance tenant2-migration_test-1 from host compute-1   
tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069
   critical
  2019-04-15T16:34:51.478442log 700.157 Cold-Migrate issued by 
tenant2 against instance tenant2-migration_test-1 owned by tenant2 from host 
compute-1   
tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069
   critical
  2019-04-15T16:34:20.181155log 700.101 Instance 
tenant2-migration_test-1 is enabled on host compute-1  
tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069
   critical

  
  see nova-compute.log (compute-1)
  compute-1 nova-compute log

  [instance: a416ead6-a17f-4bb9-9a96-3134b426b069 claimed and spawned
  here on compute-1]

  {"log":"2019-04-15 16:34:04,617.617 60908 INFO nova.compute.claims 
[req-f1195bbb-d5b0-4a75-a598-ff287d247643 3fd3229d3e6248cf9b5411b2ecec86e9 
7f1d42233341428a918855614770e676 - default default]

[Yahoo-eng-team] [Bug 1878024] Re: disk usage of the nova image cache is not counted as used disk space

2020-11-24 Thread Elod Illes

** Changed in: nova/stein
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1878024

Title:
  disk usage of the nova image cache is not counted as used disk space

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Committed
Status in OpenStack Compute (nova) ussuri series:
  Fix Committed

Bug description:
  Description
  ===
  The nova-compute service keeps a local image cache for glance images used for 
nova servers to avoid multiple download of the same image from glance. The disk 
usage of such cache is not calculated as local disk usage in nova and not 
reported to placement as used DISK_GB. This leads to disk over-allocation.

  Also the size of that cache cannot be limited by nova configuration so
  the deployer cannot reserve  disk space for that cache with
  reserved_host_disk_mb config.

  Steps to reproduce
  ==
  * Set up a single node devstack
  * Create and upload an image with a not too small physical size. Like an 
image with 1G physical size.
  * Check the current disk usage of the Host OS and configure 
reserved_host_disk_mb in nova-cpu.conf accordingly.
  * Boot two servers from that image with a flavor, like d1 (disk=5G)
  * Nova will download the glance image once to the local cache which result in 
a 1GB disk usage
  * Nova will create two root file systems, one for each VM. Those disks 
initially has minimal physical disk size, but has 5G virtual size.
  * At this point Nova allocated 5G + 5G of DISK_GB in placement, but due to 
the image in the cache the total disk usage of the two VMs + cache can be 5G + 
5G + 1G, if both VMs overwrite and fills the content of its own disk.

  Expected result
  ===
  Option A)
  Nova maintains a DISK_GB allocation in placement for the images in its cache. 
This way the expected DISK_GB allocation in placement is 5G + 5G + 1G at the end

  Option B)
  Nova provides a config option to limit the maximum size of the image cache 
and therefore the deployer can include the maximum image cache size into the 
reserved_host_disk_mb during dimensioning of the disk space of the compute.

  Actual result
  =
  Only 5G + 5G was allocation from placement. So disk space is over-allocated 
by the image cache.

  Environment
  ===

  Devstack from recent master

  stack@aio:/opt/stack/nova$ git log --oneline | head -n 1
  4b62c90063 Merge "Remove stale nested backport from InstancePCIRequests"

  libvirt driver with file based image backend

  Logs & Configs
  ==
  http://paste.openstack.org/show/793388/

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1878024/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1878583] Re: Unable to createImage/snapshot paused volume backed instances

2020-11-24 Thread Elod Illes

** Changed in: nova/stein
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1878583

Title:
  Unable to createImage/snapshot paused volume backed instances

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Committed

Bug description:
  Description
  ===
  Unable to createImage/snapshot paused volume backed instances.

  Steps to reproduce
  ==
  - Pause a volume backed instance
  - Attempt to snapshot the instance using the createImage API

  Expected result
  ===
  A snapshot image is successfully created as is the case for paused instances 
that are not volume backed.

  Actual result
  =
  n-api returns the following error:

  {'code': 409, 'message': "Cannot 'createImage' instance
  bc5a7ae4-fca9-4d83-b1b8-5534f51a9404 while it is in vm_state paused"}

  Environment
  ===
  1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/

 master

  2. Which hypervisor did you use?
 (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
 What's the version of that?

 N/A

  2. Which storage type did you use?
 (For example: Ceph, LVM, GPFS, ...)
 What's the version of that?

 N/A

  3. Which networking type did you use?
 (For example: nova-network, Neutron with OpenVSwitch, ...)

 N/A

  Logs & Configs
  ==

  As above.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1878583/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1878979] Re: Quota code does not respect [api]/instance_list_per_project_cells

2020-11-24 Thread Elod Illes

** Changed in: nova/stein
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1878979

Title:
  Quota code does not respect [api]/instance_list_per_project_cells

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Committed
Status in OpenStack Compute (nova) ussuri series:
  Fix Committed

Bug description:
  The function which counts resources using the legacy method involves
  getting a list of all cell mappings assigned to a specific project:

  
https://github.com/openstack/nova/blob/575a91ff5be79ac35aef4b61d84c78c693693304/nova/quota.py#L1170-L1209

  This code can be very heavy on a database which contains a lot of
  instances (but not a lot of mappings), potentially scanning millions
  of rows to gather 1-2 cell mappings.  In a single cell environment, it
  is just extra CPU usage with exactly the same outcome.

  The [api]/instance_list_per_project_cells was introduced to workaround
  this:

  
https://github.com/openstack/nova/blob/575a91ff5be79ac35aef4b61d84c78c693693304/nova/compute/instance_list.py#L146-L153

  However, the quota code does not implement it which means quota count
  take a big toll on the database server.  We should ideally mirror the
  same behaviour in the quota code.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1878979/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1879964] Re: Invalid value for 'hw:mem_page_size' raises confusing error

2020-11-24 Thread Elod Illes

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Also affects: nova/ussuri
   Importance: Undecided
   Status: New

** Changed in: nova/train
   Status: New => Fix Released

** Changed in: nova/ussuri
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1879964

Title:
  Invalid value for 'hw:mem_page_size' raises confusing error

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Committed

Bug description:
  Configure a flavor like so:

openstack flavor create hugepage  --ram 1024 --disk 10 --vcpus 1 test
openstack flavor set  hugepage --property hw:mem_page_size=2M test

  Attempt to boot an instance. It will fail with the following error
  message:

Invalid memory page size '0' (HTTP 400) (Request-ID: req-
  338bf619-3a54-45c5-9c59-ad8c1d425e91)

  You wouldn't know from reading it, but this is because the property
  should read 'hw:mem_page_size=2MB' (note the extra 'B').

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1879964/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1882233] Re: Libvirt driver always reports 'memory_mb_used' of 0

2020-11-24 Thread Elod Illes

** Also affects: nova/ussuri
   Importance: Undecided
   Status: New

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Changed in: nova/train
   Status: New => Fix Released

** Changed in: nova/ussuri
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1882233

Title:
  Libvirt driver always reports 'memory_mb_used' of 0

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Committed

Bug description:
  The nova-compute service periodically logs a summary of the free RAM,
  disk and vCPUs as reported by the hypervisor. For example:

Hypervisor/Node resource view: name=vtpm-f31.novalocal
  free_ram=7960MB free_disk=11.379043579101562GB free_vcpus=7
  pci_devices=[{...}]

  On a recent deployment using the libvirt driver, it's observed that
  the 'free_ram' value never changes despite instances being created and
  destroyed. This is because the 'get_memory_mb_used' function in
  'nova.virt.libvirt.host' always returns 0 unless the host platform -
  reported by 'sys.platform' is either 'linux2' or 'linux3'. Since
  Python 3.3, the major version is not included in this return value
  since it was misleading.

  This is low priority because the value only appears to be used for
  logging purposes and the values stored in e.g. the 'ComputeNode'
  object and reported to placement are calculated based on config
  options and number of instances on the node. We may wish to stop
  reporting this information instead.

  [1] https://stackoverflow.com/a/10429736/613428

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1882233/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1884214] Re: reserve disk usage for image cache fails on a fresh hypervisor

2020-11-24 Thread Elod Illes

** Changed in: nova/stein
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1884214

Title:
  reserve disk usage for image cache fails on a fresh hypervisor

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Committed

Bug description:
  If the image cache _base directory does not exists on the hypervisor
  yet and [workarounds]/reserve_disk_resource_for_image_cache = True is
  set in the nova compute config then nova-compute logs a stack trace
  [1] and resource state is not update in placement.

  [1] http://paste.openstack.org/show/794993/

  This issue was reported originally in
  https://bugs.launchpad.net/nova/+bug/1878024 by MarkMielke (mark-
  mielke).

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1884214/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1882919] Re: e1000e interface reported as unsupported

2020-11-24 Thread Elod Illes

** Changed in: nova/stein
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1882919

Title:
  e1000e interface reported as unsupported

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Committed

Bug description:
  Per this downstream bug [1], attempting to boot a Windows Server 2012
  or 2016 image will fail because libosinfo is attempting to configure
  an e1000e VIF which nova does not explicitly support. There doesn't
  appear to be any reason not to support this, since libvirt, and
  specifically QEMU/KVM, support it.

  [1] https://bugzilla.redhat.com/show_bug.cgi?id=1839808

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1882919/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1550919] Re: [Libvirt]Evacuate fail may cause disk image be deleted

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1550919

Title:
  [Libvirt]Evacuate fail may cause disk image be deleted

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Committed

Bug description:
  I checked latest source of nova on master branch, this problem is
  still exists.

  When we are doing evacuate, eventually _do_rebuild_instance will be called.
  As rebuild is not implemented in libvirt driver, in fact 
_rebuild_default_impl is called.

  try:
  with instance.mutated_migration_context():
  self.driver.rebuild(**kwargs)
  except NotImplementedError:
  # NOTE(rpodolyaka): driver doesn't provide specialized version 
  # of rebuild, fall back to the default implementation
  self._rebuild_default_impl(**kwargs)

  _rebuild_default_impl will call self.driver.spawn to boot up the instance, 
and spawn will in turn call _create_domain_and_network
  when VirtualInterfaceCreateException or Timeout happen, self.cleanup will be 
called.

  except exception.VirtualInterfaceCreateException:
  # Neutron reported failure and we didn't swallow it, so
  # bail here
  with excutils.save_and_reraise_exception():
  if guest:
  guest.poweroff()
  self.cleanup(context, instance, network_info=network_info,
   block_device_info=block_device_info)
  except eventlet.timeout.Timeout:
  # We never heard from Neutron
  LOG.warn(_LW('Timeout waiting for vif plugging callback for '
   'instance %(uuid)s'), {'uuid': instance.uuid},
   instance=instance)
  if CONF.vif_plugging_is_fatal:
  if guest:
  guest.poweroff()
  self.cleanup(context, instance, network_info=network_info,
   block_device_info=block_device_info)
  raise exception.VirtualInterfaceCreateException()

  Because default value for parameter destroy_disks is True
  def cleanup(self, context, instance, network_info, block_device_info=None,
  destroy_disks=True, migrate_data=None, destroy_vifs=True):

  So if error occur when doing evacuate during wait neutron's event,
  instance's disk file will be deleted unexpectedly

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1550919/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1818798] Re: Should not skip volume_size check for bdm.image_id == image_ref case

2020-11-24 Thread Elod Illes

** Changed in: nova/stein
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1818798

Title:
  Should not skip volume_size check for bdm.image_id == image_ref case

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) stein series:
  Fix Released

Bug description:
  The volume size should be checked in bdm.sourece_type=image, dest_type=volume 
case no matter what the image
  is, but in:
  
https://github.com/openstack/nova/blob/5a09c81af3b438ecbcf27fa653095ff55abb3ed4/nova/compute/api.py#L1452-L1453
  we skipped the check if the bdm.image_id == image_ref, it was meant to skip 
only the _get_image() check as it
  is already checked before, but it skipped the volume_size check too.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1818798/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1882919] Re: e1000e interface reported as unsupported

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1882919

Title:
  e1000e interface reported as unsupported

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) stein series:
  Fix Committed
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Committed

Bug description:
  Per this downstream bug [1], attempting to boot a Windows Server 2012
  or 2016 image will fail because libosinfo is attempting to configure
  an e1000e VIF which nova does not explicitly support. There doesn't
  appear to be any reason not to support this, since libvirt, and
  specifically QEMU/KVM, support it.

  [1] https://bugzilla.redhat.com/show_bug.cgi?id=1839808

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1882919/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1878583] Re: Unable to createImage/snapshot paused volume backed instances

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1878583

Title:
  Unable to createImage/snapshot paused volume backed instances

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Committed
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Committed

Bug description:
  Description
  ===
  Unable to createImage/snapshot paused volume backed instances.

  Steps to reproduce
  ==
  - Pause a volume backed instance
  - Attempt to snapshot the instance using the createImage API

  Expected result
  ===
  A snapshot image is successfully created as is the case for paused instances 
that are not volume backed.

  Actual result
  =
  n-api returns the following error:

  {'code': 409, 'message': "Cannot 'createImage' instance
  bc5a7ae4-fca9-4d83-b1b8-5534f51a9404 while it is in vm_state paused"}

  Environment
  ===
  1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/

 master

  2. Which hypervisor did you use?
 (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
 What's the version of that?

 N/A

  2. Which storage type did you use?
 (For example: Ceph, LVM, GPFS, ...)
 What's the version of that?

 N/A

  3. Which networking type did you use?
 (For example: nova-network, Neutron with OpenVSwitch, ...)

 N/A

  Logs & Configs
  ==

  As above.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1878583/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1843708] Re: Key-pair is not updated during the rebuild

2020-11-24 Thread Elod Illes

** Changed in: nova/stein
   Status: Fix Committed => Fix Released

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1843708

Title:
  Key-pair is not updated during the rebuild

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Committed

Bug description:
  Description
  ===

  When we want to rebuild an instance and change the keypair we can specified 
it with : 
  openstack --os-compute-api-version 2.54  server rebuild --image "Debian 10" 
--key-name key1 instance1

  This comes from this implementation : 
  https://review.opendev.org/#/c/379128/
  
https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/rebuild-keypair-reset.html

  But when rebuilding the instance, Cloud-Init will set the key in 
authorized_keys from 
  http://169.254.169.254/openstack/latest/meta_data.json

  And this meta_data.json uses the keys from instance_extra tables
  But the keypair will be updated in the 'instances' table but not in the 
'instance_extra' table.

  So the keypair is not updated inside the VM

  
  May be this is the function for saving the keypair, but the save() do nothing 
: 
  
https://opendev.org/openstack/nova/src/branch/master/nova/objects/instance.py#L714


  Steps to reproduce
  ==

  - Deploy a DevStack
  - Boot an instance with keypair key1
  - Rebuild it with key2
  - A nova show will show the key_name key2, keypairs object in table 
instance_extra is not updated and you cannot connect with key2 to the instance

  Expected result
  ===
  Connecte to the Vm with the new keypair added during the rebuild call

  Actual result
  =
  The keypair added during the rebuild call is not set in the VM

  Environment
  ===
  I tested it on a Devstack from master and we have the behaviour.
  NOVA : commit 5fa49cd0b8b6015aa61b4312b2ce1ae780c42c64

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1843708/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1825584] Re: eventlet monkey-patching breaks AMQP heartbeat on uWSGI

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1825584

Title:
  eventlet monkey-patching breaks AMQP heartbeat on uWSGI

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) stein series:
  In Progress
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  Stein nova-api running under uWSGI presents an AMQP issue. The first
  API call that requires RPC creates an AMQP connection and successfully
  completes. Normally regular heartbeats would be sent from this point
  on, to maintain the connection. This is not happening. After a few
  minutes, the AMQP server (rabbitmq, in my case) notices that there
  have been no heartbeats, and drops the connection. A later nova API
  call that requires RPC tries to use the old connection, and throws a
  "connection reset by peer" exception and the API call fails. A
  mailing-list response suggests that this is affecting mod_wsgi also:

  http://lists.openstack.org/pipermail/openstack-
  discuss/2019-April/005310.html

  I've discovered that this problem seems to be caused by eventlet
  monkey-patching, which was introduced in:

  
https://github.com/openstack/nova/commit/23ba1c690652832c655d57476630f02c268c87ae

  It was later rearranged in:

  
https://github.com/openstack/nova/commit/3c5e2b0e9fac985294a949852bb8c83d4ed77e04

  but this problem remains.

  If I comment out the import of nova.monkey_patch in
  nova/api/openstack/__init__.py the problem goes away.

  Seems that eventlet monkey-patching and uWSGI are not getting along
  for some reason...

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1825584/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1884214] Re: reserve disk usage for image cache fails on a fresh hypervisor

2020-11-24 Thread Elod Illes

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1884214

Title:
  reserve disk usage for image cache fails on a fresh hypervisor

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) stein series:
  Fix Committed
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Committed

Bug description:
  If the image cache _base directory does not exists on the hypervisor
  yet and [workarounds]/reserve_disk_resource_for_image_cache = True is
  set in the nova compute config then nova-compute logs a stack trace
  [1] and resource state is not update in placement.

  [1] http://paste.openstack.org/show/794993/

  This issue was reported originally in
  https://bugs.launchpad.net/nova/+bug/1878024 by MarkMielke (mark-
  mielke).

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1884214/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1887946] Re: Unable to detach volume from instance when previously removed from the inactive config

2020-11-24 Thread Elod Illes

** Changed in: nova/stein
   Status: Fix Committed => Fix Released

** Changed in: nova/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1887946

Title:
  Unable to detach volume from instance when previously removed from the
  inactive config

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  New
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Committed

Bug description:
  Description
  ===
  $subject, can often be encountered when previous attempts to detach a volume 
have failed due to the device still being used within the guestOS.

  This initial attempt will remove the device from the inactive config
  but fail to remove it from the active config. Any subsequent attempt
  will then fail as the initial call continues to attempt to remove the
  device from both the inactive and live configs.

  Prior to libvirt v4.1.0 this raised either a VIR_ERR_INVALID_ARG or
  VIR_ERR_OPERATION_FAILED error code from libvirt that n-cpu would
  handle, retrying the detach against the live config.

  Since libvirt v4.1.0 however this now raises a VIR_ERR_DEVICE_MISSING
  error code. This is not handled by Nova resulting in no attempt being
  made to detach the device from the live config.

  Steps to reproduce
  ==

  # Start with a volume attached as vdb (ignore the source ;))

  $ sudo virsh domblklist 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8
   Target   Source
  

   vda  
/opt/stack/data/nova/instances/4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8/disk
   vdb  
iqn.2010-10.org.openstack:volume-37cc97fa-9776-4b31-8f3f-cb1f18ff1db6/0

  # Detach from the inactive config

  $ sudo virsh detach-disk --config 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8 vdb
  Disk detached successfully

  # Confirm the device is still listed on the live config

  $ sudo virsh domblklist 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8
   Target   Source
  

   vda  
/opt/stack/data/nova/instances/4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8/disk
   vdb  
iqn.2010-10.org.openstack:volume-37cc97fa-9776-4b31-8f3f-cb1f18ff1db6/0

  # and removed from the persistent config

  $ sudo virsh domblklist --inactive 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8
   Target   Source
  

   vda  
/opt/stack/data/nova/instances/4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8/disk

  # Attempt to detach the volume

  $ openstack server remove volume 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8
  test

  Expected result
  ===
  The initial attempt to detach the device fails as the device isn't present in 
the inactive config but we continue to ensure the device is removed from the 
live config.

  Actual result
  =
  n-cpu doesn't handle the initial failure as the raised libvirt error code 
isn't recongnised.

  Environment
  ===
  1. Exact version of OpenStack you are running. See the following
    list for all releases: http://docs.openstack.org/releases/

     b7161fe9b92f0045e97c300a80e58d32b6f49be1

  2. Which hypervisor did you use?
     (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
     What's the version of that?

     libvirt + KVM

  2. Which storage type did you use?
     (For example: Ceph, LVM, GPFS, ...)
     What's the version of that?

     N/A

  3. Which networking type did you use?
     (For example: nova-network, Neutron with OpenVSwitch, ...)

     N/A

  Logs & Configs
  ==

  $ openstack server remove volume 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8 test ; 
journalctl -u devstack@n-cpu -f
  [..]
  Jul 16 17:26:53 localhost.localdomain nova-compute[190210]: DEBUG 
oslo_concurrency.lockutils [None req-16d62ef9-d492-4012-bb6d-37e5611ede50 admin 
admin] Lock "4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8" released by 
"nova.compute.manager.ComputeManager.detach_volume..do_detach_volume" 
:: held 0.141s {{(pid=190210) inner 
/usr/local/lib/python3.7/site-packages/oslo_concurrency/lockutils.py:371}}
  Jul 16 17:26:53 localhost.localdomain nova-compute[190210]: ERROR 
oslo_messaging.rpc.server [None req-16d62ef9-d492-4012-bb6d-37e5611ede50 admin 
admin] Exception during message handling: libvirt.libvirtError: device not 
found: no target device vdb
  Jul 16 17:26:53 localhost.localdomain nova-compute[190210]: ERROR 
oslo_messaging.rpc.server Traceback (most recent call last):
  Jul 16 17:26:53 localhost.localdomain nova-compute[190210]:

[Yahoo-eng-team] [Bug 1889108] Re: failures during driver.pre_live_migration remove source attachments during rollback

2020-11-24 Thread Elod Illes

** Changed in: nova/stein
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1889108

Title:
  failures during driver.pre_live_migration remove source attachments
  during rollback

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Committed

Bug description:
  Description
  ===

  $subject, the initial rollback and removal of any destination volume
  attachments is then repeated for the source volume attachments,
  leaving the volumes connected on the host but listed as `available` in
  cinder.

  Steps to reproduce
  ==
  Cause a failure during the call to driver.pre_live_migration with volumes 
attached.

  Expected result
  ===
  Any volume attachments for the destination host are deleted during the 
rollback.

  Actual result
  =
  Both sets of volumes attachments for the destination *and* the source are 
removed.

  Environment
  ===
  1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/

 eeeb964a5f65e6ac31dfb34b1256aaf95db5ba3a

  2. Which hypervisor did you use?
 (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
 What's the version of that?

 libvirt + KVM

  2. Which storage type did you use?
 (For example: Ceph, LVM, GPFS, ...)
 What's the version of that?

 N/A

  3. Which networking type did you use?
 (For example: nova-network, Neutron with OpenVSwitch, ...)

 N/A

  Logs & Configs
  ==

  When live-migration fails with attached volume changed to active and still in 
nova
  https://bugzilla.redhat.com/show_bug.cgi?id=1860914

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1889108/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

1 2 >

1 - 100 of 148 matches

Mail list logo