[Yahoo-eng-team] [Bug 1853632] Re: designate dns driver does not use domain settings for auth
[Expired for neutron because there has been no activity for 60 days.] ** Changed in: neutron Status: Incomplete => Expired -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1853632 Title: designate dns driver does not use domain settings for auth Status in neutron: Expired Bug description: The designate external dns driver does not use domain settings for authentication **if** there is more than one openstack domain. If you have only the 'Default' domain the authentication system seems to have no doubt on which domain to use so it will use that. In our deployment we support federated authentication and we have this issue. The issue lies in the external dns driver as it also does not support all of the documented options. You can test this by setting invalid one of (or all) in the [designate] section of /etc/neutron/neutron.conf: user_domain_name project_domain_name project_name it should yield the same results (it should all work). @oammis initially found this issue, so credit where it's due. We have a Queens deployment, although by what we can see from the code it should be transverse to all releases. I'll post a fix soon. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1853632/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1896592] Re: [neutron-tempest-plugin] test_dhcpv6_stateless_* clashing when creating a IPv6 subnet
[Expired for neutron because there has been no activity for 60 days.] ** Changed in: neutron Status: Incomplete => Expired -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1896592 Title: [neutron-tempest-plugin] test_dhcpv6_stateless_* clashing when creating a IPv6 subnet Status in neutron: Expired Status in tempest: Expired Bug description: The following three test cases are clashing when creating, at the same time, an IPv6 subnet with the same CIDR: - test_dhcpv6_stateless_eui64 - test_dhcpv6_stateless_no_ra - test_dhcpv6_stateless_no_ra_no_dhcp Log: https://61069af11b09b96273ad- d5a2c2135ef34e5fcff72992ca5eb476.ssl.cf2.rackcdn.com/662869/6/check /neutron-tempest-with-uwsgi/9b9c086/controller/logs/tempest_log.txt Snippet: http://paste.openstack.org/show/798195/ Error: "Invalid input for operation: Requested subnet with cidr: 2001:db8::/64 for network: 31e04aec-34df-49dc-8a05-05813a37be98 overlaps with another subnet." To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1896592/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1894839] Re: hostname not getting set as per the dns
[Expired for cloud-init because there has been no activity for 60 days.] ** Changed in: cloud-init Status: Incomplete => Expired -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1894839 Title: hostname not getting set as per the dns Status in cloud-init: Expired Bug description: Environment Details: Management Control Plane : OpenStack (Ussuri Release) cloud-init version : 19.1 (community) Data Source : Config Drive OS/platform of deployed VM : RHEL 8.2 I am using cloud-init v19.1 where the control plane (OpenStack nova service) passes information (data source) via configdrive during VM deployment. I am using set_hostname and update_hostname module under cloud_init_modules section in [1] file. As per the documentation, I have put preserve_hostname: true in the cfg file and rebooted the system(VM). The hostname is preserved and does not change after multiple reboot. This I believe is working as expected. When we using preserve_host: false and fqdn: in [1], the hostname given as input value to fqdn is assigned as the hostname of the VM. But when we just give preserve_hostname: false in [1], and deploy the VM, the deployed VM name comes as a hostname. [root@host cloudinit]# hostname host Expectation : If we give preserve_hostname: false it should set the hostname as per the dns what nslookup gives. If we use an image where preserve_hostname: false, with no hostname specified in the cfg file) and use set_hostname and update_hostname in [1] modules in the cfg file, and deploy few VMs (say 5 VMs ) with it - how will the hostname get configured to all the deployed VMs ? In this use case, we will want the hostname configured on the VM to DNS resolvable and aligned with the IP address associated with the VM. How can this be achieved ? [1] /etc/cloud/cloud.cfg To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1894839/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1896592] Re: [neutron-tempest-plugin] test_dhcpv6_stateless_* clashing when creating a IPv6 subnet
[Expired for tempest because there has been no activity for 60 days.] ** Changed in: tempest Status: Incomplete => Expired -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1896592 Title: [neutron-tempest-plugin] test_dhcpv6_stateless_* clashing when creating a IPv6 subnet Status in neutron: Expired Status in tempest: Expired Bug description: The following three test cases are clashing when creating, at the same time, an IPv6 subnet with the same CIDR: - test_dhcpv6_stateless_eui64 - test_dhcpv6_stateless_no_ra - test_dhcpv6_stateless_no_ra_no_dhcp Log: https://61069af11b09b96273ad- d5a2c2135ef34e5fcff72992ca5eb476.ssl.cf2.rackcdn.com/662869/6/check /neutron-tempest-with-uwsgi/9b9c086/controller/logs/tempest_log.txt Snippet: http://paste.openstack.org/show/798195/ Error: "Invalid input for operation: Requested subnet with cidr: 2001:db8::/64 for network: 31e04aec-34df-49dc-8a05-05813a37be98 overlaps with another subnet." To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1896592/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1885527] Re: cloud-init regenerating ssh-keys
This bug was fixed in the package cloud-init - 20.4-0ubuntu1 --- cloud-init (20.4-0ubuntu1) hirsute; urgency=medium * d/control: add gnupg to Recommends as cc_apt_configure requires it to be installed for some operations. * New upstream release. - Release 20.4 (#686) [James Falcon] (LP: #1905440) - tox: avoid tox testenv subsvars for xenial support (#684) - Ensure proper root permissions in integration tests (#664) [James Falcon] - LXD VM support in integration tests (#678) [James Falcon] - Integration test for fallocate falling back to dd (#681) [James Falcon] - .travis.yml: correctly integration test the built .deb (#683) - Ability to hot-attach NICs to preprovisioned VMs before reprovisioning (#613) [aswinrajamannar] - Support configuring SSH host certificates. (#660) [Jonathan Lung] - add integration test for LP: #1900837 (#679) - cc_resizefs on FreeBSD: Fix _can_skip_ufs_resize (#655) [Mina Galić] (LP: #1901958, #1901958) - DataSourceAzure: push dmesg log to KVP (#670) [Anh Vo] - Make mount in place for tests work (#667) [James Falcon] - integration_tests: restore emission of settings to log (#657) - DataSourceAzure: update password for defuser if exists (#671) [Anh Vo] - tox.ini: only select "ci" marked tests for CI runs (#677) - Azure helper: Increase Azure Endpoint HTTP retries (#619) [Johnson Shi] - DataSourceAzure: send failure signal on Azure datasource failure (#594) [Johnson Shi] - test_persistence: simplify VersionIsPoppedFromState (#674) - only run a subset of integration tests in CI (#672) - cli: add --system param to allow validating system user-data on a machine (#575) - test_persistence: add VersionIsPoppedFromState test (#673) - introduce an upgrade framework and related testing (#659) - add --no-tty option to gpg (#669) [Till Riedel] (LP: #1813396) - Pin pycloudlib to a working commit (#666) [James Falcon] - DataSourceOpenNebula: exclude SRANDOM from context output (#665) - cloud_tests: add hirsute release definition (#662) - split integration and cloud_tests requirements (#652) - faq.rst: add warning to answer that suggests running `clean` (#661) - Fix stacktrace in DataSourceRbxCloud if no metadata disk is found (#632) [Scott Moser] - Make wakeonlan Network Config v2 setting actually work (#626) [dermotbradley] - HACKING.md: unify network-refactoring namespace (#658) [Mina Galić] - replace usage of dmidecode with kenv on FreeBSD (#621) [Mina Galić] - Prevent timeout on travis integration tests. (#651) [James Falcon] - azure: enable pushing the log to KVP from the last pushed byte (#614) [Moustafa Moustafa] - Fix launch_kwargs bug in integration tests (#654) [James Falcon] - split read_fs_info into linux & freebsd parts (#625) [Mina Galić] - PULL_REQUEST_TEMPLATE.md: expand commit message section (#642) - Make some language improvements in growpart documentation (#649) [Shane Frasier] - Revert ".travis.yml: use a known-working version of lxd (#643)" (#650) - Fix not sourcing default 50-cloud-init ENI file on Debian (#598) [WebSpider] - remove unnecessary reboot from gpart resize (#646) [Mina Galić] - cloudinit: move dmi functions out of util (#622) [Scott Moser] - integration_tests: various launch improvements (#638) - test_lp1886531: don't assume /etc/fstab exists (#639) - Remove Ubuntu restriction from PR template (#648) [James Falcon] - util: fix mounting of vfat on *BSD (#637) [Mina Galić] - conftest: improve docstring for disable_subp_usage (#644) - doc: add example query commands to debug Jinja templates (#645) - Correct documentation and testcase data for some user-data YAML (#618) [dermotbradley] - Hetzner: Fix instance_id / SMBIOS serial comparison (#640) [Markus Schade] - .travis.yml: use a known-working version of lxd (#643) - tools/build-on-freebsd: fix comment explaining purpose of the script (#635) [Mina Galić] - Hetzner: initialize instance_id from system-serial-number (#630) [Markus Schade] (LP: #1885527) - Explicit set IPV6_AUTOCONF and IPV6_FORCE_ACCEPT_RA on static6 (#634) [Eduardo Otubo] - get_interfaces: don't exclude Open vSwitch bridge/bond members (#608) [Lukas Märdian] (LP: #1898997) - Add config modules for controlling IBM PowerVM RMC. (#584) [Aman306] (LP: #1895979) - Update network config docs to clarify MAC address quoting (#623) [dermotbradley] - gentoo: fix hostname rendering when value has a comment (#611) [Manuel Aguilera] - refactor integration testing infrastructure (#610) [James Falcon] - stages: don't reset permissions of cloud-init.log every boot (#624) (LP: #1900837) - docs: Add how to use cloud-localds to boot qemu (#617) [Joshua Powers] - Drop vestigial
[Yahoo-eng-team] [Bug 1905493] [NEW] cloud-init status --wait hangs indefinitely in a nested lxd container
Public bug reported: When booting a nested lxd container inside another lxd container (just a normal container, not a VM) (i.e. just L2), using cloud-init -status --wait, the "." is just printed off infinitely and never returns. ** Affects: cloud-init Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1905493 Title: cloud-init status --wait hangs indefinitely in a nested lxd container Status in cloud-init: New Bug description: When booting a nested lxd container inside another lxd container (just a normal container, not a VM) (i.e. just L2), using cloud-init -status --wait, the "." is just printed off infinitely and never returns. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1905493/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1905447] [NEW] ds-identify OpenStack is odd
Public bug reported: ds-identify OpenStack is odd # LP: #1715241 : arch other than intel are not identified properly. case "$DI_UNAME_MACHINE" in i?86|x86_64) :;; *) return ${DS_MAYBE};; esac It has that, which is not nice. Also i think above is no longer true, i think that arm64 ppc64le s390x do have better openstack identification these days. Also returning DS_MAYBE is a bit harmful on arches that are known not to have openstack yet - i.e. riscv64. ** Affects: cloud-init Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1905447 Title: ds-identify OpenStack is odd Status in cloud-init: New Bug description: ds-identify OpenStack is odd # LP: #1715241 : arch other than intel are not identified properly. case "$DI_UNAME_MACHINE" in i?86|x86_64) :;; *) return ${DS_MAYBE};; esac It has that, which is not nice. Also i think above is no longer true, i think that arm64 ppc64le s390x do have better openstack identification these days. Also returning DS_MAYBE is a bit harmful on arches that are known not to have openstack yet - i.e. riscv64. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1905447/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1788915] Re: sysconfig renders vlan with TYPE=Ethernet
This bug is believed to be fixed in cloud-init in version 20.4. If this is still a problem for you, please make a comment and set the state back to New Thank you. ** Changed in: cloud-init Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1788915 Title: sysconfig renders vlan with TYPE=Ethernet Status in cloud-init: Fix Released Bug description: Distribution: Fedora 28 Cloud provider: None Network content of /etc/cloud/cloud.cfg.d/99_datasource.cfg (omitting users, etc.): network: version: 1 config: - type: physical name: lan1 mac_address: 0c:c4:7a:db:dc:b0 - type: vlan name: lan1.100 vlan_link: lan1 vlan_id: 100 subnets: - type: static address: 192.168.0.2/24 gateway: 192.168.0.1 dns_nameservers: - 8.8.8.8 - 8.8.4.4 - type: vlan name: lan1.3900 vlan_link: lan1 vlan_id: 3900 subnets: - type: static address: 10.1.0.2/16 gateway: I am unable to attach logs (no network connection). $ cloud-init --version /usr/bin/cloud-init 17.1 The sysconfig renderer leaves the configured "kind" set to the default (ethernet), which results in a config file with "TYPE=Ethernet", which is incorrect and results in the VLAN interface not being created. $ cat ifcfg-lan1.100 # Created by cloud-init on instance boot automatically, do not edit. # BOOTPROTO=none DEFROUTE=yes DEVICE=lan1.100 GATEWAY=192.168.0.1 IPADDR=192.168.0.2 NETMASK=255.255.255.0 ONBOOT=yes PHYSDEV=lan1 TYPE=Ethernet USERCTL=no VLAN=yes $ ifup lan1.100 Error: Connection activation failed: No suitable device found for this connection. Removing the offending "TYPE=Ethernet" line from the config file resolves the problem (as does changing it to "TYPE=Vlan"). I altered my configuration to use version 2 of the network configuration data with identical results (problem is in renderer). To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1788915/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1776958] Re: error creating lxdbr0.
This bug is believed to be fixed in cloud-init in version 20.4. If this is still a problem for you, please make a comment and set the state back to New Thank you. ** Changed in: cloud-init Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1776958 Title: error creating lxdbr0. Status in cloud-init: Fix Released Status in cloud-init package in Ubuntu: Fix Released Bug description: $ cat > my.yaml <) failed cloudinit.util.ProcessExecutionError: Unexpected error while running command. Stderr: Error: The network already exists To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1776958/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1813396] Re: gpg called without no-tty
This bug is believed to be fixed in cloud-init in version 20.4. If this is still a problem for you, please make a comment and set the state back to New Thank you. ** Changed in: cloud-init Status: Confirmed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1813396 Title: gpg called without no-tty Status in cloud-init: Fix Released Bug description: I am running cloud-init on a libvirt/virsh with this image: https://cdimage.debian.org/cdimage/openstack/archive/9.6.5-20190122/debian-9.6.5-20190122 -openstack-amd64.qcow2 the relevant lines are: apt: sources: docker: source: 'deb [arch=amd64] https://download.docker.com/linux/debian stretch stable' keyserver: keyserver.ubuntu.com keyid: 0EBFCD88 sorry for not attaching any bugs, but the triggered command just does not find "/dev/tty") The gpg wrapper should have the "no-tty" argument for receiving key at least on debian systems. Otherwise cloudinit fails when specifying key-ids on debian cloudimages (with manually added dirmngr and apt-transport-https, it is quite a mess on the openstack debian images...) I would naively propose the following patch: diff --git a/cloudinit/gpg.py b/cloudinit/gpg.py index 7fe17a2..21d598e 100644 --- a/cloudinit/gpg.py +++ b/cloudinit/gpg.py @@ -42,7 +42,7 @@ def recv_key(key, keyserver, retries=(1, 1)): @param retries: an iterable of sleep lengths for retries. Use None to indicate no retries.""" LOG.debug("Importing key '%s' from keyserver '%s'", key, keyserver) -cmd = ["gpg", "--keyserver=%s" % keyserver, "--recv-keys", key] +cmd = ["gpg", "--no-tty", "--keyserver=%s" % keyserver, "--recv-keys", key] if retries is None: retries = [] trynum = 0 BR Till To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1813396/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1897915] Re: ntp service on centos is ntp.service, but cloud-init uses nptd.service
This bug is believed to be fixed in cloud-init in version 20.4. If this is still a problem for you, please make a comment and set the state back to New Thank you. ** Changed in: cloud-init Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1897915 Title: ntp service on centos is ntp.service, but cloud-init uses nptd.service Status in cloud-init: Fix Released Bug description: The ntp (client) service file installed by the centos 7 'ntp' package is named 'ntpd' (note the d), but cloud-init's cc_ntp module identifies that 'service_name' as 'ntp'. See below on centos 7. For centos 8, there is no 'ntp' package that I see, it seems to have been replaced by chrony. [root@cent71 ~]# rpm -ql ntp | grep systemd /usr/lib/systemd/ntp-units.d/60-ntpd.list /usr/lib/systemd/system/ntpd.service [root@cent71 ~]# systemctl status ntp.service Unit ntp.service could not be found. [root@cent71 ~]# systemctl cat ntp.service No files found for ntp.service. [root@cent71 ~]# systemctl cat ntpd.service # /usr/lib/systemd/system/ntpd.service [Unit] Description=Network Time Service After=syslog.target ntpdate.service sntp.service [Service] Type=forking EnvironmentFile=-/etc/sysconfig/ntpd ExecStart=/usr/sbin/ntpd -u ntp:ntp $OPTIONS PrivateTmp=true [Install] WantedBy=multi-user.target To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1897915/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1885527] Re: cloud-init regenerating ssh-keys
This bug is believed to be fixed in cloud-init in version 20.4. If this is still a problem for you, please make a comment and set the state back to New Thank you. ** Changed in: cloud-init Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1885527 Title: cloud-init regenerating ssh-keys Status in cloud-init: Fix Released Status in cloud-init package in Ubuntu: In Progress Bug description: Hi, I made some experiments with virtual machines with Ubuntu-20.04 at a german cloud provider (Hetzner), who uses cloud-init to initialize machines with a basic setup such as ip and ssh access. During my installation tests I had to reboot the virtual machines several times after installing or removing packages. Occassionally (not always) I noticed that the ssh host keys have changed, ssh complained. After accepting the new host keys (insecure!) I found, that all key files in /etc/ssh had fresh mod times, i.e. were freshly regenerated. This reminds me to a bug I had reported about cloud-init some time ago, where I could not change the host name permanently, because cloud-init reset it to it's initial configuration at every boot time (highly dangerous, because it seemed to reset passwords to their original state as well. Although cloud-init is intended to do an initial configuration for the first boot only, it seems to remain on the system and – even worse: occasionally – change configurations. I've never understood what's the purpose of cloud-init remaining active once after the machine is up and running. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1885527/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1826608] Re: sysconfig rendering ignores vlan name
This bug is believed to be fixed in cloud-init in version 20.4. If this is still a problem for you, please make a comment and set the state back to New Thank you. ** Changed in: cloud-init Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1826608 Title: sysconfig rendering ignores vlan name Status in cloud-init: Fix Released Bug description: sysconfig rendering currently just does not pay attention to the vlan device's name. Instead it attempts to set the name to the backing device with .* stripped from the end. Here is an example of current master output. The 'PHYSDEV' entry should be 'eth0', not 'infra'. $ cat my2.yaml version: 2 ethernets: eth0: addresses: ["192.10.1.2/24"] match: macaddress: "00:16:3e:60:7c:df" vlans: infra0: id: 1001 link: eth0 addresses: ["10.0.1.2/16"] $ tox-venv py3 python3 -m cloudinit.cmd.main devel net-convert \ --mac en0,00:16:3e:60:7c:df \ --network-data=my2.yaml --kind=yaml \ --distro=centos --output-kind=sysconfig \ --directory=out.test $ cat out.test/etc/sysconfig/network-scripts/ifcfg-eth0 # Created by cloud-init on instance boot automatically, do not edit. # BOOTPROTO=none DEVICE=eth0 HWADDR=00:16:3e:60:7c:df IPADDR=192.10.1.2 NETMASK=255.255.255.0 NM_CONTROLLED=no ONBOOT=yes STARTMODE=auto TYPE=Ethernet USERCTL=no $ cat out.test/etc/sysconfig/network-scripts/ifcfg-infra0 # Created by cloud-init on instance boot automatically, do not edit. # BOOTPROTO=none DEVICE=infra0 IPADDR=10.0.1.2 NETMASK=255.255.0.0 NM_CONTROLLED=no ONBOOT=yes PHYSDEV=infra STARTMODE=auto TYPE=Ethernet USERCTL=no VLAN=yes To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1826608/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1895976] Re: Fail to get http openstack metadata if the Linux instance runs on Hyper-V
This bug is believed to be fixed in cloud-init in version 20.4. If this is still a problem for you, please make a comment and set the state back to New Thank you. ** Changed in: cloud-init Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1895976 Title: Fail to get http openstack metadata if the Linux instance runs on Hyper-V Status in cloud-init: Fix Released Status in compute-hyperv: New Status in OpenStack Compute (nova): In Progress Status in os-win: Fix Committed Bug description: Because of the commit that introduced platform checks for enabling / using http openstack metadata (https://github.com/canonical/cloud- init/commit/1efa8a0a030794cec68197100f31a856d0d264ab), cloud-init on Linux machines will stop loading http metadata when running on "unsupported" platforms / hypervisors like Hyper-V, XEN, OracleCloud, VMware, OpenTelekomCloud - leading to a whole suite of bug reports and fixes to a non-issue. Let's try to solve this problem once for all the upcoming platforms / hypervisors by adding a configuration option on the metadata level: perform_platform_check or check_if_platform_is_supported (suggestions are welcome for the naming). The value of the config option should be true in order to maintain backwards compatibility. When set to true, cloud-init will check if the platform is supported. No one would like to patch well-working OpenStack environments for this kind of issues and it is always easier to control / build the images you use on private OpenStack. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1895976/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1895979] Re: New cloud-init config modules for PowerVM Hypervisor based VMs
This bug is believed to be fixed in cloud-init in version 20.4. If this is still a problem for you, please make a comment and set the state back to New Thank you. ** Changed in: cloud-init Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1895979 Title: New cloud-init config modules for PowerVM Hypervisor based VMs Status in cloud-init: Fix Released Bug description: Linux Virtual Machines deployed (with ppcle architecture), using IBM PowerVM[1] Hypervisor on IBM Power System hosts, need an additional component (referred to as RMC - Remote Management Console) to be installed and successfully running on the VM. This RMC module/service must be installed and functioning on a ppcle VM, for the PowerVM Hypervisor to be able to communicate / manage these Virtual Machines. When a VM boots, there are some basic set of steps(generation of unique RMC node id; subsequent restart of the RMC service etc), that must be performed on a ppcle VM, to ensure that the communication of the VM and PowerVM Hypervisor is intact. RMC has to be active on the VM for the hypervisor to be able to perform many operations successfully. Thus a healthy RMC is a prerequisite for a PowerVM hypervisor based VM. To enable the healthy functioning of RMC services on ppcle Linux based VMs, there are couple of cloud-init config modules that we have been maintaining downstream. As part of this LP bug, we would like to upstream it, so that it benefits the larger community who is using PowerVM based VMs. References: [1] https://developer.ibm.com/depmodels/cloud/articles/cl-hypervisorcompare-powervm/ To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1895979/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1897099] Re: create_swap do not fallback to dd when fallocate fails
This bug is believed to be fixed in cloud-init in version 20.4. If this is still a problem for you, please make a comment and set the state back to New Thank you. ** Changed in: cloud-init Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1897099 Title: create_swap do not fallback to dd when fallocate fails Status in cloud-init: Fix Released Bug description: Name: cloud-init Version : 20.2-1 Code in questioning: cloudinit/config/cc_mounts.py try: create_swap(fname, size, "fallocate") except util.ProcessExecutionError as e: LOG.warning(errmsg, fname, size, "dd", e) LOG.warning("Will attempt with dd.") create_swap(fname, size, "dd") as there is a kernel bug in latest's linux versions, fallocate creates swap images with holes. The workaround is to move fallocate (make create_swap function to fail) so that cloud-init will fallback to dd. Used bootcmd (or cloud-boothook) to rename (move) fallocate binary from my system but according the the logs, it didnt fallback to dd as it should. Probably the error was not ProcessExecutionError Logs: /var/log/cloud-init-output.log:2020-09-24 09:13:16,470 - cc_mounts.py[WARNING]: Failed to create swapfile '/swapfile' of size 2048MB via fallocate: Unexpected error while running command. /var/log/cloud-init-output.log:Command: ['fallocate', '-l', '2048M', '/swapfile'] /var/log/cloud-init-output.log:Reason: [Errno 2] No such file or directory: b'fallocate' /var/log/cloud-init-output.log:2020-09-24 09:13:16,479 - cc_mounts.py[WARNING]: failed to setup swap: [Errno 2] No such file or directory: '/swapfile' /var/log/cloud-init-output.log:chmod: cannot access '/usr/bin/fallocate': No such file or directory /var/log/cloud-init.log:2020-09-24 09:13:16,460 - cc_mounts.py[DEBUG]: Attempting to determine the real name of swap /var/log/cloud-init.log:2020-09-24 09:13:16,460 - cc_mounts.py[DEBUG]: changed default device swap => None /var/log/cloud-init.log:2020-09-24 09:13:16,460 - cc_mounts.py[DEBUG]: Ignoring nonexistent default named mount swap /var/log/cloud-init.log:2020-09-24 09:13:16,461 - cc_mounts.py[DEBUG]: suggest 2048.0 MB swap for 1983.953125 MB memory with '9030.296875 MB' disk given max=2048.0 MB [max=2048.0 MB]' /var/log/cloud-init.log:2020-09-24 09:13:16,461 - cc_mounts.py[DEBUG]: Creating swapfile in '/swapfile' on fstype 'ext4' using 'fallocate' /var/log/cloud-init.log:2020-09-24 09:13:16,461 - util.py[DEBUG]: Running command ['fallocate', '-l', '2048M', '/swapfile'] with allowed return codes [0] (she ll=False, capture=True) /var/log/cloud-init.log:2020-09-24 09:13:16,470 - cc_mounts.py[WARNING]: Failed to create swapfile '/swapfile' of size 2048MB via fallocate: Unexpected error while running command. /var/log/cloud-init.log:Command: ['fallocate', '-l', '2048M', '/swapfile'] /var/log/cloud-init.log:Reason: [Errno 2] No such file or directory: b'fallocate' /var/log/cloud-init.log:2020-09-24 09:13:16,479 - util.py[DEBUG]: Attempting to remove /swapfile /var/log/cloud-init.log:2020-09-24 09:13:16,479 - util.py[DEBUG]: Setting up swap file took 0.019 seconds /var/log/cloud-init.log:2020-09-24 09:13:16,479 - cc_mounts.py[WARNING]: failed to setup swap: [Errno 2] No such file or directory: '/swapfile' To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1897099/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1898997] Re: MAAS cannot deploy/boot if OVS bridge is configured on a single PXE NIC
This bug is believed to be fixed in cloud-init in version 20.4. If this is still a problem for you, please make a comment and set the state back to New Thank you. ** Changed in: cloud-init Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1898997 Title: MAAS cannot deploy/boot if OVS bridge is configured on a single PXE NIC Status in cloud-init: Fix Released Status in netplan: Fix Released Status in netplan.io package in Ubuntu: Fix Released Status in netplan.io source package in Focal: Fix Released Status in netplan.io source package in Groovy: Fix Released Bug description: Problem description: If we try to deploy a single-NIC machine via MAAS, configuring an Open vSwitch bridge as the primary/PXE interface, the machine will install and boot Ubuntu 20.04 but it cannot finish the whole configuration (e.g. copying of SSH keys) and cannot be accessed/controlled via MAAS. It ends up in a "Failed" state. This is because systemd-network-wait-online.service fails (for some reason), before netplan can fully setup and configure the OVS bridge. Because of broken networking cloud-init cannot complete its final stages, like setup of SSH keys or signaling its state back to MAAS. If we wait a little longer the OVS bridge will actually come online and networking is working – SSH not being setup and MAAS state still "Failed", though. Steps to reproduce: * Setup a (virtual) MAAS system, e.g. inside a LXD container using a KVM host, as described here: https://discourse.maas.io/t/setting-up-a-flexible-virtual-maas-test-environment/142 * Install & setup maas[-cli] snap from 2.9/beta channel (instead of the deb/PPA from the discourse post) * Configure netplan PPA+key for testing via "Settings" -> "Package repos": https://launchpad.net/~slyon/+archive/ubuntu/ovs * Prepare curtin preseed in /var/snap/maas/current/preseeds/curtin_userdata, inside the LXD container (so you can access the broken machine afterwards): == #cloud-config debconf_selections: maas: | {{for line in str(curtin_preseed).splitlines()}} {{line}} {{endfor}} late_commands: maas: [wget, '--no-proxy', '{{node_disable_pxe_url}}', '--post-data', '{{node_disable_pxe_data}}', '-O', '/dev/null'] 90_create_user: ["curtin", "in-target", "--", "sh", "-c", "sudo useradd test -g 0 -G sudo"] 92_set_user_password: ["curtin", "in-target", "--", "sh", "-c", "echo 'test:test' | sudo chpasswd"] 94_cat: ["curtin", "in-target", "--", "sh", "-c", "cat /etc/passwd"] 98_cloud_init: ["curtin", "in-target", "--", "apt-get", "-y", "install", "cloud-init"] == * Compose a new virtual machine via MAAS' "KVM" menu, named e.g. "test1" * Watch it being commissioned via MAAS' "Machines" menu * Once it's ready select your machine (e.g. "test1.maas") -> Network * Select the single network interface (e.g. "ens4") -> Create bridge * Choose "Bridge type: Open vSwitch (ovs)", Select "Subnet" and "IP mode", save. * Deploy machine to Ubuntu 20.04 via "Take action" button The machine will install the OS and boot, but will end up in a "Failed" state inside MAAS due to network/OVS not being setup correctly. MAAS/SSH has no control over it. You can access the (broken) machine via serial console from the KVM-host (i.e. LXD container) via "virsh console test1" using the "test:test" credentials. === SRU/Focal/netplan.io === [Impact] This update contains bug-fixes and packaging improvements and we would like to make sure all of our supported customers have access to these improvements. The notable ones are: * Setup OVS early in network-pre.target to avoid delays (LP: #1898997) See the changelog entry below for a full list of changes and bugs. [Test Case] The following development and SRU process was followed: https://wiki.ubuntu.com/NetplanUpdates Netplan contains an extensive integration test suite that is ran using the SRU package for each releases. This test suite's results are available here: http://autopkgtest.ubuntu.com/packages/n/netplan.io A successful run is required before the proposed netplan package can be let into -updates. The netplan team will be in charge of attaching the artifacts and console output of the appropriate run to the bug. Netplan team members will not mark ‘verification-done’ until this has happened. [Regression Potential] In order to mitigate the regression potential, the results of the aforementioned integration tests are attached to this bug. Focal: https://git.launchpad.net/~slyon/+git/files/tree/LP1898997/focal_amd64.log https://git.launchpad.net/~slyon/+git/files/tree/LP1898997/focal_arm64.log https://git.launchpad.net/~slyon/+git/files/tree/LP1898997/focal_armhf.log
[Yahoo-eng-team] [Bug 1900837] Re: cloud-init resets permissions on log file after reboot
This bug is believed to be fixed in cloud-init in version 20.4. If this is still a problem for you, please make a comment and set the state back to New Thank you. ** Changed in: cloud-init Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1900837 Title: cloud-init resets permissions on log file after reboot Status in cloud-init: Fix Released Bug description: In attempting to apply CIS security guidelines onto an Ubuntu system it was found that changing the log files in /var/log to 640, that on a reboot cloud-init would reset the permissions to 644. As long as cloud-init can write to the file it should be ok to alter the permissions without issue. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1900837/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1901958] Re: FreeBSD fix fs related bugs
This bug is believed to be fixed in cloud-init in version 20.4. If this is still a problem for you, please make a comment and set the state back to New Thank you. ** Changed in: cloud-init Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1901958 Title: FreeBSD fix fs related bugs Status in cloud-init: Fix Released Bug description: 1) FreeBSD not support vfat use msdosfs Original report https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=250496 "Feel free to submit upstream if you have signed the CLA. I do not want to sign it." 2) if fs have trim: (-t) or MAC multilabel: (-l) flag, resize FS fail. https://www.freebsd.org/cgi/man.cgi?query=tunefs=8 2020-10-28 17:15:07,015 - handlers.py[DEBUG]: finish: init-network/config-resizefs: FAIL: running config-resizefs with frequency always ... File "/usr/local/lib/python3.7/site-packages/cloudinit/config/cc_resizefs.py", line 114, in _can_skip_resize_ufs optlist, _args = getopt.getopt(newfs_cmd[1:], opt_value) File "/usr/local/lib/python3.7/getopt.py", line 95, in getopt opts, args = do_shorts(opts, args[0][1:], shortopts, args[1:]) File "/usr/local/lib/python3.7/getopt.py", line 195, in do_shorts if short_has_arg(opt, shortopts): File "/usr/local/lib/python3.7/getopt.py", line 211, in short_has_arg raise GetoptError(_('option -%s not recognized') % opt, opt) getopt.GetoptError: option -t not recognized To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1901958/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1905440] Re: Release 20.4
This bug is believed to be fixed in cloud-init in version 20.4. If this is still a problem for you, please make a comment and set the state back to New Thank you. ** Changed in: cloud-init Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1905440 Title: Release 20.4 Status in cloud-init: Fix Released Bug description: == Release Notes == Cloud-init release 20.4 is now available The 20.4 release: * spanned about 3 months * had 29 contributors from 31 domains * fixed 14 Launchpad issues Highlights: - Azure ability to hot-attach NICs to preprovisioned VMs before reprovisioning - Additional Azure failure handling - Add NoCloud seed from vendordata - Ability to blacklist network interfaces based on driver - New IBM PowerVM specific RMC module - Allow OVS bridge as primary interface - add cli "--system" param to allow validating system user-data - Support configuring SSH host certificates - New integration testing framework == Changelog == - tox: avoid tox testenv subsvars for xenial support (#684) - Ensure proper root permissions in integration tests (#664) [James Falcon] - LXD VM support in integration tests (#678) [James Falcon] - Integration test for fallocate falling back to dd (#681) [James Falcon] - .travis.yml: correctly integration test the built .deb (#683) - Ability to hot-attach NICs to preprovisioned VMs before reprovisioning (#613) [aswinrajamannar] - Support configuring SSH host certificates. (#660) [Jonathan Lung] - add integration test for LP: #1900837 (#679) - cc_resizefs on FreeBSD: Fix _can_skip_ufs_resize (#655) [Mina Galić] (LP: #1901958, #1901958) - DataSourceAzure: push dmesg log to KVP (#670) [Anh Vo] - Make mount in place for tests work (#667) [James Falcon] - integration_tests: restore emission of settings to log (#657) - DataSourceAzure: update password for defuser if exists (#671) [Anh Vo] - tox.ini: only select "ci" marked tests for CI runs (#677) - Azure helper: Increase Azure Endpoint HTTP retries (#619) [Johnson Shi] - DataSourceAzure: send failure signal on Azure datasource failure (#594) [Johnson Shi] - test_persistence: simplify VersionIsPoppedFromState (#674) - only run a subset of integration tests in CI (#672) - cli: add --system param to allow validating system user-data on a machine (#575) - test_persistence: add VersionIsPoppedFromState test (#673) - introduce an upgrade framework and related testing (#659) - add --no-tty option to gpg (#669) [Till Riedel] (LP: #1813396) - Pin pycloudlib to a working commit (#666) [James Falcon] - DataSourceOpenNebula: exclude SRANDOM from context output (#665) - cloud_tests: add hirsute release definition (#662) - split integration and cloud_tests requirements (#652) - faq.rst: add warning to answer that suggests running `clean` (#661) - Fix stacktrace in DataSourceRbxCloud if no metadata disk is found (#632) [Scott Moser] - Make wakeonlan Network Config v2 setting actually work (#626) [dermotbradley] - HACKING.md: unify network-refactoring namespace (#658) [Mina Galić] - replace usage of dmidecode with kenv on FreeBSD (#621) [Mina Galić] - Prevent timeout on travis integration tests. (#651) [James Falcon] - azure: enable pushing the log to KVP from the last pushed byte (#614) [Moustafa Moustafa] - Fix launch_kwargs bug in integration tests (#654) [James Falcon] - split read_fs_info into linux & freebsd parts (#625) [Mina Galić] - PULL_REQUEST_TEMPLATE.md: expand commit message section (#642) - Make some language improvements in growpart documentation (#649) [Shane Frasier] - Revert ".travis.yml: use a known-working version of lxd (#643)" (#650) - Fix not sourcing default 50-cloud-init ENI file on Debian (#598) [WebSpider] - remove unnecessary reboot from gpart resize (#646) [Mina Galić] - cloudinit: move dmi functions out of util (#622) [Scott Moser] - integration_tests: various launch improvements (#638) - test_lp1886531: don't assume /etc/fstab exists (#639) - Remove Ubuntu restriction from PR template (#648) [James Falcon] - util: fix mounting of vfat on *BSD (#637) [Mina Galić] - conftest: improve docstring for disable_subp_usage (#644) - doc: add example query commands to debug Jinja templates (#645) - Correct documentation and testcase data for some user-data YAML (#618) [dermotbradley] - Hetzner: Fix instance_id / SMBIOS serial comparison (#640) [Markus Schade] - .travis.yml: use a known-working version of lxd (#643) - tools/build-on-freebsd: fix comment explaining purpose of the script (#635) [Mina Galić] - Hetzner: initialize instance_id from system-serial-number (#630) [Markus Schade] (LP: #1885527)
[Yahoo-eng-team] [Bug 1905440] [NEW] Release 20.4
Public bug reported: == Release Notes == Cloud-init release 20.4 is now available The 20.4 release: * spanned about 3 months * had 29 contributors from 31 domains * fixed 14 Launchpad issues Highlights: == Changelog == - tox: avoid tox testenv subsvars for xenial support (#684) - Ensure proper root permissions in integration tests (#664) [James Falcon] - LXD VM support in integration tests (#678) [James Falcon] - Integration test for fallocate falling back to dd (#681) [James Falcon] - .travis.yml: correctly integration test the built .deb (#683) - Ability to hot-attach NICs to preprovisioned VMs before reprovisioning (#613) [aswinrajamannar] - Support configuring SSH host certificates. (#660) [Jonathan Lung] - add integration test for LP: #1900837 (#679) - cc_resizefs on FreeBSD: Fix _can_skip_ufs_resize (#655) [Mina Galić] (LP: #1901958, #1901958) - DataSourceAzure: push dmesg log to KVP (#670) [Anh Vo] - Make mount in place for tests work (#667) [James Falcon] - integration_tests: restore emission of settings to log (#657) - DataSourceAzure: update password for defuser if exists (#671) [Anh Vo] - tox.ini: only select "ci" marked tests for CI runs (#677) - Azure helper: Increase Azure Endpoint HTTP retries (#619) [Johnson Shi] - DataSourceAzure: send failure signal on Azure datasource failure (#594) [Johnson Shi] - test_persistence: simplify VersionIsPoppedFromState (#674) - only run a subset of integration tests in CI (#672) - cli: add --system param to allow validating system user-data on a machine (#575) - test_persistence: add VersionIsPoppedFromState test (#673) - introduce an upgrade framework and related testing (#659) - add --no-tty option to gpg (#669) [Till Riedel] (LP: #1813396) - Pin pycloudlib to a working commit (#666) [James Falcon] - DataSourceOpenNebula: exclude SRANDOM from context output (#665) - cloud_tests: add hirsute release definition (#662) - split integration and cloud_tests requirements (#652) - faq.rst: add warning to answer that suggests running `clean` (#661) - Fix stacktrace in DataSourceRbxCloud if no metadata disk is found (#632) [Scott Moser] - Make wakeonlan Network Config v2 setting actually work (#626) [dermotbradley] - HACKING.md: unify network-refactoring namespace (#658) [Mina Galić] - replace usage of dmidecode with kenv on FreeBSD (#621) [Mina Galić] - Prevent timeout on travis integration tests. (#651) [James Falcon] - azure: enable pushing the log to KVP from the last pushed byte (#614) [Moustafa Moustafa] - Fix launch_kwargs bug in integration tests (#654) [James Falcon] - split read_fs_info into linux & freebsd parts (#625) [Mina Galić] - PULL_REQUEST_TEMPLATE.md: expand commit message section (#642) - Make some language improvements in growpart documentation (#649) [Shane Frasier] - Revert ".travis.yml: use a known-working version of lxd (#643)" (#650) - Fix not sourcing default 50-cloud-init ENI file on Debian (#598) [WebSpider] - remove unnecessary reboot from gpart resize (#646) [Mina Galić] - cloudinit: move dmi functions out of util (#622) [Scott Moser] - integration_tests: various launch improvements (#638) - test_lp1886531: don't assume /etc/fstab exists (#639) - Remove Ubuntu restriction from PR template (#648) [James Falcon] - util: fix mounting of vfat on *BSD (#637) [Mina Galić] - conftest: improve docstring for disable_subp_usage (#644) - doc: add example query commands to debug Jinja templates (#645) - Correct documentation and testcase data for some user-data YAML (#618) [dermotbradley] - Hetzner: Fix instance_id / SMBIOS serial comparison (#640) [Markus Schade] - .travis.yml: use a known-working version of lxd (#643) - tools/build-on-freebsd: fix comment explaining purpose of the script (#635) [Mina Galić] - Hetzner: initialize instance_id from system-serial-number (#630) [Markus Schade] (LP: #1885527) - Explicit set IPV6_AUTOCONF and IPV6_FORCE_ACCEPT_RA on static6 (#634) [Eduardo Otubo] - get_interfaces: don't exclude Open vSwitch bridge/bond members (#608) [Lukas Märdian] (LP: #1898997) - Add config modules for controlling IBM PowerVM RMC. (#584) [Aman306] (LP: #1895979) - Update network config docs to clarify MAC address quoting (#623) [dermotbradley] - gentoo: fix hostname rendering when value has a comment (#611) [Manuel Aguilera] - refactor integration testing infrastructure (#610) [James Falcon] - stages: don't reset permissions of cloud-init.log every boot (#624) (LP: #1900837) - docs: Add how to use cloud-localds to boot qemu (#617) [Joshua Powers] - Drop vestigial update_resolve_conf_file function (#620) [Scott Moser] - cc_mounts: correctly fallback to dd if fallocate fails (#585) (LP: #1897099) - .travis.yml: add integration-tests to Travis matrix (#600) - ssh_util: handle non-default AuthorizedKeysFile config (#586) [Eduardo Otubo] - Multiple file fix for AuthorizedKeysFile config (#60)
[Yahoo-eng-team] [Bug 1550919] Re: [Libvirt]Evacuate fail may cause disk image be deleted
** Changed in: nova/ussuri Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1550919 Title: [Libvirt]Evacuate fail may cause disk image be deleted Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Bug description: I checked latest source of nova on master branch, this problem is still exists. When we are doing evacuate, eventually _do_rebuild_instance will be called. As rebuild is not implemented in libvirt driver, in fact _rebuild_default_impl is called. try: with instance.mutated_migration_context(): self.driver.rebuild(**kwargs) except NotImplementedError: # NOTE(rpodolyaka): driver doesn't provide specialized version # of rebuild, fall back to the default implementation self._rebuild_default_impl(**kwargs) _rebuild_default_impl will call self.driver.spawn to boot up the instance, and spawn will in turn call _create_domain_and_network when VirtualInterfaceCreateException or Timeout happen, self.cleanup will be called. except exception.VirtualInterfaceCreateException: # Neutron reported failure and we didn't swallow it, so # bail here with excutils.save_and_reraise_exception(): if guest: guest.poweroff() self.cleanup(context, instance, network_info=network_info, block_device_info=block_device_info) except eventlet.timeout.Timeout: # We never heard from Neutron LOG.warn(_LW('Timeout waiting for vif plugging callback for ' 'instance %(uuid)s'), {'uuid': instance.uuid}, instance=instance) if CONF.vif_plugging_is_fatal: if guest: guest.poweroff() self.cleanup(context, instance, network_info=network_info, block_device_info=block_device_info) raise exception.VirtualInterfaceCreateException() Because default value for parameter destroy_disks is True def cleanup(self, context, instance, network_info, block_device_info=None, destroy_disks=True, migrate_data=None, destroy_vifs=True): So if error occur when doing evacuate during wait neutron's event, instance's disk file will be deleted unexpectedly To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1550919/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1868033] Re: Booting instance with pci_device fails during rocky->stein live upgrade
** Changed in: nova/ussuri Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1868033 Title: Booting instance with pci_device fails during rocky->stein live upgrade Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Bug description: Environment: Stein nova-conductor having set upgrade_levels to rocky Rocky nova-compute Boot an instance with a flavour that has a pci_device Error: Failed to publish message to topic 'nova': maximum recursion depth exceeded: RuntimeError: maximum recursion depth exceeded Tracked this down it it continually trying to backport the InstancePCIRequests: It gets as arguments: objinst={u'nova_object.version': u'1.1', u'nova_object.name': u'InstancePCIRequests', u'nova_object.data': {u'instance_uuid': u'08212b12-8fa8-42d9-8d3e-52ed60a64135', u'requests': [{u'nova_object.version': u'1.3', u'nova_object.name': u'InstancePCIRequest', u'nova_object.data': {u'count': 1, u'is_new': False, u'numa_policy': None, u'request_id': None, u'requester_id': None, u'alias_name': u'V100-32G', u'spec': [{u'vendor_id': u'10de', u'product_id': u'1db6'}]}, u'nova_object.namespace': u'nova'}]}, u'nova_object.namespace': u'nova'}, object_versions={u'InstancePCIRequests': '1.1', 'InstancePCIRequest': '1.2'} It fails because it doesn't backport the individual InstancePCIRequest inside the InstancePCIRequests object and so keeps trying. Error it shows is: IncompatibleObjectVersion: Version 1.3 of InstancePCIRequest is not supported, supported version is 1.2 I have fixed this in our setup by altering obj_make_compatible to downgrade the individual requests to version 1.2 which seems to work and all is good To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1868033/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1869050] Re: migration of anti-affinity server fails due to stale scheduler instance info
** Changed in: nova/ussuri Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1869050 Title: migration of anti-affinity server fails due to stale scheduler instance info Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) pike series: Invalid Status in OpenStack Compute (nova) queens series: Invalid Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Bug description: Description === Steps to reproduce == Have a deployment with 3 compute nodes * make sure that the deployment is configured with tracks_instance_changes=True (True is the default) * create and server group with anti-affinity policy * boot server1 into the group * boot server2 into the group * migrate server2 * confirm the migration * boot server3 Make sure that between the last two steps there was no periodic _sync_scheduler_instance_info running on the compute that was hosted server2 before the migration. This could done by doing the last too steps after each other without waiting too much as interval of that periodic (scheduler_instance_sync_interval) is defaulted to 120 sec. Expected result === server3 is booted on the host where server2 is moved away Actual result = server3 cannot be booted (NoValidHost) Triage == The confirm resize call on the source compute does not update the scheduler that the instance is removed from this host. This makes the scheduler instance info stale and causing the subsequent scheduling error. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1869050/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1878024] Re: disk usage of the nova image cache is not counted as used disk space
** Changed in: nova/ussuri Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1878024 Title: disk usage of the nova image cache is not counted as used disk space Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Bug description: Description === The nova-compute service keeps a local image cache for glance images used for nova servers to avoid multiple download of the same image from glance. The disk usage of such cache is not calculated as local disk usage in nova and not reported to placement as used DISK_GB. This leads to disk over-allocation. Also the size of that cache cannot be limited by nova configuration so the deployer cannot reserve disk space for that cache with reserved_host_disk_mb config. Steps to reproduce == * Set up a single node devstack * Create and upload an image with a not too small physical size. Like an image with 1G physical size. * Check the current disk usage of the Host OS and configure reserved_host_disk_mb in nova-cpu.conf accordingly. * Boot two servers from that image with a flavor, like d1 (disk=5G) * Nova will download the glance image once to the local cache which result in a 1GB disk usage * Nova will create two root file systems, one for each VM. Those disks initially has minimal physical disk size, but has 5G virtual size. * At this point Nova allocated 5G + 5G of DISK_GB in placement, but due to the image in the cache the total disk usage of the two VMs + cache can be 5G + 5G + 1G, if both VMs overwrite and fills the content of its own disk. Expected result === Option A) Nova maintains a DISK_GB allocation in placement for the images in its cache. This way the expected DISK_GB allocation in placement is 5G + 5G + 1G at the end Option B) Nova provides a config option to limit the maximum size of the image cache and therefore the deployer can include the maximum image cache size into the reserved_host_disk_mb during dimensioning of the disk space of the compute. Actual result = Only 5G + 5G was allocation from placement. So disk space is over-allocated by the image cache. Environment === Devstack from recent master stack@aio:/opt/stack/nova$ git log --oneline | head -n 1 4b62c90063 Merge "Remove stale nested backport from InstancePCIRequests" libvirt driver with file based image backend Logs & Configs == http://paste.openstack.org/show/793388/ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1878024/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1878583] Re: Unable to createImage/snapshot paused volume backed instances
** Changed in: nova/ussuri Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1878583 Title: Unable to createImage/snapshot paused volume backed instances Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Bug description: Description === Unable to createImage/snapshot paused volume backed instances. Steps to reproduce == - Pause a volume backed instance - Attempt to snapshot the instance using the createImage API Expected result === A snapshot image is successfully created as is the case for paused instances that are not volume backed. Actual result = n-api returns the following error: {'code': 409, 'message': "Cannot 'createImage' instance bc5a7ae4-fca9-4d83-b1b8-5534f51a9404 while it is in vm_state paused"} Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ master 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? N/A 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs == As above. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1878583/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1878979] Re: Quota code does not respect [api]/instance_list_per_project_cells
** Changed in: nova/ussuri Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1878979 Title: Quota code does not respect [api]/instance_list_per_project_cells Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Bug description: The function which counts resources using the legacy method involves getting a list of all cell mappings assigned to a specific project: https://github.com/openstack/nova/blob/575a91ff5be79ac35aef4b61d84c78c693693304/nova/quota.py#L1170-L1209 This code can be very heavy on a database which contains a lot of instances (but not a lot of mappings), potentially scanning millions of rows to gather 1-2 cell mappings. In a single cell environment, it is just extra CPU usage with exactly the same outcome. The [api]/instance_list_per_project_cells was introduced to workaround this: https://github.com/openstack/nova/blob/575a91ff5be79ac35aef4b61d84c78c693693304/nova/compute/instance_list.py#L146-L153 However, the quota code does not implement it which means quota count take a big toll on the database server. We should ideally mirror the same behaviour in the quota code. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1878979/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1879964] Re: Invalid value for 'hw:mem_page_size' raises confusing error
** Changed in: nova/ussuri Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1879964 Title: Invalid value for 'hw:mem_page_size' raises confusing error Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Bug description: Configure a flavor like so: openstack flavor create hugepage --ram 1024 --disk 10 --vcpus 1 test openstack flavor set hugepage --property hw:mem_page_size=2M test Attempt to boot an instance. It will fail with the following error message: Invalid memory page size '0' (HTTP 400) (Request-ID: req- 338bf619-3a54-45c5-9c59-ad8c1d425e91) You wouldn't know from reading it, but this is because the property should read 'hw:mem_page_size=2MB' (note the extra 'B'). To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1879964/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1882233] Re: Libvirt driver always reports 'memory_mb_used' of 0
** Changed in: nova/ussuri Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1882233 Title: Libvirt driver always reports 'memory_mb_used' of 0 Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Bug description: The nova-compute service periodically logs a summary of the free RAM, disk and vCPUs as reported by the hypervisor. For example: Hypervisor/Node resource view: name=vtpm-f31.novalocal free_ram=7960MB free_disk=11.379043579101562GB free_vcpus=7 pci_devices=[{...}] On a recent deployment using the libvirt driver, it's observed that the 'free_ram' value never changes despite instances being created and destroyed. This is because the 'get_memory_mb_used' function in 'nova.virt.libvirt.host' always returns 0 unless the host platform - reported by 'sys.platform' is either 'linux2' or 'linux3'. Since Python 3.3, the major version is not included in this return value since it was misleading. This is low priority because the value only appears to be used for logging purposes and the values stored in e.g. the 'ComputeNode' object and reported to placement are calculated based on config options and number of instances on the node. We may wish to stop reporting this information instead. [1] https://stackoverflow.com/a/10429736/613428 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1882233/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1882821] Re: '[libvirt] file_backed_memory' and '[DEFAULT] reserved_host_memory_mb' are incompatible
** Also affects: nova/ussuri Importance: Undecided Status: New ** Changed in: nova/ussuri Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1882821 Title: '[libvirt] file_backed_memory' and '[DEFAULT] reserved_host_memory_mb' are incompatible Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Bug description: Per title, the '[libvirt] file_backed_memory' and '[DEFAULT] reserved_host_memory_mb' config options are incompatible. Not only does '[DEFAULT] reserved_host_memory_mb' not really make sense for file backed memory (if you want to reserve "memory", configure a lower '[libvirt] file_backed_memory' value) but configuring a value for '[libvirt] file_backed_memory' that is lower than the value for '[DEFAULT] reserved_host_memory_mb', which currently defaults to 512MB, will break nova's resource reporting to placement: nova.exception.ResourceProviderUpdateFailed: Failed to update resource provider via URL /resource_providers/f39bde61-6f73-4ccb-9488-6efb9689730f/inventories: {"errors": [{"status": 400, "title": "Bad Request", "detail": "The server could not comply with the request since it is either malformed or otherwise incorrect.\n\n Unable to update inventory for resource provider f39bde61-6f73-4ccb-9488-6efb9689730f: Invalid inventory for 'MEMORY_MB' on resource provider 'f39bde61-6f73-4ccb-9488-6efb9689730f'. The reserved value is greater than total. ", "code": "placement.undefined_code", "request_id": "req-977e43e7-1a7c-4309-96ec-49a75bdea58a"}]} Ideally we should error out if both values are configured, however, doing so would be a breaking change. Instead, we can warn if these are incompatible and then error our in a future release. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1882821/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1882919] Re: e1000e interface reported as unsupported
** Changed in: nova/ussuri Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1882919 Title: e1000e interface reported as unsupported Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Bug description: Per this downstream bug [1], attempting to boot a Windows Server 2012 or 2016 image will fail because libosinfo is attempting to configure an e1000e VIF which nova does not explicitly support. There doesn't appear to be any reason not to support this, since libvirt, and specifically QEMU/KVM, support it. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1839808 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1882919/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1884214] Re: reserve disk usage for image cache fails on a fresh hypervisor
** Changed in: nova/ussuri Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1884214 Title: reserve disk usage for image cache fails on a fresh hypervisor Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Bug description: If the image cache _base directory does not exists on the hypervisor yet and [workarounds]/reserve_disk_resource_for_image_cache = True is set in the nova compute config then nova-compute logs a stack trace [1] and resource state is not update in placement. [1] http://paste.openstack.org/show/794993/ This issue was reported originally in https://bugs.launchpad.net/nova/+bug/1878024 by MarkMielke (mark- mielke). To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1884214/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1753676] Re: Live migration not working as Expected when Restarting nova-compute service while migration from source node
** Also affects: nova/train Importance: Undecided Status: New ** Changed in: nova/train Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1753676 Title: Live migration not working as Expected when Restarting nova-compute service while migration from source node Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: Fix Released Bug description: Description === Environment: Ubuntu 16.04 Openstack Version: Pike I am trying to migrate VM ( live migration ( block migration ) ) form one compute node to another compute node...Everything looks good unless I restart nova-compute service, live migration still running underneath with help of libvirt, once the vm reaches destination, database is not updated properly. Steps to reproduce: === nova.conf ( libvirt setting on both compute nodes ) [libvirt] live_migration_bandwidth=1200 live_migration_downtime=100 live_migration_downtime_steps =3 live_migration_downtime_delay=10 live_migration_flag = VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE virt_type = kvm inject_password = False disk_cachemodes = network=writeback live_migration_uri = "qemu+tcp://nova@%s/system" live_migration_tunnelled = False block_migration_flag = VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_NON_SHARED_INC ( default openstack live migration configuration ( pre-copy with no tunneling ) Source vm root disk ( boot from volume with one ephemernal disk (160GB) ) Trying to migrate vm from compute1 to compute2, below is my source vm. | OS-EXT-SRV-ATTR:host | compute1 | | OS-EXT-SRV-ATTR:hostname | testcase1-all-ephemernal-boot-from-vol | | OS-EXT-SRV-ATTR:hypervisor_hostname | compute1 | | OS-EXT-SRV-ATTR:instance_name| instance-0153 1) nova live-migration --block-migrate compute2 [req-48a3df61-3974-46ac-8019-c4c4a0f8a8c8 4a8150eb246a4450829331e993f8c3fd f11a5d3631f14c4f879a2e7dddb96c06 - default default] pre_live_migration data is LibvirtLiveMigrateData(bdms=,block_migration=True,disk_available_mb=6900736,disk_over_commit=,filename='tmpW5ApOS',graphics_listen_addr_spice=x.x.x.x,graphics_listen_addr_vnc=127.0.0.1,image_type='default',instance_relative_path='504028fc-1381 -42ca-ad7c- def7f749a722',is_shared_block_storage=False,is_shared_instance_path=False,is_volume_backed=True,migration=,serial_listen_addr=None,serial_listen_ports=,supported_perf_events=,target_connect_addr=) pre_live_migration /openstack/venvs/nova-16.0.6/lib/python2.7/site- packages/nova/compute/manager.py:5437 Migration started, able to see the data and memory transfer ( using iftop ) Data transfer between compute nodes using iftop <= 4.94Gb 4.99Gb 5.01Gb Restarted Nova-compute service on source compute node ( where the vm is migrating) Live migration still it is going, once migration completes, below is my total data transfer ( using iftop ) TX: cum: 17.3MB peak: 2.50Mb rates: 11.1Kb 7.11Kb 463Kb RX:97.7GB 4.97Gb 3.82Kb 1.93Kb 1.87Gb TOTAL: 97.7GB 4.97Gb Once migration completes, from the destination compute node ( we can able to see the virsh domain running) root@compute2:~# virsh list --all IdName State 3 instance-0153 running From the nova-compute.log Instance has been moved to another host compute1(compute1). There are allocations remaining against the source host that might need to be removed: {u'resources': {u'VCPU': 8, u'MEMORY_MB': 23808, u'DISK_GB': 180}}. _remove_deleted_instances_allocations /openstack/venvs/nova-16.0.6/lib/python2.7/site- packages/nova/compute/resource_tracker.py:123 Nova compute still showing 0 vcpus ( but 8 core vm was there ) Total usable vcpus: 56, total allocated vcpus: 0 _report_final_resource_view /openstack/venvs/nova-16.0.6/lib/python2.7 /site-packages/nova/compute/resource_tracker.py:792 nova show ( still nova db shows src hostname, db is
[Yahoo-eng-team] [Bug 1805767] Re: The new numa topology in the new flavor extra specs weren't parsed when resize
** Also affects: nova/train Importance: Undecided Status: New ** Changed in: nova/train Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1805767 Title: The new numa topology in the new flavor extra specs weren't parsed when resize Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: Fix Released Bug description: Env: host with 2 numa nodes. flavor n2 request two instance numa nodes flavor n3 request three instance numa nodes Reproduce: Boot an instance with flavor n2, which scheduled to the host. Resize the instance with n3. The scheduler logs: Nov 28 18:27:16 jfz1r03h15 nova-scheduler[47260]: DEBUG nova.virt.hardware [None req-953d07bf-8ead-4f21-bd64-1ab12244eec1 admin admin] Attempting to fit instance cell InstanceNUMACell(cpu_pinning_raw=None,cpu_policy=None,cpu_thread_policy=None,cpu_topology=,cpuset=set([0]),cpuset_reserved=None,id=0,memory=256,pagesize=None) on host_cell NUMACell(cpu_usage=1,cpuset=set([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53]),id=0,memory=128835,memory_usage=256,mempages=[NUMAPagesTopology,NUMAPagesTopology,NUMAPagesTopology],network_metadata=NetworkMetadata,pinned_cpus=set([]),siblings=[set([43,7]),set([16,52]),set([2,38]),set([8,44]),set([50,14]),set([0,36]),set([51,15]),set([1,37]),set([10,46]),set([11,47]),set([42,6]),set([41,5]),set([9,45]),set([3,39]),set([48,12]),set([49,13]),set([17,53]),set([40,4])]) {{(pid=48606) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1019}} Nov 28 18:27:16 jfz1r03h15 nova-scheduler[47260]: DEBUG nova.virt.hardware [None req-953d07bf-8ead-4f21-bd64-1ab12244eec1 admin admin] Attempting to fit instance cell InstanceNUMACell(cpu_pinning_raw=None,cpu_policy=None,cpu_thread_policy=None,cpu_topology=,cpuset=set([1]),cpuset_reserved=None,id=1,memory=256,pagesize=None) on host_cell NUMACell(cpu_usage=1,cpuset=set([18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71]),id=1,memory=129009,memory_usage=256,mempages=[NUMAPagesTopology,NUMAPagesTopology,NUMAPagesTopology],network_metadata=NetworkMetadata,pinned_cpus=set([]),siblings=[set([59,23]),set([65,29]),set([18,54]),set([34,70]),set([24,60]),set([33,69]),set([58,22]),set([67,31]),set([66,30]),set([26,62]),set([35,71]),set([57,21]),set([25,61]),set([19,55]),set([64,28]),set([32,68]),set([27,63]),set([56,20])]) {{(pid=48606) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1019}} As above, the scheduler only see two instance numa nodes. It means the new flavor extra specs weren't parsed. The nova-compute log: Nov 28 18:27:27 jfz1r03h15 nova-scheduler[47260]: DEBUG oslo_service.periodic_task [None req-7aff4535-fe99-48b4-bab9-d206d35412ff None None] Running periodic task SchedulerManager._run_periodic_tasks {{(pid=48606) run_periodic_tasks /usr/local/lib/python2.7/dist-packages/oslo_service/periodic_task.py:219}} Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/exception_wrapper.py", line 79, in wrapped Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR oslo_messaging.rpc.server function_name, call_dict, binary, tb) Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR oslo_messaging.rpc.server self.force_reraise() Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/exception_wrapper.py", line 69, in wrapped Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw) Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 187, in decorated_function Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR oslo_messaging.rpc.server "Error: %s", e, instance=instance) Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ Nov 28 18:27:18 jfz1r03h15 nova-compute[52929]: ERROR
[Yahoo-eng-team] [Bug 1843708] Re: Key-pair is not updated during the rebuild
** Changed in: nova/ussuri Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1843708 Title: Key-pair is not updated during the rebuild Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Bug description: Description === When we want to rebuild an instance and change the keypair we can specified it with : openstack --os-compute-api-version 2.54 server rebuild --image "Debian 10" --key-name key1 instance1 This comes from this implementation : https://review.opendev.org/#/c/379128/ https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/rebuild-keypair-reset.html But when rebuilding the instance, Cloud-Init will set the key in authorized_keys from http://169.254.169.254/openstack/latest/meta_data.json And this meta_data.json uses the keys from instance_extra tables But the keypair will be updated in the 'instances' table but not in the 'instance_extra' table. So the keypair is not updated inside the VM May be this is the function for saving the keypair, but the save() do nothing : https://opendev.org/openstack/nova/src/branch/master/nova/objects/instance.py#L714 Steps to reproduce == - Deploy a DevStack - Boot an instance with keypair key1 - Rebuild it with key2 - A nova show will show the key_name key2, keypairs object in table instance_extra is not updated and you cannot connect with key2 to the instance Expected result === Connecte to the Vm with the new keypair added during the rebuild call Actual result = The keypair added during the rebuild call is not set in the VM Environment === I tested it on a Devstack from master and we have the behaviour. NOVA : commit 5fa49cd0b8b6015aa61b4312b2ce1ae780c42c64 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1843708/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1847054] Re: kolla-ansible CI: nova-compute-ironic reports errors in the ironic scenario
** Also affects: nova/train Importance: Undecided Status: New ** Changed in: nova/train Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1847054 Title: kolla-ansible CI: nova-compute-ironic reports errors in the ironic scenario Status in kolla-ansible: Invalid Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: Fix Released Bug description: /var/log/kolla/nova/nova-compute-ironic.log 2019-10-07 07:32:21.268 6 ERROR nova.compute.manager [req-9de2cbda-9d8f-4a0f-a5be-de74d26077a2 - - - - -] No compute node record for host primary-ironic: nova.exception_Remote.ComputeHostNotFound_Remote: Compute host primary-ironic could not be found. 2019-10-07 07:33:22.454 6 ERROR nova.compute.manager [req-9de2cbda-9d8f-4a0f-a5be-de74d26077a2 - - - - -] No compute node record for host primary-ironic: nova.exception_Remote.ComputeHostNotFound_Remote: Compute host primary-ironic could not be found. 2019-10-07 07:34:22.416 6 ERROR nova.compute.manager [req-9de2cbda-9d8f-4a0f-a5be-de74d26077a2 - - - - -] No compute node record for host primary-ironic: nova.exception_Remote.ComputeHostNotFound_Remote: Compute host primary-ironic could not be found. 2019-10-07 07:35:22.422 6 ERROR nova.compute.manager [req-9de2cbda-9d8f-4a0f-a5be-de74d26077a2 - - - - -] No compute node record for host primary-ironic: nova.exception_Remote.ComputeHostNotFound_Remote: Compute host primary-ironic could not be found. 2019-10-07 07:36:24.422 6 ERROR nova.compute.manager [req-9de2cbda-9d8f-4a0f-a5be-de74d26077a2 - - - - -] No compute node record for host primary-ironic: nova.exception_Remote.ComputeHostNotFound_Remote: Compute host primary-ironic could not be found. 2019-10-07 07:37:26.423 6 ERROR nova.compute.manager [req-9de2cbda-9d8f-4a0f-a5be-de74d26077a2 - - - - -] No compute node record for host primary-ironic: nova.exception_Remote.ComputeHostNotFound_Remote: Compute host primary-ironic could not be found. 2019-10-07 07:38:27.419 6 ERROR nova.compute.manager [req-9de2cbda-9d8f-4a0f-a5be-de74d26077a2 - - - - -] No compute node record for host primary-ironic: nova.exception_Remote.ComputeHostNotFound_Remote: Compute host primary-ironic could not be found. 2019-10-07 07:39:29.430 6 ERROR nova.compute.manager [req-9de2cbda-9d8f-4a0f-a5be-de74d26077a2 - - - - -] No compute node record for host primary-ironic: nova.exception_Remote.ComputeHostNotFound_Remote: Compute host primary-ironic could not be found. 2019-10-07 07:40:30.420 6 ERROR nova.compute.manager [req-9de2cbda-9d8f-4a0f-a5be-de74d26077a2 - - - - -] No compute node record for host primary-ironic: nova.exception_Remote.ComputeHostNotFound_Remote: Compute host primary-ironic could not be found. 2019-10-07 07:41:32.420 6 ERROR nova.compute.manager [req-9de2cbda-9d8f-4a0f-a5be-de74d26077a2 - - - - -] No compute node record for host primary-ironic: nova.exception_Remote.ComputeHostNotFound_Remote: Compute host primary-ironic could not be found. To manage notifications about this bug go to: https://bugs.launchpad.net/kolla-ansible/+bug/1847054/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1848308] Re: Impossible to set instance CPU policy to 'shared' through flavor image property
** Also affects: nova/train Importance: Undecided Status: New ** Changed in: nova/train Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1848308 Title: Impossible to set instance CPU policy to 'shared' through flavor image property Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: Fix Released Bug description: According to the content defined in doc https://docs.openstack.org/nova/latest/admin/cpu-topologies.html #customizing-instance-cpu-pinning-policies, the instance CPU allocation policy could be explicitly set through instance flavor property 'hw:cpu_policy', or its corresponding image property 'hw_cpu_policy'. One general rule to solve the conflict between these two properties, if I understand it correctly, is if 'hw:cpu_policy' is not given with any policy, then it should take the policy from 'hw_cpu_policy'. But currently, if 'hw_cpu_policy' is set as 'shared' and no value set in 'hw:cpu_policy' the logic in 'hardware.get_cpu_policy_constraint' tells that there is no CPU policy (throug return a value of None). In this case, it should return the "shared" policy. This issue is first reported in https://review.opendev.org/#/c/688603/ but without tracking with a bug ID. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1848308/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1879500] Re: Unable to rescue using volume snapshot based images
** Changed in: nova/ussuri Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1879500 Title: Unable to rescue using volume snapshot based images Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Bug description: Description === While working on bug #1876330 it was observed that attempts to rescue an instance with a volume snapshot image were permitted but would ultimately fail to boot the instance with file based imagebackends *or* fail out right with the rbd imagebackend. This is due to these images being metadata containers and containing no image data, thus resulting in Nova attempting to rescue with zero length images. Steps to reproduce == * Launch a volume backed instance * Snapshot the instance using the imageCreate API. * Attempt to rescue the image using the created image. Expected result === The request is rejected as there is no support for rescuing using a volume snapshot based image. Actual result = The request is acceptable and either fails to boot the instance or fails earlier due to the zero length image. Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ master 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? Libvirt + KVM 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs == See bug #1876330. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1879500/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1891547] Re: AttributeError: module 'libvirt' has no attribute 'VIR_ERR_DEVICE_MISSING'
** Changed in: nova/ussuri Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1891547 Title: AttributeError: module 'libvirt' has no attribute 'VIR_ERR_DEVICE_MISSING' Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: New Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Bug description: Description === I7eb86edc130d186a66c04b229d46347ec5c0b625 introduced support for the libvirt hot unplug error code VIR_ERR_DEVICE_MISSING that was itself introduced in libvirt v4.1.0. The change did not however cover versions < v4.1.0 such as v4.0.0 installed in our bionic based CI test envs, causing attribute errors when we attempt to reference. Steps to reproduce == * Attempt to detach a busy or missing device from an instance with libvirt < v4.1.0 installed. Expected result === The correct error codes are referenced and checked to confirm what happened. Actual result = AttributeError: module 'libvirt' has no attribute 'VIR_ERR_DEVICE_MISSING' as the error code is not available prior to libvirt v4.1.0. Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ master 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? libvirt + * 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs == https://zuul.opendev.org/t/openstack/build/2d57acc8c90741e6ba5a6795195e3ffd/log/controller/logs/screen-n-cpu.txt?severity=4 Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 nova-compute[32162]: ERROR oslo_messaging.rpc.server Traceback (most recent call last): Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 nova-compute[32162]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/dist-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 nova-compute[32162]: ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 nova-compute[32162]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/dist-packages/oslo_messaging/rpc/dispatcher.py", line 273, in dispatch Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 nova-compute[32162]: ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 nova-compute[32162]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/dist-packages/oslo_messaging/rpc/dispatcher.py", line 193, in _do_dispatch Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 nova-compute[32162]: ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 nova-compute[32162]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/exception_wrapper.py", line 78, in wrapped Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 nova-compute[32162]: ERROR oslo_messaging.rpc.server function_name, call_dict, binary) Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 nova-compute[32162]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 nova-compute[32162]: ERROR oslo_messaging.rpc.server self.force_reraise() Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 nova-compute[32162]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 nova-compute[32162]: ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 nova-compute[32162]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise Aug 13 13:50:50.038271 ubuntu-bionic-inap-mtl01-0019255828 nova-compute[32162]: ERROR oslo_messaging.rpc.server raise value Aug 13 13:50:50.038271
[Yahoo-eng-team] [Bug 1870357] Re: raw disk usage is not correctly reported during resource update
** Also affects: nova/train Importance: Undecided Status: New ** Changed in: nova/train Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1870357 Title: raw disk usage is not correctly reported during resource update Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: Fix Released Bug description: Description === available_disk_least r(free disk for a new instance) seems not be calculated correctly when instance is in raw (images_type=raw) and preallocate_space option is not set. This may lead placement/scheduler to take wrong decision regarding space availabilty on hosts. when total amount of over_committed_disk_size is evaluated on a host, it seems that in raw disk type case it is always set to 0. Steps to reproduce === on a master devstack: . devstack/openrc admin admin $openstack hypervisor show alex-devstack | grep available_least | disk_available_least | 381 # create an instance with 80GB of disk with qcow2 by default: $openstack server create --flavor m1.large --image cirros-0.4.0-x86_64-disk --nic net-id=private alex # few seconds later we can see disk available is minus by 80, all is fine. $ openstack hypervisor show alex-devstack | grep available_least | disk_available_least | 301 # delete instance $ openstack server delete xxx # Now set images_type = raw in [libvirt] section in /etc/nova/nova-cpu.conf $ grep images_type /etc/nova/nova-cpu.conf images_type = raw # restart compute $ sudo service devstack@n-cpu restart # respawn the same instance, it will create a instance with raw disk now $openstack server create --flavor m1.large --image cirros-0.4.0-x86_64-disk --nic net-id=private alex # few seconds later we can see disk available is minus by only by 3GB which is not correct: openstack hypervisor show alex-devstack | grep available_least | disk_available_least | 378 # only allocated size use is decreased: $ ls -lhs /opt/stack/data/nova/instances/31e46f53-6223-40c3-ad84-0f19d10b52be/disk 2.6G -rw-r--r-- 1 libvirt-qemu kvm 80G Apr 1 10:00 /opt/stack/data/nova/instances/31e46f53-6223-40c3-ad84-0f19d10b52be/disk Expected result === calculation of over_committed_disk_size must be done for raw disk (at least on not preallocated one) Actual result = over_committed_disk_size is set to 0 in all cases for raw disk. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1870357/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1887946] Re: Unable to detach volume from instance when previously removed from the inactive config
** Changed in: nova/ussuri Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1887946 Title: Unable to detach volume from instance when previously removed from the inactive config Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: New Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Bug description: Description === $subject, can often be encountered when previous attempts to detach a volume have failed due to the device still being used within the guestOS. This initial attempt will remove the device from the inactive config but fail to remove it from the active config. Any subsequent attempt will then fail as the initial call continues to attempt to remove the device from both the inactive and live configs. Prior to libvirt v4.1.0 this raised either a VIR_ERR_INVALID_ARG or VIR_ERR_OPERATION_FAILED error code from libvirt that n-cpu would handle, retrying the detach against the live config. Since libvirt v4.1.0 however this now raises a VIR_ERR_DEVICE_MISSING error code. This is not handled by Nova resulting in no attempt being made to detach the device from the live config. Steps to reproduce == # Start with a volume attached as vdb (ignore the source ;)) $ sudo virsh domblklist 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8 Target Source vda /opt/stack/data/nova/instances/4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8/disk vdb iqn.2010-10.org.openstack:volume-37cc97fa-9776-4b31-8f3f-cb1f18ff1db6/0 # Detach from the inactive config $ sudo virsh detach-disk --config 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8 vdb Disk detached successfully # Confirm the device is still listed on the live config $ sudo virsh domblklist 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8 Target Source vda /opt/stack/data/nova/instances/4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8/disk vdb iqn.2010-10.org.openstack:volume-37cc97fa-9776-4b31-8f3f-cb1f18ff1db6/0 # and removed from the persistent config $ sudo virsh domblklist --inactive 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8 Target Source vda /opt/stack/data/nova/instances/4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8/disk # Attempt to detach the volume $ openstack server remove volume 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8 test Expected result === The initial attempt to detach the device fails as the device isn't present in the inactive config but we continue to ensure the device is removed from the live config. Actual result = n-cpu doesn't handle the initial failure as the raised libvirt error code isn't recongnised. Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ b7161fe9b92f0045e97c300a80e58d32b6f49be1 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? libvirt + KVM 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs == $ openstack server remove volume 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8 test ; journalctl -u devstack@n-cpu -f [..] Jul 16 17:26:53 localhost.localdomain nova-compute[190210]: DEBUG oslo_concurrency.lockutils [None req-16d62ef9-d492-4012-bb6d-37e5611ede50 admin admin] Lock "4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8" released by "nova.compute.manager.ComputeManager.detach_volume..do_detach_volume" :: held 0.141s {{(pid=190210) inner /usr/local/lib/python3.7/site-packages/oslo_concurrency/lockutils.py:371}} Jul 16 17:26:53 localhost.localdomain nova-compute[190210]: ERROR oslo_messaging.rpc.server [None req-16d62ef9-d492-4012-bb6d-37e5611ede50 admin admin] Exception during message handling: libvirt.libvirtError: device not found: no target device vdb Jul 16 17:26:53 localhost.localdomain nova-compute[190210]: ERROR oslo_messaging.rpc.server Traceback (most recent call last): Jul 16 17:26:53 localhost.localdomain nova-compute[190210]: ERROR oslo_messaging.rpc.server File
[Yahoo-eng-team] [Bug 1889108] Re: failures during driver.pre_live_migration remove source attachments during rollback
** Changed in: nova/ussuri Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1889108 Title: failures during driver.pre_live_migration remove source attachments during rollback Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Bug description: Description === $subject, the initial rollback and removal of any destination volume attachments is then repeated for the source volume attachments, leaving the volumes connected on the host but listed as `available` in cinder. Steps to reproduce == Cause a failure during the call to driver.pre_live_migration with volumes attached. Expected result === Any volume attachments for the destination host are deleted during the rollback. Actual result = Both sets of volumes attachments for the destination *and* the source are removed. Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ eeeb964a5f65e6ac31dfb34b1256aaf95db5ba3a 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? libvirt + KVM 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs == When live-migration fails with attached volume changed to active and still in nova https://bugzilla.redhat.com/show_bug.cgi?id=1860914 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1889108/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1889257] Re: Live migration of realtime instances is broken
** Changed in: nova/ussuri Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1889257 Title: Live migration of realtime instances is broken Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Bug description: Attempting to live migrate an instance with realtime enabled fails on master (commit d4c857dfcb1). This appears to be a bug with the live migration of pinned instances feature introduced in Train. # Steps to reproduce Create a server using realtime attributes and then attempt to live migrate it. For example: $ openstack flavor create --ram 1024 --disk 0 --vcpu 4 \ --property 'hw:cpu_policy=dedicated' \ --property 'hw:cpu_realtime=yes' \ --property 'hw:cpu_realtime_mask=^0-1' \ realtime $ openstack server create --os-compute-api-version=2.latest \ --flavor realtime --image cirros-0.5.1-x86_64-disk --nic none \ --boot-from-volume 1 --wait \ test.realtime $ openstack server migrate --live-migration test.realtime # Expected result Instance should be live migrated. # Actual result The live migration never happens. Looking at the logs we see the following error: Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/eventlet/hubs/hub.py", line 461, in fire_timers timer() File "/usr/local/lib/python3.6/dist-packages/eventlet/hubs/timer.py", line 59, in __call__ cb(*args, **kw) File "/usr/local/lib/python3.6/dist-packages/eventlet/event.py", line 175, in _do_send waiter.switch(result) File "/usr/local/lib/python3.6/dist-packages/eventlet/greenthread.py", line 221, in main result = function(*args, **kwargs) File "/opt/stack/nova/nova/utils.py", line 670, in context_wrapper return func(*args, **kwargs) File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 8966, in _live_migration_operation # is still ongoing, or failed File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise raise value File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 8959, in _live_migration_operation # 2. src==running, dst==paused File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 658, in migrate destination, params=params, flags=flags) File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 190, in doit result = proxy_call(self._autowrap, f, *args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 148, in proxy_call rv = execute(f, *args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 129, in execute six.reraise(c, e, tb) File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise raise value File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 83, in tworker rv = meth(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/libvirt.py", line 1745, in migrateToURI3 if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self) libvirt.libvirtError: vcpussched attributes 'vcpus' must not overlap Looking further, we see there are issues with the XML we are generating for the destination. Compare what we have on the source before updating the XML for the destination: DEBUG nova.virt.libvirt.migration [-] _update_numa_xml input xml= ... 4096 {{(pid=12600) _update_numa_xml /opt/stack/nova/nova/virt/libvirt/migration.py:97} To what we have after the update: DEBUG nova.virt.libvirt.migration [-] _update_numa_xml output xml= ... 4096 ... {{(pid=12600) _update_numa_xml /opt/stack/nova/nova/virt/libvirt/migration.py:131}} The issue is the 'vcpusched' elements. We're assuming there are only one of these elements when updating the XML for the destination [1]. Have to figure out why there are multiple elements and how best to handle this (likely by deleting and recreating everything). I suspect the reason we didn't spot this is because libvirt is rewriting the XML on us. This is what nova is providing libvirt upon boot: DEBUG
[Yahoo-eng-team] [Bug 1890428] Re: format_message() is specifica novaException is not should raise in generic exeptions
** Changed in: nova/ussuri Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1890428 Title: format_message() is specifica novaException is not should raise in generic exeptions Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Bug description: In [1] we used format_message() to print the exception info, but the format_message() was specific for nova exception, we dont should do for that, just need to print exec is enough. [1]https://review.opendev.org/#/c/631244/69/nova/compute/manager.py@2599 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1890428/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1866373] Re: URLS in os-keypairs 'links' body are incorrect
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1866373 Title: URLS in os-keypairs 'links' body are incorrect Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Bug description: Similar to https://bugs.launchpad.net/nova/+bug/1864428, the URLs in the 'links' element of the response are incorrect. They read '/keypairs', not '/os-keypairs'. From the current api-ref (2020-03-06): { "keypairs": [ { "keypair": { "fingerprint": "7e:eb:ab:24:ba:d1:e1:88:ae:9a:fb:66:53:df:d3:bd", "name": "keypair-5d935425-31d5-48a7-a0f1-e76e9813f2c3", "type": "ssh", "public_key": "ssh-rsa B3NzaC1yc2EDAQABAAABAQCkF3MX59OrlBs3dH5CU7lNmvpbrgZxSpyGjlnE8Flkirnc/Up22lpjznoxqeoTAwTW034k7Dz6aYIrZGmQwe2TkE084yqvlj45Dkyoj95fW/sZacm0cZNuL69EObEGHdprfGJQajrpz22NQoCD8TFB8Wv+8om9NH9Le6s+WPe98WC77KLw8qgfQsbIey+JawPWl4O67ZdL5xrypuRjfIPWjgy/VH85IXg/Z/GONZ2nxHgSShMkwqSFECAC5L3PHB+0+/12M/iikdatFSVGjpuHvkLOs3oe7m6HlOfluSJ85BzLWBbvva93qkGmLg4ZAc8rPh2O+YIsBUHNLLMM/oQp Generated-by-Nova\n" } } ], "keypairs_links": [ { "href": "http://openstack.example.com/v2.1/6f70656e737461636b20342065766572/keypairs?limit=1=keypair-5d935425-31d5-48a7-a0f1-e76e9813f2c3;, "rel": "next" } ] } To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1866373/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1866380] Re: Ironic driver hash ring treats hostnames differing only by case as different hostnames
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1866380 Title: Ironic driver hash ring treats hostnames differing only by case as different hostnames Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) pike series: In Progress Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Bug description: Recently we had a customer case where attempts to add new ironic nodes to an existing undercloud resulted in half of the nodes failing to be detected and added to nova. Ironic API returned all of the newly added nodes when called by the driver, but half of the nodes were not returned to the compute manager by the driver. There was only one nova-compute service managing all of the ironic nodes of the all-in-one typical undercloud deployment. After days of investigation and examination of a database dump from the customer, we noticed that at some point the customer had changed the hostname of the machine from something containing uppercase letters to the same name but all lowercase. The nova-compute service record had the mixed case name and the CONF.host (socket.gethostname()) had the lowercase name. The hash ring logic adds all of the nova-compute service hostnames plus CONF.host to hash ring, then the ironic driver reports only the nodes it owns by retrieving a service hostname from the ring based on a hash of each ironic node UUID. Because of the machine hostname change, the hash ring contained, for example: {'MachineHostName', 'machinehostname'} when it should have contained only one hostname. And because the hash ring contained two hostnames, the driver was able to retrieve only half of the nodes as nodes that it owned. So half of the new nodes were excluded and not added as new compute nodes. I propose adding some logging to the driver related to the hash ring to help with debugging in the future. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1866380/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1866937] Re: Requests to neutron API do not use retries
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1866937 Title: Requests to neutron API do not use retries Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Bug description: We have a customer bug report downstream [1] where nova occasionally fails to carry out server actions requiring calls to neutron API if haproxy happens to close a connection after idle time of 10 seconds at nearly the same time as an incoming request that attempts to re-use the connection while it is being torn down. Here is an excerpt from [1]: The result of our investigation, the cause is follows. 1. neutron-client in nova uses connection pool ( urllib3/requests ) for http. 2. Sometimes, http connection is reused for different requests. 3. Connection between neutron-client and haproxy is closed from haproxy when it is in idle for 10 seconds. 4. If reusing connection from client side and closing connection from haproxy side are happend almost same time, client gets RST and end with "bad status line". To address this problem, we can add a new config option for neutron client (similar to the existing config options we have for cinder client and glance client retries) to be more resilient during such scenarios. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1788853 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1866937/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1867380] Re: nova-live-migration and nova-grenade-multinode fail due to n-cpu restarting slowly after being reconfigured for ceph
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1867380 Title: nova-live-migration and nova-grenade-multinode fail due to n-cpu restarting slowly after being reconfigured for ceph Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) pike series: Fix Committed Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Bug description: Description === $subject, it appears the current check of using grep to find active n-cpu processes isn't enough and we actually need to wait for the services to report as UP before starting to run Tempest. In the following we can see Tempest starting at 2020-03-13 13:01:19.528 while n-cpu within the instance isn't marked as UP for another ~20 seconds: https://zuul.opendev.org/t/openstack/build/5c213f869f324b69a423a983034d4539/log /job-output.txt#6305 https://zuul.opendev.org/t/openstack/build/5c213f869f324b69a423a983034d4539/log/logs/screen-n-cpu.txt#3825 https://zuul.opendev.org/t/openstack/build/5c213f869f324b69a423a983034d4539/log/logs/subnode-2/screen-n-cpu.txt#3534 I've only seen this on stable/pike at present but it could potentially hit all branches with slow enough CI nodes. Steps to reproduce == Run nova-live-migration on slow CI nodes. Expected result === nova/tests/live_migration/hooks/ceph.sh waits until hosts are marked as UP before running Tempest. Actual result = nova/tests/live_migration/hooks/ceph.sh checks for running n-cpu processes and then immediately starts Tempest. Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ stable/pike but you be present on other branches with slow enough CI nodes. 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? Libvirt / KVM. 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs == Mar 13 13:01:39.170201 ubuntu-xenial-rax-iad-0015199005 nova-compute[30153]: DEBUG nova.compute.manager [None req-beafe617-34df-4bec-9ff6-4a0b7bebb15f None None] [instance: 74932102-3737-4f8f-9002-763b2d580c3a] Instance spawn was interrupted before instance_claim, setting instance to ERROR state {{(pid=30153) _error_out_instances_whose_build_was_interrupted /opt/stack/new/nova/nova/compute/manager.py:1323}} Mar 13 13:01:39.255008 ubuntu-xenial-rax-iad-0015199005 nova-compute[30153]: DEBUG nova.compute.manager [None req-beafe617-34df-4bec-9ff6-4a0b7bebb15f None None] [instance: 042afab0-fbef-4506-84e2-1f54cb9d67ca] Instance spawn was interrupted before instance_claim, setting instance to ERROR state {{(pid=30153) _error_out_instances_whose_build_was_interrupted /opt/stack/new/nova/nova/compute/manager.py:1323}} Mar 13 13:01:39.322508 ubuntu-xenial-rax-iad-0015199005 nova-compute[30153]: DEBUG nova.compute.manager [None req-beafe617-34df-4bec-9ff6-4a0b7bebb15f None None] [instance: cc293f53-7428-4e66-9841-20cce219e24f] Instance spawn was interrupted before instance_claim, setting instance to ERROR state {{(pid=30153) _error_out_instances_whose_build_was_interrupted /opt/stack/new/nova/nova/compute/manager.py:1323}} To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1867380/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1774249] Re: update_available_resource will raise DiskNotFound after resize but before confirm
** Also affects: nova/train Importance: Undecided Status: New ** Changed in: nova/train Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1774249 Title: update_available_resource will raise DiskNotFound after resize but before confirm Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) ocata series: Triaged Status in OpenStack Compute (nova) pike series: Fix Committed Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Bug description: Original reported in RH Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1584315 Tested on OSP12 (Pike), but appears to be still present on master. Should only occur if nova compute is configured to use local file instance storage. Create instance A on compute X Resize instance A to compute Y Domain is powered off /var/lib/nova/instances/ renamed to _resize on X Domain is *not* undefined On compute X: update_available_resource runs as a periodic task First action is to update self rt calls driver.get_available_resource() ...calls _get_disk_over_committed_size_total ...iterates over all defined domains, including the ones whose disks we renamed ...fails because a referenced disk no longer exists Results in errors in nova-compute.log: 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager [req-bd52371f-c6ec-4a83-9584-c00c5377acd8 - - - - -] Error updating resources for node compute-0.localdomain.: DiskNotFound: No disk at /var/lib/nova/instances/f3ed9015-3984-43f4-b4a5-c2898052b47d/disk 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager Traceback (most recent call last): 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6695, in update_available_resource_for_node 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager rt.update_available_resource(context, nodename) 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 641, in update_available_resource 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager resources = self.driver.get_available_resource(nodename) 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5892, in get_available_resource 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager disk_over_committed = self._get_disk_over_committed_size_total() 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7393, in _get_disk_over_committed_size_total 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager config, block_device_info) 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7301, in _get_instance_disk_info_from_config 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager dk_size = disk_api.get_allocated_disk_size(path) 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/disk/api.py", line 156, in get_allocated_disk_size 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager return images.qemu_img_info(path).disk_size 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/images.py", line 57, in qemu_img_info 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager raise exception.DiskNotFound(location=path) 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager DiskNotFound: No disk at /var/lib/nova/instances/f3ed9015-3984-43f4-b4a5-c2898052b47d/disk And resource tracker is no longer updated. We can find lots of these in the gate. Note that change Icec2769bf42455853cbe686fb30fda73df791b25 nearly mitigates this, but doesn't because task_state is not set while the instance is awaiting confirm. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1774249/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1831771] Re: UnexpectedDeletingTaskStateError exception can leave traces of VIFs on host
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1831771 Title: UnexpectedDeletingTaskStateError exception can leave traces of VIFs on host Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Bug description: This was originally reported in Bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1668159 The 'UnexpectedDeletingTaskStateError' exception can be raised by something like aborting a large heat stack, where the instance hasn't finished setting up before the stack is aborted and the instances deleted. https://github.com/openstack/nova/blob/19.0.0/nova/db/sqlalchemy/api.py#L2864 We handle this in the compute manager and as part of that handling, we clean up the resource tracking of network interfaces. https://github.com/openstack/nova/blob/19.0.0/nova/compute/manager.py#L2034-L2040 However, we don't unplug these interfaces. This can result in things being left over on the host. We should attempt to unplug VIFs as part of this cleanup. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1831771/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1844929] Re: grenade jobs failing due to "Timed out waiting for response from cell" in scheduler
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1844929 Title: grenade jobs failing due to "Timed out waiting for response from cell" in scheduler Status in grenade: Invalid Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Bug description: Seen here: https://zuul.opendev.org/t/openstack/build/d53346210978403f888b85b82b2fe0c7/log/logs/screen-n-sch.txt.gz?severity=3#2368 Sep 22 00:50:54.174385 ubuntu-bionic-ovh-gra1-0011664420 nova- scheduler[18043]: WARNING nova.context [None req- 1929039e-1517-4326-9700-738d4b570ba6 tempest- AttachInterfacesUnderV243Test-2009753731 tempest- AttachInterfacesUnderV243Test-2009753731] Timed out waiting for response from cell 8acfb79b-2e40-4e1c-bc3d-d404dac6db90 Looks like something is causing timeouts reaching cell1 during grenade runs. The only errors I see in the rabbit logs are these for the uwsgi (API) servers: =ERROR REPORT 22-Sep-2019::00:35:30 === closing AMQP connection <0.1511.0> (217.182.141.188:48492 -> 217.182.141.188:5672 - uwsgi:19453:72e08501-61ca-4ade-865e- f0605979ed7d): missed heartbeats from client, timeout: 60s -- It looks like we don't have mysql logs in this grenade run, maybe we need a fix like this somewhere for grenade: https://github.com/openstack/devstack/commit/f92c346131db2c89b930b1a23f8489419a2217dc logstash shows 1101 hits in the last 7 days, since Sept 17 actually: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Timed%20out%20waiting%20for%20response%20from%20cell%5C%22%20AND%20tags%3A%5C%22screen-n-sch.txt%5C%22=7d check and gate queues, all failures. It also appears to only show up on fortnebula and OVH nodes, primarily fortnebula. I wonder if there is a performing/timing issue if those nodes are slower and we aren't waiting for something during the grenade upgrade before proceeding. To manage notifications about this bug go to: https://bugs.launchpad.net/grenade/+bug/1844929/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1837199] Re: nova-manage Tracebeck on missing arg
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1837199 Title: nova-manage Tracebeck on missing arg Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Confirmed Status in OpenStack Compute (nova) stein series: Confirmed Status in OpenStack Compute (nova) train series: Fix Released Bug description: # nova-manage cell_v2 An error has occurred: Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/oslo_config/cfg.py", line 3179, in __getattr__ return getattr(self._conf._namespace, name) AttributeError: '_Namespace' object has no attribute 'action_fn' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/stack/nova/nova/cmd/manage.py", line 2205, in main fn, fn_args, fn_kwargs = cmd_common.get_action_fn() File "/opt/stack/nova/nova/cmd/common.py", line 169, in get_action_fn fn = CONF.category.action_fn File "/usr/local/lib/python3.7/site-packages/oslo_config/cfg.py", line 3181, in __getattr__ raise NoSuchOptError(name) oslo_config.cfg.NoSuchOptError: no such option action_fn in group [DEFAULT] # nova-manage cell_v2 help usage: nova-manage cell_v2 [-h] {create_cell,delete_cell,delete_host,discover_hosts,list_cells,list_hosts,map_cell0,map_cell_and_hosts,map_instances,simple_cell_setup,update_cell,verify_instance} ... nova-manage cell_v2: error: argument action: invalid choice: 'help' (choose from 'create_cell', 'delete_cell', 'delete_host', 'discover_hosts', 'list_cells', 'list_hosts', 'map_cell0', 'map_cell_and_hosts', 'map_instances', 'simple_cell_setup', 'update_cell', 'verify_instance') # nova-manage cell_v2 -h usage: nova-manage cell_v2 [-h] {create_cell,delete_cell,delete_host,discover_hosts,list_cells,list_hosts,map_cell0,map_cell_and_hosts,map_instances,simple_cell_setup,update_cell,verify_instance} ... positional arguments: {create_cell,delete_cell,delete_host,discover_hosts,list_cells,list_hosts,map_cell0,map_cell_and_hosts,map_instances,simple_cell_setup,update_cell,verify_instance} optional arguments: -h, --helpshow this help message and exit python version: /usr/bin/python3 --version Python 3.7.3 nova version: $ git log -1 commit 78f9961d293e3b3e0ac62345b78abb1c9e2bd128 (HEAD -> master, origin/master, origin/HEAD) oslo.config 6.11.0 Instead of printing Traceback, nova-manage should give a hint for the user choices. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1837199/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1852458] Re: "create" instance action not created when instance is buried in cell0
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1852458 Title: "create" instance action not created when instance is buried in cell0 Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) ocata series: Triaged Status in OpenStack Compute (nova) pike series: Triaged Status in OpenStack Compute (nova) queens series: Triaged Status in OpenStack Compute (nova) rocky series: Triaged Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Bug description: Before cell0 was introduced the API would create the "create" instance action for each instance in the nova cell database before casting off to conductor to do scheduling: https://github.com/openstack/nova/blob/mitaka- eol/nova/compute/api.py#L1180 Note that conductor failed to "complete" the action with a failure event: https://github.com/openstack/nova/blob/mitaka- eol/nova/conductor/manager.py#L374 But at least the action was created. Since then, with cell0, if scheduling fails the instance is buried in the cell0 database but no instance action is created. To illustrate, I disabled the single nova-compute service on my devstack host and created a server which failed with NoValidHost: $ openstack server show build-fail1 -f value -c fault {u'message': u'No valid host was found. ', u'code': 500, u'created': u'2019-11-13T15:57:13Z'} When listing instance actions I expected to see a "create" action but there were none: $ nova instance-action-list 008a7d52-dd83-4f52-a720-b3cfcc498259 +++-+++ | Action | Request_ID | Message | Start_Time | Updated_At | +++-+++ +++-+++ This is because the "create" action is only created when the instance is scheduled to a specific cell: https://github.com/openstack/nova/blob/20.0.0/nova/conductor/manager.py#L1460 Solution: The ComputeTaskManager._bury_in_cell0 method should also create a "create" action in cell0 like it does for the instance BDMs and tags. This goes back to Ocata: https://review.opendev.org/#/c/319379/ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1852458/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1854126] Re: s390x: failed to live migrate VM
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1854126 Title: s390x: failed to live migrate VM Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Bug description: see following logs when doing live migration on s390s platform with KVM openstack server migrate --live kvm02 --block-migration d28caa4a-215b- 44c8-bed0-e0e7faca07e5 ogs: 2019-10-10 12:03:25.710 19003 ERROR nova.virt.libvirt.driver [req- 83d11ac0-3414-489e-8ad2-bfd0078e059f 44cdcb0bbe9e40fc91c043533d4dcbac 4067c50d412549c29b2deb58ec400ea1 - default default] CPU doesn't have compatibility. XML error: Missing CPU model name Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult: libvirtError: XML error: Missing CPU model name 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server [req-83d11ac0-3414-489e-8ad2-bfd0078e059f 44cdcb0bbe9e40fc91c043533d4dcbac 4067c50d412549c29b2deb58ec400ea1 - default default] Exception during message handling: MigrationPreCheckError: Migration pre-check error: CPU doesn't have compatibility. XML error: Missing CPU model name Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 166, in _process_incoming 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server return self.do_dispatch(endpoint, method, ctxt, args) 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in do_dispatch 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 79, in wrapped 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server function_name, call_dict, binary, tb) 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in exit 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server self.force_reraise() 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server six.reraise(self.type, self.value, self.tb) 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 69, in wrapped 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw) 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/utils.py", line 1418, in decorated_function 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs) 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 215, in decorated_function 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server kwargs['instance'], e, sys.exc_info()) 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in exit 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server self.force_reraise() 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server six.reraise(self.type, self.value, self.tb) 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 203, in decorated_function 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs) 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6262, in check_can_live_migrate_destination 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server
[Yahoo-eng-team] [Bug 1862633] Re: unshelve leak allocation if update port fails
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1862633 Title: unshelve leak allocation if update port fails Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) pike series: Fix Committed Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Bug description: If updating the port binding during unshelve of an offloaded server fails then nova leaks placement allocation. Steps to reproduce == 1) boot a server with a neutron port 2) shelve and offload the server 3) disable the original host of the server to force scheduling during unshelve to select a differetn host. This is important as it triggers a non empty port update during unshelve 4) unshelve the server and inject network fault in the communication between nova and neutron. You can try to simply shut down neutron-server at the right moment as well. Right means just before the target compute tries to send the port update 5) observer that the unshelve fails, the server goes back to offloaded state, but the placement allocation on the target host remains. Triage: the problem is cause by a missing fault handling code in the compute manager[1]. The compute manager has proper error handling if the unshelve fails in the virt driver spawn call, but it does not handle failure if the neutron communication fails. The compute manager method simply logs and re-raises the neutron exceptions. This means that the exception is dropped as the unshelve_instance compute RPC is a cast. [1] https://github.com/openstack/nova/blob/1fcd74730d343b7cee12a0a50ea537dc4ff87f65/nova/compute/manager.py#L6473 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1862633/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1856925] Re: Nova compute service exception that performs cold migration virtual machine stuck in resize state.
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1856925 Title: Nova compute service exception that performs cold migration virtual machine stuck in resize state. Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: Fix Released Bug description: Description: In the case of a nova-compute service exception, such as down, the instance gets stuck in the resize state during cold migration and cannot perform evacuation.The command request for nova API is also issued, server_status and Task State have been changed, but compute cannot receive the request, resulting in the server State remaining in the resize State. When nova-compute is restarted, the server State becomes ERROR.It is recommended to add validation to prevent instances from entering inoperable states. This can also happen with commands such as stop/rebuild/reboot. Environment: 1. openstack-Q;nova -version:9.1.1 2. hypervisor: Libvirt + KVM 3. One control node, two compute nodes. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1856925/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1860990] Re: RBD image backend tries to flatten images even if they are already flat
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1860990 Title: RBD image backend tries to flatten images even if they are already flat Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Bug description: When [DEFAULT]show_multiple_locations option is not set in glance, and both glance and nova use ceph as their backend, with properly configured accesses, nova will fail with the following exception: 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [req-8021fd76-d5ab-4a9b-bd17-f5eb4d4faf62 0e96a04f360644818632b7e46fe8d3e7 ac01daacc7424a40b8b464a163902dcb - default default] [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] Instance failed to spawn: rbd.InvalidArgument: [errno 22] error flattening b'fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6_disk' 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] Traceback (most recent call last): 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] File "/var/lib/openstack/lib/python3.6/site-packages/nova/compute/manager.py", line 5757, in _unshelve_instance 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] block_device_info=block_device_info) 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] File "/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 3457, in spawn 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] block_device_info=block_device_info) 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] File "/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 3832, in _create_image 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] fallback_from_host) 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] File "/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 3923, in _create_and_inject_local_root 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] instance, size, fallback_from_host) 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] File "/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 9267, in _try_fetch_image_cache 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] image.flatten() 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] File "/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/imagebackend.py", line 983, in flatten 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] self.driver.flatten(self.rbd_name, pool=self.driver.pool) 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] File "/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/storage/rbd_utils.py", line 290, in flatten 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] vol.flatten() 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] File "/var/lib/openstack/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in doit 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] result = proxy_call(self._autowrap, f, *args, **kwargs) 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] File "/var/lib/openstack/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in proxy_call 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] rv = execute(f, *args, **kwargs) 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] File "/var/lib/openstack/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in execute 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6]
[Yahoo-eng-team] [Bug 1864428] Re: Hypervisors collection_name affects pagination query
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1864428 Title: Hypervisors collection_name affects pagination query Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack SDK: New Bug description: In nova, hypervisor view builder's collect_name is 'hypervisors': ``` # nova.api.openstack.compute.views.hypervisors.ViewBuilder class ViewBuilder(common.ViewBuilder): _collection_name = "hypervisors" def get_links(self, request, hypervisors, detail=False): coll_name = (self._collection_name + '/detail' if detail else self._collection_name) return self._get_collection_links(request, hypervisors, coll_name, 'id') ``` So when we do paginated query via openstacksdk, we would get response like this: ``` {u'hypervisors': [{u'status': u'enabled', u'state': u'up', u'id': u'53fb5bdc-f9a4-4fc4-a4be-8eb33cd236b1', u'hypervisor_hostname': u'gd02-compute-11e115e64e19'}, {u'status': u'enabled', u'state': u'up', u'id': u'a4db6ea8-2a91-45e7-a4b4-cb26c2dbc514', u'hypervisor_hostname': u'gd02-compute-11e115e64e11'}, {u'status': u'enabled', u'state': u'up', u'id': u'd92c5452-ea75-4f58-8e0b-b4a6823850d8', u'hypervisor_hostname': u'gd02-compute-11e115e64e12'}], u'hypervisors_links': [{u'href': u'http://nova-api.cty.os:11010/v2.1/hypervisors?limit=3=d92c5452-ea75-4f58-8e0b-b4a6823850d8', u'rel': u'next'}]} ``` And openstacksdk uses wrong hypervisors_links to query next page, and get error: ``` Traceback (most recent call last): File "p_hypervisors-admin.py", line 48, in do_operation(limit, marker) File "p_hypervisors-admin.py", line 38, in do_operation srvs = [i for i in info] File "/usr/lib/python2.7/site-packages/openstack/resource.py", line 898, in list exceptions.raise_from_response(response) File "/usr/lib/python2.7/site-packages/openstack/exceptions.py", line 212, in raise_from_response http_status=http_status, request_id=request_id openstack.exceptions.NotFoundException: NotFoundException: 404: Client Error for url: http://nova-api.cty.os:11010/v2.1/hypervisors?limit=3=ea9d857a-6328-47a8-abde-dc38972f4ca2, Not Found ``` The right uri should be `/v2.1/os-hypervisors` To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1864428/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1863605] Re: live migration with vpmem will go to error in Train
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1863605 Title: live migration with vpmem will go to error in Train Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: Fix Released Bug description: We introduced vpmem support in Train release, including create/resize/cold migration, but not live migration(with libvirt/qemu). Since live migration will essentially rely on an libvirt xml, for vpmem there will be backend files configued in xml, if we live migrate an instance with vpmem under Train release, we may got two unexpected results: 1. If the dest host has the same vpmem backend files as that used by instance on source host, the live migration will succeed but the vpmems consumed on dest host will not be tracked. 2. If the dest host doesn't have those vpmems, the live migration will fail. We need reject the live migration with vpmem in nova conductor when do the precheck. And backport to T release. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1863605/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1862633] Re: unshelve leak allocation if update port fails
** Also affects: nova/pike Importance: Undecided Status: New ** Also affects: nova/stein Importance: Undecided Status: New ** Also affects: nova/train Importance: Undecided Status: New ** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/rocky Importance: Undecided Status: New ** Changed in: nova/stein Status: New => Fix Released ** Changed in: nova/train Status: New => Fix Committed ** Changed in: nova/rocky Status: New => Fix Committed ** Changed in: nova/pike Status: New => Fix Committed ** Changed in: nova/queens Status: New => Fix Committed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1862633 Title: unshelve leak allocation if update port fails Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) pike series: Fix Committed Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Bug description: If updating the port binding during unshelve of an offloaded server fails then nova leaks placement allocation. Steps to reproduce == 1) boot a server with a neutron port 2) shelve and offload the server 3) disable the original host of the server to force scheduling during unshelve to select a differetn host. This is important as it triggers a non empty port update during unshelve 4) unshelve the server and inject network fault in the communication between nova and neutron. You can try to simply shut down neutron-server at the right moment as well. Right means just before the target compute tries to send the port update 5) observer that the unshelve fails, the server goes back to offloaded state, but the placement allocation on the target host remains. Triage: the problem is cause by a missing fault handling code in the compute manager[1]. The compute manager has proper error handling if the unshelve fails in the virt driver spawn call, but it does not handle failure if the neutron communication fails. The compute manager method simply logs and re-raises the neutron exceptions. This means that the exception is dropped as the unshelve_instance compute RPC is a cast. [1] https://github.com/openstack/nova/blob/1fcd74730d343b7cee12a0a50ea537dc4ff87f65/nova/compute/manager.py#L6473 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1862633/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1866373] Re: URLS in os-keypairs 'links' body are incorrect
** Also affects: nova/train Importance: Undecided Status: New ** Also affects: nova/rocky Importance: Undecided Status: New ** Also affects: nova/stein Importance: Undecided Status: New ** Changed in: nova/train Status: New => Fix Committed ** Changed in: nova/stein Status: New => Fix Released ** Changed in: nova/rocky Status: New => Fix Committed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1866373 Title: URLS in os-keypairs 'links' body are incorrect Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Bug description: Similar to https://bugs.launchpad.net/nova/+bug/1864428, the URLs in the 'links' element of the response are incorrect. They read '/keypairs', not '/os-keypairs'. From the current api-ref (2020-03-06): { "keypairs": [ { "keypair": { "fingerprint": "7e:eb:ab:24:ba:d1:e1:88:ae:9a:fb:66:53:df:d3:bd", "name": "keypair-5d935425-31d5-48a7-a0f1-e76e9813f2c3", "type": "ssh", "public_key": "ssh-rsa B3NzaC1yc2EDAQABAAABAQCkF3MX59OrlBs3dH5CU7lNmvpbrgZxSpyGjlnE8Flkirnc/Up22lpjznoxqeoTAwTW034k7Dz6aYIrZGmQwe2TkE084yqvlj45Dkyoj95fW/sZacm0cZNuL69EObEGHdprfGJQajrpz22NQoCD8TFB8Wv+8om9NH9Le6s+WPe98WC77KLw8qgfQsbIey+JawPWl4O67ZdL5xrypuRjfIPWjgy/VH85IXg/Z/GONZ2nxHgSShMkwqSFECAC5L3PHB+0+/12M/iikdatFSVGjpuHvkLOs3oe7m6HlOfluSJ85BzLWBbvva93qkGmLg4ZAc8rPh2O+YIsBUHNLLMM/oQp Generated-by-Nova\n" } } ], "keypairs_links": [ { "href": "http://openstack.example.com/v2.1/6f70656e737461636b20342065766572/keypairs?limit=1=keypair-5d935425-31d5-48a7-a0f1-e76e9813f2c3;, "rel": "next" } ] } To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1866373/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1878024] Re: disk usage of the nova image cache is not counted as used disk space
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1878024 Title: disk usage of the nova image cache is not counted as used disk space Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Committed Bug description: Description === The nova-compute service keeps a local image cache for glance images used for nova servers to avoid multiple download of the same image from glance. The disk usage of such cache is not calculated as local disk usage in nova and not reported to placement as used DISK_GB. This leads to disk over-allocation. Also the size of that cache cannot be limited by nova configuration so the deployer cannot reserve disk space for that cache with reserved_host_disk_mb config. Steps to reproduce == * Set up a single node devstack * Create and upload an image with a not too small physical size. Like an image with 1G physical size. * Check the current disk usage of the Host OS and configure reserved_host_disk_mb in nova-cpu.conf accordingly. * Boot two servers from that image with a flavor, like d1 (disk=5G) * Nova will download the glance image once to the local cache which result in a 1GB disk usage * Nova will create two root file systems, one for each VM. Those disks initially has minimal physical disk size, but has 5G virtual size. * At this point Nova allocated 5G + 5G of DISK_GB in placement, but due to the image in the cache the total disk usage of the two VMs + cache can be 5G + 5G + 1G, if both VMs overwrite and fills the content of its own disk. Expected result === Option A) Nova maintains a DISK_GB allocation in placement for the images in its cache. This way the expected DISK_GB allocation in placement is 5G + 5G + 1G at the end Option B) Nova provides a config option to limit the maximum size of the image cache and therefore the deployer can include the maximum image cache size into the reserved_host_disk_mb during dimensioning of the disk space of the compute. Actual result = Only 5G + 5G was allocation from placement. So disk space is over-allocated by the image cache. Environment === Devstack from recent master stack@aio:/opt/stack/nova$ git log --oneline | head -n 1 4b62c90063 Merge "Remove stale nested backport from InstancePCIRequests" libvirt driver with file based image backend Logs & Configs == http://paste.openstack.org/show/793388/ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1878024/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1824858] Re: nova instance remnant left behind after cold migration completes
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1824858 Title: nova instance remnant left behind after cold migration completes Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in StarlingX: Fix Released Bug description: Brief Description - After cold migration to a new worker node, instances remnants are left behind Severity standard Steps to Reproduce -- worker nodes compute-1 and compute-2 have label remote-storage enabled 1. Launch instance on compute-1 2. cold migrate to compute-2 3. confirm cold migration to complete Expected Behavior -- Migration to compute-2 and cleanup on files on compute-1 Actual Behavior At 16:35:24 cold migration for instance a416ead6-a17f-4bb9-9a96-3134b426b069 completed to compute-2 but the following path is left behind on compute-1 compute-1:/var/lib/nova/instances/a416ead6-a17f-4bb9-9a96-3134b426b069 compute-1:/var/lib/nova/instances$ ls a416ead6-a17f-4bb9-9a96-3134b426b069 _base locks a416ead6-a17f-4bb9-9a96-3134b426b069_resize compute_nodes lost+found compute-1:/var/lib/nova/instances$ ls a416ead6-a17f-4bb9-9a96-3134b426b069 _base compute_nodes locks lost+found compute-1:/var/lib/nova/instances$ ls a416ead6-a17f-4bb9-9a96-3134b426b069 _base compute_nodes locks lost+found 2019-04-15T16:35:24.646749clear 700.010 Instance tenant2-migration_test-1 owned by tenant2 has been cold-migrated to host compute-2 waiting for confirmation tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069 critical 2019-04-15T16:35:24.482575log 700.168 Cold-Migrate-Confirm complete for instance tenant2-migration_test-1 enabled on host compute-2 tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069 critical 2019-04-15T16:35:16.815223log 700.163 Cold-Migrate-Confirm issued by tenant2 against instance tenant2-migration_test-1 owned by tenant2 on host compute-2 tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069 critical 2019-04-15T16:35:10.030068clear 700.009 Instance tenant2-migration_test-1 owned by tenant2 is cold migrating from host compute-1 tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069 critical 2019-04-15T16:35:09.971414set 700.010 Instance tenant2-migration_test-1 owned by tenant2 has been cold-migrated to host compute-2 waiting for confirmation tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069 critical 2019-04-15T16:35:09.970212log 700.162 Cold-Migrate complete for instance tenant2-migration_test-1 now enabled on host compute-2 waiting for confirmation tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069 critical 2019-04-15T16:34:51.637687set 700.009 Instance tenant2-migration_test-1 owned by tenant2 is cold migrating from host compute-1 tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069 critical 2019-04-15T16:34:51.637636log 700.158 Cold-Migrate inprogress for instance tenant2-migration_test-1 from host compute-1 tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069 critical 2019-04-15T16:34:51.478442log 700.157 Cold-Migrate issued by tenant2 against instance tenant2-migration_test-1 owned by tenant2 from host compute-1 tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069 critical 2019-04-15T16:34:20.181155log 700.101 Instance tenant2-migration_test-1 is enabled on host compute-1 tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069 critical see nova-compute.log (compute-1) compute-1 nova-compute log [instance: a416ead6-a17f-4bb9-9a96-3134b426b069 claimed and spawned here on compute-1] {"log":"2019-04-15 16:34:04,617.617 60908 INFO nova.compute.claims [req-f1195bbb-d5b0-4a75-a598-ff287d247643 3fd3229d3e6248cf9b5411b2ecec86e9 7f1d42233341428a918855614770e676 - default default]
[Yahoo-eng-team] [Bug 1834659] Re: Volume not removed on instance deletion
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1834659 Title: Volume not removed on instance deletion Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Bug description: Description === When we deploy a non-ephemeral instance (i.e. Creating a new volume), and indicate "YES" in "Delete Volume on Instance delete",then delete the instance, and volume driver terminate connection in cinder takes too long to return, the volume is not removed. The volume status remains as "In-use" and "Attached to None on /dev/vda". For example: abcfa1db-1748-4f04-9a29-128cf22efcc5 - 130GiB In-use - Attached to None on /dev/vda Steps to reproduce == Please refer to this bug comment #2 below Expected result === Volume gets removed Actual result = Volume remains attached Environment === Issue was initially reported downstream against Newton release (see comment #1 below). Customer was using hitachi volume driver: volume_driver = cinder.volume.drivers.hitachi.hbsd.hbsd_fc.HBSDFCDriver As a note, the hitachi drivers are unsupported as of Pike (see cinder commit 595c8d3f8523a9612ccc64ff4147eab993493892 Issue was reproduced in a devstack environment runnning the Stein release. Volume driver used was lvm (devstack default) To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1834659/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1864428] Re: Hypervisors collection_name affects pagination query
** Also affects: nova/stein Importance: Undecided Status: New ** Also affects: nova/rocky Importance: Undecided Status: New ** Also affects: nova/train Importance: Undecided Status: New ** Changed in: nova/stein Status: New => Fix Released ** Changed in: nova/train Status: New => Fix Committed ** Changed in: nova/rocky Status: New => Fix Committed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1864428 Title: Hypervisors collection_name affects pagination query Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack SDK: New Bug description: In nova, hypervisor view builder's collect_name is 'hypervisors': ``` # nova.api.openstack.compute.views.hypervisors.ViewBuilder class ViewBuilder(common.ViewBuilder): _collection_name = "hypervisors" def get_links(self, request, hypervisors, detail=False): coll_name = (self._collection_name + '/detail' if detail else self._collection_name) return self._get_collection_links(request, hypervisors, coll_name, 'id') ``` So when we do paginated query via openstacksdk, we would get response like this: ``` {u'hypervisors': [{u'status': u'enabled', u'state': u'up', u'id': u'53fb5bdc-f9a4-4fc4-a4be-8eb33cd236b1', u'hypervisor_hostname': u'gd02-compute-11e115e64e19'}, {u'status': u'enabled', u'state': u'up', u'id': u'a4db6ea8-2a91-45e7-a4b4-cb26c2dbc514', u'hypervisor_hostname': u'gd02-compute-11e115e64e11'}, {u'status': u'enabled', u'state': u'up', u'id': u'd92c5452-ea75-4f58-8e0b-b4a6823850d8', u'hypervisor_hostname': u'gd02-compute-11e115e64e12'}], u'hypervisors_links': [{u'href': u'http://nova-api.cty.os:11010/v2.1/hypervisors?limit=3=d92c5452-ea75-4f58-8e0b-b4a6823850d8', u'rel': u'next'}]} ``` And openstacksdk uses wrong hypervisors_links to query next page, and get error: ``` Traceback (most recent call last): File "p_hypervisors-admin.py", line 48, in do_operation(limit, marker) File "p_hypervisors-admin.py", line 38, in do_operation srvs = [i for i in info] File "/usr/lib/python2.7/site-packages/openstack/resource.py", line 898, in list exceptions.raise_from_response(response) File "/usr/lib/python2.7/site-packages/openstack/exceptions.py", line 212, in raise_from_response http_status=http_status, request_id=request_id openstack.exceptions.NotFoundException: NotFoundException: 404: Client Error for url: http://nova-api.cty.os:11010/v2.1/hypervisors?limit=3=ea9d857a-6328-47a8-abde-dc38972f4ca2, Not Found ``` The right uri should be `/v2.1/os-hypervisors` To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1864428/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1869050] Re: migration of anti-affinity server fails due to stale scheduler instance info
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1869050 Title: migration of anti-affinity server fails due to stale scheduler instance info Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) pike series: Invalid Status in OpenStack Compute (nova) queens series: Invalid Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Committed Bug description: Description === Steps to reproduce == Have a deployment with 3 compute nodes * make sure that the deployment is configured with tracks_instance_changes=True (True is the default) * create and server group with anti-affinity policy * boot server1 into the group * boot server2 into the group * migrate server2 * confirm the migration * boot server3 Make sure that between the last two steps there was no periodic _sync_scheduler_instance_info running on the compute that was hosted server2 before the migration. This could done by doing the last too steps after each other without waiting too much as interval of that periodic (scheduler_instance_sync_interval) is defaulted to 120 sec. Expected result === server3 is booted on the host where server2 is moved away Actual result = server3 cannot be booted (NoValidHost) Triage == The confirm resize call on the source compute does not update the scheduler that the instance is removed from this host. This makes the scheduler instance info stale and causing the subsequent scheduling error. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1869050/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1878979] Re: Quota code does not respect [api]/instance_list_per_project_cells
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1878979 Title: Quota code does not respect [api]/instance_list_per_project_cells Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Committed Bug description: The function which counts resources using the legacy method involves getting a list of all cell mappings assigned to a specific project: https://github.com/openstack/nova/blob/575a91ff5be79ac35aef4b61d84c78c693693304/nova/quota.py#L1170-L1209 This code can be very heavy on a database which contains a lot of instances (but not a lot of mappings), potentially scanning millions of rows to gather 1-2 cell mappings. In a single cell environment, it is just extra CPU usage with exactly the same outcome. The [api]/instance_list_per_project_cells was introduced to workaround this: https://github.com/openstack/nova/blob/575a91ff5be79ac35aef4b61d84c78c693693304/nova/compute/instance_list.py#L146-L153 However, the quota code does not implement it which means quota count take a big toll on the database server. We should ideally mirror the same behaviour in the quota code. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1878979/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1867075] Re: Arm64: Instance with Configure Drive attach volume failed failed
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1867075 Title: Arm64: Instance with Configure Drive attach volume failed failed Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Bug description: Arm64. Image: cirros-0.5.1 hw_cdrom_bus='scsi', hw_disk_bus='scsi', hw_machine_type='virt', hw_rng_model='virtio', hw_scsi_model='virtio-scsi', os_command_line=''console=ttyAMA0'' Boot a vm. Create a volume: openstack volume create --size 1 test Attach: openstack server add volume cirros-test test Error: DEBUG nova.virt.libvirt.guest [None req-8dfbf677-50bb-42be-869f-52c9ac638d59 admin admin] attach device xml: b9abb789-1c55-4210-ab5c-78b0e3619405 ror: Requested operation is not valid: Domain already contains a disk with that address ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] Traceback (most recent call last): ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] File "/opt/stack/nova/nova/virt/block_device.py", line 599, in _volume_attach ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] device_type=self['device_type'], encryption=encryption) ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1731, in attach_volume ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] conf = self._get_volume_config(connection_info, disk_info) ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] self.force_reraise() ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] six.reraise(self.type_, self.value, self.tb) ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] raise value ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1731, in attach_volume ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] conf = self._get_volume_config(connection_info, disk_info) ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 293, in attach_device ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] self._domain.attachDeviceFlags(device_xml, flags=flags) ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 190, in doit ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] result = proxy_call(self._autowrap, f, *args, **kwargs) ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 148, in proxy_call ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] rv = execute(f, *args, **kwargs) ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 129, in execute ERROR
[Yahoo-eng-team] [Bug 1854126] Re: s390x: failed to live migrate VM
** Also affects: nova/stein Importance: Undecided Status: New ** Also affects: nova/train Importance: Undecided Status: New ** Changed in: nova/train Status: New => Fix Committed ** Changed in: nova/stein Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1854126 Title: s390x: failed to live migrate VM Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Committed Bug description: see following logs when doing live migration on s390s platform with KVM openstack server migrate --live kvm02 --block-migration d28caa4a-215b- 44c8-bed0-e0e7faca07e5 ogs: 2019-10-10 12:03:25.710 19003 ERROR nova.virt.libvirt.driver [req- 83d11ac0-3414-489e-8ad2-bfd0078e059f 44cdcb0bbe9e40fc91c043533d4dcbac 4067c50d412549c29b2deb58ec400ea1 - default default] CPU doesn't have compatibility. XML error: Missing CPU model name Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult: libvirtError: XML error: Missing CPU model name 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server [req-83d11ac0-3414-489e-8ad2-bfd0078e059f 44cdcb0bbe9e40fc91c043533d4dcbac 4067c50d412549c29b2deb58ec400ea1 - default default] Exception during message handling: MigrationPreCheckError: Migration pre-check error: CPU doesn't have compatibility. XML error: Missing CPU model name Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 166, in _process_incoming 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server return self.do_dispatch(endpoint, method, ctxt, args) 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in do_dispatch 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 79, in wrapped 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server function_name, call_dict, binary, tb) 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in exit 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server self.force_reraise() 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server six.reraise(self.type, self.value, self.tb) 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 69, in wrapped 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw) 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/utils.py", line 1418, in decorated_function 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs) 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 215, in decorated_function 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server kwargs['instance'], e, sys.exc_info()) 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in exit 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server self.force_reraise() 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server six.reraise(self.type, self.value, self.tb) 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 203, in decorated_function 2019-10-10 12:03:25.748 19003 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs) 2019-10-10 12:03:25.748 19003 ERROR
[Yahoo-eng-team] [Bug 1844929] Re: grenade jobs failing due to "Timed out waiting for response from cell" in scheduler
** Changed in: nova/stein Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1844929 Title: grenade jobs failing due to "Timed out waiting for response from cell" in scheduler Status in grenade: Invalid Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Committed Bug description: Seen here: https://zuul.opendev.org/t/openstack/build/d53346210978403f888b85b82b2fe0c7/log/logs/screen-n-sch.txt.gz?severity=3#2368 Sep 22 00:50:54.174385 ubuntu-bionic-ovh-gra1-0011664420 nova- scheduler[18043]: WARNING nova.context [None req- 1929039e-1517-4326-9700-738d4b570ba6 tempest- AttachInterfacesUnderV243Test-2009753731 tempest- AttachInterfacesUnderV243Test-2009753731] Timed out waiting for response from cell 8acfb79b-2e40-4e1c-bc3d-d404dac6db90 Looks like something is causing timeouts reaching cell1 during grenade runs. The only errors I see in the rabbit logs are these for the uwsgi (API) servers: =ERROR REPORT 22-Sep-2019::00:35:30 === closing AMQP connection <0.1511.0> (217.182.141.188:48492 -> 217.182.141.188:5672 - uwsgi:19453:72e08501-61ca-4ade-865e- f0605979ed7d): missed heartbeats from client, timeout: 60s -- It looks like we don't have mysql logs in this grenade run, maybe we need a fix like this somewhere for grenade: https://github.com/openstack/devstack/commit/f92c346131db2c89b930b1a23f8489419a2217dc logstash shows 1101 hits in the last 7 days, since Sept 17 actually: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Timed%20out%20waiting%20for%20response%20from%20cell%5C%22%20AND%20tags%3A%5C%22screen-n-sch.txt%5C%22=7d check and gate queues, all failures. It also appears to only show up on fortnebula and OVH nodes, primarily fortnebula. I wonder if there is a performing/timing issue if those nodes are slower and we aren't waiting for something during the grenade upgrade before proceeding. To manage notifications about this bug go to: https://bugs.launchpad.net/grenade/+bug/1844929/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1860990] Re: RBD image backend tries to flatten images even if they are already flat
** Changed in: nova/stein Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1860990 Title: RBD image backend tries to flatten images even if they are already flat Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Committed Bug description: When [DEFAULT]show_multiple_locations option is not set in glance, and both glance and nova use ceph as their backend, with properly configured accesses, nova will fail with the following exception: 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [req-8021fd76-d5ab-4a9b-bd17-f5eb4d4faf62 0e96a04f360644818632b7e46fe8d3e7 ac01daacc7424a40b8b464a163902dcb - default default] [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] Instance failed to spawn: rbd.InvalidArgument: [errno 22] error flattening b'fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6_disk' 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] Traceback (most recent call last): 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] File "/var/lib/openstack/lib/python3.6/site-packages/nova/compute/manager.py", line 5757, in _unshelve_instance 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] block_device_info=block_device_info) 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] File "/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 3457, in spawn 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] block_device_info=block_device_info) 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] File "/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 3832, in _create_image 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] fallback_from_host) 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] File "/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 3923, in _create_and_inject_local_root 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] instance, size, fallback_from_host) 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] File "/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 9267, in _try_fetch_image_cache 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] image.flatten() 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] File "/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/imagebackend.py", line 983, in flatten 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] self.driver.flatten(self.rbd_name, pool=self.driver.pool) 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] File "/var/lib/openstack/lib/python3.6/site-packages/nova/virt/libvirt/storage/rbd_utils.py", line 290, in flatten 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] vol.flatten() 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] File "/var/lib/openstack/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in doit 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] result = proxy_call(self._autowrap, f, *args, **kwargs) 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] File "/var/lib/openstack/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in proxy_call 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] rv = execute(f, *args, **kwargs) 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance: fa9e4118-1bb1-4d52-a2e1-9f61b0e20dc6] File "/var/lib/openstack/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in execute 2020-01-23 14:36:43.617 8647 ERROR nova.compute.manager [instance:
[Yahoo-eng-team] [Bug 1859766] Re: functional tests intermittently fails with "ReadOnlyFieldError: Cannot modify readonly field uuid"
** Changed in: nova/stein Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1859766 Title: functional tests intermittently fails with "ReadOnlyFieldError: Cannot modify readonly field uuid" Status in OpenStack Compute (nova): Invalid Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Bug description: On stable/stain and stable/rocky multiple functional tests fails randomly with the following stack trace[1]: Traceback (most recent call last): File "nova/compute/manager.py", line 2322, in _build_and_run_instance with self.rt.instance_claim(context, instance, node, limits): File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 328, in inner return f(*args, **kwargs) File "nova/compute/resource_tracker.py", line 235, in instance_claim self._update(elevated, cn) File "nova/compute/resource_tracker.py", line 1034, in _update self.old_resources[nodename] = old_compute File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "nova/compute/resource_tracker.py", line 1028, in _update compute_node.save() File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper return fn(self, *args, **kwargs) File "nova/objects/compute_node.py", line 341, in save self._from_db_object(self._context, self, db_compute) File "nova/objects/compute_node.py", line 214, in _from_db_object setattr(compute, key, value) File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 77, in setter raise exception.ReadOnlyFieldError(field=name) ReadOnlyFieldError: Cannot modify readonly field uuid logstash signature: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22ReadOnlyFieldError%3A%20Cannot%20modify%20readonly%20field%20uuid%5C%22 [1] https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_234/702181/1/check/nova-tox-functional/2341192/testr_results.html To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1859766/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1866380] Re: Ironic driver hash ring treats hostnames differing only by case as different hostnames
** Changed in: nova/stein Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1866380 Title: Ironic driver hash ring treats hostnames differing only by case as different hostnames Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) pike series: In Progress Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Committed Bug description: Recently we had a customer case where attempts to add new ironic nodes to an existing undercloud resulted in half of the nodes failing to be detected and added to nova. Ironic API returned all of the newly added nodes when called by the driver, but half of the nodes were not returned to the compute manager by the driver. There was only one nova-compute service managing all of the ironic nodes of the all-in-one typical undercloud deployment. After days of investigation and examination of a database dump from the customer, we noticed that at some point the customer had changed the hostname of the machine from something containing uppercase letters to the same name but all lowercase. The nova-compute service record had the mixed case name and the CONF.host (socket.gethostname()) had the lowercase name. The hash ring logic adds all of the nova-compute service hostnames plus CONF.host to hash ring, then the ironic driver reports only the nodes it owns by retrieving a service hostname from the ring based on a hash of each ironic node UUID. Because of the machine hostname change, the hash ring contained, for example: {'MachineHostName', 'machinehostname'} when it should have contained only one hostname. And because the hash ring contained two hostnames, the driver was able to retrieve only half of the nodes as nodes that it owned. So half of the new nodes were excluded and not added as new compute nodes. I propose adding some logging to the driver related to the hash ring to help with debugging in the future. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1866380/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1866937] Re: Requests to neutron API do not use retries
** Changed in: nova/stein Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1866937 Title: Requests to neutron API do not use retries Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Committed Bug description: We have a customer bug report downstream [1] where nova occasionally fails to carry out server actions requiring calls to neutron API if haproxy happens to close a connection after idle time of 10 seconds at nearly the same time as an incoming request that attempts to re-use the connection while it is being torn down. Here is an excerpt from [1]: The result of our investigation, the cause is follows. 1. neutron-client in nova uses connection pool ( urllib3/requests ) for http. 2. Sometimes, http connection is reused for different requests. 3. Connection between neutron-client and haproxy is closed from haproxy when it is in idle for 10 seconds. 4. If reusing connection from client side and closing connection from haproxy side are happend almost same time, client gets RST and end with "bad status line". To address this problem, we can add a new config option for neutron client (similar to the existing config options we have for cinder client and glance client retries) to be more resilient during such scenarios. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1788853 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1866937/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1867380] Re: nova-live-migration and nova-grenade-multinode fail due to n-cpu restarting slowly after being reconfigured for ceph
** Changed in: nova/stein Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1867380 Title: nova-live-migration and nova-grenade-multinode fail due to n-cpu restarting slowly after being reconfigured for ceph Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) pike series: Fix Committed Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Committed Bug description: Description === $subject, it appears the current check of using grep to find active n-cpu processes isn't enough and we actually need to wait for the services to report as UP before starting to run Tempest. In the following we can see Tempest starting at 2020-03-13 13:01:19.528 while n-cpu within the instance isn't marked as UP for another ~20 seconds: https://zuul.opendev.org/t/openstack/build/5c213f869f324b69a423a983034d4539/log /job-output.txt#6305 https://zuul.opendev.org/t/openstack/build/5c213f869f324b69a423a983034d4539/log/logs/screen-n-cpu.txt#3825 https://zuul.opendev.org/t/openstack/build/5c213f869f324b69a423a983034d4539/log/logs/subnode-2/screen-n-cpu.txt#3534 I've only seen this on stable/pike at present but it could potentially hit all branches with slow enough CI nodes. Steps to reproduce == Run nova-live-migration on slow CI nodes. Expected result === nova/tests/live_migration/hooks/ceph.sh waits until hosts are marked as UP before running Tempest. Actual result = nova/tests/live_migration/hooks/ceph.sh checks for running n-cpu processes and then immediately starts Tempest. Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ stable/pike but you be present on other branches with slow enough CI nodes. 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? Libvirt / KVM. 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs == Mar 13 13:01:39.170201 ubuntu-xenial-rax-iad-0015199005 nova-compute[30153]: DEBUG nova.compute.manager [None req-beafe617-34df-4bec-9ff6-4a0b7bebb15f None None] [instance: 74932102-3737-4f8f-9002-763b2d580c3a] Instance spawn was interrupted before instance_claim, setting instance to ERROR state {{(pid=30153) _error_out_instances_whose_build_was_interrupted /opt/stack/new/nova/nova/compute/manager.py:1323}} Mar 13 13:01:39.255008 ubuntu-xenial-rax-iad-0015199005 nova-compute[30153]: DEBUG nova.compute.manager [None req-beafe617-34df-4bec-9ff6-4a0b7bebb15f None None] [instance: 042afab0-fbef-4506-84e2-1f54cb9d67ca] Instance spawn was interrupted before instance_claim, setting instance to ERROR state {{(pid=30153) _error_out_instances_whose_build_was_interrupted /opt/stack/new/nova/nova/compute/manager.py:1323}} Mar 13 13:01:39.322508 ubuntu-xenial-rax-iad-0015199005 nova-compute[30153]: DEBUG nova.compute.manager [None req-beafe617-34df-4bec-9ff6-4a0b7bebb15f None None] [instance: cc293f53-7428-4e66-9841-20cce219e24f] Instance spawn was interrupted before instance_claim, setting instance to ERROR state {{(pid=30153) _error_out_instances_whose_build_was_interrupted /opt/stack/new/nova/nova/compute/manager.py:1323}} To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1867380/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1774249] Re: update_available_resource will raise DiskNotFound after resize but before confirm
** Changed in: nova/stein Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1774249 Title: update_available_resource will raise DiskNotFound after resize but before confirm Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) ocata series: Triaged Status in OpenStack Compute (nova) pike series: Fix Committed Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Bug description: Original reported in RH Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1584315 Tested on OSP12 (Pike), but appears to be still present on master. Should only occur if nova compute is configured to use local file instance storage. Create instance A on compute X Resize instance A to compute Y Domain is powered off /var/lib/nova/instances/ renamed to _resize on X Domain is *not* undefined On compute X: update_available_resource runs as a periodic task First action is to update self rt calls driver.get_available_resource() ...calls _get_disk_over_committed_size_total ...iterates over all defined domains, including the ones whose disks we renamed ...fails because a referenced disk no longer exists Results in errors in nova-compute.log: 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager [req-bd52371f-c6ec-4a83-9584-c00c5377acd8 - - - - -] Error updating resources for node compute-0.localdomain.: DiskNotFound: No disk at /var/lib/nova/instances/f3ed9015-3984-43f4-b4a5-c2898052b47d/disk 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager Traceback (most recent call last): 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6695, in update_available_resource_for_node 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager rt.update_available_resource(context, nodename) 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 641, in update_available_resource 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager resources = self.driver.get_available_resource(nodename) 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5892, in get_available_resource 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager disk_over_committed = self._get_disk_over_committed_size_total() 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7393, in _get_disk_over_committed_size_total 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager config, block_device_info) 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7301, in _get_instance_disk_info_from_config 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager dk_size = disk_api.get_allocated_disk_size(path) 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/disk/api.py", line 156, in get_allocated_disk_size 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager return images.qemu_img_info(path).disk_size 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/images.py", line 57, in qemu_img_info 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager raise exception.DiskNotFound(location=path) 2018-05-30 02:17:08.647 1 ERROR nova.compute.manager DiskNotFound: No disk at /var/lib/nova/instances/f3ed9015-3984-43f4-b4a5-c2898052b47d/disk And resource tracker is no longer updated. We can find lots of these in the gate. Note that change Icec2769bf42455853cbe686fb30fda73df791b25 nearly mitigates this, but doesn't because task_state is not set while the instance is awaiting confirm. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1774249/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1834659] Re: Volume not removed on instance deletion
** Also affects: nova/rocky Importance: Undecided Status: New ** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/train Importance: Undecided Status: New ** Also affects: nova/stein Importance: Undecided Status: New ** Changed in: nova/stein Status: New => Fix Released ** Changed in: nova/rocky Status: New => Fix Committed ** Changed in: nova/queens Status: New => Fix Committed ** Changed in: nova/train Status: New => Fix Committed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1834659 Title: Volume not removed on instance deletion Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Committed Bug description: Description === When we deploy a non-ephemeral instance (i.e. Creating a new volume), and indicate "YES" in "Delete Volume on Instance delete",then delete the instance, and volume driver terminate connection in cinder takes too long to return, the volume is not removed. The volume status remains as "In-use" and "Attached to None on /dev/vda". For example: abcfa1db-1748-4f04-9a29-128cf22efcc5 - 130GiB In-use - Attached to None on /dev/vda Steps to reproduce == Please refer to this bug comment #2 below Expected result === Volume gets removed Actual result = Volume remains attached Environment === Issue was initially reported downstream against Newton release (see comment #1 below). Customer was using hitachi volume driver: volume_driver = cinder.volume.drivers.hitachi.hbsd.hbsd_fc.HBSDFCDriver As a note, the hitachi drivers are unsupported as of Pike (see cinder commit 595c8d3f8523a9612ccc64ff4147eab993493892 Issue was reproduced in a devstack environment runnning the Stein release. Volume driver used was lvm (devstack default) To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1834659/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1831771] Re: UnexpectedDeletingTaskStateError exception can leave traces of VIFs on host
** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/rocky Importance: Undecided Status: New ** Also affects: nova/train Importance: Undecided Status: New ** Also affects: nova/stein Importance: Undecided Status: New ** Changed in: nova/train Status: New => Fix Committed ** Changed in: nova/stein Status: New => Fix Released ** Changed in: nova/rocky Status: New => Fix Committed ** Changed in: nova/queens Status: New => Fix Committed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1831771 Title: UnexpectedDeletingTaskStateError exception can leave traces of VIFs on host Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Committed Bug description: This was originally reported in Bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1668159 The 'UnexpectedDeletingTaskStateError' exception can be raised by something like aborting a large heat stack, where the instance hasn't finished setting up before the stack is aborted and the instances deleted. https://github.com/openstack/nova/blob/19.0.0/nova/db/sqlalchemy/api.py#L2864 We handle this in the compute manager and as part of that handling, we clean up the resource tracking of network interfaces. https://github.com/openstack/nova/blob/19.0.0/nova/compute/manager.py#L2034-L2040 However, we don't unplug these interfaces. This can result in things being left over on the host. We should attempt to unplug VIFs as part of this cleanup. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1831771/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1550919] Re: [Libvirt]Evacuate fail may cause disk image be deleted
** Changed in: nova/stein Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1550919 Title: [Libvirt]Evacuate fail may cause disk image be deleted Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Committed Bug description: I checked latest source of nova on master branch, this problem is still exists. When we are doing evacuate, eventually _do_rebuild_instance will be called. As rebuild is not implemented in libvirt driver, in fact _rebuild_default_impl is called. try: with instance.mutated_migration_context(): self.driver.rebuild(**kwargs) except NotImplementedError: # NOTE(rpodolyaka): driver doesn't provide specialized version # of rebuild, fall back to the default implementation self._rebuild_default_impl(**kwargs) _rebuild_default_impl will call self.driver.spawn to boot up the instance, and spawn will in turn call _create_domain_and_network when VirtualInterfaceCreateException or Timeout happen, self.cleanup will be called. except exception.VirtualInterfaceCreateException: # Neutron reported failure and we didn't swallow it, so # bail here with excutils.save_and_reraise_exception(): if guest: guest.poweroff() self.cleanup(context, instance, network_info=network_info, block_device_info=block_device_info) except eventlet.timeout.Timeout: # We never heard from Neutron LOG.warn(_LW('Timeout waiting for vif plugging callback for ' 'instance %(uuid)s'), {'uuid': instance.uuid}, instance=instance) if CONF.vif_plugging_is_fatal: if guest: guest.poweroff() self.cleanup(context, instance, network_info=network_info, block_device_info=block_device_info) raise exception.VirtualInterfaceCreateException() Because default value for parameter destroy_disks is True def cleanup(self, context, instance, network_info, block_device_info=None, destroy_disks=True, migrate_data=None, destroy_vifs=True): So if error occur when doing evacuate during wait neutron's event, instance's disk file will be deleted unexpectedly To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1550919/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1868033] Re: Booting instance with pci_device fails during rocky->stein live upgrade
** Also affects: nova/ussuri Importance: Undecided Status: New ** Also affects: nova/stein Importance: Undecided Status: New ** Also affects: nova/train Importance: Undecided Status: New ** Changed in: nova/train Status: New => Fix Released ** Changed in: nova/ussuri Status: New => Fix Committed ** Changed in: nova/stein Status: New => Fix Committed ** Changed in: nova/stein Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1868033 Title: Booting instance with pci_device fails during rocky->stein live upgrade Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Committed Bug description: Environment: Stein nova-conductor having set upgrade_levels to rocky Rocky nova-compute Boot an instance with a flavour that has a pci_device Error: Failed to publish message to topic 'nova': maximum recursion depth exceeded: RuntimeError: maximum recursion depth exceeded Tracked this down it it continually trying to backport the InstancePCIRequests: It gets as arguments: objinst={u'nova_object.version': u'1.1', u'nova_object.name': u'InstancePCIRequests', u'nova_object.data': {u'instance_uuid': u'08212b12-8fa8-42d9-8d3e-52ed60a64135', u'requests': [{u'nova_object.version': u'1.3', u'nova_object.name': u'InstancePCIRequest', u'nova_object.data': {u'count': 1, u'is_new': False, u'numa_policy': None, u'request_id': None, u'requester_id': None, u'alias_name': u'V100-32G', u'spec': [{u'vendor_id': u'10de', u'product_id': u'1db6'}]}, u'nova_object.namespace': u'nova'}]}, u'nova_object.namespace': u'nova'}, object_versions={u'InstancePCIRequests': '1.1', 'InstancePCIRequest': '1.2'} It fails because it doesn't backport the individual InstancePCIRequest inside the InstancePCIRequests object and so keeps trying. Error it shows is: IncompatibleObjectVersion: Version 1.3 of InstancePCIRequest is not supported, supported version is 1.2 I have fixed this in our setup by altering obj_make_compatible to downgrade the individual requests to version 1.2 which seems to work and all is good To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1868033/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1867075] Re: Arm64: Instance with Configure Drive attach volume failed failed
** Changed in: nova/stein Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1867075 Title: Arm64: Instance with Configure Drive attach volume failed failed Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Committed Bug description: Arm64. Image: cirros-0.5.1 hw_cdrom_bus='scsi', hw_disk_bus='scsi', hw_machine_type='virt', hw_rng_model='virtio', hw_scsi_model='virtio-scsi', os_command_line=''console=ttyAMA0'' Boot a vm. Create a volume: openstack volume create --size 1 test Attach: openstack server add volume cirros-test test Error: DEBUG nova.virt.libvirt.guest [None req-8dfbf677-50bb-42be-869f-52c9ac638d59 admin admin] attach device xml: b9abb789-1c55-4210-ab5c-78b0e3619405 ror: Requested operation is not valid: Domain already contains a disk with that address ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] Traceback (most recent call last): ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] File "/opt/stack/nova/nova/virt/block_device.py", line 599, in _volume_attach ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] device_type=self['device_type'], encryption=encryption) ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1731, in attach_volume ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] conf = self._get_volume_config(connection_info, disk_info) ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] self.force_reraise() ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] six.reraise(self.type_, self.value, self.tb) ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] raise value ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1731, in attach_volume ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] conf = self._get_volume_config(connection_info, disk_info) ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 293, in attach_device ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] self._domain.attachDeviceFlags(device_xml, flags=flags) ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 190, in doit ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] result = proxy_call(self._autowrap, f, *args, **kwargs) ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 148, in proxy_call ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] rv = execute(f, *args, **kwargs) ERROR nova.virt.block_device [instance: 22bdc0a6-1c0c-43fa-8c64-66735b6a6cb6] File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 129, in execute ERROR
[Yahoo-eng-team] [Bug 1869050] Re: migration of anti-affinity server fails due to stale scheduler instance info
** Changed in: nova/stein Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1869050 Title: migration of anti-affinity server fails due to stale scheduler instance info Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) pike series: Invalid Status in OpenStack Compute (nova) queens series: Invalid Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Committed Status in OpenStack Compute (nova) ussuri series: Fix Committed Bug description: Description === Steps to reproduce == Have a deployment with 3 compute nodes * make sure that the deployment is configured with tracks_instance_changes=True (True is the default) * create and server group with anti-affinity policy * boot server1 into the group * boot server2 into the group * migrate server2 * confirm the migration * boot server3 Make sure that between the last two steps there was no periodic _sync_scheduler_instance_info running on the compute that was hosted server2 before the migration. This could done by doing the last too steps after each other without waiting too much as interval of that periodic (scheduler_instance_sync_interval) is defaulted to 120 sec. Expected result === server3 is booted on the host where server2 is moved away Actual result = server3 cannot be booted (NoValidHost) Triage == The confirm resize call on the source compute does not update the scheduler that the instance is removed from this host. This makes the scheduler instance info stale and causing the subsequent scheduling error. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1869050/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1852458] Re: "create" instance action not created when instance is buried in cell0
** Changed in: nova/stein Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1852458 Title: "create" instance action not created when instance is buried in cell0 Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) ocata series: Triaged Status in OpenStack Compute (nova) pike series: Triaged Status in OpenStack Compute (nova) queens series: Triaged Status in OpenStack Compute (nova) rocky series: Triaged Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Committed Bug description: Before cell0 was introduced the API would create the "create" instance action for each instance in the nova cell database before casting off to conductor to do scheduling: https://github.com/openstack/nova/blob/mitaka- eol/nova/compute/api.py#L1180 Note that conductor failed to "complete" the action with a failure event: https://github.com/openstack/nova/blob/mitaka- eol/nova/conductor/manager.py#L374 But at least the action was created. Since then, with cell0, if scheduling fails the instance is buried in the cell0 database but no instance action is created. To illustrate, I disabled the single nova-compute service on my devstack host and created a server which failed with NoValidHost: $ openstack server show build-fail1 -f value -c fault {u'message': u'No valid host was found. ', u'code': 500, u'created': u'2019-11-13T15:57:13Z'} When listing instance actions I expected to see a "create" action but there were none: $ nova instance-action-list 008a7d52-dd83-4f52-a720-b3cfcc498259 +++-+++ | Action | Request_ID | Message | Start_Time | Updated_At | +++-+++ +++-+++ This is because the "create" action is only created when the instance is scheduled to a specific cell: https://github.com/openstack/nova/blob/20.0.0/nova/conductor/manager.py#L1460 Solution: The ComputeTaskManager._bury_in_cell0 method should also create a "create" action in cell0 like it does for the instance BDMs and tags. This goes back to Ocata: https://review.opendev.org/#/c/319379/ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1852458/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1824858] Re: nova instance remnant left behind after cold migration completes
** Changed in: nova/stein Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1824858 Title: nova instance remnant left behind after cold migration completes Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Committed Status in StarlingX: Fix Released Bug description: Brief Description - After cold migration to a new worker node, instances remnants are left behind Severity standard Steps to Reproduce -- worker nodes compute-1 and compute-2 have label remote-storage enabled 1. Launch instance on compute-1 2. cold migrate to compute-2 3. confirm cold migration to complete Expected Behavior -- Migration to compute-2 and cleanup on files on compute-1 Actual Behavior At 16:35:24 cold migration for instance a416ead6-a17f-4bb9-9a96-3134b426b069 completed to compute-2 but the following path is left behind on compute-1 compute-1:/var/lib/nova/instances/a416ead6-a17f-4bb9-9a96-3134b426b069 compute-1:/var/lib/nova/instances$ ls a416ead6-a17f-4bb9-9a96-3134b426b069 _base locks a416ead6-a17f-4bb9-9a96-3134b426b069_resize compute_nodes lost+found compute-1:/var/lib/nova/instances$ ls a416ead6-a17f-4bb9-9a96-3134b426b069 _base compute_nodes locks lost+found compute-1:/var/lib/nova/instances$ ls a416ead6-a17f-4bb9-9a96-3134b426b069 _base compute_nodes locks lost+found 2019-04-15T16:35:24.646749clear 700.010 Instance tenant2-migration_test-1 owned by tenant2 has been cold-migrated to host compute-2 waiting for confirmation tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069 critical 2019-04-15T16:35:24.482575log 700.168 Cold-Migrate-Confirm complete for instance tenant2-migration_test-1 enabled on host compute-2 tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069 critical 2019-04-15T16:35:16.815223log 700.163 Cold-Migrate-Confirm issued by tenant2 against instance tenant2-migration_test-1 owned by tenant2 on host compute-2 tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069 critical 2019-04-15T16:35:10.030068clear 700.009 Instance tenant2-migration_test-1 owned by tenant2 is cold migrating from host compute-1 tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069 critical 2019-04-15T16:35:09.971414set 700.010 Instance tenant2-migration_test-1 owned by tenant2 has been cold-migrated to host compute-2 waiting for confirmation tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069 critical 2019-04-15T16:35:09.970212log 700.162 Cold-Migrate complete for instance tenant2-migration_test-1 now enabled on host compute-2 waiting for confirmation tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069 critical 2019-04-15T16:34:51.637687set 700.009 Instance tenant2-migration_test-1 owned by tenant2 is cold migrating from host compute-1 tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069 critical 2019-04-15T16:34:51.637636log 700.158 Cold-Migrate inprogress for instance tenant2-migration_test-1 from host compute-1 tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069 critical 2019-04-15T16:34:51.478442log 700.157 Cold-Migrate issued by tenant2 against instance tenant2-migration_test-1 owned by tenant2 from host compute-1 tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069 critical 2019-04-15T16:34:20.181155log 700.101 Instance tenant2-migration_test-1 is enabled on host compute-1 tenant=7f1d4223-3341-428a-9188-55614770e676.instance=a416ead6-a17f-4bb9-9a96-3134b426b069 critical see nova-compute.log (compute-1) compute-1 nova-compute log [instance: a416ead6-a17f-4bb9-9a96-3134b426b069 claimed and spawned here on compute-1] {"log":"2019-04-15 16:34:04,617.617 60908 INFO nova.compute.claims [req-f1195bbb-d5b0-4a75-a598-ff287d247643 3fd3229d3e6248cf9b5411b2ecec86e9 7f1d42233341428a918855614770e676 - default default]
[Yahoo-eng-team] [Bug 1878024] Re: disk usage of the nova image cache is not counted as used disk space
** Changed in: nova/stein Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1878024 Title: disk usage of the nova image cache is not counted as used disk space Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Committed Status in OpenStack Compute (nova) ussuri series: Fix Committed Bug description: Description === The nova-compute service keeps a local image cache for glance images used for nova servers to avoid multiple download of the same image from glance. The disk usage of such cache is not calculated as local disk usage in nova and not reported to placement as used DISK_GB. This leads to disk over-allocation. Also the size of that cache cannot be limited by nova configuration so the deployer cannot reserve disk space for that cache with reserved_host_disk_mb config. Steps to reproduce == * Set up a single node devstack * Create and upload an image with a not too small physical size. Like an image with 1G physical size. * Check the current disk usage of the Host OS and configure reserved_host_disk_mb in nova-cpu.conf accordingly. * Boot two servers from that image with a flavor, like d1 (disk=5G) * Nova will download the glance image once to the local cache which result in a 1GB disk usage * Nova will create two root file systems, one for each VM. Those disks initially has minimal physical disk size, but has 5G virtual size. * At this point Nova allocated 5G + 5G of DISK_GB in placement, but due to the image in the cache the total disk usage of the two VMs + cache can be 5G + 5G + 1G, if both VMs overwrite and fills the content of its own disk. Expected result === Option A) Nova maintains a DISK_GB allocation in placement for the images in its cache. This way the expected DISK_GB allocation in placement is 5G + 5G + 1G at the end Option B) Nova provides a config option to limit the maximum size of the image cache and therefore the deployer can include the maximum image cache size into the reserved_host_disk_mb during dimensioning of the disk space of the compute. Actual result = Only 5G + 5G was allocation from placement. So disk space is over-allocated by the image cache. Environment === Devstack from recent master stack@aio:/opt/stack/nova$ git log --oneline | head -n 1 4b62c90063 Merge "Remove stale nested backport from InstancePCIRequests" libvirt driver with file based image backend Logs & Configs == http://paste.openstack.org/show/793388/ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1878024/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1878583] Re: Unable to createImage/snapshot paused volume backed instances
** Changed in: nova/stein Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1878583 Title: Unable to createImage/snapshot paused volume backed instances Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Committed Bug description: Description === Unable to createImage/snapshot paused volume backed instances. Steps to reproduce == - Pause a volume backed instance - Attempt to snapshot the instance using the createImage API Expected result === A snapshot image is successfully created as is the case for paused instances that are not volume backed. Actual result = n-api returns the following error: {'code': 409, 'message': "Cannot 'createImage' instance bc5a7ae4-fca9-4d83-b1b8-5534f51a9404 while it is in vm_state paused"} Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ master 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? N/A 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs == As above. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1878583/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1878979] Re: Quota code does not respect [api]/instance_list_per_project_cells
** Changed in: nova/stein Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1878979 Title: Quota code does not respect [api]/instance_list_per_project_cells Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Committed Status in OpenStack Compute (nova) ussuri series: Fix Committed Bug description: The function which counts resources using the legacy method involves getting a list of all cell mappings assigned to a specific project: https://github.com/openstack/nova/blob/575a91ff5be79ac35aef4b61d84c78c693693304/nova/quota.py#L1170-L1209 This code can be very heavy on a database which contains a lot of instances (but not a lot of mappings), potentially scanning millions of rows to gather 1-2 cell mappings. In a single cell environment, it is just extra CPU usage with exactly the same outcome. The [api]/instance_list_per_project_cells was introduced to workaround this: https://github.com/openstack/nova/blob/575a91ff5be79ac35aef4b61d84c78c693693304/nova/compute/instance_list.py#L146-L153 However, the quota code does not implement it which means quota count take a big toll on the database server. We should ideally mirror the same behaviour in the quota code. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1878979/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1879964] Re: Invalid value for 'hw:mem_page_size' raises confusing error
** Also affects: nova/train Importance: Undecided Status: New ** Also affects: nova/ussuri Importance: Undecided Status: New ** Changed in: nova/train Status: New => Fix Released ** Changed in: nova/ussuri Status: New => Fix Committed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1879964 Title: Invalid value for 'hw:mem_page_size' raises confusing error Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Committed Bug description: Configure a flavor like so: openstack flavor create hugepage --ram 1024 --disk 10 --vcpus 1 test openstack flavor set hugepage --property hw:mem_page_size=2M test Attempt to boot an instance. It will fail with the following error message: Invalid memory page size '0' (HTTP 400) (Request-ID: req- 338bf619-3a54-45c5-9c59-ad8c1d425e91) You wouldn't know from reading it, but this is because the property should read 'hw:mem_page_size=2MB' (note the extra 'B'). To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1879964/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1882233] Re: Libvirt driver always reports 'memory_mb_used' of 0
** Also affects: nova/ussuri Importance: Undecided Status: New ** Also affects: nova/train Importance: Undecided Status: New ** Changed in: nova/train Status: New => Fix Released ** Changed in: nova/ussuri Status: New => Fix Committed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1882233 Title: Libvirt driver always reports 'memory_mb_used' of 0 Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Committed Bug description: The nova-compute service periodically logs a summary of the free RAM, disk and vCPUs as reported by the hypervisor. For example: Hypervisor/Node resource view: name=vtpm-f31.novalocal free_ram=7960MB free_disk=11.379043579101562GB free_vcpus=7 pci_devices=[{...}] On a recent deployment using the libvirt driver, it's observed that the 'free_ram' value never changes despite instances being created and destroyed. This is because the 'get_memory_mb_used' function in 'nova.virt.libvirt.host' always returns 0 unless the host platform - reported by 'sys.platform' is either 'linux2' or 'linux3'. Since Python 3.3, the major version is not included in this return value since it was misleading. This is low priority because the value only appears to be used for logging purposes and the values stored in e.g. the 'ComputeNode' object and reported to placement are calculated based on config options and number of instances on the node. We may wish to stop reporting this information instead. [1] https://stackoverflow.com/a/10429736/613428 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1882233/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1884214] Re: reserve disk usage for image cache fails on a fresh hypervisor
** Changed in: nova/stein Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1884214 Title: reserve disk usage for image cache fails on a fresh hypervisor Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Committed Bug description: If the image cache _base directory does not exists on the hypervisor yet and [workarounds]/reserve_disk_resource_for_image_cache = True is set in the nova compute config then nova-compute logs a stack trace [1] and resource state is not update in placement. [1] http://paste.openstack.org/show/794993/ This issue was reported originally in https://bugs.launchpad.net/nova/+bug/1878024 by MarkMielke (mark- mielke). To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1884214/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1882919] Re: e1000e interface reported as unsupported
** Changed in: nova/stein Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1882919 Title: e1000e interface reported as unsupported Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Committed Bug description: Per this downstream bug [1], attempting to boot a Windows Server 2012 or 2016 image will fail because libosinfo is attempting to configure an e1000e VIF which nova does not explicitly support. There doesn't appear to be any reason not to support this, since libvirt, and specifically QEMU/KVM, support it. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1839808 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1882919/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1550919] Re: [Libvirt]Evacuate fail may cause disk image be deleted
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1550919 Title: [Libvirt]Evacuate fail may cause disk image be deleted Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Committed Bug description: I checked latest source of nova on master branch, this problem is still exists. When we are doing evacuate, eventually _do_rebuild_instance will be called. As rebuild is not implemented in libvirt driver, in fact _rebuild_default_impl is called. try: with instance.mutated_migration_context(): self.driver.rebuild(**kwargs) except NotImplementedError: # NOTE(rpodolyaka): driver doesn't provide specialized version # of rebuild, fall back to the default implementation self._rebuild_default_impl(**kwargs) _rebuild_default_impl will call self.driver.spawn to boot up the instance, and spawn will in turn call _create_domain_and_network when VirtualInterfaceCreateException or Timeout happen, self.cleanup will be called. except exception.VirtualInterfaceCreateException: # Neutron reported failure and we didn't swallow it, so # bail here with excutils.save_and_reraise_exception(): if guest: guest.poweroff() self.cleanup(context, instance, network_info=network_info, block_device_info=block_device_info) except eventlet.timeout.Timeout: # We never heard from Neutron LOG.warn(_LW('Timeout waiting for vif plugging callback for ' 'instance %(uuid)s'), {'uuid': instance.uuid}, instance=instance) if CONF.vif_plugging_is_fatal: if guest: guest.poweroff() self.cleanup(context, instance, network_info=network_info, block_device_info=block_device_info) raise exception.VirtualInterfaceCreateException() Because default value for parameter destroy_disks is True def cleanup(self, context, instance, network_info, block_device_info=None, destroy_disks=True, migrate_data=None, destroy_vifs=True): So if error occur when doing evacuate during wait neutron's event, instance's disk file will be deleted unexpectedly To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1550919/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1818798] Re: Should not skip volume_size check for bdm.image_id == image_ref case
** Changed in: nova/stein Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1818798 Title: Should not skip volume_size check for bdm.image_id == image_ref case Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) stein series: Fix Released Bug description: The volume size should be checked in bdm.sourece_type=image, dest_type=volume case no matter what the image is, but in: https://github.com/openstack/nova/blob/5a09c81af3b438ecbcf27fa653095ff55abb3ed4/nova/compute/api.py#L1452-L1453 we skipped the check if the bdm.image_id == image_ref, it was meant to skip only the _get_image() check as it is already checked before, but it skipped the volume_size check too. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1818798/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1882919] Re: e1000e interface reported as unsupported
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1882919 Title: e1000e interface reported as unsupported Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) stein series: Fix Committed Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Committed Bug description: Per this downstream bug [1], attempting to boot a Windows Server 2012 or 2016 image will fail because libosinfo is attempting to configure an e1000e VIF which nova does not explicitly support. There doesn't appear to be any reason not to support this, since libvirt, and specifically QEMU/KVM, support it. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1839808 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1882919/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1878583] Re: Unable to createImage/snapshot paused volume backed instances
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1878583 Title: Unable to createImage/snapshot paused volume backed instances Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Committed Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Committed Bug description: Description === Unable to createImage/snapshot paused volume backed instances. Steps to reproduce == - Pause a volume backed instance - Attempt to snapshot the instance using the createImage API Expected result === A snapshot image is successfully created as is the case for paused instances that are not volume backed. Actual result = n-api returns the following error: {'code': 409, 'message': "Cannot 'createImage' instance bc5a7ae4-fca9-4d83-b1b8-5534f51a9404 while it is in vm_state paused"} Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ master 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? N/A 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs == As above. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1878583/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1843708] Re: Key-pair is not updated during the rebuild
** Changed in: nova/stein Status: Fix Committed => Fix Released ** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1843708 Title: Key-pair is not updated during the rebuild Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Committed Bug description: Description === When we want to rebuild an instance and change the keypair we can specified it with : openstack --os-compute-api-version 2.54 server rebuild --image "Debian 10" --key-name key1 instance1 This comes from this implementation : https://review.opendev.org/#/c/379128/ https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/rebuild-keypair-reset.html But when rebuilding the instance, Cloud-Init will set the key in authorized_keys from http://169.254.169.254/openstack/latest/meta_data.json And this meta_data.json uses the keys from instance_extra tables But the keypair will be updated in the 'instances' table but not in the 'instance_extra' table. So the keypair is not updated inside the VM May be this is the function for saving the keypair, but the save() do nothing : https://opendev.org/openstack/nova/src/branch/master/nova/objects/instance.py#L714 Steps to reproduce == - Deploy a DevStack - Boot an instance with keypair key1 - Rebuild it with key2 - A nova show will show the key_name key2, keypairs object in table instance_extra is not updated and you cannot connect with key2 to the instance Expected result === Connecte to the Vm with the new keypair added during the rebuild call Actual result = The keypair added during the rebuild call is not set in the VM Environment === I tested it on a Devstack from master and we have the behaviour. NOVA : commit 5fa49cd0b8b6015aa61b4312b2ce1ae780c42c64 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1843708/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1825584] Re: eventlet monkey-patching breaks AMQP heartbeat on uWSGI
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1825584 Title: eventlet monkey-patching breaks AMQP heartbeat on uWSGI Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) stein series: In Progress Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Bug description: Stein nova-api running under uWSGI presents an AMQP issue. The first API call that requires RPC creates an AMQP connection and successfully completes. Normally regular heartbeats would be sent from this point on, to maintain the connection. This is not happening. After a few minutes, the AMQP server (rabbitmq, in my case) notices that there have been no heartbeats, and drops the connection. A later nova API call that requires RPC tries to use the old connection, and throws a "connection reset by peer" exception and the API call fails. A mailing-list response suggests that this is affecting mod_wsgi also: http://lists.openstack.org/pipermail/openstack- discuss/2019-April/005310.html I've discovered that this problem seems to be caused by eventlet monkey-patching, which was introduced in: https://github.com/openstack/nova/commit/23ba1c690652832c655d57476630f02c268c87ae It was later rearranged in: https://github.com/openstack/nova/commit/3c5e2b0e9fac985294a949852bb8c83d4ed77e04 but this problem remains. If I comment out the import of nova.monkey_patch in nova/api/openstack/__init__.py the problem goes away. Seems that eventlet monkey-patching and uWSGI are not getting along for some reason... To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1825584/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1884214] Re: reserve disk usage for image cache fails on a fresh hypervisor
** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1884214 Title: reserve disk usage for image cache fails on a fresh hypervisor Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) stein series: Fix Committed Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Committed Bug description: If the image cache _base directory does not exists on the hypervisor yet and [workarounds]/reserve_disk_resource_for_image_cache = True is set in the nova compute config then nova-compute logs a stack trace [1] and resource state is not update in placement. [1] http://paste.openstack.org/show/794993/ This issue was reported originally in https://bugs.launchpad.net/nova/+bug/1878024 by MarkMielke (mark- mielke). To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1884214/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1887946] Re: Unable to detach volume from instance when previously removed from the inactive config
** Changed in: nova/stein Status: Fix Committed => Fix Released ** Changed in: nova/train Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1887946 Title: Unable to detach volume from instance when previously removed from the inactive config Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: New Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Committed Bug description: Description === $subject, can often be encountered when previous attempts to detach a volume have failed due to the device still being used within the guestOS. This initial attempt will remove the device from the inactive config but fail to remove it from the active config. Any subsequent attempt will then fail as the initial call continues to attempt to remove the device from both the inactive and live configs. Prior to libvirt v4.1.0 this raised either a VIR_ERR_INVALID_ARG or VIR_ERR_OPERATION_FAILED error code from libvirt that n-cpu would handle, retrying the detach against the live config. Since libvirt v4.1.0 however this now raises a VIR_ERR_DEVICE_MISSING error code. This is not handled by Nova resulting in no attempt being made to detach the device from the live config. Steps to reproduce == # Start with a volume attached as vdb (ignore the source ;)) $ sudo virsh domblklist 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8 Target Source vda /opt/stack/data/nova/instances/4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8/disk vdb iqn.2010-10.org.openstack:volume-37cc97fa-9776-4b31-8f3f-cb1f18ff1db6/0 # Detach from the inactive config $ sudo virsh detach-disk --config 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8 vdb Disk detached successfully # Confirm the device is still listed on the live config $ sudo virsh domblklist 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8 Target Source vda /opt/stack/data/nova/instances/4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8/disk vdb iqn.2010-10.org.openstack:volume-37cc97fa-9776-4b31-8f3f-cb1f18ff1db6/0 # and removed from the persistent config $ sudo virsh domblklist --inactive 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8 Target Source vda /opt/stack/data/nova/instances/4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8/disk # Attempt to detach the volume $ openstack server remove volume 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8 test Expected result === The initial attempt to detach the device fails as the device isn't present in the inactive config but we continue to ensure the device is removed from the live config. Actual result = n-cpu doesn't handle the initial failure as the raised libvirt error code isn't recongnised. Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ b7161fe9b92f0045e97c300a80e58d32b6f49be1 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? libvirt + KVM 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs == $ openstack server remove volume 4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8 test ; journalctl -u devstack@n-cpu -f [..] Jul 16 17:26:53 localhost.localdomain nova-compute[190210]: DEBUG oslo_concurrency.lockutils [None req-16d62ef9-d492-4012-bb6d-37e5611ede50 admin admin] Lock "4b1a0828-8dcc-4b73-a05e-5b50cb62c8f8" released by "nova.compute.manager.ComputeManager.detach_volume..do_detach_volume" :: held 0.141s {{(pid=190210) inner /usr/local/lib/python3.7/site-packages/oslo_concurrency/lockutils.py:371}} Jul 16 17:26:53 localhost.localdomain nova-compute[190210]: ERROR oslo_messaging.rpc.server [None req-16d62ef9-d492-4012-bb6d-37e5611ede50 admin admin] Exception during message handling: libvirt.libvirtError: device not found: no target device vdb Jul 16 17:26:53 localhost.localdomain nova-compute[190210]: ERROR oslo_messaging.rpc.server Traceback (most recent call last): Jul 16 17:26:53 localhost.localdomain nova-compute[190210]:
[Yahoo-eng-team] [Bug 1889108] Re: failures during driver.pre_live_migration remove source attachments during rollback
** Changed in: nova/stein Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1889108 Title: failures during driver.pre_live_migration remove source attachments during rollback Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in OpenStack Compute (nova) stein series: Fix Released Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Committed Bug description: Description === $subject, the initial rollback and removal of any destination volume attachments is then repeated for the source volume attachments, leaving the volumes connected on the host but listed as `available` in cinder. Steps to reproduce == Cause a failure during the call to driver.pre_live_migration with volumes attached. Expected result === Any volume attachments for the destination host are deleted during the rollback. Actual result = Both sets of volumes attachments for the destination *and* the source are removed. Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ eeeb964a5f65e6ac31dfb34b1256aaf95db5ba3a 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? libvirt + KVM 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? N/A 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs == When live-migration fails with attached volume changed to active and still in nova https://bugzilla.redhat.com/show_bug.cgi?id=1860914 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1889108/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp