[Yahoo-eng-team] [Bug 1910663] Re: pci device duplicate attach after intance evacuated
[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.] ** Changed in: nova Status: Incomplete => Expired -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1910663 Title: pci device duplicate attach after intance evacuated Status in OpenStack Compute (nova): Expired Bug description: my openstack version is openstack-nova-compute-22.0.1 after I evacuated an instance use: nova evacuate 837e283a-4288-44c1-b1e8-846fb0488c9c I found this instance's pci device duplicated in virsh xml ,here are: it suppose to be only one PCI passthough device. then I try more times,every time nova add a new pci passthough device to this instance during evacuate.the database finally like this. : deleted|id |compute_node_id|address |vendor_id|dev_type|dev_id |label |status |uuid| ---|---|---||-|||---|-|| 0|726|189|:85:00.0|10de |type-PCI|pci__85_00_0|label_10de_1e89|allocated|fb211d66-e245-4de5-baec-8686a0b3fb9b| 0|747|195|:03:00.0|10de |type-PCI|pci__03_00_0|label_10de_1e89|allocated|fa0d92ff-273d-4a70-967d-9cf934b41f2c| 0|828|216|:02:00.0|10de |type-PCI|pci__02_00_0|label_10de_1e89|allocated|ae20691f-a2bf-43d4-a630-d9cd5db2168b| 0|915|237|:03:00.0|10de |type-PCI|pci__03_00_0|label_10de_1e89|allocated|6281ec51-decb-426c-b906-c788181bdd01| so this instance now has 4 pci passthough device from four different hosts. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1910663/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1928299] Re: centos7 train vm live migration stops network on vm for some minutes
[Expired for neutron train because there has been no activity for 60 days.] ** Changed in: neutron/train Status: Incomplete => Expired -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1928299 Title: centos7 train vm live migration stops network on vm for some minutes Status in neutron: Expired Status in neutron train series: Expired Bug description: Hello, I have upgraded my centos 7 openstack installation from Stein to Train. On train I am facing an issue with live migration: when a vm is migrated from one kvm node to another, it stops to respond to ping requests from some minutes. I had the same issue on Stein and I resolved it with a workaround suggest by Sean Mooney where legacy port binding was used. On train seems there aren't backported patches to solve the issue. I enabled debug option on neutron and here there is the dhcp-agent.log from the exact time when the live migration started: http://paste.openstack.org/show/805325/ Here there is the openvswitch-agent log from the source kvm node: http://paste.openstack.org/show/805327/ Here there is the openvswich agent log from the destination kvm node: http://paste.openstack.org/show/805329/ I am using openvswitch mechanism driver and iptables_hybrid firewall driver. Please any help will be appreciated Ignazio To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1928299/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1928299] Re: centos7 train vm live migration stops network on vm for some minutes
[Expired for neutron because there has been no activity for 60 days.] ** Changed in: neutron Status: Incomplete => Expired -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1928299 Title: centos7 train vm live migration stops network on vm for some minutes Status in neutron: Expired Status in neutron train series: Expired Bug description: Hello, I have upgraded my centos 7 openstack installation from Stein to Train. On train I am facing an issue with live migration: when a vm is migrated from one kvm node to another, it stops to respond to ping requests from some minutes. I had the same issue on Stein and I resolved it with a workaround suggest by Sean Mooney where legacy port binding was used. On train seems there aren't backported patches to solve the issue. I enabled debug option on neutron and here there is the dhcp-agent.log from the exact time when the live migration started: http://paste.openstack.org/show/805325/ Here there is the openvswitch-agent log from the source kvm node: http://paste.openstack.org/show/805327/ Here there is the openvswich agent log from the destination kvm node: http://paste.openstack.org/show/805329/ I am using openvswitch mechanism driver and iptables_hybrid firewall driver. Please any help will be appreciated Ignazio To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1928299/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1892361] Re: SRIOV instance gets type-PF interface, libvirt kvm fails
Queens and Rocky are both extended maintenance and have had the proposed patches merged. Updating tasks to mark as fix released. ** Changed in: nova/rocky Status: New => Fix Released ** Changed in: nova/queens Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1892361 Title: SRIOV instance gets type-PF interface, libvirt kvm fails Status in Ubuntu Cloud Archive: Fix Released Status in Ubuntu Cloud Archive queens series: Fix Released Status in Ubuntu Cloud Archive rocky series: Fix Released Status in Ubuntu Cloud Archive stein series: Fix Released Status in Ubuntu Cloud Archive train series: Fix Released Status in Ubuntu Cloud Archive ussuri series: Fix Released Status in Ubuntu Cloud Archive victoria series: Fix Released Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Released Status in OpenStack Compute (nova) rocky series: Fix Released Status in OpenStack Compute (nova) stein series: Fix Committed Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Status in OpenStack Compute (nova) victoria series: Fix Released Status in nova package in Ubuntu: Fix Released Status in nova source package in Bionic: Fix Released Status in nova source package in Focal: Fix Released Status in nova source package in Groovy: Fix Released Status in nova source package in Hirsute: Fix Released Bug description: When spawning an SR-IOV enabled instance on a newly deployed host, nova attempts to spawn it with an type-PF pci device. This fails with the below stack trace. After restarting neutron-sriov-agent and nova-compute services on the compute node and spawning an SR-IOV instance again, a type-VF pci device is selected, and instance spawning succeeds. Stack trace: 2020-08-20 08:29:09.558 7624 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 6db8011e6ecd4fd0aaa53c8f89f08b1b __call__ /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:400 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 015e4fd7db304665ab5378caa691bb8b] [insta nce: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Instance failed to spawn: libvirtError: unsupported configuration: Interface type hostdev is currently supported on SR-IOV Virtual Functions only 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Traceback (most recent call last): 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2274, in _build_resources 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] yield resources 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2054, in _build_and_run_instance 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] block_device_info=block_device_info) 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3147, in spawn 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] destroy_disks_on_failure=True) 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5651, in _create_domain_and_network 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] destroy_disks_on_failure) 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] self.force_reraise() 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] six.reraise(self.type_, self.value, self.tb) 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-f
[Yahoo-eng-team] [Bug 1936720] [NEW] new instance gets stuck indefinitely at build state with task_state none
Public bug reported: Description === nova-compute service is up but does not work. new instances which get scheduled on that compute node will stuck at build state with task_state none, and it doesn't go to ERROR state even after it reaches intance build timeout threshold. (openstack) server show 9299bee1-633d-4233-9f2b-9a7d1871d51b +-++ | Field | Value | +-++ | OS-DCF:diskConfig | AUTO | | OS-EXT-AZ:availability_zone | nova | | OS-EXT-SRV-ATTR:host| None | | OS-EXT-SRV-ATTR:hypervisor_hostname | None | | OS-EXT-SRV-ATTR:instance_name | instance-bfb6 | | OS-EXT-STS:power_state | NOSTATE | | OS-EXT-STS:task_state | None | | OS-EXT-STS:vm_state | building | | OS-SRV-USG:launched_at | None | | OS-SRV-USG:terminated_at| None | | accessIPv4 | | | accessIPv6 | | | addresses | | | config_drive| | | created | 2021-07-17T11:49:35Z | | flavor | i1.mini (75253a8f-eb7c-4473-9874-884a01a524a7) | | hostId | | | id | 9299bee1-633d-4233-9f2b-9a7d1871d51b | | image | | | key_name| Sia-KP | | name| qwerty-17 | | progress| 0 | | project_id | c4a93f6c1c194bf78bd98ee0f4d51978 | | properties | | | status | BUILD | | updated | 2021-07-17T11:49:41Z | | user_id | 042131e0784b46218521eee7963022bf | | volumes_attached| | +-++ I have two OpenStack setups (staging and production). this issue happens on both of them but randomly on different compute nodes. both setups are stable/ussuri release and deployed using openstack-ansible. there were no error in nova logs, I enabled debug on nova services, it cought my eye that on the corrupted compute node, the logs got stopped sometime before this problem occurs. compute service list, while this issue happens. (CP-12 is the corrupted compute node) (openstack) compute service list +-++---+--+-+---++ | ID | Binary | Host | Zone | Status | State | Updated At | +-++---+--+-+---++ | 7 | nova-conductor | SHN-CN-61-nova-api-container-b11ef08e | internal | enabled | up| 2021-07-17T14:23:45.00 | | 34 | nova-scheduler | SHN-CN-61-nova-api-container-b11ef08e | internal | enabled | up| 2021-07-17T14:23:43.00 | | 85 | nova-conductor | SHN-CN-63-nova-api-container-e4f37374 | internal | enabled | up| 2021-07-17T14:23:41.00 | | 91 | nova-conductor | SHN-CN-62-nova-api-container-71ffd912 | internal | enabled | up| 2021-07-17T14:23:45.00 | | 109 | nova-scheduler | SHN-CN-63-nova-api-container-e4f37374 | internal | enabled | up| 2021-07-17T14:23:41.00 | | 157 | nova-scheduler | SHN-CN-62-nova-api-container-71ffd912 | internal | enabled | up| 2021-07-17T14:23:45.00 | | 199 | nova-compute | SHN-CP-72 | nova | enabled | up| 2021-07-17T14:23:41.00 | . . . | 232 | nova-comp