[Yahoo-eng-team] [Bug 1910663] Re: pci device duplicate attach after intance evacuated

2021-07-17 Thread Launchpad Bug Tracker
[Expired for OpenStack Compute (nova) because there has been no activity
for 60 days.]

** Changed in: nova
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1910663

Title:
  pci device duplicate attach after intance evacuated

Status in OpenStack Compute (nova):
  Expired

Bug description:
  my openstack version is openstack-nova-compute-22.0.1 
  after I evacuated an instance  use:
   nova evacuate 837e283a-4288-44c1-b1e8-846fb0488c9c

  I found this instance's pci device duplicated in virsh xml ,here are:

  
    
    
    
  
  
    
    
  
    
    
    
  
  
    
    
  
    
    
    
  

  it suppose to be only one PCI passthough device. then I try more times,every 
time nova add a new pci passthough
   device to this instance during evacuate.the database finally like this. :

  deleted|id |compute_node_id|address |vendor_id|dev_type|dev_id  
|label  |status   |uuid|
  
---|---|---||-|||---|-||
    0|726|189|:85:00.0|10de 
|type-PCI|pci__85_00_0|label_10de_1e89|allocated|fb211d66-e245-4de5-baec-8686a0b3fb9b|
    0|747|195|:03:00.0|10de 
|type-PCI|pci__03_00_0|label_10de_1e89|allocated|fa0d92ff-273d-4a70-967d-9cf934b41f2c|
    0|828|216|:02:00.0|10de 
|type-PCI|pci__02_00_0|label_10de_1e89|allocated|ae20691f-a2bf-43d4-a630-d9cd5db2168b|
    0|915|237|:03:00.0|10de 
|type-PCI|pci__03_00_0|label_10de_1e89|allocated|6281ec51-decb-426c-b906-c788181bdd01|

  so this instance now has 4 pci passthough device from four different
  hosts.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1910663/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1928299] Re: centos7 train vm live migration stops network on vm for some minutes

2021-07-17 Thread Launchpad Bug Tracker
[Expired for neutron train because there has been no activity for 60
days.]

** Changed in: neutron/train
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1928299

Title:
  centos7 train vm live migration stops network on vm for some minutes

Status in neutron:
  Expired
Status in neutron train series:
  Expired

Bug description:
  Hello, I have upgraded my centos 7 openstack installation from Stein to Train.
  On train I am facing an issue with live migration:
  when a vm is migrated from one kvm node to another, it stops to respond to 
ping requests from some minutes.
  I had the same issue on Stein and I resolved it with a workaround suggest by 
Sean Mooney where legacy port binding was used. 

  On train seems there aren't backported patches to solve the issue.

  I enabled debug option on neutron and here there is the dhcp-agent.log from 
the exact time when the live migration started:
  http://paste.openstack.org/show/805325/

  Here there is the openvswitch-agent log from the source kvm node:

  http://paste.openstack.org/show/805327/

  Here there is the openvswich agent log from the destination kvm node:

  http://paste.openstack.org/show/805329/

  
  I am using openvswitch mechanism driver and iptables_hybrid firewall driver.

  Please any help will be appreciated 
  Ignazio

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1928299/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1928299] Re: centos7 train vm live migration stops network on vm for some minutes

2021-07-17 Thread Launchpad Bug Tracker
[Expired for neutron because there has been no activity for 60 days.]

** Changed in: neutron
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1928299

Title:
  centos7 train vm live migration stops network on vm for some minutes

Status in neutron:
  Expired
Status in neutron train series:
  Expired

Bug description:
  Hello, I have upgraded my centos 7 openstack installation from Stein to Train.
  On train I am facing an issue with live migration:
  when a vm is migrated from one kvm node to another, it stops to respond to 
ping requests from some minutes.
  I had the same issue on Stein and I resolved it with a workaround suggest by 
Sean Mooney where legacy port binding was used. 

  On train seems there aren't backported patches to solve the issue.

  I enabled debug option on neutron and here there is the dhcp-agent.log from 
the exact time when the live migration started:
  http://paste.openstack.org/show/805325/

  Here there is the openvswitch-agent log from the source kvm node:

  http://paste.openstack.org/show/805327/

  Here there is the openvswich agent log from the destination kvm node:

  http://paste.openstack.org/show/805329/

  
  I am using openvswitch mechanism driver and iptables_hybrid firewall driver.

  Please any help will be appreciated 
  Ignazio

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1928299/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1892361] Re: SRIOV instance gets type-PF interface, libvirt kvm fails

2021-07-17 Thread Billy Olsen
Queens and Rocky are both extended maintenance and have had the proposed
patches merged. Updating tasks to mark as fix released.

** Changed in: nova/rocky
   Status: New => Fix Released

** Changed in: nova/queens
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1892361

Title:
  SRIOV instance gets type-PF interface, libvirt kvm fails

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive queens series:
  Fix Released
Status in Ubuntu Cloud Archive rocky series:
  Fix Released
Status in Ubuntu Cloud Archive stein series:
  Fix Released
Status in Ubuntu Cloud Archive train series:
  Fix Released
Status in Ubuntu Cloud Archive ussuri series:
  Fix Released
Status in Ubuntu Cloud Archive victoria series:
  Fix Released
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Released
Status in OpenStack Compute (nova) rocky series:
  Fix Released
Status in OpenStack Compute (nova) stein series:
  Fix Committed
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released
Status in OpenStack Compute (nova) victoria series:
  Fix Released
Status in nova package in Ubuntu:
  Fix Released
Status in nova source package in Bionic:
  Fix Released
Status in nova source package in Focal:
  Fix Released
Status in nova source package in Groovy:
  Fix Released
Status in nova source package in Hirsute:
  Fix Released

Bug description:
  When spawning an SR-IOV enabled instance on a newly deployed host,
  nova attempts to spawn it with an type-PF pci device. This fails with
  the below stack trace.

  After restarting neutron-sriov-agent and nova-compute services on the
  compute node and spawning an SR-IOV instance again, a type-VF pci
  device is selected, and instance spawning succeeds.

  Stack trace:
  2020-08-20 08:29:09.558 7624 DEBUG oslo_messaging._drivers.amqpdriver [-] 
received reply msg_id: 6db8011e6ecd4fd0aaa53c8f89f08b1b __call__ 
/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:400
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager 
[req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 
dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 
015e4fd7db304665ab5378caa691bb8b] [insta
  nce: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Instance failed to spawn: 
libvirtError: unsupported configuration: Interface type hostdev is currently 
supported on SR-IOV Virtual Functions only
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11] Traceback (most recent call last):
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11]   File 
"/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2274, in 
_build_resources
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11] yield resources
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11]   File 
"/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2054, in 
_build_and_run_instance
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11] block_device_info=block_device_info)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11]   File 
"/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3147, in 
spawn
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11] destroy_disks_on_failure=True)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11]   File 
"/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5651, in 
_create_domain_and_network
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11] destroy_disks_on_failure)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11]   File 
"/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11] self.force_reraise()
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11]   File 
"/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11] six.reraise(self.type_, self.value, 
self.tb)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-f

[Yahoo-eng-team] [Bug 1936720] [NEW] new instance gets stuck indefinitely at build state with task_state none

2021-07-17 Thread Siavash Sardari
Public bug reported:

Description
===
nova-compute service is up but does not work.

new instances which get scheduled on that compute node will stuck at build 
state with task_state none,
and it doesn't go to ERROR state even after it reaches intance build timeout 
threshold.

(openstack) server show 9299bee1-633d-4233-9f2b-9a7d1871d51b
+-++
| Field   | Value   
   |
+-++
| OS-DCF:diskConfig   | AUTO
   |
| OS-EXT-AZ:availability_zone | nova
   |
| OS-EXT-SRV-ATTR:host| None
   |
| OS-EXT-SRV-ATTR:hypervisor_hostname | None
   |
| OS-EXT-SRV-ATTR:instance_name   | instance-bfb6   
   |
| OS-EXT-STS:power_state  | NOSTATE 
   |
| OS-EXT-STS:task_state   | None
   |
| OS-EXT-STS:vm_state | building
   |
| OS-SRV-USG:launched_at  | None
   |
| OS-SRV-USG:terminated_at| None
   |
| accessIPv4  | 
   |
| accessIPv6  | 
   |
| addresses   | 
   |
| config_drive| 
   |
| created | 2021-07-17T11:49:35Z
   |
| flavor  | i1.mini 
(75253a8f-eb7c-4473-9874-884a01a524a7) |
| hostId  | 
   |
| id  | 9299bee1-633d-4233-9f2b-9a7d1871d51b
   |
| image   | 
   |
| key_name| Sia-KP  
   |
| name| qwerty-17   
   |
| progress| 0   
   |
| project_id  | c4a93f6c1c194bf78bd98ee0f4d51978
   |
| properties  | 
   |
| status  | BUILD   
   |
| updated | 2021-07-17T11:49:41Z
   |
| user_id | 042131e0784b46218521eee7963022bf
   |
| volumes_attached| 
   |
+-++


I have two OpenStack setups (staging and production). this issue happens on 
both of them but randomly on 
different compute nodes. both setups are stable/ussuri release and deployed 
using openstack-ansible.

there were no error in nova logs, I enabled debug on nova services, it cought 
my eye that on the corrupted
compute node, the logs got stopped sometime before this problem occurs.

compute service list, while this issue happens. (CP-12 is the corrupted
compute node)

(openstack) compute service list
+-++---+--+-+---++
|  ID | Binary | Host  | Zone | 
Status  | State | Updated At |
+-++---+--+-+---++
|   7 | nova-conductor | SHN-CN-61-nova-api-container-b11ef08e | internal | 
enabled | up| 2021-07-17T14:23:45.00 |
|  34 | nova-scheduler | SHN-CN-61-nova-api-container-b11ef08e | internal | 
enabled | up| 2021-07-17T14:23:43.00 |
|  85 | nova-conductor | SHN-CN-63-nova-api-container-e4f37374 | internal | 
enabled | up| 2021-07-17T14:23:41.00 |
|  91 | nova-conductor | SHN-CN-62-nova-api-container-71ffd912 | internal | 
enabled | up| 2021-07-17T14:23:45.00 |
| 109 | nova-scheduler | SHN-CN-63-nova-api-container-e4f37374 | internal | 
enabled | up| 2021-07-17T14:23:41.00 |
| 157 | nova-scheduler | SHN-CN-62-nova-api-container-71ffd912 | internal | 
enabled | up| 2021-07-17T14:23:45.00 |
| 199 | nova-compute   | SHN-CP-72 | nova | 
enabled | up| 2021-07-17T14:23:41.00 |
.
.
.
| 232 | nova-comp