[Bug 1751923] Re: [SRU]_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

2021-09-01 Thread Corey Bryant
This bug was fixed in the package nova - 2:17.0.13-0ubuntu3~cloud0
---

 nova (2:17.0.13-0ubuntu3~cloud0) xenial-queens; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 nova (2:17.0.13-0ubuntu3) bionic; urgency=medium
 .
   * Force refresh instance info_cache during heal (LP: #1751923):
 - d/p/0001-Force-refresh-instance-info_cache-during-heal.patch
 - d/p/0002-remove-deprecated-test_list_vifs_neutron_notimplemented.patch


** Changed in: cloud-archive/queens
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1751923

Title:
  [SRU]_heal_instance_info_cache periodic task bases on port list from
  nova db, not from neutron server

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1751923/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1751923] Re: [SRU]_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

2021-09-01 Thread Corey Bryant
This bug was fixed in the package nova - 2:18.3.0-0ubuntu1~cloud3
---

 nova (2:18.3.0-0ubuntu1~cloud3) bionic-rocky; urgency=medium
 .
   * Force refresh instance info_cache during heal (LP: #1751923):
 - d/p/0001-Force-refresh-instance-info_cache-during-heal.patch


** Changed in: cloud-archive/rocky
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1751923

Title:
  [SRU]_heal_instance_info_cache periodic task bases on port list from
  nova db, not from neutron server

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1751923/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1751923] Re: [SRU]_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

2021-09-01 Thread Edward Hope-Morley
Verified rocky-proposed using [Test Plan] with output as follows:

# apt-cache policy nova-common
nova-common:
  Installed: 2:18.3.0-0ubuntu1~cloud3
  Candidate: 2:18.3.0-0ubuntu1~cloud3
  Version table:
 *** 2:18.3.0-0ubuntu1~cloud3 500
500 http://ubuntu-cloud.archive.canonical.com/ubuntu 
bionic-proposed/rocky/main amd64 Packages
100 /var/lib/dpkg/status
 2:17.0.13-0ubuntu3 500
500 http://nova.clouds.archive.ubuntu.com/ubuntu bionic-updates/main 
amd64 Packages
 2:17.0.10-0ubuntu2.1 500
500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 
Packages
 2:17.0.1-0ubuntu1 500
500 http://nova.clouds.archive.ubuntu.com/ubuntu bionic/main amd64 
Packages

I also tested by manually deleting the network_info for a vm then
waiting for the periodic task to run -
https://pastebin.ubuntu.com/p/7gmZQsvC8H/

** Tags removed: verification-rocky-needed
** Tags added: verification-rocky-done

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1751923

Title:
  [SRU]_heal_instance_info_cache periodic task bases on port list from
  nova db, not from neutron server

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1751923/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1751923] Re: [SRU]_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

2021-08-30 Thread Launchpad Bug Tracker
This bug was fixed in the package nova - 2:17.0.13-0ubuntu3

---
nova (2:17.0.13-0ubuntu3) bionic; urgency=medium

  * Force refresh instance info_cache during heal (LP: #1751923):
- d/p/0001-Force-refresh-instance-info_cache-during-heal.patch
- d/p/0002-remove-deprecated-test_list_vifs_neutron_notimplemented.patch

 -- Jorge Niedbalski   Mon, 17 May 2021
14:25:43 -0400

** Changed in: nova (Ubuntu Bionic)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1751923

Title:
  [SRU]_heal_instance_info_cache periodic task bases on port list from
  nova db, not from neutron server

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1751923/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1751923] Re: [SRU]_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

2021-08-27 Thread Jorge Niedbalski
Hello,

I've verified that this problem doesn't reproduces with the package
contained in proposed.

1) Deployed this bundle of bionic-queens

Upgraded to the following version:


root@juju-51d6ad-1751923-6:/home/ubuntu# dpkg -l | grep nova
ii  nova-api-os-compute  2:17.0.13-0ubuntu3
all  OpenStack Compute - OpenStack Compute API frontend
ii  nova-common  2:17.0.13-0ubuntu3
all  OpenStack Compute - common files
ii  nova-conductor   2:17.0.13-0ubuntu3
all  OpenStack Compute - conductor service
ii  nova-placement-api   2:17.0.13-0ubuntu3
all  OpenStack Compute - placement API frontend
ii  nova-scheduler   2:17.0.13-0ubuntu3
all  OpenStack Compute - virtual machine scheduler
ii  python-nova  2:17.0.13-0ubuntu3
all  OpenStack Compute Python libraries


root@juju-51d6ad-1751923-7:/home/ubuntu# dpkg -l | grep nova
ii  nova-api-metadata2:17.0.13-0ubuntu3 
   all  OpenStack Compute - metadata API frontend
ii  nova-common  2:17.0.13-0ubuntu3 
   all  OpenStack Compute - common files
ii  nova-compute 2:17.0.13-0ubuntu3 
   all  OpenStack Compute - compute node base
ii  nova-compute-kvm 2:17.0.13-0ubuntu3 
   all  OpenStack Compute - compute node (KVM)
ii  nova-compute-libvirt 2:17.0.13-0ubuntu3 
   all  OpenStack Compute - compute node libvirt support
ii  python-nova  2:17.0.13-0ubuntu3 
   all  OpenStack Compute Python libraries
ii  python-novaclient2:9.1.1-0ubuntu1   
   all  client library for OpenStack Compute API - Python 2.7
ii  python3-novaclient   2:9.1.1-0ubuntu1   
   all  client library for OpenStack Compute API - 3.x


root@juju-51d6ad-1751923-6:/home/ubuntu# systemctl status nova*|grep -i active
   Active: active (running) since Fri 2021-08-27 22:02:25 UTC; 1h 7min ago
   Active: active (running) since Fri 2021-08-27 22:02:12 UTC; 1h 8min ago
   Active: active (running) since Fri 2021-08-27 22:02:25 UTC; 1h 7min ago


3) Created a server with 4 private ports, 1 public one.

ubuntu@niedbalski-bastion:~/stsstack-bundles/openstack$ openstack server list
+--+---++---++---+
| ID   | Name  | Status | Networks  
| Image  | 
Flavor|
+--+---++---++---+
| 5843e6b5-e1a7-4208-9f19-1d051c032afb | cirros-232302 | ACTIVE | 
private=192.168.21.22, 192.168.21.6, 192.168.21.10, 192.168.21.13, 10.5.150.1 | 
cirros | m1.cirros |
+--+---++---++---+

ubuntu@niedbalski-bastion:~/stsstack-bundles/openstack$ nova interface-list 
5843e6b5-e1a7-4208-9f19-1d051c032afb 
++--+--+---+---+
| Port State | Port ID  | Net ID
   | IP addresses  | MAC Addr  |
++--+--+---+---+
| ACTIVE | 1680b164-14d7-4d6e-b085-94292ece82cf | 
8d91e266-0925-4c29-8039-0d71862df4fc | 192.168.21.13 | fa:16:3e:cf:f8:c8 |
| ACTIVE | 5865a40e-36fa-4cf9-bd40-85a1e78031f5 | 
8d91e266-0925-4c29-8039-0d71862df4fc | 192.168.21.6  | fa:16:3e:eb:73:b1 |
| ACTIVE | 5f400107-d9eb-4a1b-a37b-3bd034d8f995 | 
8d91e266-0925-4c29-8039-0d71862df4fc | 192.168.21.10 | fa:16:3e:95:9a:78 |
| ACTIVE | b11d1c8e-d42a-41e0-a7ad-e34a7a93d020 | 
8d91e266-0925-4c29-8039-0d71862df4fc | 192.16 8.21.22 | fa:16:3e:a3:45:45 |
++--+--+---+---+

4) I can see the 4 tap devices.

root@juju-51d6ad-1751923-7:/home/ubuntu# virsh dumpxml instance-0001|grep 
-i tap
  
  
  
  


5) I modified the instance info caches removing one of the interfaces.

Database changed
mysql>  update instance_info_caches set 

[Bug 1751923] Re: [SRU]_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

2021-07-28 Thread Jorge Niedbalski
I am in the process to verify bionic/rocky/queens releases.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1751923

Title:
  [SRU]_heal_instance_info_cache periodic task bases on port list from
  nova db, not from neutron server

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1751923/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1751923] Re: [SRU]_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

2021-07-22 Thread Mathew Hodson
** Changed in: nova (Ubuntu)
   Importance: Undecided => Medium

** Changed in: nova (Ubuntu Bionic)
   Importance: Undecided => Medium

** Changed in: nova (Ubuntu Disco)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1751923

Title:
  [SRU]_heal_instance_info_cache periodic task bases on port list from
  nova db, not from neutron server

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1751923/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1751923] Re: [SRU]_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

2021-07-05 Thread Łukasz Zemczak
I'm a bit worried about the aforementioned regression potential here,
but I'll accept it seeing that this was accepted by the OpenStack team.
Since I'd best prefer if the SRUs were as safe as possible, offering
fallback functionality in case the system is old. I assume this would
require the additional commits cherry-picked?

Anyway, let's proceed for now.

** Changed in: nova (Ubuntu Bionic)
   Status: In Progress => Fix Committed

** Tags added: verification-needed verification-needed-bionic

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1751923

Title:
  [SRU]_heal_instance_info_cache periodic task bases on port list from
  nova db, not from neutron server

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1751923/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1751923] Re: [SRU]_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

2021-06-28 Thread Corey Bryant
** Description changed:

  [Impact]
  
  * During periodic task _heal_instance_info_cache the instance_info_caches are 
not updated using instance port_ids taken from neutron, but from nova db.
  * This causes that existing VMs to loose their network interfaces after 
reboot.
  
  [Test Plan]
  
  * This bug is reproducible on Bionic/Queens clouds.
  
  1) Deploy the following Juju bundle: https://paste.ubuntu.com/p/HgsqZfsDGh/
  2) Run the following script: https://paste.ubuntu.com/p/c4VDkqyR2z/
  3) If the script finishes with "Port not found" , the bug is still present.
  
  [Where problems could occur]
  
  Instances created prior to the Openstack Newton release that have more
  than one interface will not have associated information in the
  virtual_interfaces table that is required to repopulate the cache with
  interfaces in the same order they were attached prior. In the unlikely
  event that this occurs and you are using Openstack release Queen or
  Rocky, it will be necessary to either manually populate this table.
  Openstack Stein has a patch that adds support for generating this data.
  Since as things stand the guest will be unable to identify it's network
  information at all in the event the cache gets purged and given the
  hopefully low risk that a vm was created prior to Newton we hope the
  potential for this regression is very low.
  
+ [Discussion]
+ SRU team, please review the most recent version of nova   
2:17.0.13-0ubuntu3 in the unapproved queue. The older version can be rejected.
+ 
  --
  
  Description
  ===
  
  During periodic task _heal_instance_info_cache the
  instance_info_caches are not updated using instance port_ids taken
  from neutron, but from nova db.
  
  Sometimes, perhaps because of some race-condition, its possible to
  lose some ports from instance_info_caches. Periodic task
  _heal_instance_info_cache should clean this up (add missing records),
  but in fact it's not working this way.
  
  How it looks now?
  =
  
  _heal_instance_info_cache during crontask:
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/compute/manager.py#L6525
  
  is using network_api to get instance_nw_info (instance_info_caches):
  
-   try:
-   # Call to network API to get instance info.. this will
-   # force an update to the instance's info_cache
-   self.network_api.get_instance_nw_info(context, instance)
+   try:
+   # Call to network API to get instance info.. this will
+   # force an update to the instance's info_cache
+   self.network_api.get_instance_nw_info(context, instance)
  
  self.network_api.get_instance_nw_info() is listed below:
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L1377
  
  and it uses _build_network_info_model() without networks and port_ids
  parameters (because we're not adding any new interface to instance):
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L2356
  
  Next: _gather_port_ids_and_networks() generates the list of instance
  networks and port_ids:
  
- networks, port_ids = self._gather_port_ids_and_networks(
-   context, instance, networks, port_ids, client)
+ networks, port_ids = self._gather_port_ids_and_networks(
+   context, instance, networks, port_ids, client)
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L2389-L2390
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L1393
  
  As we see that _gather_port_ids_and_networks() takes the port list
  from DB:
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/objects/instance.py#L1173-L1176
  
  And thats it. When we lose a port its not possible to add it again with this 
periodic task.
  The only way is to clean device_id field in neutron port object and re-attach 
the interface using `nova interface-attach`.
  
  When the interface is missing and there is no port configured on
  compute host (for example after compute reboot) - interface is not
  added to instance and from neutron point of view port state is DOWN.
  
  When the interface is missing in cache and we reboot hard the instance
  - its not added as tapinterface in xml file = we don't have the
  network on host.
  
  Steps to reproduce
  ==
  1. Spawn devstack
  2. Spawn VM inside devstack with multiple ports (for example also from 2 
different networks)
  3. Update the DB row, drop one interface from interfaces_list
  4. Hard-Reboot the instance
  5. See that nova list shows instance without one address, but nova 
interface-list shows all 

[Bug 1751923] Re: [SRU]_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

2021-06-28 Thread Corey Bryant
New nova packages including this fix have been uploaded to rocky-staging and 
the bionic unapproved queue:
https://launchpad.net/~ubuntu-cloud-archive/+archive/ubuntu/rocky-staging/+packages?field.name_filter=nova
https://launchpad.net/ubuntu/bionic/+queue?queue_state=1_text=nova

** Changed in: nova (Ubuntu Bionic)
   Status: Confirmed => Triaged

** Changed in: nova (Ubuntu Bionic)
   Status: Triaged => In Progress

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1751923

Title:
  [SRU]_heal_instance_info_cache periodic task bases on port list from
  nova db, not from neutron server

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1751923/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1751923] Re: [SRU]_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

2021-06-28 Thread Corey Bryant
Thanks Jorge. Let's patch rocky as well for upgrade purposes.

** Changed in: cloud-archive/rocky
   Status: Won't Fix => In Progress

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1751923

Title:
  [SRU]_heal_instance_info_cache periodic task bases on port list from
  nova db, not from neutron server

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1751923/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1751923] Re: [SRU]_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

2021-06-28 Thread Edward Hope-Morley
Restored the bug description to its original format and updated SRU
info.

** Description changed:

  [Impact]
  
  * During periodic task _heal_instance_info_cache the instance_info_caches are 
not updated using instance port_ids taken from neutron, but from nova db.
  * This causes that existing VMs to loose their network interfaces after 
reboot.
  
  [Test Plan]
  
  * This bug is reproducible on Bionic/Queens clouds.
  
  1) Deploy the following Juju bundle: https://paste.ubuntu.com/p/HgsqZfsDGh/
  2) Run the following script: https://paste.ubuntu.com/p/c4VDkqyR2z/
  3) If the script finishes with "Port not found" , the bug is still present.
  
  [Where problems could occur]
  
- ** No specific regression potential has been identified.
- ** Check the other info section ***
- 
- [Other Info]
+ Instances created prior to the Openstack Newton release that have more
+ than one interface will not have associated information in the
+ virtual_interfaces table that is required to repopulate the cache with
+ interfaces in the same order they were attached prior. In the unlikely
+ event that this occurs and you are using Openstack release Queen or
+ Rocky, it will be necessary to either manually populate this table.
+ Openstack Stein has a patch that adds support for generating this data.
+ Since as things stand the guest will be unable to identify it's network
+ information at all in the event the cache gets purged and given the
+ hopefully low risk that a vm was created prior to Newton we hope the
+ potential for this regression is very low.
+ 
+ --
+ 
+ Description
+ ===
+ 
+ During periodic task _heal_instance_info_cache the
+ instance_info_caches are not updated using instance port_ids taken
+ from neutron, but from nova db.
+ 
+ Sometimes, perhaps because of some race-condition, its possible to
+ lose some ports from instance_info_caches. Periodic task
+ _heal_instance_info_cache should clean this up (add missing records),
+ but in fact it's not working this way.
  
  How it looks now?
  =
  
  _heal_instance_info_cache during crontask:
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/compute/manager.py#L6525
  
  is using network_api to get instance_nw_info (instance_info_caches):
  
- \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0try:
- 
\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0#
 Call to network API to get instance info.. this will
- 
\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0#
 force an update to the instance's info_cache
- 
\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0self.network_api.get_instance_nw_info(context,
 instance)
+   try:
+   # Call to network API to get instance info.. this will
+   # force an update to the instance's info_cache
+   self.network_api.get_instance_nw_info(context, instance)
  
  self.network_api.get_instance_nw_info() is listed below:
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L1377
  
  and it uses _build_network_info_model() without networks and port_ids
  parameters (because we're not adding any new interface to instance):
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L2356
  
  Next: _gather_port_ids_and_networks() generates the list of instance
  networks and port_ids:
  
- \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0networks, port_ids = 
self._gather_port_ids_and_networks(
- 
\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0context,
 instance, networks, port_ids, client)
+ networks, port_ids = self._gather_port_ids_and_networks(
+   context, instance, networks, port_ids, client)
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L2389-L2390
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L1393
  
- As we see that _gather_port_ids_and_networks() takes the port list from
- DB:
+ As we see that _gather_port_ids_and_networks() takes the port list
+ from DB:
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/objects/instance.py#L1173-L1176
  
  And thats it. When we lose a port its not possible to add it again with this 
periodic task.
  The only way is to clean device_id field in neutron port object and re-attach 
the interface using `nova interface-attach`.
  
- When the interface is missing and there is no port configured on compute
- host (for example after compute reboot) - interface is not added to
- instance and from neutron point of 

[Bug 1751923] Re: [SRU]_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

2021-06-28 Thread Edward Hope-Morley
@coreycb I think we have everything we need to proceed with this SRU
now. Since Queens is the oldest release currently supported on Ubuntu
and support for populating vif attach ordering required to rebuild the
cache has been available since Newton I think the risk of anyone being
impacted is very small. VMs created prior to Newton would need the patch
[1] and eventually [2] backported from Stein but I don't see them as
essential and given the impact of not having this fix asap I think it
supersedes those which we can handle separately.

[1] 
https://github.com/openstack/nova/commit/3534471c578eda6236e79f43153788c4725a5634
[2] https://bugs.launchpad.net/nova/+bug/1825034

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1751923

Title:
  [SRU]_heal_instance_info_cache periodic task bases on port list from
  nova db, not from neutron server

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1751923/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1751923] Re: [SRU]_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

2021-06-01 Thread Jorge Niedbalski
@corey anything in specific you need at my end to get this SRU reviewed?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1751923

Title:
  [SRU]_heal_instance_info_cache periodic task bases on port list from
  nova db, not from neutron server

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1751923/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1751923] Re: [SRU]_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

2021-05-20 Thread Edward Hope-Morley
Since Queens is populating the virtual_interfaces table as standard I
think we should proceed with this SRU -
https://pastebin.ubuntu.com/p/BdCPsVKGk5/ - since it will provide a
clean fix for Queens clouds.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1751923

Title:
  [SRU]_heal_instance_info_cache periodic task bases on port list from
  nova db, not from neutron server

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1751923/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1751923] Re: [SRU]_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

2021-05-17 Thread Jorge Niedbalski
** Description changed:

  [Impact]
  
  * During periodic task _heal_instance_info_cache the instance_info_caches are 
not updated using instance port_ids taken from neutron, but from nova db.
  * This causes that existing VMs to loose their network interfaces after 
reboot.
  
  [Test Plan]
  
  * This bug is reproducible on Bionic/Queens clouds.
  
  1) Deploy the following Juju bundle: https://paste.ubuntu.com/p/HgsqZfsDGh/
- 2) Run the following script: https://paste.ubuntu.com/p/DrFcDXZGSt/
+ 2) Run the following script: https://paste.ubuntu.com/p/c4VDkqyR2z/
  3) If the script finishes with "Port not found" , the bug is still present.
  
  [Where problems could occur]
  
  ** No specific regression potential has been identified.
  ** Check the other info section ***
- 
  
  [Other Info]
  
  How it looks now?
  =
  
  _heal_instance_info_cache during crontask:
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/compute/manager.py#L6525
  
  is using network_api to get instance_nw_info (instance_info_caches):
  
  \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0try:
  
\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0#
 Call to network API to get instance info.. this will
  
\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0#
 force an update to the instance's info_cache
  
\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0self.network_api.get_instance_nw_info(context,
 instance)
  
  self.network_api.get_instance_nw_info() is listed below:
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L1377
  
  and it uses _build_network_info_model() without networks and port_ids
  parameters (because we're not adding any new interface to instance):
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L2356
  
  Next: _gather_port_ids_and_networks() generates the list of instance
  networks and port_ids:
  
  \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0networks, port_ids = 
self._gather_port_ids_and_networks(
  
\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0context,
 instance, networks, port_ids, client)
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L2389-L2390
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L1393
  
  As we see that _gather_port_ids_and_networks() takes the port list from
  DB:
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/objects/instance.py#L1173-L1176
  
  And thats it. When we lose a port its not possible to add it again with this 
periodic task.
  The only way is to clean device_id field in neutron port object and re-attach 
the interface using `nova interface-attach`.
  
  When the interface is missing and there is no port configured on compute
  host (for example after compute reboot) - interface is not added to
  instance and from neutron point of view port state is DOWN.
  
  When the interface is missing in cache and we reboot hard the instance -
  its not added as tapinterface in xml file = we don't have the network on
  host.
  
  Steps to reproduce
  ==
  1. Spawn devstack
  2. Spawn VM inside devstack with multiple ports (for example also from 2 
different networks)
  3. Update the DB row, drop one interface from interfaces_list
  4. Hard-Reboot the instance
  5. See that nova list shows instance without one address, but nova 
interface-list shows all 
addresseshttps://launchpad.net/~niedbalski/+archive/ubuntu/lp1751923/+packages
  6. See that one port is missing in instance xml files
  7. In theory the _heal_instance_info_cache should fix this things, it relies 
on memory, not on the fresh list of instance ports taken from neutron.
  
  Reproduced Example
  ==
  1. Spawn VM with 1 private network port
  nova boot --flavor m1.small --image cirros-0.3.5-x86_64-disk --nic 
net-name=private  test-2
  2. Attach ports to have 2 private and 2 public interfaces
  nova list:
  | a64ed18d-9868-4bf0-90d3-d710d278922d | test-2 | ACTIVE | -  | 
Running | public=2001:db8::e, 172.24.4.15, 2001:db8::c, 172.24.4.16; 
private=fdda:5d77:e18e:0:f816:3eff:fee8:, 10.0.0.3, 
fdda:5d77:e18e:0:f816:3eff:fe53:231c, 10.0.0.5 |
  
  So we see 4 ports:
  stack@mjozefcz-devstack-ptg:~$ nova interface-list 
a64ed18d-9868-4bf0-90d3-d710d278922d
  
++--+--+---+---+
  | Port State | Port ID  | Net ID  
 | 

[Bug 1751923] Re: [SRU]_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

2021-05-17 Thread Jorge Niedbalski
** Patch added: "lp1751923_bionic.debdiff"
   
https://bugs.launchpad.net/nova/+bug/1751923/+attachment/5498309/+files/lp1751923_bionic.debdiff

** Description changed:

  [Impact]
  
  * During periodic task _heal_instance_info_cache the instance_info_caches are 
not updated using instance port_ids taken from neutron, but from nova db.
  * This causes that existing VMs to loose their network interfaces after 
reboot.
  
  [Test Plan]
  
  * This bug is reproducible on Bionic/Queens clouds.
  
  1) Deploy the following Juju bundle: https://paste.ubuntu.com/p/HgsqZfsDGh/
  2) Run the following script: https://paste.ubuntu.com/p/DrFcDXZGSt/
  3) If the script finishes with "Port not found" , the bug is still present.
  
  [Where problems could occur]
  
+ ** No specific regression potential has been identified.
  ** Check the other info section ***
  
  
  [Other Info]
  
  How it looks now?
  =
  
  _heal_instance_info_cache during crontask:
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/compute/manager.py#L6525
  
  is using network_api to get instance_nw_info (instance_info_caches):
  
  \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0try:
  
\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0#
 Call to network API to get instance info.. this will
  
\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0#
 force an update to the instance's info_cache
  
\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0self.network_api.get_instance_nw_info(context,
 instance)
  
  self.network_api.get_instance_nw_info() is listed below:
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L1377
  
  and it uses _build_network_info_model() without networks and port_ids
  parameters (because we're not adding any new interface to instance):
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L2356
  
  Next: _gather_port_ids_and_networks() generates the list of instance
  networks and port_ids:
  
  \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0networks, port_ids = 
self._gather_port_ids_and_networks(
  
\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0context,
 instance, networks, port_ids, client)
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L2389-L2390
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L1393
  
  As we see that _gather_port_ids_and_networks() takes the port list from
  DB:
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/objects/instance.py#L1173-L1176
  
  And thats it. When we lose a port its not possible to add it again with this 
periodic task.
  The only way is to clean device_id field in neutron port object and re-attach 
the interface using `nova interface-attach`.
  
  When the interface is missing and there is no port configured on compute
  host (for example after compute reboot) - interface is not added to
  instance and from neutron point of view port state is DOWN.
  
  When the interface is missing in cache and we reboot hard the instance -
  its not added as tapinterface in xml file = we don't have the network on
  host.
  
  Steps to reproduce
  ==
  1. Spawn devstack
  2. Spawn VM inside devstack with multiple ports (for example also from 2 
different networks)
  3. Update the DB row, drop one interface from interfaces_list
  4. Hard-Reboot the instance
- 5. See that nova list shows instance without one address, but nova 
interface-list shows all addresses
+ 5. See that nova list shows instance without one address, but nova 
interface-list shows all 
addresseshttps://launchpad.net/~niedbalski/+archive/ubuntu/lp1751923/+packages
  6. See that one port is missing in instance xml files
  7. In theory the _heal_instance_info_cache should fix this things, it relies 
on memory, not on the fresh list of instance ports taken from neutron.
  
  Reproduced Example
  ==
  1. Spawn VM with 1 private network port
  nova boot --flavor m1.small --image cirros-0.3.5-x86_64-disk --nic 
net-name=private  test-2
  2. Attach ports to have 2 private and 2 public interfaces
  nova list:
  | a64ed18d-9868-4bf0-90d3-d710d278922d | test-2 | ACTIVE | -  | 
Running | public=2001:db8::e, 172.24.4.15, 2001:db8::c, 172.24.4.16; 
private=fdda:5d77:e18e:0:f816:3eff:fee8:, 10.0.0.3, 
fdda:5d77:e18e:0:f816:3eff:fe53:231c, 10.0.0.5 |
  
  So we see 4 ports:
  stack@mjozefcz-devstack-ptg:~$ nova interface-list 
a64ed18d-9868-4bf0-90d3-d710d278922d
  

[Bug 1751923] Re: [SRU]_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

2021-05-17 Thread Jorge Niedbalski
Hello,

I've prepared a PPA for testing the proposed patch on B/Queens
https://launchpad.net/~niedbalski/+archive/ubuntu/lp1751923/+packages

Attached is the debdiff for bionic.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1751923

Title:
  [SRU]_heal_instance_info_cache periodic task bases on port list from
  nova db, not from neutron server

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1751923/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1751923] Re: [SRU]_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

2021-05-17 Thread Jorge Niedbalski
** Changed in: cloud-archive/rocky
   Status: In Progress => Won't Fix

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1751923

Title:
  [SRU]_heal_instance_info_cache periodic task bases on port list from
  nova db, not from neutron server

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1751923/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1751923] Re: [SRU]_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

2021-05-15 Thread Jorge Niedbalski
** Description changed:

- Description
- ===
+ [Impact]
  
- During periodic task _heal_instance_info_cache the instance_info_caches
- are not updated using instance port_ids taken from neutron, but from
- nova db.
+ * During periodic task _heal_instance_info_cache the instance_info_caches are 
not updated using instance port_ids taken from neutron, but from nova db.
+ * This causes that existing VMs to loose their network interfaces after 
reboot.
  
- Sometimes, perhaps because of some race-condition, its possible to lose
- some ports from instance_info_caches. Periodic task
- _heal_instance_info_cache should clean this up (add missing records),
- but in fact it's not working this way.
+ [Test Plan]
+ 
+ * This bug is reproducible on Bionic/Queens clouds.
+ 
+ 1) Deploy the following Juju bundle: https://paste.ubuntu.com/p/HgsqZfsDGh/
+ 2) Run the following script: https://paste.ubuntu.com/p/DrFcDXZGSt/
+ 3) If the script finishes with "Port not found" , the bug is still present.
+ 
+ [Where problems could occur]
+ 
+ ** Check the other info section ***
+ 
+ 
+ [Other Info]
  
  How it looks now?
  =
  
  _heal_instance_info_cache during crontask:
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/compute/manager.py#L6525
  
  is using network_api to get instance_nw_info (instance_info_caches):
  
- try:
- # Call to network API to get instance info.. this will
- # force an update to the instance's info_cache
- self.network_api.get_instance_nw_info(context, instance)
+ \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0try:
+ 
\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0#
 Call to network API to get instance info.. this will
+ 
\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0#
 force an update to the instance's info_cache
+ 
\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0self.network_api.get_instance_nw_info(context,
 instance)
  
  self.network_api.get_instance_nw_info() is listed below:
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L1377
  
  and it uses _build_network_info_model() without networks and port_ids
  parameters (because we're not adding any new interface to instance):
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L2356
  
  Next: _gather_port_ids_and_networks() generates the list of instance
  networks and port_ids:
  
-   networks, port_ids = self._gather_port_ids_and_networks(
- context, instance, networks, port_ids, client)
+ \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0networks, port_ids = 
self._gather_port_ids_and_networks(
+ 
\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0context,
 instance, networks, port_ids, client)
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L2389-L2390
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L1393
  
  As we see that _gather_port_ids_and_networks() takes the port list from
  DB:
  
  
https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/objects/instance.py#L1173-L1176
  
  And thats it. When we lose a port its not possible to add it again with this 
periodic task.
  The only way is to clean device_id field in neutron port object and re-attach 
the interface using `nova interface-attach`.
  
  When the interface is missing and there is no port configured on compute
  host (for example after compute reboot) - interface is not added to
  instance and from neutron point of view port state is DOWN.
  
  When the interface is missing in cache and we reboot hard the instance -
  its not added as tapinterface in xml file = we don't have the network on
  host.
  
  Steps to reproduce
  ==
  1. Spawn devstack
  2. Spawn VM inside devstack with multiple ports (for example also from 2 
different networks)
  3. Update the DB row, drop one interface from interfaces_list
  4. Hard-Reboot the instance
  5. See that nova list shows instance without one address, but nova 
interface-list shows all addresses
  6. See that one port is missing in instance xml files
  7. In theory the _heal_instance_info_cache should fix this things, it relies 
on memory, not on the fresh list of instance ports taken from neutron.
  
  Reproduced Example
  ==
  1. Spawn VM with 1 private network port
  nova boot --flavor m1.small --image cirros-0.3.5-x86_64-disk --nic 
net-name=private  test-2
  2. Attach ports to have 2 private and 2 public interfaces
  nova list:
  | a64ed18d-9868-4bf0-90d3-d710d278922d |