[Yahoo-eng-team] [Bug 1973136] Re: glance-multistore-cinder-import is failing consistently

2022-05-13 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/c/openstack/glance/+/841548
Committed: 
https://opendev.org/openstack/glance/commit/d7fa7a0321ea5a56ec130aa0bd346749459ccaf2
Submitter: "Zuul (22348)"
Branch:master

commit d7fa7a0321ea5a56ec130aa0bd346749459ccaf2
Author: whoami-rajat 
Date:   Thu May 12 12:24:06 2022 +0530

Disable import workflow in glance cinder jobs

Recently, glance-multistore-cinder-import job started failing.
As per the RCA done here[1], the reason is glance is using
import workflow to create images which is an async operation.
As in case of glance cinder configuration, there are a lot of
external APIs (cinder) called like volume create, attachment
create, attachment update, attachment delete etc which takes
time to process hence the image doesn't get available in the
expected time (as per devstack) hence the failure.
Disabling import workflow will cause the images to be created
synchronously which should pass the glance cinder jobs.
To disable import workflow, we are inheriting from
tempest-integrated-storage and not
tempest-integrated-storage-import (which has import plugin enabled).

[1] 
https://review.opendev.org/c/openstack/glance/+/841278/1#message-456096e48b28e5b866deb8bf53e9258ee08219a0

Closes-Bug: 1973136
Change-Id: I524dfeb05c078773aa77020d4a6a9991a7eb75c2


** Changed in: glance
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1973136

Title:
  glance-multistore-cinder-import is failing consistently

Status in Glance:
  Fix Released

Bug description:
  glance-multistore-cinder-import and glance-multistore-cinder-import-
  fips (non-voting) jobs are failing consistently on glance gate with
  the following error

  2022-05-11 07:50:33.918925 | controller | ++
  lib/tempest:configure_tempest:181:   echo 'Found no valid
  images to use!'

  https://zuul.opendev.org/t/openstack/build/1838a5d0284e42ec81270cc8a33a1b8f

To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/1973136/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1880828] Re: New instance is always in "spawning" status

2022-05-13 Thread Billy Olsen
Marking charm tasks as invalid on this particular bug as these aren't
related to the charms and were chased down to other components.

** Changed in: charm-nova-compute
   Status: New => Invalid

** Changed in: openstack-bundles
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1880828

Title:
  New instance is always in "spawning" status

Status in OpenStack Nova Compute Charm:
  Invalid
Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Bundles:
  Invalid

Bug description:
  bundle: openstack-base-bionic-train 
https://github.com/openstack-charmers/openstack-bundles/blob/master/development/openstack-base-bionic-train/bundle.yaml
  hardware: 2 d05 and 2 d06 (the log of the compute node is from one of the 
d06. Please note they are arm64 arch.)

  When trying to create new instances on the deployed openstack, the
  instance is always in the status of "spawning"

  [Steps to Reproduce]
  1. Deploy with the above bundle and hardware by following the instruction of 
https://jaas.ai/openstack-base/bundle/67
  2. Wait about 1.5 until the deployment is ready. By ready it means every unit 
shows its message as "ready" e.g. https://paste.ubuntu.com/p/k48YVnPyVZ/
  3. Follow the instruction of https://jaas.ai/openstack-base/bundle/67 until 
the step of "openstack server create" to create new instance. This step is also 
summarized in details in this gist code snippet 
https://gist.github.com/tai271828/b0c00a611e703046dd52da12a66226b0#file-02-basic-test-just-deployed-sh

  [Expected Behavior]
  An instance is created a few seconds later

  [Actual Behavior]
  The status of the instance is always (> 20 minutes) "spawning"

  [Additional Information]

  1. [workaround] Use `ps aux | grep qemu-img` to check if a qemu-img
  image converting process exists or not. The process should complete
  within ~20 sec. If the process exists for more than 1 minutes, use
  `pkill -f qemu-img` to terminate the process and re-create instances
  again.

  The image converting process looks like this one:

  ```
  qemu-img convert -t none -O raw -f qcow2 /var/lib/nova/instance 
s/_base/9b8156fbecaa194804a637226c8ffded93a57489.part 
/var/lib/nova/instances/_base/9b8156fbecaa194804a637226c8ffded93a57489.converted
  ```

  2. By investing in more details, this issue is a coupled issue of 1)
  nova should timeout instance process (comment#21) 2) qemu does not
  terminate the process to convert the image successfully (comment#20)

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-nova-compute/+bug/1880828/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1972278] Re: ovn-octavia-provider oslo config options colliding with neutron ones

2022-05-13 Thread Fernando Royo
** Changed in: neutron
   Status: Triaged => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1972278

Title:
  ovn-octavia-provider oslo config options colliding with neutron ones

Status in neutron:
  Fix Released

Bug description:
  Some jobs in zuul are reporting this error:

  Failed to import test module: 
ovn_octavia_provider.tests.functional.test_integration
  Traceback (most recent call last):
File "/usr/lib/python3.8/unittest/loader.py", line 436, in _find_test_path
  module = self._get_module_from_name(name)
File "/usr/lib/python3.8/unittest/loader.py", line 377, in 
_get_module_from_name
  __import__(name)
File 
"/home/zuul/src/opendev.org/openstack/ovn-octavia-provider/ovn_octavia_provider/tests/functional/test_integration.py",
 line 18, in 
  from ovn_octavia_provider.tests.functional import base as ovn_base
File 
"/home/zuul/src/opendev.org/openstack/ovn-octavia-provider/ovn_octavia_provider/tests/functional/base.py",
 line 31, in 
  from neutron.tests.functional import base
File 
"/home/zuul/src/opendev.org/openstack/ovn-octavia-provider/.tox/dsvm-functional/lib/python3.8/site-packages/neutron/tests/functional/base.py",
 line 40, in 
  from neutron.conf.plugins.ml2.drivers.ovn import ovn_conf
File 
"/home/zuul/src/opendev.org/openstack/ovn-octavia-provider/.tox/dsvm-functional/lib/python3.8/site-packages/neutron/conf/plugins/ml2/drivers/ovn/ovn_conf.py",
 line 212, in 
  cfg.CONF.register_opts(ovn_opts, group='ovn')
File 
"/home/zuul/src/opendev.org/openstack/ovn-octavia-provider/.tox/dsvm-functional/lib/python3.8/site-packages/oslo_config/cfg.py",
 line 2077, in __inner
  ...
  if _is_opt_registered(self._opts, opt):
File 
"/home/zuul/src/opendev.org/openstack/ovn-octavia-provider/.tox/dsvm-functional/lib/python3.8/site-packages/oslo_config/cfg.py",
 line 356, in _is_opt_registered
  raise DuplicateOptError(opt.name)
  oslo_config.cfg.DuplicateOptError: duplicate option: ovn_nb_connection

  Basically the OVN octavia provider is registering opts a soon modules
  (driver, agent or helper) are imported so when tests run the setUp
  they are triggered by a Duplicate option error because they are based
  on TestOVNFunctionalBase from Neutron where same options are loaded.
  Error doesn't appear in running environment as neutron and ovn-
  octavia-provider (octavia) are running in separate process but in zuul
  jobs they collide.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1972278/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1973349] [NEW] Slow queries after upgrade to Xena

2022-05-13 Thread Dmitriy Rabotyagov
Public bug reported:

After upgrading to Xena we started noticing slow queries that were written down 
in mysql slow log.
Most of them were including next subquery:
SELECT DISTINCT ports.id AS ports_id FROM ports, networks WHERE 
ports.project_id = '' OR ports.network_id = networks.id AND 
networks.project_id = ''.

So for example, when issuing `openstack project list` this subquery appears 
several times:
```
SELECT allowedaddresspairs.port_id AS allowedaddresspairs_port_id, 
allowedaddresspairs.mac_address AS allowedaddresspairs_mac_address, 
allowedaddresspairs.ip_address AS allowedaddresspairs_ip_address, 
anon_1.ports_id AS anon_1_ports_id \nFROM (SELECT DISTINCT ports.id AS ports_id 
\nFROM ports, networks \nWHERE ports.project_id = '' OR 
ports.network_id = networks.id AND networks.project_id = '') AS anon_1 
INNER JOIN allowedaddresspairs ON anon_1.ports_id = allowedaddresspairs.port_id

SELECT extradhcpopts.id AS extradhcpopts_id, extradhcpopts.port_id AS
extradhcpopts_port_id, extradhcpopts.opt_name AS extradhcpopts_opt_name,
extradhcpopts.opt_value AS extradhcpopts_opt_value,
extradhcpopts.ip_version AS extradhcpopts_ip_version, anon_1.ports_id AS
anon_1_ports_id \nFROM (SELECT DISTINCT ports.id AS ports_id \nFROM
ports, networks \nWHERE ports.project_id = '' OR
ports.network_id = networks.id AND networks.project_id = '') AS
anon_1 INNER JOIN extradhcpopts ON anon_1.ports_id =
extradhcpopts.port_id0.000

SELECT ipallocations.port_id AS ipallocations_port_id, ipallocations.ip_address 
AS ipallocations_ip_address, ipallocations.subnet_id AS 
ipallocations_subnet_id, ipallocations.network_id AS ipallocations_network_id, 
anon_1.ports_id AS anon_1_ports_id \nFROM (SELECT DISTINCT ports.id AS ports_id 
\nFROM ports, networks \nWHERE ports.project_id = '' OR 
ports.network_id = networks.id AND networks.project_id = '') AS anon_1 
INNER JOIN ipallocations ON anon_1.ports_id = ipallocations.port_id ORDER BY 
ipallocations.ip_address, ipallocations.subnet_id
```


Another interesting thing is difference in execution time between 
admin/non-admin call:
(openstack) dmitriy@6BT6XT2:~$ . Documents/openrc/admin.rc 
(openstack) dmitriy@6BT6XT2:~$ time openstack port list --project  | 
wc -l
2142

real0m5,401s
user0m1,565s
sys 0m0,086s
(openstack) dmitriy@6BT6XT2:~$ . Documents/openrc/.rc 
(openstack) dmitriy@6BT6XT2:~$ time openstack port list | wc -l
2142

real2m38,101s
user0m1,626s
sys 0m0,083s
(openstack) dmitriy@6BT6XT2:~$ 


Environment:
Neutron SHA: 97180b01837638bd0476c28bdda2340eccd649af
Backend: ovs
OS: Ubuntu 20.04
Mariadb: 10.6.5
SQLalchemy: 1.4.23
Backend: openvswitch
Plugins: router vpnaas metering 
neutron_dynamic_routing.services.bgp.bgp_plugin.BgpPlugin

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1973349

Title:
  Slow queries after upgrade to Xena

Status in neutron:
  New

Bug description:
  After upgrading to Xena we started noticing slow queries that were written 
down in mysql slow log.
  Most of them were including next subquery:
  SELECT DISTINCT ports.id AS ports_id FROM ports, networks WHERE 
ports.project_id = '' OR ports.network_id = networks.id AND 
networks.project_id = ''.

  So for example, when issuing `openstack project list` this subquery appears 
several times:
  ```
  SELECT allowedaddresspairs.port_id AS allowedaddresspairs_port_id, 
allowedaddresspairs.mac_address AS allowedaddresspairs_mac_address, 
allowedaddresspairs.ip_address AS allowedaddresspairs_ip_address, 
anon_1.ports_id AS anon_1_ports_id \nFROM (SELECT DISTINCT ports.id AS ports_id 
\nFROM ports, networks \nWHERE ports.project_id = '' OR 
ports.network_id = networks.id AND networks.project_id = '') AS anon_1 
INNER JOIN allowedaddresspairs ON anon_1.ports_id = allowedaddresspairs.port_id

  SELECT extradhcpopts.id AS extradhcpopts_id, extradhcpopts.port_id AS
  extradhcpopts_port_id, extradhcpopts.opt_name AS
  extradhcpopts_opt_name, extradhcpopts.opt_value AS
  extradhcpopts_opt_value, extradhcpopts.ip_version AS
  extradhcpopts_ip_version, anon_1.ports_id AS anon_1_ports_id \nFROM
  (SELECT DISTINCT ports.id AS ports_id \nFROM ports, networks \nWHERE
  ports.project_id = '' OR ports.network_id = networks.id AND
  networks.project_id = '') AS anon_1 INNER JOIN extradhcpopts
  ON anon_1.ports_id = extradhcpopts.port_id0.000

  SELECT ipallocations.port_id AS ipallocations_port_id, 
ipallocations.ip_address AS ipallocations_ip_address, ipallocations.subnet_id 
AS ipallocations_subnet_id, ipallocations.network_id AS 
ipallocations_network_id, anon_1.ports_id AS anon_1_ports_id \nFROM (SELECT 
DISTINCT ports.id AS ports_id \nFROM ports, networks \nWHERE ports.project_id = 
'' OR ports.network_id = networks.id AND networks.project_id = 
'') AS anon_1 INNER JOIN ipallocations ON anon_1.ports_id = 

[Yahoo-eng-team] [Bug 1973347] [NEW] OVN revision_number infinite update loop

2022-05-13 Thread Renat Nurgaliyev
Public bug reported:

After the change described in
https://mail.openvswitch.org/pipermail/ovs-dev/2022-May/393966.html was
merged and released in stable OVN 22.03, there is a possibility to
create an endless loop of revision_number update in external_ids of
ports and router_ports. We have confirmed the bug in Ussuri and Yoga.
When the problem happens, the Neutron log would look like this:

2022-05-13 09:30:56.318 25 ... Successfully bumped revision number for resource 
8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: router_ports) to 4815
2022-05-13 09:30:56.366 25 ... Running txn n=1 command(idx=0): 
CheckRevisionNumberCommand(...)
2022-05-13 09:30:56.367 25 ... Running txn n=1 command(idx=1): 
SetLSwitchPortCommand(...)
2022-05-13 09:30:56.367 25 ... Running txn n=1 command(idx=2): 
PgDelPortCommand(...)
2022-05-13 09:30:56.467 25 ... Successfully bumped revision number for resource 
8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: ports) to 4815
2022-05-13 09:30:56.880 25 ... Running txn n=1 command(idx=0): 
CheckRevisionNumberCommand(...)
2022-05-13 09:30:56.881 25 ... Running txn n=1 command(idx=1): 
UpdateLRouterPortCommand(...)
2022-05-13 09:30:56.881 25 ... Running txn n=1 command(idx=2): 
SetLRouterPortInLSwitchPortCommand(...)
2022-05-13 09:30:56.984 25 ... Successfully bumped revision number for resource 
8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: router_ports) to 4816
2022-05-13 09:30:57.057 25 ... Running txn n=1 command(idx=0): 
CheckRevisionNumberCommand(...)
2022-05-13 09:30:57.057 25 ... Running txn n=1 command(idx=1): 
SetLSwitchPortCommand(...)
2022-05-13 09:30:57.058 25 ... Running txn n=1 command(idx=2): 
PgDelPortCommand(...)
2022-05-13 09:30:57.159 25 ... Successfully bumped revision number for resource 
8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: ports) to 4816
2022-05-13 09:30:57.523 25 ... Running txn n=1 command(idx=0): 
CheckRevisionNumberCommand(...)
2022-05-13 09:30:57.523 25 ... Running txn n=1 command(idx=1): 
UpdateLRouterPortCommand(...)
2022-05-13 09:30:57.524 25 ... Running txn n=1 command(idx=2): 
SetLRouterPortInLSwitchPortCommand(...)
2022-05-13 09:30:57.627 25 ... Successfully bumped revision number for resource 
8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: router_ports) to 4817
2022-05-13 09:30:57.674 25 ... Running txn n=1 command(idx=0): 
CheckRevisionNumberCommand(...)
2022-05-13 09:30:57.674 25 ... Running txn n=1 command(idx=1): 
SetLSwitchPortCommand(...)
2022-05-13 09:30:57.675 25 ... Running txn n=1 command(idx=2): 
PgDelPortCommand(...)
2022-05-13 09:30:57.765 25 ... Successfully bumped revision number for resource 
8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: ports) to 4817

(full version here: https://pastebin.com/raw/NLP1b6Qm).

In our lab environment we have confirmed that the problem is gone after
mentioned change is rolled back.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1973347

Title:
  OVN revision_number infinite update loop

Status in neutron:
  New

Bug description:
  After the change described in
  https://mail.openvswitch.org/pipermail/ovs-dev/2022-May/393966.html
  was merged and released in stable OVN 22.03, there is a possibility to
  create an endless loop of revision_number update in external_ids of
  ports and router_ports. We have confirmed the bug in Ussuri and Yoga.
  When the problem happens, the Neutron log would look like this:

  2022-05-13 09:30:56.318 25 ... Successfully bumped revision number for 
resource 8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: router_ports) to 4815
  2022-05-13 09:30:56.366 25 ... Running txn n=1 command(idx=0): 
CheckRevisionNumberCommand(...)
  2022-05-13 09:30:56.367 25 ... Running txn n=1 command(idx=1): 
SetLSwitchPortCommand(...)
  2022-05-13 09:30:56.367 25 ... Running txn n=1 command(idx=2): 
PgDelPortCommand(...)
  2022-05-13 09:30:56.467 25 ... Successfully bumped revision number for 
resource 8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: ports) to 4815
  2022-05-13 09:30:56.880 25 ... Running txn n=1 command(idx=0): 
CheckRevisionNumberCommand(...)
  2022-05-13 09:30:56.881 25 ... Running txn n=1 command(idx=1): 
UpdateLRouterPortCommand(...)
  2022-05-13 09:30:56.881 25 ... Running txn n=1 command(idx=2): 
SetLRouterPortInLSwitchPortCommand(...)
  2022-05-13 09:30:56.984 25 ... Successfully bumped revision number for 
resource 8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: router_ports) to 4816
  2022-05-13 09:30:57.057 25 ... Running txn n=1 command(idx=0): 
CheckRevisionNumberCommand(...)
  2022-05-13 09:30:57.057 25 ... Running txn n=1 command(idx=1): 
SetLSwitchPortCommand(...)
  2022-05-13 09:30:57.058 25 ... Running txn n=1 command(idx=2): 
PgDelPortCommand(...)
  2022-05-13 09:30:57.159 25 ... Successfully bumped revision number for 
resource 8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: ports) to 4816
  2022-05-13 

[Yahoo-eng-team] [Bug 1973276] Re: OVN port loses its virtual type after port update

2022-05-13 Thread Rodolfo Alonso
The problem in the python reproducer is the VIP device_owner. The VIP
must not have one. Once removed from the python reproducer code, the VIP
never looses its type "virtual".

** Changed in: neutron
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1973276

Title:
  OVN port loses its virtual type after port update

Status in neutron:
  Invalid

Bug description:
  Bug found in Octavia (master)

  Octavia creates at least 2 ports for each load balancer:
  - the VIP port, it is down, it keeps/stores the IP address of the LB
  - the VRRP port, plugged into a VM, it has the VIP address in the 
allowed-address list (and the VIP address is configured on the interface in the 
VM)

  When sending an ARP request for the VIP address, the VRRP port should
  reply with its mac-address.

  In OVN the VIP port is marked as "type: virtual".

  But when the VIP port is updated, it loses its "port: virtual" status
  and that breaks the ARP resolution (OVN replies to the ARP request by
  sending the mac-address of the VIP port - which is not used/down).

  Quick reproducer that simulates the Octavia behavior:

  
  ===

  import subprocess
  import time
   
  import openstack
   
  conn = openstack.connect(cloud="devstack-admin-demo")
   
  network = conn.network.find_network("public")
   
  sg = conn.network.find_security_group('sg')
  if not sg:
  sg = conn.network.create_security_group(name='sg')
   
  vip_port = conn.network.create_port(
  name="lb-vip",
  network_id=network.id,
  device_id="lb-1",
  device_owner="me",
  is_admin_state_up=False)
   
  vip_address = [
  fixed_ip['ip_address']
  for fixed_ip in vip_port.fixed_ips
  if '.' in fixed_ip['ip_address']][0]
   
  vrrp_port = conn.network.create_port(
  name="lb-vrrp",
  device_id="vrrp",
  device_owner="vm",
  network_id=network.id)
  vrrp_port = conn.network.update_port(
  vrrp_port,
  allowed_address_pairs=[
  {"ip_address": vip_address,
   "mac_address": vrrp_port.mac_address}])
   
  time.sleep(1)
   
  output = subprocess.check_output(
  f"sudo ovn-nbctl show | grep -A2 'port {vip_port.id}'",
  shell=True)
  output = output.decode('utf-8')
   
  if 'type: virtual' in output:
  print("Port is virtual, this is ok.")
  print(output)
   
  conn.network.update_port(
  vip_port,
  security_group_ids=[sg.id])
   
  time.sleep(1)
   
  output = subprocess.check_output(
  f"sudo ovn-nbctl show | grep -A2 'port {vip_port.id}'",
  shell=True)
  output = output.decode('utf-8')
   
  if 'type: virtual' not in output:
  print("Port is not virtual, this is an issue.")
  print(output)

  ===

  
  In my env (devstack master on c9s):
  $ python3 /mnt/host/virtual_port_issue.py
  Port is virtual, this is ok.
  port e0fe2894-e306-42d9-8c5e-6e77b77659e2 (aka lb-vip)
  type: virtual
  addresses: ["fa:16:3e:93:00:8f 172.24.4.111 2001:db8::178"]

  Port is not virtual, this is an issue.
  port e0fe2894-e306-42d9-8c5e-6e77b77659e2 (aka lb-vip)
  addresses: ["fa:16:3e:93:00:8f 172.24.4.111 2001:db8::178"]
  port 8ec36278-82b1-436b-bc5e-ea03ef22192f

  
  In Octavia, the "port: virtual" is _sometimes_ back after other updates of 
the ports, but in some cases the LB is unreachable.

  (and "ovn-nbctl lsp-set-type  virtual" fixes the LB)

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1973276/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1973276] [NEW] OVN port loses its virtual type after port update

2022-05-13 Thread Gregory Thiemonge
Public bug reported:

Bug found in Octavia (master)

Octavia creates at least 2 ports for each load balancer:
- the VIP port, it is down, it keeps/stores the IP address of the LB
- the VRRP port, plugged into a VM, it has the VIP address in the 
allowed-address list (and the VIP address is configured on the interface in the 
VM)

When sending an ARP request for the VIP address, the VRRP port should
reply with its mac-address.

In OVN the VIP port is marked as "type: virtual".

But when the VIP port is updated, it loses its "port: virtual" status
and that breaks the ARP resolution (OVN replies to the ARP request by
sending the mac-address of the VIP port - which is not used/down).

Quick reproducer that simulates the Octavia behavior:


===

import subprocess
import time
 
import openstack
 
conn = openstack.connect(cloud="devstack-admin-demo")
 
network = conn.network.find_network("public")
 
sg = conn.network.find_security_group('sg')
if not sg:
sg = conn.network.create_security_group(name='sg')
 
vip_port = conn.network.create_port(
name="lb-vip",
network_id=network.id,
device_id="lb-1",
device_owner="me",
is_admin_state_up=False)
 
vip_address = [
fixed_ip['ip_address']
for fixed_ip in vip_port.fixed_ips
if '.' in fixed_ip['ip_address']][0]
 
vrrp_port = conn.network.create_port(
name="lb-vrrp",
device_id="vrrp",
device_owner="vm",
network_id=network.id)
vrrp_port = conn.network.update_port(
vrrp_port,
allowed_address_pairs=[
{"ip_address": vip_address,
 "mac_address": vrrp_port.mac_address}])
 
time.sleep(1)
 
output = subprocess.check_output(
f"sudo ovn-nbctl show | grep -A2 'port {vip_port.id}'",
shell=True)
output = output.decode('utf-8')
 
if 'type: virtual' in output:
print("Port is virtual, this is ok.")
print(output)
 
conn.network.update_port(
vip_port,
security_group_ids=[sg.id])
 
time.sleep(1)
 
output = subprocess.check_output(
f"sudo ovn-nbctl show | grep -A2 'port {vip_port.id}'",
shell=True)
output = output.decode('utf-8')
 
if 'type: virtual' not in output:
print("Port is not virtual, this is an issue.")
print(output)

===


In my env (devstack master on c9s):
$ python3 /mnt/host/virtual_port_issue.py
Port is virtual, this is ok.
port e0fe2894-e306-42d9-8c5e-6e77b77659e2 (aka lb-vip)
type: virtual
addresses: ["fa:16:3e:93:00:8f 172.24.4.111 2001:db8::178"]

Port is not virtual, this is an issue.
port e0fe2894-e306-42d9-8c5e-6e77b77659e2 (aka lb-vip)
addresses: ["fa:16:3e:93:00:8f 172.24.4.111 2001:db8::178"]
port 8ec36278-82b1-436b-bc5e-ea03ef22192f


In Octavia, the "port: virtual" is _sometimes_ back after other updates of the 
ports, but in some cases the LB is unreachable.

(and "ovn-nbctl lsp-set-type  virtual" fixes the LB)

** Affects: neutron
 Importance: High
 Status: Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1973276

Title:
  OVN port loses its virtual type after port update

Status in neutron:
  Confirmed

Bug description:
  Bug found in Octavia (master)

  Octavia creates at least 2 ports for each load balancer:
  - the VIP port, it is down, it keeps/stores the IP address of the LB
  - the VRRP port, plugged into a VM, it has the VIP address in the 
allowed-address list (and the VIP address is configured on the interface in the 
VM)

  When sending an ARP request for the VIP address, the VRRP port should
  reply with its mac-address.

  In OVN the VIP port is marked as "type: virtual".

  But when the VIP port is updated, it loses its "port: virtual" status
  and that breaks the ARP resolution (OVN replies to the ARP request by
  sending the mac-address of the VIP port - which is not used/down).

  Quick reproducer that simulates the Octavia behavior:

  
  ===

  import subprocess
  import time
   
  import openstack
   
  conn = openstack.connect(cloud="devstack-admin-demo")
   
  network = conn.network.find_network("public")
   
  sg = conn.network.find_security_group('sg')
  if not sg:
  sg = conn.network.create_security_group(name='sg')
   
  vip_port = conn.network.create_port(
  name="lb-vip",
  network_id=network.id,
  device_id="lb-1",
  device_owner="me",
  is_admin_state_up=False)
   
  vip_address = [
  fixed_ip['ip_address']
  for fixed_ip in vip_port.fixed_ips
  if '.' in fixed_ip['ip_address']][0]
   
  vrrp_port = conn.network.create_port(
  name="lb-vrrp",
  device_id="vrrp",
  device_owner="vm",
  network_id=network.id)
  vrrp_port = conn.network.update_port(
  vrrp_port,
  allowed_address_pairs=[
  {"ip_address": vip_address,
   "mac_address": vrrp_port.mac_address}])
   
  time.sleep(1)
   
  output = subprocess.check_output(