[Yahoo-eng-team] [Bug 1917409] [NEW] neutron-l3-agents won't become active

2021-03-01 Thread Brad Marshall
Public bug reported:

We have a Ubuntu Ussari cloud deployed on Ubuntu 20.04 using the juju
charms from the 20.08 bundle (planning to upgrade soon).

The problem that is occuring that all l3 agents for routers using a
particular external network show up with their ha_state in standby.
I've tried removing and re-adding, and we never see the state go to
active.

$ neutron l3-agent-list-hosting-router bradm-router
neutron CLI is deprecated and will be removed in the future. Use openstack CLI 
instead.
+--+-++---+--+
| id   | host| admin_state_up | alive | 
ha_state |
+--+-++---+--+
| 09ae92c9-ae8f-4209-b1a8-d593cc6d6602 | oschv1.maas | True   | :-)   | 
standby  |
| 4d9fe934-b1f8-4c2b-83ea-04971f827209 | oschv2.maas | True   | :-)   | 
standby  |
| 70b8b60e-7fbd-4b3a-80a3-90875ca72ce6 | oschv4.maas | True   | :-)   | 
standby  |
+--+-++---+--+

This generates a stack trace:

2021-03-01 02:59:47.344 3675486 ERROR neutron.agent.l3.router_info [-] 
'NoneType' object has no attribute 'get'
Traceback (most recent call last):

  File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/server.py", line 165, 
in _process_incoming
res = self.dispatcher.dispatch(message)

  File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/dispatcher.py", line 
276, in dispatch
return self._do_dispatch(endpoint, method, ctxt, args)

  File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/dispatcher.py", line 
196, in _do_dispatch
result = func(ctxt, **new_args)

  File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 139, in 
wrapped
setattr(e, '_RETRY_EXCEEDED', True)

  File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in 
__exit__
self.force_reraise()

  File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
six.reraise(self.type_, self.value, self.tb)

  File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise
raise value

  File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 135, in 
wrapped
return f(*args, **kwargs)

  File "/usr/lib/python3/dist-packages/oslo_db/api.py", line 154, in wrapper
ectxt.value = e.inner_exc

  File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in 
__exit__
self.force_reraise()

  File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
six.reraise(self.type_, self.value, self.tb)

  File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise
raise value

  File "/usr/lib/python3/dist-packages/oslo_db/api.py", line 142, in wrapper
return f(*args, **kwargs)

  File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 183, in 
wrapped
LOG.debug("Retry wrapper got retriable exception: %s", e)

  File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in 
__exit__
self.force_reraise()

  File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
six.reraise(self.type_, self.value, self.tb)

  File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise
raise value

  File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 179, in 
wrapped
return f(*dup_args, **dup_kwargs)

  File "/usr/lib/python3/dist-packages/neutron/api/rpc/handlers/l3_rpc.py", 
line 306, in get_agent_gateway_port
agent_port = self.l3plugin.create_fip_agent_gw_port_if_not_exists(

  File "/usr/lib/python3/dist-packages/neutron/db/l3_dvr_db.py", line 1101, in 
create_fip_agent_gw_port_if_not_exists
self._populate_mtu_and_subnets_for_ports(context, [agent_port])

  File "/usr/lib/python3/dist-packages/neutron/db/l3_db.py", line 1772, in 
_populate_mtu_and_subnets_for_ports
network_ids = [p['network_id']

  File "/usr/lib/python3/dist-packages/neutron/db/l3_db.py", line 1772, in 

network_ids = [p['network_id']

  File "/usr/lib/python3/dist-packages/neutron/db/l3_db.py", line 1720, in 
_each_port_having_fixed_ips
fixed_ips = port.get('fixed_ips', [])

This system was running successfully after deployment, and has been left
running for a while and when it was revisited was in this state.  I've
been unable to successfully debug what has caused it to be in this
state.

Versions:
Ubuntu 20.04
Juju charms 20.08
Openstack ussari
Environment: Clustered services using containers on converged hypervisors

$ dpkg-query -W neutron-common
neutron-common  2:16.2.0-0ubuntu2

Please let me know if there is any further information that could be
used to see what is happening here.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is 

[Yahoo-eng-team] [Bug 1917393] [NEW] [L3][Port forwarding] admin state DOWN/UP router will lose all pf-floating-ips and nat rules

2021-03-01 Thread LIU Yulong
Public bug reported:

Need to clean cache when router is down, otherwise the port forwarding
extension will skip all objects processing due to cache is hitting.

** Affects: neutron
 Importance: High
 Status: Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1917393

Title:
  [L3][Port forwarding] admin state DOWN/UP router will lose all pf-
  floating-ips and nat rules

Status in neutron:
  Confirmed

Bug description:
  Need to clean cache when router is down, otherwise the port forwarding
  extension will skip all objects processing due to cache is hitting.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1917393/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1917370] [NEW] [functional] ovn maintenance worker isn't mocked in functional tests

2021-03-01 Thread Slawek Kaplonski
Public bug reported:

In most of the functional tests there is no need to run
MaintenanceThread from the ovn mech driver. It caused a lot of error
logs in the job's output and may also cause some failures in some cases
from time to time.

** Affects: neutron
 Importance: High
 Assignee: Slawek Kaplonski (slaweq)
 Status: Confirmed


** Tags: functional-tests ovn

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1917370

Title:
  [functional] ovn maintenance worker isn't mocked in functional tests

Status in neutron:
  Confirmed

Bug description:
  In most of the functional tests there is no need to run
  MaintenanceThread from the ovn mech driver. It caused a lot of error
  logs in the job's output and may also cause some failures in some
  cases from time to time.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1917370/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1832021] Re: Checksum drop of metadata traffic on isolated networks with DPDK

2021-03-01 Thread Erlon R. Cruz
** Description changed:

+ [Impact]
+ 
  When an isolated network using provider networks for tenants (meaning
  without virtual routers: DVR or network node), metadata access occurs in
  the qdhcp ip netns rather than the qrouter netns.
  
  The following options are set in the dhcp_agent.ini file:
  force_metadata = True
  enable_isolated_metadata = True
  
  VMs on the provider tenant network are unable to access metadata as
  packets are dropped due to checksum.
  
- When we added the following in the qdhcp netns, VMs regained access to
- metadata:
+ [Test Plan]
  
-  iptables -t mangle -A OUTPUT -o ns-+ -p tcp --sport 80 -j CHECKSUM
- --checksum-fill
+ 1. Create an OpenStack deployment with DPDK options enabled and 'enable-
+ local-dhcp-and-metadata: true' in neutron-openvswitch. A sample, simple
+ 3 node bundle can be found here[1].
  
- It seems this setting was recently removed from the qrouter netns [0]
- but it never existed in the qdhcp to begin with.
+ 2. Create an external flat network and subnet:
  
- [0] https://review.opendev.org/#/c/654645/
+ openstack network show dpdk_net || \
+   openstack network create --provider-network-type flat \
+--provider-physical-network physnet1 dpdk_net \
+--external
  
- Related LP Bug #1831935
- See 
https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1831935/comments/10
+ openstack subnet show dpdk_net || \
+ openstack subnet create --allocation-pool 
start=10.230.58.100,end=10.230.58.200 \
+ --subnet-range 10.230.56.0/21 --dhcp --gateway 
10.230.56.1 \
+ --dns-nameserver 10.230.56.2 \
+ --ip-version 4 --network dpdk_net dpdk_subnet
+ 
+ 
+ 3. Create an instance attached to that network. The instance must have a 
flavor that uses huge pages.
+ 
+ openstack flavor create --ram 8192 --disk 50 --vcpus 4 m1.dpdk
+ openstack flavor set m1.dpdk --property hw:mem_page_size=large
+ 
+ openstack server create --wait --image xenial --flavor m1.dpdk --key-
+ name testkey --network dpdk_net i1
+ 
+ 4. Log into the instance host and check the instance console. The
+ instance will hang into the boot and show the following message:
+ 
+ 2020-11-20 09:43:26,790 - openstack.py[DEBUG]: Failed reading optional
+ path http://169.254.169.254/openstack/2015-10-15/user_data due to:
+ HTTPConnectionPool(host='169.254.169.254', port=80): Read timed out.
+ (read timeout=10.0)
+ 
+ 5. Apply the fix in all computes, restart the DHCP agents in all
+ computes and create the instance again.
+ 
+ 6. No errors should be shown and the instance quickly boots.
+ 
+ 
+ [Where problems could occur]
+ 
+ * This change is only touched if datapath_type and ovs_use_veth. Those 
settings are mostly used for DPDK environments. The core of the fix is
+ to toggle off checksum offload done by the DHCP namespace interfaces.
+ This will have the drawback of adding some overhead on the packet processing 
for DHCP traffic but given DHCP does not demand too much data, this should be a 
minor proble.
+ 
+ * Future changes on the syntax of the ethtool command could cause
+ regressions
+ 
+ 
+ [Other Info]
+ 
+  * None
+ 
+ 
+ [1] https://gist.github.com/sombrafam/e0741138773e444960eb4aeace6e3e79

** Also affects: cloud-archive
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1832021

Title:
  Checksum drop of metadata traffic on isolated networks with DPDK

Status in OpenStack neutron-openvswitch charm:
  Fix Released
Status in Ubuntu Cloud Archive:
  New
Status in neutron:
  Fix Released

Bug description:
  [Impact]

  When an isolated network using provider networks for tenants (meaning
  without virtual routers: DVR or network node), metadata access occurs
  in the qdhcp ip netns rather than the qrouter netns.

  The following options are set in the dhcp_agent.ini file:
  force_metadata = True
  enable_isolated_metadata = True

  VMs on the provider tenant network are unable to access metadata as
  packets are dropped due to checksum.

  [Test Plan]

  1. Create an OpenStack deployment with DPDK options enabled and
  'enable-local-dhcp-and-metadata: true' in neutron-openvswitch. A
  sample, simple 3 node bundle can be found here[1].

  2. Create an external flat network and subnet:

  openstack network show dpdk_net || \
openstack network create --provider-network-type flat \
 --provider-physical-network physnet1 dpdk_net \
 --external

  openstack subnet show dpdk_net || \
  openstack subnet create --allocation-pool 
start=10.230.58.100,end=10.230.58.200 \
  --subnet-range 10.230.56.0/21 --dhcp --gateway 
10.230.56.1 \
  --dns-nameserver 10.230.56.2 \
  

[Yahoo-eng-team] [Bug 1735724] Re: Metadata iptables rules never inserted upon exception on router creation

2021-03-01 Thread Jeremy Stanley
Thanks for digging into the report. Based on your analysis, the VMT has
no plans to issue an advisory, since none of our supported releases is
considered vulnerable to this any longer. If new information is brought
to light which indicates there is still a means to exploit this flaw in
more recent releases, we're happy to reconsider the decision at that
time.

** Changed in: ossa
   Status: Incomplete => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1735724

Title:
  Metadata iptables rules never inserted upon exception on router
  creation

Status in neutron:
  Fix Released
Status in OpenStack Security Advisory:
  Won't Fix

Bug description:
  We've been debugging some issues being seen lately [0] and found out
  that there's a bug in l3 agent when creating routers (or during
  initial sync). Jakub Libosvar and I spent some time recreating the
  issue and this is what we got:

  Especially since we bumped to ovsdbapp 0.8.0, we've seen some jobs
  failing due to errors when authenticating using PK to a VM. The TCP
  connection to the SSH port was successfully established but the
  authentication failed. After debugging further, we found out that
  metadata rules in qrouter namespace which redirect traffic to haproxy
  (which replaced old neutron-ns-metadata-proxy) were missing, so VM's
  weren't fetching metadata (hence, public key).

  These rules are installed by metadata driver after a router is created [1] on 
the AFTER_CREATE notification. Also, they will get created during the initial 
sync of the l3 agent (since it's still unknown for the agent) [2]. Here, if we 
don't know the router yet, we'll call _proccess_added_router() and if it's a 
known router we'll call _process_updated_router().
  After our tests, we've seen that iptables rules are never restored if we 
simulate an
  Exception inside ri.process() at [3] even though the router is scheduled for 
resync [4]. The reason why this happens is because we've already added it to 
our router info [5] so even though
  ri.process() fails at L481 and it's scheduled for resync, next time 
_process_updated_router()
  will get called instead of _process_added_router() thus not pushing the 
notification into
  metadata driver to install iptables rules and they never get installed. 

  In conclusion, if an error occurs during _process_added_router() we might end 
up losing
  metadata forever until we restart the agent and this call succeeds. Worse, we 
will be
  forwarding metadata requests via br-ex which could lead to security issues 
(ie. we could be injecting wrong metadata from the outside or the metadata 
server running in the underlying cloud may respond).

  With ovsdbapp 0.9.0 we're minimizing this because if a port fails to be added 
to br-int, ovsdbapp will enqueue the transaction instead of throwing an 
Exception but there could be still some other exceptions I guess that 
reproduces this scenario outside of ovsdbapp so we need to fix it
  in Neutron.

  Thanks
  Daniel Alvarez

  ---

  [0] https://bugs.launchpad.net/tripleo/+bug/1731063
  [1] 
https://github.com/openstack/neutron/blob/02fa049c5f5a38a276bec6e55c68ac19cd08c59f/neutron/agent/metadata/driver.py#L288
  [2] 
https://github.com/openstack/neutron/blob/02fa049c5f5a38a276bec6e55c68ac19cd08c59f/neutron/agent/l3/agent.py#L472
  [3] 
https://github.com/openstack/neutron/blob/02fa049c5f5a38a276bec6e55c68ac19cd08c59f/neutron/agent/l3/agent.py#L481
  [4] 
https://github.com/openstack/neutron/blob/02fa049c5f5a38a276bec6e55c68ac19cd08c59f/neutron/agent/l3/agent.py#L565
  [5] 
https://github.com/openstack/neutron/blob/02fa049c5f5a38a276bec6e55c68ac19cd08c59f/neutron/agent/l3/agent.py#L478

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1735724/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1892361] Re: SRIOV instance gets type-PF interface, libvirt kvm fails

2021-03-01 Thread Chris MacNaughton
This bug was fixed in the package nova - 2:20.5.0-0ubuntu1~cloud0
---

 nova (2:20.5.0-0ubuntu1~cloud0) bionic-train; urgency=medium
 .
   * New stable point release for OpenStack Train (LP: #1915787).
   * d/p/lp1892361.patch: Removed after change landed upstream.
 .
 nova (2:20.4.1-0ubuntu1~cloud1) bionic-train; urgency=medium
 .
   * d/p/lp1892361.patch: Update pci stat pools based on PCI device changes 
(LP: #1892361).


** Changed in: cloud-archive/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1892361

Title:
  SRIOV instance gets type-PF interface, libvirt kvm fails

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive queens series:
  New
Status in Ubuntu Cloud Archive rocky series:
  New
Status in Ubuntu Cloud Archive stein series:
  New
Status in Ubuntu Cloud Archive train series:
  Fix Released
Status in Ubuntu Cloud Archive ussuri series:
  Fix Released
Status in Ubuntu Cloud Archive victoria series:
  Fix Released
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  In Progress
Status in OpenStack Compute (nova) rocky series:
  In Progress
Status in OpenStack Compute (nova) stein series:
  Fix Committed
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released
Status in OpenStack Compute (nova) victoria series:
  Fix Released
Status in nova package in Ubuntu:
  Fix Released
Status in nova source package in Bionic:
  New
Status in nova source package in Focal:
  Fix Released
Status in nova source package in Groovy:
  Fix Released
Status in nova source package in Hirsute:
  Fix Released

Bug description:
  When spawning an SR-IOV enabled instance on a newly deployed host,
  nova attempts to spawn it with an type-PF pci device. This fails with
  the below stack trace.

  After restarting neutron-sriov-agent and nova-compute services on the
  compute node and spawning an SR-IOV instance again, a type-VF pci
  device is selected, and instance spawning succeeds.

  Stack trace:
  2020-08-20 08:29:09.558 7624 DEBUG oslo_messaging._drivers.amqpdriver [-] 
received reply msg_id: 6db8011e6ecd4fd0aaa53c8f89f08b1b __call__ 
/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:400
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager 
[req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 
dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 
015e4fd7db304665ab5378caa691bb8b] [insta
  nce: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Instance failed to spawn: 
libvirtError: unsupported configuration: Interface type hostdev is currently 
supported on SR-IOV Virtual Functions only
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11] Traceback (most recent call last):
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11]   File 
"/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2274, in 
_build_resources
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11] yield resources
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11]   File 
"/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2054, in 
_build_and_run_instance
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11] block_device_info=block_device_info)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11]   File 
"/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3147, in 
spawn
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11] destroy_disks_on_failure=True)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11]   File 
"/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5651, in 
_create_domain_and_network
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11] destroy_disks_on_failure)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11]   File 
"/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11] self.force_reraise()
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11]   File 
"/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
  

[Yahoo-eng-team] [Bug 1735724] Re: Metadata iptables rules never inserted upon exception on router creation

2021-03-01 Thread Slawek Kaplonski
I was trying to reproduce that issue today and I couldn't.
Looking at the code it seems for me that after Brian's change [1] those rules 
are now added to the iptables_manager during creation of the router_info 
instance. So it's way before ri.process() is really called. If there will be 
any issue in that constuctor, there will be even no namespace created at all 
for the router.

[1] https://review.openstack.org/524406

** Changed in: neutron
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1735724

Title:
  Metadata iptables rules never inserted upon exception on router
  creation

Status in neutron:
  Fix Released
Status in OpenStack Security Advisory:
  Incomplete

Bug description:
  We've been debugging some issues being seen lately [0] and found out
  that there's a bug in l3 agent when creating routers (or during
  initial sync). Jakub Libosvar and I spent some time recreating the
  issue and this is what we got:

  Especially since we bumped to ovsdbapp 0.8.0, we've seen some jobs
  failing due to errors when authenticating using PK to a VM. The TCP
  connection to the SSH port was successfully established but the
  authentication failed. After debugging further, we found out that
  metadata rules in qrouter namespace which redirect traffic to haproxy
  (which replaced old neutron-ns-metadata-proxy) were missing, so VM's
  weren't fetching metadata (hence, public key).

  These rules are installed by metadata driver after a router is created [1] on 
the AFTER_CREATE notification. Also, they will get created during the initial 
sync of the l3 agent (since it's still unknown for the agent) [2]. Here, if we 
don't know the router yet, we'll call _proccess_added_router() and if it's a 
known router we'll call _process_updated_router().
  After our tests, we've seen that iptables rules are never restored if we 
simulate an
  Exception inside ri.process() at [3] even though the router is scheduled for 
resync [4]. The reason why this happens is because we've already added it to 
our router info [5] so even though
  ri.process() fails at L481 and it's scheduled for resync, next time 
_process_updated_router()
  will get called instead of _process_added_router() thus not pushing the 
notification into
  metadata driver to install iptables rules and they never get installed. 

  In conclusion, if an error occurs during _process_added_router() we might end 
up losing
  metadata forever until we restart the agent and this call succeeds. Worse, we 
will be
  forwarding metadata requests via br-ex which could lead to security issues 
(ie. we could be injecting wrong metadata from the outside or the metadata 
server running in the underlying cloud may respond).

  With ovsdbapp 0.9.0 we're minimizing this because if a port fails to be added 
to br-int, ovsdbapp will enqueue the transaction instead of throwing an 
Exception but there could be still some other exceptions I guess that 
reproduces this scenario outside of ovsdbapp so we need to fix it
  in Neutron.

  Thanks
  Daniel Alvarez

  ---

  [0] https://bugs.launchpad.net/tripleo/+bug/1731063
  [1] 
https://github.com/openstack/neutron/blob/02fa049c5f5a38a276bec6e55c68ac19cd08c59f/neutron/agent/metadata/driver.py#L288
  [2] 
https://github.com/openstack/neutron/blob/02fa049c5f5a38a276bec6e55c68ac19cd08c59f/neutron/agent/l3/agent.py#L472
  [3] 
https://github.com/openstack/neutron/blob/02fa049c5f5a38a276bec6e55c68ac19cd08c59f/neutron/agent/l3/agent.py#L481
  [4] 
https://github.com/openstack/neutron/blob/02fa049c5f5a38a276bec6e55c68ac19cd08c59f/neutron/agent/l3/agent.py#L565
  [5] 
https://github.com/openstack/neutron/blob/02fa049c5f5a38a276bec6e55c68ac19cd08c59f/neutron/agent/l3/agent.py#L478

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1735724/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1904399] Re: [OVN] Inconsistent "flooding to unregistered" IGMP configuration

2021-03-01 Thread Corey Bryant
This bug was fixed in the package neutron - 2:16.2.0-0ubuntu3~cloud0
---

 neutron (2:16.2.0-0ubuntu3~cloud0) bionic-ussuri; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 neutron (2:16.2.0-0ubuntu3) focal; urgency=medium
 .
   * d/p/ovn-fix-inconsistent-igmp-configuration.patch: Cherry-picked from
 upstream stable/ussuri to ensure flooding of unregistered multicast
 packets to all ports is disabled (LP: #1904399).


** Changed in: cloud-archive/ussuri
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1904399

Title:
  [OVN] Inconsistent "flooding to unregistered" IGMP configuration

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive ussuri series:
  Fix Released
Status in Ubuntu Cloud Archive victoria series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Focal:
  Fix Released
Status in neutron source package in Groovy:
  Fix Released
Status in neutron source package in Hirsute:
  Fix Released

Bug description:
  ML2/OVN reuses the same "[ovs]/igmp_snooping_enable" configuration
  option from ML2/OVS, which says [0]:

  "Setting this option to True will also enable Open vSwitch mcast-
  snooping-disable-flood-unregistered flag. This option will disable
  flooding of unregistered multicast packets to all ports."

  But, that's not true for ML2/OVN, in fact, this is the opposite
  because ML2/OVN does have the option to flood to unregistered VMs
  enabled by default.

  In order to keep the consistent between both drivers and this
  configuration option, ML2/OVN needs to disable the
  "mcast_flood_unregistered" configuration in the other_config column
  from the Logical Switch table when igmp_snooping_enable is True.

  [0]
  
https://opendev.org/openstack/neutron/src/branch/master/neutron/conf/agent/ovs_conf.py#L36-L47

  [Impact]

  See above.

  [Test Case]

  Run the following and expect success:
  root@f1:~# sudo apt install python3-neutron
  root@f1:/usr/lib/python3/dist-packages# python3 -m unittest 
neutron.tests.unit.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_maintenance.TestDBInconsistenciesPeriodics.test_check_for_igmp_snoop_support

  I would also like to get test feedback from Canonical bootstack as
  they are hitting this issue.

  [Regression Potential]
  This is a very minimal and targeted change that always hard codes 
MCAST_FLOOD_UNREGISTERED to 'false'.

  In assessing regression potential for changes like this, one that
  comes to mind is potential of a type error when setting
  MCAST_FLOOD_UNREGISTERED. Upon visual inspection of this code fix, a
  type error would be impossible, as what was once set to a 'true' or
  'false' value is now set to 'false'.

  Another thought is whether MCAST_FLOOD_UNREGISTERED has any use if
  MCAST_SNOOP is set to false, but that is not the case according to
  upstream OVN documentation which states: mcast_flood_unregistered:
  optional string, either true or false Determines whether unregistered
  multicast traffic should be flooded or not. Only applicable if
  other_config:mcast_snoop is enabled.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1904399/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1904399] Re: [OVN] Inconsistent "flooding to unregistered" IGMP configuration

2021-03-01 Thread Corey Bryant
This bug was fixed in the package neutron - 2:17.0.0-0ubuntu3~cloud0
---

 neutron (2:17.0.0-0ubuntu3~cloud0) focal-victoria; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 neutron (2:17.0.0-0ubuntu3) groovy; urgency=medium
 .
   * d/p/ovn-fix-inconsistent-igmp-configuration.patch: Cherry-picked from
 upstream stable/victoria to ensure flooding of unregistered multicast
 packets to all ports is disabled (LP: #1904399).


** Changed in: cloud-archive/victoria
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1904399

Title:
  [OVN] Inconsistent "flooding to unregistered" IGMP configuration

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive ussuri series:
  Fix Released
Status in Ubuntu Cloud Archive victoria series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Focal:
  Fix Released
Status in neutron source package in Groovy:
  Fix Released
Status in neutron source package in Hirsute:
  Fix Released

Bug description:
  ML2/OVN reuses the same "[ovs]/igmp_snooping_enable" configuration
  option from ML2/OVS, which says [0]:

  "Setting this option to True will also enable Open vSwitch mcast-
  snooping-disable-flood-unregistered flag. This option will disable
  flooding of unregistered multicast packets to all ports."

  But, that's not true for ML2/OVN, in fact, this is the opposite
  because ML2/OVN does have the option to flood to unregistered VMs
  enabled by default.

  In order to keep the consistent between both drivers and this
  configuration option, ML2/OVN needs to disable the
  "mcast_flood_unregistered" configuration in the other_config column
  from the Logical Switch table when igmp_snooping_enable is True.

  [0]
  
https://opendev.org/openstack/neutron/src/branch/master/neutron/conf/agent/ovs_conf.py#L36-L47

  [Impact]

  See above.

  [Test Case]

  Run the following and expect success:
  root@f1:~# sudo apt install python3-neutron
  root@f1:/usr/lib/python3/dist-packages# python3 -m unittest 
neutron.tests.unit.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_maintenance.TestDBInconsistenciesPeriodics.test_check_for_igmp_snoop_support

  I would also like to get test feedback from Canonical bootstack as
  they are hitting this issue.

  [Regression Potential]
  This is a very minimal and targeted change that always hard codes 
MCAST_FLOOD_UNREGISTERED to 'false'.

  In assessing regression potential for changes like this, one that
  comes to mind is potential of a type error when setting
  MCAST_FLOOD_UNREGISTERED. Upon visual inspection of this code fix, a
  type error would be impossible, as what was once set to a 'true' or
  'false' value is now set to 'false'.

  Another thought is whether MCAST_FLOOD_UNREGISTERED has any use if
  MCAST_SNOOP is set to false, but that is not the case according to
  upstream OVN documentation which states: mcast_flood_unregistered:
  optional string, either true or false Determines whether unregistered
  multicast traffic should be flooded or not. Only applicable if
  other_config:mcast_snoop is enabled.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1904399/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1917322] [NEW] cloudinit.net.get_devicelist includes "bonding_masters" if present

2021-03-01 Thread Dan Watkins
Public bug reported:

$ ls -l /sys/class/net/
total 0
lrwxrwxrwx 1 root root0 Feb 26 21:51 bond0 -> 
../../devices/virtual/net/bond0
-rw-r--r-- 1 root root 4096 Feb 26 21:51 bonding_masters
lrwxrwxrwx 1 root root0 Feb 26 21:51 enp5s0 -> 
../../devices/pci:00/:00:01.4/:05:00.0/virtio12/net/enp5s0
lrwxrwxrwx 1 root root0 Feb 26 21:51 lo -> ../../devices/virtual/net/lo
lrwxrwxrwx 1 root root0 Feb 26 21:51 ovs-br -> 
../../devices/virtual/net/ovs-br
lrwxrwxrwx 1 root root0 Feb 26 21:51 ovs-br.100 -> 
../../devices/virtual/net/ovs-br.100
lrwxrwxrwx 1 root root0 Feb 26 21:51 ovs-system -> 
../../devices/virtual/net/ovs-system

$ python3 -c "from cloudinit.net import get_devicelist; print(get_devicelist())"
['bonding_masters', 'enp5s0', 'bond0', 'ovs-system', 'ovs-br.100', 'lo', 
'ovs-br']

** Affects: cloud-init
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1917322

Title:
  cloudinit.net.get_devicelist includes "bonding_masters" if present

Status in cloud-init:
  New

Bug description:
  $ ls -l /sys/class/net/
  total 0
  lrwxrwxrwx 1 root root0 Feb 26 21:51 bond0 -> 
../../devices/virtual/net/bond0
  -rw-r--r-- 1 root root 4096 Feb 26 21:51 bonding_masters
  lrwxrwxrwx 1 root root0 Feb 26 21:51 enp5s0 -> 
../../devices/pci:00/:00:01.4/:05:00.0/virtio12/net/enp5s0
  lrwxrwxrwx 1 root root0 Feb 26 21:51 lo -> ../../devices/virtual/net/lo
  lrwxrwxrwx 1 root root0 Feb 26 21:51 ovs-br -> 
../../devices/virtual/net/ovs-br
  lrwxrwxrwx 1 root root0 Feb 26 21:51 ovs-br.100 -> 
../../devices/virtual/net/ovs-br.100
  lrwxrwxrwx 1 root root0 Feb 26 21:51 ovs-system -> 
../../devices/virtual/net/ovs-system

  $ python3 -c "from cloudinit.net import get_devicelist; 
print(get_devicelist())"
  ['bonding_masters', 'enp5s0', 'bond0', 'ovs-system', 'ovs-br.100', 'lo', 
'ovs-br']

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1917322/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1904399] Re: [OVN] Inconsistent "flooding to unregistered" IGMP configuration

2021-03-01 Thread Launchpad Bug Tracker
This bug was fixed in the package neutron - 2:17.0.0-0ubuntu3

---
neutron (2:17.0.0-0ubuntu3) groovy; urgency=medium

  * d/p/ovn-fix-inconsistent-igmp-configuration.patch: Cherry-picked from
upstream stable/victoria to ensure flooding of unregistered multicast
packets to all ports is disabled (LP: #1904399).

 -- Corey Bryant   Mon, 08 Feb 2021 12:25:46
-0500

** Changed in: neutron (Ubuntu Groovy)
   Status: Fix Committed => Fix Released

** Changed in: neutron (Ubuntu Focal)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1904399

Title:
  [OVN] Inconsistent "flooding to unregistered" IGMP configuration

Status in Ubuntu Cloud Archive:
  Fix Committed
Status in Ubuntu Cloud Archive ussuri series:
  Fix Committed
Status in Ubuntu Cloud Archive victoria series:
  Fix Committed
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Focal:
  Fix Released
Status in neutron source package in Groovy:
  Fix Released
Status in neutron source package in Hirsute:
  Fix Released

Bug description:
  ML2/OVN reuses the same "[ovs]/igmp_snooping_enable" configuration
  option from ML2/OVS, which says [0]:

  "Setting this option to True will also enable Open vSwitch mcast-
  snooping-disable-flood-unregistered flag. This option will disable
  flooding of unregistered multicast packets to all ports."

  But, that's not true for ML2/OVN, in fact, this is the opposite
  because ML2/OVN does have the option to flood to unregistered VMs
  enabled by default.

  In order to keep the consistent between both drivers and this
  configuration option, ML2/OVN needs to disable the
  "mcast_flood_unregistered" configuration in the other_config column
  from the Logical Switch table when igmp_snooping_enable is True.

  [0]
  
https://opendev.org/openstack/neutron/src/branch/master/neutron/conf/agent/ovs_conf.py#L36-L47

  [Impact]

  See above.

  [Test Case]

  Run the following and expect success:
  root@f1:~# sudo apt install python3-neutron
  root@f1:/usr/lib/python3/dist-packages# python3 -m unittest 
neutron.tests.unit.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_maintenance.TestDBInconsistenciesPeriodics.test_check_for_igmp_snoop_support

  I would also like to get test feedback from Canonical bootstack as
  they are hitting this issue.

  [Regression Potential]
  This is a very minimal and targeted change that always hard codes 
MCAST_FLOOD_UNREGISTERED to 'false'.

  In assessing regression potential for changes like this, one that
  comes to mind is potential of a type error when setting
  MCAST_FLOOD_UNREGISTERED. Upon visual inspection of this code fix, a
  type error would be impossible, as what was once set to a 'true' or
  'false' value is now set to 'false'.

  Another thought is whether MCAST_FLOOD_UNREGISTERED has any use if
  MCAST_SNOOP is set to false, but that is not the case according to
  upstream OVN documentation which states: mcast_flood_unregistered:
  optional string, either true or false Determines whether unregistered
  multicast traffic should be flooded or not. Only applicable if
  other_config:mcast_snoop is enabled.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1904399/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp