** Description changed:
[Impact]
Netlink calls to the kernel can return more than 16k bytes (they can
return 32k on newer kernels). The pyroute2 library has a default buffer
size of 16k and fails to read the data when kernel response data
overflows this.
One example of where users encounter this is booting OpenStack instances
with SRIOV when there are more than 32 VFs, as seen in the original
problem description (included below).
[Test Case]
Use an SRIOV capable card and enable more than 32 VFs on a modern
kernel. Attempt to launch an instance using OpenStack as follows:
1. Create example network:
$ juju switch openstack
$ source ~/deploy/novarc
$ openstack network create \
--provider-physical-network sriovfabric \
--provider-segment 300 \
--provider-network-type vlan \
test-sriov
$ openstack subnet create --network test-sriov \
--no-dhcp \
--gateway none \
--subnet-range 192.168.1.0/24 test-sriov
2. Create ports over virtual function:
$ juju switch openstack
$ source ~/deploy/novarc
$ openstack port create \
--network test-sriov \
--vnic-type direct \
sriov-vf1
$ openstack server create \
--image bionic-kvm \
--flavor m1.small \
--network ext-net-300 \
--port sriov-vf1 \
--key-name ubuntu-keypair \
--availability-zone nova:cmp4az1cz20300kvs.mgt.pst.stg.tlc.example.com \
sriov-vf1
3. The instance stalls in build state (virsh list shows paused VM) and
drops to ERROR
[Where problems could occur]
Problems may occur in existing customers already using openstack to
schedule SRIOV instances and may show up as failure to build instances.
Additional problems could include the increased memory usage of the nova
processes which occurs by increasing the default buffer size. For
tightly spec'd systems with small memory allocated to the host, this
could further eat into any margin available and push memory usage over
the edge.
-
- [Previous Description]
-
- # Problem Description
- Attempt to boot instance with SR-IOV interface fails. Instance stays in BUILD
stage for ca 1 minute and then turns to ERROR state. Neutron agent log shows:
-
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent
[req-f116d427-838a-4366-8173-801fbe84e406 - - - - -] Error in agent loop.
Devices info: {}: TypeError: Cannot serialize error('unpack_from requires a
buffer of at least 4 bytes',)
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent Traceback (most
recent call last):
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File
"/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py",
line 473, in daemon_loop
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent device_info =
self.scan_devices(devices, updated_devices_copy)
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File
"/usr/lib/python3/dist-packages/osprofiler/profiler.py", line 160, in wrapper
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent result =
f(*args, **kwargs)
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File
"/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py",
line 243, in scan_devices
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent curr_devices =
self.eswitch_mgr.get_assigned_devices_info()
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File
"/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py",
line 344, in get_assigned_devices_info
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent for device in
embedded_switch.get_assigned_devices_info():
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File
"/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py",
line 186, in get_assigned_devices_info
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent mac =
self.get_pci_device(pci_slot)
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File
"/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py",
line 297, in get_pci_device
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent macs =
self.pci_dev_wrapper.get_assigned_macs([vf_index])
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File
"/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/pci_lib.py",
line 46, in get_assigned_macs
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent vfs =
ip.link.get_vfs()
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File
"/usr/lib/python3/dist-packages/neutron/agent/linux/ip_lib.py", line 516, in
get_vfs
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return
privileged.get_link_vfs(self.name, self._parent.namespace)
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File
"/usr/lib/python3/dist-packages/oslo_privsep/priv_context.py", line 247, in
_wrap
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return
self.channel.remote_call(name, args, kwargs)
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File
"/usr/lib/python3/dist-packages/oslo_privsep/daemon.py", line 204, in
remote_call
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent raise
exc_type(*result[2])
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent TypeError: Cannot
serialize error('unpack_from requires a buffer of at least 4 bytes',)
- 2020-11-18 10:54:58.927 53769 ERROR
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent
- 2020-11-18 10:55:00.885 53769 INFO
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent
[req-f116d427-838a-4366-8173-801fbe84e406 - - - - -] Agent out of sync with
plugin!
- 2020-11-18 10:55:02.244 53769 INFO neutron.agent.securitygroups_rpc
[req-f4627e61-2abe-49bc-bc7b-9fa9f66e1f4b 980116faf095432e8ac7887db995aeb3
78ab17a067cf49cabba6c3c5d0faabcc - - -] Security group member updated
['808d2b62-75ba-45d6-969c-87ce90d56c37']
-
- # Environment
- Openstack USSURI + OVN
- ovn-chassis version: cs:~openstack-charmers-next/ovn-chassis-40
- neutron-sriov-agent version 2:16.2.0-0ubuntu1~cloud0
- CIS hardened system.
- aa profile set to disable, AppArmor profiles teardown applied.
- neutron-sriov-agent reports UP in openstack network agent list.
-
- charm configuration:
- charm: ovn-chassis
- settings:
- bridge-interface-mappings:
- value: br-data:bond1
- debug:
- value: false
- dpdk-bond-config:
- value: :balance-tcp:active:fast
- dpdk-bond-mappings:
- dpdk-driver:
- dpdk-socket-cores:
- value: 1
- dpdk-socket-memory:
- value: 1024
- enable-dpdk:
- value: false
- enable-hardware-offload:
- value: false
- enable-sriov:
- value: true
- new-units-paused:
- value: false
- openstack-metadata-workers:
- value: 2
- ovn-bridge-mappings:
- value: dcfabric:br-data sriovfabric:br-data
- sriov-device-mappings:
- value: sriovfabric:ens3f0 sriovfabric:ens3f1
- sriov-numvfs:
- value: ens3f0:64 ens3f0:64
-
- Agent config:
- root@cmp4az1cz20300kvs:~# cat /etc/neutron/plugins/ml2/sriov_agent.ini
-
###############################################################################
- # [ WARNING ]
- # Configuration file maintained by Juju. Local changes may be overwritten.
- # Config managed by ovn-chassis charm
-
###############################################################################
- [securitygroup]
- firewall_driver = neutron.agent.firewall.NoopFirewallDriver
-
- [sriov_nic]
- physical_device_mappings = sriovfabric:ens3f0,sriovfabric:ens3f1
- exclude_devices =
-
- root@cmp4az1cz20300kvs:~# cat /etc/neutron/neutron.conf
-
###############################################################################
- # [ WARNING ]
- # Configuration file maintained by Juju. Local changes may be overwritten.
- # Config managed by ovn-chassis charm
-
###############################################################################
- [DEFAULT]
- debug = False
- host = cmp4az1cz20300kvs.mgt.pst.stg.tlc.example.com
- core_plugin = neutron.plugins.ml2.plugin.Ml2Plugin
-
- # This template must be included under the [DEFAULT] section
-
- transport_url =
-
rabbit://neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.12:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.243:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.44:5672/openstack
-
- [oslo_messaging_notifications]
- driver = messagingv2
- # This template must be included under the [DEFAULT] section
-
- transport_url =
-
rabbit://neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.12:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.243:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.44:5672/openstack
-
- topics = notifications
- [AGENT]
- root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.confroot
-
- # STEPS TO REPRODUCE
- - apply environment config as above
- - create networking and the instance
- Create example network:
- $ juju switch openstack
- $ source ~/deploy/novarc
- $ openstack network create \
- --provider-physical-network sriovfabric \
- --provider-segment 300 \
- --provider-network-type vlan \
- test-sriov
-
- $ openstack subnet create --network test-sriov \
- --no-dhcp \
- --gateway none \
- --subnet-range 192.168.1.0/24 test-sriov
-
- Create ports over virtual function:
- $ juju switch openstack
- $ source ~/deploy/novarc
- $ openstack port create \
- --network test-sriov \
- --vnic-type direct \
- sriov-vf1
-
- $ openstack server create \
- --image bionic-kvm \
- --flavor m1.small \
- --network ext-net-300 \
- --port sriov-vf1 \
- --key-name ubuntu-keypair \
- --availability-zone nova:cmp4az1cz20300kvs.mgt.pst.stg.tlc.example.com \
- sriov-vf1
-
- - the instance stalls in build state (virsh list shows paused VM) and
- drops to ERROR
** Information type changed from Private to Public
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1904730
Title:
neutron-agent-sriov fails to create port
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pyroute2/+bug/1904730/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs