[Yahoo-eng-team] [Bug 1982373] Re: nova/neutron ignore and overwrite custom device owner fields
Since it's nova logic to update port I guess the bug should be filed against nova project. @akkaris what do you think? Also in the last statement "the port is actually bound now to the instance" - I can't see this from "openstack server list" output, am I missing something? ** Changed in: neutron Status: New => Opinion -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1982373 Title: nova/neutron ignore and overwrite custom device owner fields Status in neutron: Opinion Bug description: nova/neutron ignore custom device owner fields when the device_id matches a nova server. The fact that the device_owner field is set to something other than nova is completely ignored. Sequence of command line actions: ~~~ ~~~ stack@standalone ~]$ openstack server list +--+-++-+---+---+ | ID | Name| Status | Networks| Image | Flavor| +--+-++-+---+---+ | 382c107f-a082-4e9b-8adb-2ba45323c479 | ostest-lq27s-worker-0-cz6gw | ACTIVE | ostest-lq27s-openshift=10.196.2.215 | rhcos | m1.large | | 985a609a-1fdd-4f48-b996-9311883c33a2 | ostest-lq27s-worker-0-5vcxf | ACTIVE | ostest-lq27s-openshift=10.196.2.151 | rhcos | m1.large | ~~~ ~~~ # openstack port create --network ed889e25-f8fa-4684-a9c4-54fff8de37b8 --device 382c107f-a082-4e9b-8adb-2ba45323c479 --device-owner TestOwner --fixed-ip subnet=ba4e5cdb-a0e3-47f2-9233-47d5a12c,ip-address=10.196.100.200 TestPort (...) | id | 697f4773-7fe7-4d1b-9804-8fbb003b1194 (...) # openstack port create --network ed889e25-f8fa-4684-a9c4-54fff8de37b8 --device 382c107f-a082-4e9b-8adb-2ba45323c479 --device-owner TestOwner --fixed-ip subnet=ba4e5cdb-a0e3-47f2-9233-47d5a12c,ip-address=10.196.100.201 TestPort2 (...) | id | bc22dfa9-90fa-4d70-84a8-ec3a41ea2305 (...) ~~~ Now, run this in a terminal: ~~~ while true ; do sleep 10 ; date ; openstack port show 697f4773-7fe7-4d1b-9804-8fbb003b1194 | grep device_owner; done Wed Jul 20 14:21:26 UTC 2022 | device_owner| TestOwner | Wed Jul 20 14:21:38 UTC 2022 | device_owner| TestOwner | Wed Jul 20 14:21:51 UTC 2022 | device_owner| TestOwner | Wed Jul 20 14:22:03 UTC 2022 | device_owner| TestOwner | Wed Jul 20 14:22:15 UTC 2022 | device_owner| TestOwner | Wed Jul 20 14:22:28 UTC 2022 | device_owner| TestOwner | Wed Jul 20 14:22:40 UTC 2022 | device_owner| TestOwner | (...) ~~~ In another terminal, delete and recreate the second port: ~~~ [stack@standalone ~]$ openstack port delete bc22dfa9-90fa-4d70-84a8-ec3a41ea2305 [stack@standalone ~]$ openstack port create --network ed889e25-f8fa-4684-a9c4-54fff8de37b8 --device 382c107f-a082-4e9b-8adb-2ba45323c479 --device-owner TestOwner --fixed-ip subnet=ba4e5cdb-a0e3-47f2-9233-47d5a12c,ip-address=10.196.100.201 TestPort2 (...) | id | bc22dfa9-90fa-4d70-84a8-ec3a41ea2305 (...) ~~~ Check in the terminal that's running the while loop: ~~~ Wed Jul 20 14:22:53 UTC 2022 | device_owner| TestOwner
[Yahoo-eng-team] [Bug 1969615] Re: OVS: flow loop is created with openvswitch version 2.16
So flows look the same for both 2.15 and 2.16 (no surprise here), just that in 2.16 case this weird ofport 7 appears out of nowhere according to vswitchd log, and in fact there's no such ofport on the bridge. Also flow counters are zero for 2.16 case: cookie=0xb722108b439955c3, duration=81.938s, table=0, n_packets=0, n_bytes=0, idle_age=81, priority=0 actions=resubmit(,60) for 2.15 we see packets: cookie=0xb722108b439955c3, duration=631.481s, table=0, n_packets=35, n_bytes=2870, idle_age=20, priority=0 actions=resubmit(,60) Not sure it's a neutron issue, probably openvswitch folks could point some ways to debug. ** Changed in: neutron Status: Confirmed => Opinion -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1969615 Title: OVS: flow loop is created with openvswitch version 2.16 Status in neutron: Opinion Bug description: * Summary neutron-openvswitch-agent is causing a flow loop when using Openvswitch version 2.16. * High level description Running Neutron in Xena release using the openvswitch plugin is causing a flow loop when using openvswitch version 2.16. This does not occur when deploying openvswitch version 2.15. * Pre-conditions Ansible-Kolla based deployment using "source: ubuntu" in stable/xena release. neutron_plugin_agent: "openvswitch". Deploying a 3 node cluster with basic Openstack services. * Version: ** OpenStack version: Xena ** Linux distro: kolla-ansible stable/xena, Ubuntu 20.04.4 LTS * Step-by-step 1. Deploy Openstack using kolla-ansible from stable/xena branch 2. Create a project network/subnet for Octavia 3. Create Octavia health-manager ports in Neutron for the 3 control nodes 4. Create the ports on each control node as ovs bridge ports 5. Assign IP addresses to the o-hm0 interfaces on all 3 nodes 6. try to ping one node from another node ubuntu@ctl1:~$ openstack network show lb-mgmt +---+--+ | Field | Value| +---+--+ | admin_state_up| UP | | availability_zone_hints | | | availability_zones| nova | | created_at| 2022-04-20T10:36:26Z | | description | | | dns_domain| None | | id| c0c1b3ec-a6c3-4145-b94a-6c7fa4d7a740 | | ipv4_address_scope| None | | ipv6_address_scope| None | | is_default| None | | is_vlan_transparent | None | | mtu | 1450 | | name | lb-mgmt | | port_security_enabled | True | | project_id| 6cbb86e577a042499529110f6a1e8603 | | provider:network_type | vxlan| | provider:physical_network | None | | provider:segmentation_id | 577 | | qos_policy_id | None | | revision_number | 2| | router:external | Internal | | segments | None | | shared| False| | status| ACTIVE | | subnets | bf004f5a-4cae-4277-a3f4-a4cf787033cb | | tags | | | updated_at| 2022-04-20T10:36:28Z | +---+--+ ubuntu@ctl1:~$ openstack subnet show lb-mgmt +--+--+ | Field| Value| +--+--+ | allocation_pools | 172.16.1.1-172.16.255.254| | cidr | 172.16.0.0/16| | created_at | 2022-04-20T10:36:28Z | | description | | | dns_nameservers | | | dns_publish_fixed_ip | None | | enable_dhcp | True | | gateway_ip | 172.16.0.1 | |
[Yahoo-eng-team] [Bug 1961173] [NEW] [fullstack] test_vm_is_accessible_by_local_ip fails sometimes
Public bug reported: Happens for (with_conntrack_rules) scenario. Examples: - https://b4c71a9e78e49e1ca534-33cd363c3f72485dda255154bdda0fc8.ssl.cf1.rackcdn.com/829247/2/check/neutron- fullstack-with-uwsgi/cdc875c/testr_results.html - https://1c11d883c451b6b39e08-76fe6537709af1be557ea31f3d630d58.ssl.cf5.rackcdn.com/829022/3/check/neutron- fullstack-with-uwsgi/0243e12/testr_results.html Traceback (most recent call last): File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 183, in func return f(self, *args, **kwargs) File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/test_local_ip.py", line 111, in test_vm_is_accessible_by_local_ip vms.ping_all() File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/resources/machine.py", line 46, in ping_all vm_1.block_until_ping(vm_2.ip) File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/common/machine_fixtures.py", line 67, in block_until_ping utils.wait_until_true( File "/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", line 722, in wait_until_true raise exception neutron.tests.common.machine_fixtures.FakeMachineException: No ICMP reply obtained from IP address 10.0.0.38 The test fails even before Local IP creation - on initial VMs connectivity check ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: Confirmed ** Tags: gate-failure ** Description changed: - Happens for (with_conntrack_rules) scenario, example + Happens for (with_conntrack_rules) scenario. + Examples: + - https://b4c71a9e78e49e1ca534-33cd363c3f72485dda255154bdda0fc8.ssl.cf1.rackcdn.com/829247/2/check/neutron- fullstack-with-uwsgi/cdc875c/testr_results.html + - + https://1c11d883c451b6b39e08-76fe6537709af1be557ea31f3d630d58.ssl.cf5.rackcdn.com/829022/3/check/neutron- + fullstack-with-uwsgi/0243e12/testr_results.html + + Traceback (most recent call last): - File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 183, in func - return f(self, *args, **kwargs) - File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/test_local_ip.py", line 111, in test_vm_is_accessible_by_local_ip - vms.ping_all() - File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/resources/machine.py", line 46, in ping_all - vm_1.block_until_ping(vm_2.ip) - File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/common/machine_fixtures.py", line 67, in block_until_ping - utils.wait_until_true( - File "/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", line 722, in wait_until_true - raise exception + File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 183, in func + return f(self, *args, **kwargs) + File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/test_local_ip.py", line 111, in test_vm_is_accessible_by_local_ip + vms.ping_all() + File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/resources/machine.py", line 46, in ping_all + vm_1.block_until_ping(vm_2.ip) + File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/common/machine_fixtures.py", line 67, in block_until_ping + utils.wait_until_true( + File "/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", line 722, in wait_until_true + raise exception neutron.tests.common.machine_fixtures.FakeMachineException: No ICMP reply obtained from IP address 10.0.0.38 - - The test fails even before Local IP creation - on initial VMs connectivity check + The test fails even before Local IP creation - on initial VMs + connectivity check -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1961173 Title: [fullstack] test_vm_is_accessible_by_local_ip fails sometimes Status in neutron: Confirmed Bug description: Happens for (with_conntrack_rules) scenario. Examples: - https://b4c71a9e78e49e1ca534-33cd363c3f72485dda255154bdda0fc8.ssl.cf1.rackcdn.com/829247/2/check/neutron- fullstack-with-uwsgi/cdc875c/testr_results.html - https://1c11d883c451b6b39e08-76fe6537709af1be557ea31f3d630d58.ssl.cf5.rackcdn.com/829022/3/check/neutron- fullstack-with-uwsgi/0243e12/testr_results.html Traceback (most recent call last): File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 183, in func return f(self, *args, **kwargs) File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/test_local_ip.py", line 111, in test_vm_is_accessible_by_local_ip vms.p
[Yahoo-eng-team] [Bug 1958627] Re: Incomplete ARP entries on L3 gw namespace
Seems more related to neutron-dynamic-routing project ** Tags added: l3-bgp ** Changed in: neutron Importance: Undecided => Medium ** Changed in: neutron Status: New => Opinion -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1958627 Title: Incomplete ARP entries on L3 gw namespace Status in neutron: Opinion Bug description: Setup information: Legacy l3 router + bgp + Public Address I see a lot unnecessary ARP request traffic to all instances in same network. Check bgp speaker advertised routes:(fake addresses) openstack bgp speaker list advertised routes 3e533042-729a-4782-8b8d-x +--+---+ | Destination | Nexthop | +--+---+ | 99.99.99.0/24 | 99.99.99.1| +--+---+ Check L3 gw namespace arp table:(large count of incomplete ARP entries) ip netns exec qrouter-b524c5fc-dc91-41cb--ceded936xxx arp -ne Address HWtype HWaddress Flags Mask Iface 99.99.99.92ether (incomplete). C qg-6a574a15-db 99.99.99.96ether fa:16:3e:c6:85:28 C qr-7a4cfad1-f7 99.99.99.97ether fa:16:3e:b4:3d:28 C qr-7a4cfad1-f7 99.99.99.90ether fa:16:3e:83:1f:4b C qr-7a4cfad1-f7 99.99.99.91(incomplete) qr-7a4cfad1-f7 99.99.99.98ether (incomplete)C qr-7a4cfad1-f7 99.99.99.99ether fa:16:3e:c6:e6:fd C qr-7a4cfad1-f7 99.99.99.94ether fa:16:3e:dc:34:74 C qr-7a4cfad1-f7 99.99.99.95(incomplete) qr-7a4cfad1-f7 99.99.99.92ether fa:16:3e:51:af:ef C qr-7a4cfad1-f7 99.99.99.93(incomplete) qr-7a4cfad1-f7 99.99.99.98(incomplete) qr-7a4cfad1-f7 99.99.99.96(incomplete) qr-7a4cfad1-f7 . Neutron adds all subnet IPs and try arping all time. Wallaby release To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1958627/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1958128] Re: Neutron l3 agent keeps restarting (Ubuntu)
Marking "Invalid" for neutron based on Brian's last comment ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1958128 Title: Neutron l3 agent keeps restarting (Ubuntu) Status in neutron: Invalid Bug description: After following neutron install guide, when trying to create a floating IP, the request succeeded, but the floating IP never became reachable. Looking at neutron-l3-agent status, I could see that it was restarting every 2 seconds, failing with an exception of 'file not found /etc/neutron/fwaas_driver.ini'. As a temporary fix, I touched the file to create an empty one, and the service started without any errors and the floating IP started working. My configuration is exactly the one provided in the install guide, I didn't change anything. Maybe the documentation should contain a step to avoid this issue ? - [ ] This doc is inaccurate in this way: __ - [x] This is a doc addition request. - [ ] I have a fix to the document that I can paste below including example: input and output. If you have a troubleshooting or support issue, use the following resources: - The mailing list: https://lists.openstack.org - IRC: 'openstack' channel on OFTC --- Release: 19.1.1.dev10 on 2019-08-21 16:09:09 SHA: d202e323d7f03edc56add8e83aeb9cddbbbce895 Source: https://opendev.org/openstack/neutron/src/doc/source/install/controller-install-ubuntu.rst URL: https://docs.openstack.org/neutron/xena/install/controller-install-ubuntu.html To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1958128/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1948676] Re: rpc response timeout for agent report_state is not possible
Did you investigate "This has the side effect that if a rabbitmq or neutron-server is restarted all agents that is currently reporting there will hang for a long time until report_state times out"? Is it expected behavior from messaging side? ** Changed in: neutron Status: In Progress => Opinion -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1948676 Title: rpc response timeout for agent report_state is not possible Status in neutron: Opinion Bug description: When hosting a large amount of routers and/or networks the RPC calls from the agents can take a long time which requires us to increase the rpc_response_timeout from the default of 60 seconds to a higher value for the agents to not timeout. This has the side effect that if a rabbitmq or neutron-server is restarted all agents that is currently reporting there will hang for a long time until report_state times out, during this time neutron- server has not got any reports causing it to set the agent as down. When it times out and tries again the reporting will succeed but a full sync will be triggered for all agents that was previously dead. This in itself can cause a very high load on the control plane. Consider the fact that a configuration change is deployed using tooling to all neutron-server nodes which is restarted, all agents will die, when they either 1) come back after rpc_response_timeout is reached and tries again or 2) is restarted manually all of them will do a full sync. We should have a configuration option that only applies to the rpc timeout for the report_state RPC call from agents because that could be lowered to be within the bounds of the agent not being seen as down. The old behavior can be kept by simply falling back to rpc_response_timeout by default instead of introducing a new default in this override. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1948676/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1939125] Re: Incorect Auto schedule new network segments notification listner
** Also affects: neutron/stein Importance: Undecided Status: New ** Also affects: neutron/queens Importance: Undecided Status: New ** Also affects: neutron/rocky Importance: Undecided Status: New ** Changed in: neutron/queens Status: New => Triaged ** Changed in: neutron/rocky Status: New => Triaged ** Changed in: neutron/stein Status: New => Triaged ** Changed in: neutron/queens Importance: Undecided => Medium ** Changed in: neutron/stein Importance: Undecided => Medium ** Changed in: neutron/rocky Importance: Undecided => Medium ** Changed in: neutron Importance: Undecided => Medium -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1939125 Title: Incorect Auto schedule new network segments notification listner Status in neutron: New Status in neutron queens series: Triaged Status in neutron rocky series: Triaged Status in neutron stein series: Triaged Bug description: auto_schedule_new_network_segments() added in Ic9e64aa4ecdc3d56f00c26204ad931b810db7599 uses new payload notification listener in old stable branches of Neutron that still use old notify syntax. Following branches are affected: stable/stein, stable/rocky, stable/queens To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1939125/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1938788] Re: Validate if fixed_ip given for port isn't the same as subnet's gateway_ip
** Changed in: neutron Status: In Progress => Opinion -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1938788 Title: Validate if fixed_ip given for port isn't the same as subnet's gateway_ip Status in neutron: Opinion Bug description: Currently when new port is created with fixed_ip given, neutron is not validating if that fixed_ip address isn't the same as subnet's gateway IP. That may cause problems, like e.g.: $ openstack subnet show | allocation_pools | 10.0.0.2-10.0.0.254 | cidr | 10.0.0.0/24 | enable_dhcp | True ... | gateway_ip| 10.0.0.1 $ nova boot --flavor test --image test --nic net-id=,v4-fixed-ip=10.0.0.1 test-vm1 The instance will be created successfully, but after that, network communication issue could be happened since the gateway ip conflict. So Neutron should forbid creation of the port with gateway's ip address if it is not router's port (device_owner isn't set for one of the router device owners). To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1938788/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1938913] Re: Install and configure compute node in Neutron
From the log it's absolutely impossible to figure out what's wrong. Anyway it's definitely not a Neutron issue ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1938913 Title: Install and configure compute node in Neutron Status in neutron: Invalid Bug description: After following the steps to install and configure neutron in the compute node, the nova service is not starting: sudo service nova-compute status ● nova-compute.service - OpenStack Compute Loaded: loaded (/lib/systemd/system/nova-compute.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Wed 2021-08-04 12:04:06 -03; 32s ago Process: 71167 ExecStart=/etc/init.d/nova-compute systemd-start (code=exited, status=1/FAILURE) Main PID: 71167 (code=exited, status=1/FAILURE) ago 04 12:04:06 openstack-compute systemd[1]: nova-compute.service: Scheduled restart job, restart coun> ago 04 12:04:06 openstack-compute systemd[1]: Stopped OpenStack Compute. ago 04 12:04:06 openstack-compute systemd[1]: nova-compute.service: Start request repeated too quickly. ago 04 12:04:06 openstack-compute systemd[1]: nova-compute.service: Failed with result 'exit-code'. ago 04 12:04:06 openstack-compute systemd[1]: Failed to start OpenStack Compute. Here is the log: 2021-08-04 12:04:05.528 71167 INFO os_vif [-] Loaded VIF plugins: linux_bridge, noop, ovs 2021-08-04 12:04:05.573 71167 CRITICAL nova [req-558c7b67-0fd7-4430-882b-e1a398d4ec4c - - - - -] Unhandled error: TypeError: argument of type 'NoneType' is not iterable 2021-08-04 12:04:05.573 71167 ERROR nova Traceback (most recent call last): 2021-08-04 12:04:05.573 71167 ERROR nova File "/usr/bin/nova-compute", line 10, in 2021-08-04 12:04:05.573 71167 ERROR nova sys.exit(main()) 2021-08-04 12:04:05.573 71167 ERROR nova File "/usr/lib/python3/dist-packages/nova/cmd/compute.py", line 58, in main 2021-08-04 12:04:05.573 71167 ERROR nova server = service.Service.create(binary='nova-compute', 2021-08-04 12:04:05.573 71167 ERROR nova File "/usr/lib/python3/dist-packages/nova/service.py", line 252, in create 2021-08-04 12:04:05.573 71167 ERROR nova service_obj = cls(host, binary, topic, manager, 2021-08-04 12:04:05.573 71167 ERROR nova File "/usr/lib/python3/dist-packages/nova/service.py", line 115, in __init__ 2021-08-04 12:04:05.573 71167 ERROR nova conductor_api.wait_until_ready(context.get_admin_context()) 2021-08-04 12:04:05.573 71167 ERROR nova File "/usr/lib/python3/dist-packages/nova/conductor/api.py", line 67, in wait_until_ready 2021-08-04 12:04:05.573 71167 ERROR nova self.base_rpcapi.ping(context, '1.21 GigaWatts', 2021-08-04 12:04:05.573 71167 ERROR nova File "/usr/lib/python3/dist-packages/nova/baserpc.py", line 58, in ping 2021-08-04 12:04:05.573 71167 ERROR nova return cctxt.call(context, 'ping', arg=arg_p) 2021-08-04 12:04:05.573 71167 ERROR nova File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 175, in call 2021-08-04 12:04:05.573 71167 ERROR nova self.transport._send(self.target, msg_ctxt, msg, 2021-08-04 12:04:05.573 71167 ERROR nova File "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 123, in _send 2021-08-04 12:04:05.573 71167 ERROR nova return self._driver.send(target, ctxt, message, 2021-08-04 12:04:05.573 71167 ERROR nova File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 680, in send 2021-08-04 12:04:05.573 71167 ERROR nova return self._send(target, ctxt, message, wait_for_reply, timeout, 2021-08-04 12:04:05.573 71167 ERROR nova File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 626, in _send 2021-08-04 12:04:05.573 71167 ERROR nova msg.update({'_reply_q': self._get_reply_q()}) 2021-08-04 12:04:05.573 71167 ERROR nova File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 607, in _get_reply_q 2021-08-04 12:04:05.573 71167 ERROR nova conn = self._get_connection(rpc_common.PURPOSE_LISTEN) 2021-08-04 12:04:05.573 71167 ERROR nova File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 597, in _get_connection 2021-08-04 12:04:05.573 71167 ERROR nova return rpc_common.ConnectionContext(self._connection_pool, 2021-08-04 12:04:05.573 71167 ERROR nova File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/common.py", line 425, in __init__ 2021-08-04 12:04:05.573 71167 ERROR nova self.connection = connection_pool.create(purpose) 2021-08-04 12:04:05.573 71167 ERROR nova File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/pool.py", line 146, in create 2021-08-04 12:04:05.573 71167 ERROR nova return
[Yahoo-eng-team] [Bug 1938826] Re: Install and configure controller node in Neutron
Looks like your config file is missing required config values. This is an issue of the installer. Please file a bug to the installer project. ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1938826 Title: Install and configure controller node in Neutron Status in neutron: Invalid Bug description: When restarting the neutron-linuxbridge-agent service, I am facing a problem: sudo service neutron-linuxbridge-agent status ● neutron-linuxbridge-agent.service - Openstack Neutron Linux Bridge Agent Loaded: loaded (/lib/systemd/system/neutron-linuxbridge-agent.service; enabled; vendor preset: enabled) Active: inactive (dead) since Tue 2021-08-03 16:04:01 -03; 79ms ago Process: 377517 ExecStartPre=/bin/mkdir -p /var/lock/neutron /var/log/neutron /var/lib/neutron (code=exited, status=0/SUCCESS) Process: 377518 ExecStartPre=/bin/chown neutron:neutron /var/lock/neutron /var/log/neutron /var/lib/neutron (code=exited, status=0/SUCCESS) Process: 377519 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS) Process: 377520 ExecStart=/etc/init.d/neutron-linuxbridge-agent systemd-start (code=exited, status=0/SUCCESS) Main PID: 377520 (code=exited, status=0/SUCCESS) ago 03 16:03:59 openstack-controller systemd[1]: Starting Openstack Neutron Linux Bridge Agent... ago 03 16:03:59 openstack-controller systemd[1]: Started Openstack Neutron Linux Bridge Agent. ago 03 16:04:00 openstack-controller sudo[377533]: neutron : TTY=unknown ; PWD=/var/lib/neutron ; USER=root ; COMMAND=/usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf privsep-helper --config-file /> ago 03 16:04:00 openstack-controller sudo[377533]: pam_unix(sudo:session): session opened for user root by (uid=0) ago 03 16:04:00 openstack-controller sudo[377533]: pam_unix(sudo:session): session closed for user root ago 03 16:04:01 openstack-controller systemd[1]: neutron-linuxbridge-agent.service: Succeeded. Same problem for the neutron-dhcp-agent service: sudo service neutron-dhcp-agent status ● neutron-dhcp-agent.service - OpenStack Neutron DHCP agent Loaded: loaded (/lib/systemd/system/neutron-dhcp-agent.service; enabled; vendor preset: enabled) Active: inactive (dead) since Tue 2021-08-03 16:22:18 -03; 6s ago Docs: man:neutron-dhcp-agent(1) Process: 384411 ExecStart=/etc/init.d/neutron-dhcp-agent systemd-start (code=exited, status=0/SUCCESS) Main PID: 384411 (code=exited, status=0/SUCCESS) ago 03 16:22:17 openstack-controller systemd[1]: Started OpenStack Neutron DHCP agent. ago 03 16:22:18 openstack-controller systemd[1]: neutron-dhcp-agent.service: Succeeded. Below is the log (/var/log/neutron/neutron-linuxbridge-agent.log): 2021-08-02 15:27:07.843 56737 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Tunneling cannot be enabled without the local_ip bound to an interface on the host. Please> 2021-08-02 15:27:09.601 56805 INFO neutron.common.config [-] Logging enabled! 2021-08-02 15:27:09.601 56805 INFO neutron.common.config [-] /usr/bin/neutron-linuxbridge-agent version 18.0.0 2021-08-02 15:27:09.601 56805 INFO neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Interface mappings: {} 2021-08-02 15:27:09.601 56805 INFO neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Bridge mappings: {} 2021-08-02 15:27:09.602 56805 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Tunneling cannot be enabled without the local_ip bound to an interface on the host. Please> 2021-08-02 15:27:11.321 56832 INFO neutron.common.config [-] Logging enabled! 2021-08-02 15:27:11.322 56832 INFO neutron.common.config [-] /usr/bin/neutron-linuxbridge-agent version 18.0.0 2021-08-02 15:27:11.322 56832 INFO neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Interface mappings: {} 2021-08-02 15:27:11.322 56832 INFO neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Bridge mappings: {} 2021-08-02 15:27:11.322 56832 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Tunneling cannot be enabled without the local_ip bound to an interface on the host. Please> 2021-08-03 15:48:49.813 372237 INFO neutron.common.config [-] Logging enabled! 2021-08-03 15:48:49.814 372237 INFO neutron.common.config [-] /usr/bin/neutron-linuxbridge-agent version 18.0.0 2021-08-03 15:48:49.814 372237 INFO neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Interface mappings: {'provider': 'enp0s31f6'} 2021-08-03 15:48:49.814 372237 INFO neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Bridge mappings: {}
[Yahoo-eng-team] [Bug 1938685] [NEW] ofctl timeouts lead to dvr-ha-multinode-full failures
Public bug reported: Recently neutron-ovs-tempest-dvr-ha-multinode-full (non-voting) job start failing often. Usual test fail is: "Details: (ServersTestJSON:setUpClass) Server 74743462-a419-4f89-a92c-0e99bc185581 failed to reach ACTIVE status and task state "None" within the required time (196 s). Current status: BUILD. Current task state: spawning." Looking at logs I see that the reason is ofctl timeout (300 sec) that causes OVS agent to not process new port(s) in time: Jul 30 17:33:42.946480 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-7ef93d36-7664-4072-b3d1-677a772a0fc1 None None] fdb_add received {{(pid=82746) fdb_add /opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:841}} Jul 30 17:37:46.516378 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch [None req-9d8a3325-2d80-41a4-9f3d-184b365b7dfc None None] ofctl request version=0x4,msg_type=0xe,msg_len=None,xid=0xdfcb3e13,OFPFlowMod(buffer_id=4294967295,command=0,cookie=7439791576028281136,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18), OFPActionOutput(len=16,max_len=0,port=-1,type=0)],type=4)],match=OFPMatch(oxm_fields={'eth_dst': 'fa:16:3e:0f:58:bc', 'vlan_vid': 4113}),out_group=0,out_port=0,priority=20,table_id=60) timed out: eventlet.timeout.Timeout: 300 seconds Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-9d8a3325-2d80-41a4-9f3d-184b365b7dfc None None] Error while processing VIF ports: RuntimeError: ofctl request version=0x4,msg_type=0xe,msg_len=None,xid=0xdfcb3e13,OFPFlowMod(buffer_id=4294967295,command=0,cookie=7439791576028281136,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18), OFPActionOutput(len=16,max_len=0,port=-1,type=0)],type=4)],match=OFPMatch(oxm_fields={'eth_dst': 'fa:16:3e:0f:58:bc', 'vlan_vid': 4113}),out_group=0,out_port=0,priority=20,table_id=60) timed out Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last): Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py", line 91, in _send_msg ... Not sure why ofctl timeouts, but anyway default 300 seems too much ** Affects: neutron Importance: High Status: Triaged ** Tags: gate-failure ** Description changed: Recently neutron-ovs-tempest-dvr-ha-multinode-full (non-voting) job start failing often. Usual test fail is: "Details: (ServersTestJSON:setUpClass) Server 74743462-a419-4f89-a92c-0e99bc185581 failed to reach ACTIVE status and task state "None" within the required time (196 s). Current status: BUILD. Current task state: spawning." Looking at logs I see that the reason is ofctl timeout (300 sec) that causes OVS agent to not process new port(s) in time: Jul 30 17:33:42.946480 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-7ef93d36-7664-4072-b3d1-677a772a0fc1 None None] fdb_add received {{(pid=82746) fdb_add /opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:841}} Jul 30 17:37:46.516378 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch [None req-9d8a3325-2d80-41a4-9f3d-184b365b7dfc None None] ofctl request version=0x4,msg_type=0xe,msg_len=None,xid=0xdfcb3e13,OFPFlowMod(buffer_id=4294967295,command=0,cookie=7439791576028281136,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18), OFPActionOutput(len=16,max_len=0,port=-1,type=0)],type=4)],match=OFPMatch(oxm_fields={'eth_dst': 'fa:16:3e:0f:58:bc', 'vlan_vid': 4113}),out_group=0,out_port=0,priority=20,table_id=60) timed out: eventlet.timeout.Timeout: 300 seconds Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-9d8a3325-2d80-41a4-9f3d-184b365b7dfc None None] Error while processing VIF ports: RuntimeError: ofctl request
[Yahoo-eng-team] [Bug 1933234] [NEW] [Fullstack] TestLegacyL3Agent.test_mtu_update fails sometimes
Public bug reported: Traceback (most recent call last): File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 183, in func return f(self, *args, **kwargs) File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/test_l3_agent.py", line 322, in test_mtu_update common_utils.wait_until_true(lambda: ri_dev.link.mtu == mtu) File "/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", line 707, in wait_until_true raise WaitTimeout(_("Timed out after %d seconds") % timeout) neutron.common.utils.WaitTimeout: Timed out after 60 seconds example: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c93/674044/13/check /neutron-fullstack-with-uwsgi/c9334b7/testr_results.html So router interface device MTU is not updated after network MTU update. ** Affects: neutron Importance: Undecided Status: New ** Tags: gate-failure -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1933234 Title: [Fullstack] TestLegacyL3Agent.test_mtu_update fails sometimes Status in neutron: New Bug description: Traceback (most recent call last): File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 183, in func return f(self, *args, **kwargs) File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/test_l3_agent.py", line 322, in test_mtu_update common_utils.wait_until_true(lambda: ri_dev.link.mtu == mtu) File "/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", line 707, in wait_until_true raise WaitTimeout(_("Timed out after %d seconds") % timeout) neutron.common.utils.WaitTimeout: Timed out after 60 seconds example: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c93/674044/13/check /neutron-fullstack-with-uwsgi/c9334b7/testr_results.html So router interface device MTU is not updated after network MTU update. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1933234/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1930401] Re: Fullstack l3 agent tests failing due to timeout waiting until port is active
** Also affects: oslo.privsep Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1930401 Title: Fullstack l3 agent tests failing due to timeout waiting until port is active Status in neutron: Confirmed Status in oslo.privsep: New Bug description: Many fullstack L3 agent related tests are failing recently and the common thing for many of them is the fact that they are failing while waiting until port status will be ACTIVE. Like e.g.: https://9cec50bd524f94a2df4c-c6273b9a7cf594e42eb2c4e7f818.ssl.cf5.rackcdn.com/791365/6/check/neutron-fullstack-with-uwsgi/6fc0704/testr_results.html https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_73b/793141/2/check/neutron-fullstack-with-uwsgi/73b08ae/testr_results.html https://b87ba208d44b7f1356ad-f27c11edabee52a7804784593cf2712d.ssl.cf5.rackcdn.com/791365/5/check/neutron-fullstack-with-uwsgi/634ccb1/testr_results.html https://dd43e0f9601da5e2e650-51b18fcc89837fbadd0245724df9c686.ssl.cf1.rackcdn.com/791365/6/check/neutron-fullstack-with-uwsgi/5413cd9/testr_results.html https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_8d0/791365/5/check/neutron-fullstack-with-uwsgi/8d024fb/testr_results.html https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_188/791365/5/check/neutron-fullstack-with-uwsgi/188aa48/testr_results.html https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_9a3/792998/2/check/neutron-fullstack-with-uwsgi/9a3b5a2/testr_results.html To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1930401/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1928299] Re: centos7 train vm live migration stops network on vm for some minutes
** Also affects: neutron/train Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1928299 Title: centos7 train vm live migration stops network on vm for some minutes Status in neutron: New Status in neutron train series: New Bug description: Hello, I have upgraded my centos 7 openstack installation from Stein to Train. On train I am facing an issue with live migration: when a vm is migrated from one kvm node to another, it stops to respond to ping requests from some minutes. I had the same issue on Stein and I resolved it with a workaround suggest by Sean Mooney where legacy port binding was used. On train seems there aren't backported patches to solve the issue. I enabled debug option on neutron and here there is the dhcp-agent.log from the exact time when the live migration started: http://paste.openstack.org/show/805325/ Here there is the openvswitch-agent log from the source kvm node: http://paste.openstack.org/show/805327/ Here there is the openvswich agent log from the destination kvm node: http://paste.openstack.org/show/805329/ I am using openvswitch mechanism driver and iptables_hybrid firewall driver. Please any help will be appreciated Ignazio To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1928299/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1923470] [NEW] test_security_group_recreated_on_port_update fails in CI
Public bug reported: neutron-tempest-plugin-api job start failing, test_security_group_recreated_on_port_update: Traceback (most recent call last): File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/api/admin/test_security_groups.py", line 43, in test_security_group_recreated_on_port_update self.assertIn('default', names) File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/testtools/testcase.py", line 421, in assertIn self.assertThat(haystack, Contains(needle), message) File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/testtools/testcase.py", line 502, in assertThat raise mismatch_error testtools.matchers._impl.MismatchError: 'default' not in [] Seems the culprit is patch https://review.opendev.org/c/openstack/neutron/+/777605. ** Affects: neutron Importance: Critical Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: gate-failure -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1923470 Title: test_security_group_recreated_on_port_update fails in CI Status in neutron: New Bug description: neutron-tempest-plugin-api job start failing, test_security_group_recreated_on_port_update: Traceback (most recent call last): File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/api/admin/test_security_groups.py", line 43, in test_security_group_recreated_on_port_update self.assertIn('default', names) File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/testtools/testcase.py", line 421, in assertIn self.assertThat(haystack, Contains(needle), message) File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/testtools/testcase.py", line 502, in assertThat raise mismatch_error testtools.matchers._impl.MismatchError: 'default' not in [] Seems the culprit is patch https://review.opendev.org/c/openstack/neutron/+/777605. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1923470/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1923161] [NEW] DHCP notification could be optimized
Public bug reported: DHCP notification is done after each create/update/delete for network, subnet and port [1]. This notification currently has to retrieve network from DB each time, which is a quite heavy DB request and hence affects performance of port and subnet CRUD [2]. 2 proposals: - not fetch network when it's not needed - pass network dict from plugin [1] https://github.com/openstack/neutron/blob/bdd661d21898d573ef39448316860aa4c692b834/neutron/api/rpc/agentnotifiers/dhcp_rpc_agent_api.py#L111-L120 [2] https://github.com/openstack/neutron/blob/bdd661d21898d573ef39448316860aa4c692b834/neutron/api/rpc/agentnotifiers/dhcp_rpc_agent_api.py#L200 ** Affects: neutron Importance: Wishlist Assignee: Oleg Bondarev (obondarev) Status: In Progress ** Tags: loadimpact ** Changed in: neutron Status: New => In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1923161 Title: DHCP notification could be optimized Status in neutron: In Progress Bug description: DHCP notification is done after each create/update/delete for network, subnet and port [1]. This notification currently has to retrieve network from DB each time, which is a quite heavy DB request and hence affects performance of port and subnet CRUD [2]. 2 proposals: - not fetch network when it's not needed - pass network dict from plugin [1] https://github.com/openstack/neutron/blob/bdd661d21898d573ef39448316860aa4c692b834/neutron/api/rpc/agentnotifiers/dhcp_rpc_agent_api.py#L111-L120 [2] https://github.com/openstack/neutron/blob/bdd661d21898d573ef39448316860aa4c692b834/neutron/api/rpc/agentnotifiers/dhcp_rpc_agent_api.py#L200 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1923161/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1917866] [NEW] No need to fetch whole network object on port create
Public bug reported: DB plugin checks for network existence during port create: https://github.com/openstack/neutron/blob/cb64e3a19fdddb3eac593114a482c9dd69be68d5/neutron/db/db_base_plugin_v2.py#L1422 There is no need to fetch the whole net object (which leads to several heavy DB requests according to OSProfiler stats) when only need to check net existence. ** Affects: neutron Importance: Wishlist Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: db loadimpact ** Changed in: neutron Assignee: (unassigned) => Oleg Bondarev (obondarev) ** Changed in: neutron Importance: Undecided => Wishlist ** Tags added: loadimpact -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1917866 Title: No need to fetch whole network object on port create Status in neutron: New Bug description: DB plugin checks for network existence during port create: https://github.com/openstack/neutron/blob/cb64e3a19fdddb3eac593114a482c9dd69be68d5/neutron/db/db_base_plugin_v2.py#L1422 There is no need to fetch the whole net object (which leads to several heavy DB requests according to OSProfiler stats) when only need to check net existence. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1917866/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1916618] Re: Neutron unit test for QoS driver fails in master branch
Please make sure you have latest neutron-lib version (2.9.0) installed on your env, this should fix the test. ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1916618 Title: Neutron unit test for QoS driver fails in master branch Status in neutron: Invalid Bug description: When running the unit tests on my own machine, 1 test related to QoS fails. However this does not seem to be happening on the Zuul gates. It also happens in other people workstation. Step-by-step reproduction: - Clone neutron repo - Run Python 3.8 unit tests with Tox. You can use: $ tox -e py38 neutron.tests.unit.services.qos.drivers.test_manager.TestQoSDriversRulesValidations.test_validate_rule_for_network) OUTPUT: neutron.tests.unit.services.qos.drivers.test_manager.TestQoSDriversRulesValidations.test_validate_rule_for_network -- Captured traceback: ~~~ Traceback (most recent call last): File "/home/elvira/neutron/neutron/tests/base.py", line 182, in func return f(self, *args, **kwargs) File "/home/elvira/neutron/neutron/tests/unit/services/qos/drivers/test_manager.py", line 141, in test_validate_rule_for_network self.assertTrue(driver_manager.validate_rule_for_network( File "/home/elvira/neutron/neutron/services/qos/drivers/manager.py", line 160, in validate_rule_for_network driver.validate_rule_for_network(context, rule, AttributeError: 'QoSDriver' object has no attribute 'validate_rule_for_network' To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1916618/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1916428] Re: dibbler tool for dhcpv6 is concluded
It's not an actual bug in Neutron, but the topic is worth a discussion. ** Changed in: neutron Status: New => Opinion ** Tags added: ipv6 -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1916428 Title: dibbler tool for dhcpv6 is concluded Status in neutron: Opinion Bug description: Hi team, according to the latest annoucement of https://github.com/tomaszmrugalski/dibbler, seems the said project is concluded by lacking maintainers, and I also found the said tools have been as the Ipv6 dhcp default implementation. The author suggest https://gitlab.isc.org/isc-projects/kea . Is there any plan for this in Neutron team? Thanks very much To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1916428/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1905726] Re: Qos plugin performs too many queries
** Changed in: neutron Status: In Progress => Fix Released ** Changed in: neutron Milestone: None => wallaby-3 ** Changed in: neutron Status: Fix Released => Fix Committed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1905726 Title: Qos plugin performs too many queries Status in neutron: Fix Committed Bug description: Whenever retrieving the port list while having the QoS plugin enabled, Neutron performs about 10 DB queries per port, most of them being QoS related: http://paste.openstack.org/raw/800461/ For 1000 ports, we end up with 10 000 sequential DB queries. A simple "neutron port-list" or "nova list" command will exceed 1 minute, which is likely to hit timeouts. This seems to be the problem: https://github.com/openstack/neutron/blob/17.0.0/neutron/db/db_base_plugin_v2.py#L1566-L1570 For each of the retrieved ports, the plugins are then supposed to provide additional details, so for each port we get a certain number of extra queries. One idea would be to add a flag such as 'detailed' or 'include_extensions' to 'get_ports' and then propagate it to '_make_port_dict' through the 'process_extensions' parameter. Another idea would be to let the plugins extend the query but that might be less feasible. Worth mentioning that there were a couple of commits meant to reduce the number of queries but it's still excessive: https://review.opendev.org/c/openstack/neutron/+/667998 https://review.opendev.org/c/openstack/neutron/+/667981/ To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1905726/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1915271] [NEW] test_create_router_set_gateway_with_fixed_ip fails in dvr-ha job
/opt/stack/tempest/tempest/lib/services/network/networks_client.py", line 52, in delete_network return self.delete_resource(uri) File "/opt/stack/tempest/tempest/lib/services/network/base.py", line 41, in delete_resource resp, body = self.delete(req_uri) File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 331, in delete return self.request('DELETE', url, extra_headers, headers, body) File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 704, in request self._error_checker(resp, resp_body) File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 825, in _error_checker raise exceptions.Conflict(resp_body, resp=resp) tempest.lib.exceptions.Conflict: Conflict with state of target resource Details: {'type': 'NetworkInUse', 'message': 'Unable to complete operation on network c06258a3-d817-4ffd-b9c6-1c20eaedd688. There are one or more ports still in use on the network.', 'detail': ''} }}} Traceback (most recent call last): File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 109, in wrapper return func(*func_args, **func_kwargs) File "/opt/stack/tempest/tempest/api/network/admin/test_routers.py", line 254, in test_create_router_set_gateway_with_fixed_ip self.admin_routers_client.delete_router(router['id']) File "/opt/stack/tempest/tempest/lib/services/network/routers_client.py", line 52, in delete_router return self.delete_resource(uri) File "/opt/stack/tempest/tempest/lib/services/network/base.py", line 41, in delete_resource resp, body = self.delete(req_uri) File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 331, in delete return self.request('DELETE', url, extra_headers, headers, body) File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 704, in request self._error_checker(resp, resp_body) File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 880, in _error_checker raise exceptions.ServerFault(resp_body, resp=resp, tempest.lib.exceptions.ServerFault: Got server fault Details: Request Failed: internal server error while processing your request. ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: Confirmed ** Tags: gate-failure l3-dvr-backlog -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1915271 Title: test_create_router_set_gateway_with_fixed_ip fails in dvr-ha job Status in neutron: Confirmed Bug description: test_create_router_set_gateway_with_fixed_ip periodically fails in neutron-tempest-dvr-ha-multinode-full. Failure start to happen after new engine facade switch in L3 code. Failure is due to router failed to be deleted (neutron returns 500), see first line in below traceback. Traceback: 2021-01-28 13:36:26,139 81948 INFO [tempest.lib.common.rest_client] Request (RoutersAdminTest:test_create_router_set_gateway_with_fixed_ip): 500 DELETE https://10.209.160.170:9696/v2.0/routers/3f337e0e-3bed-44f7-a8f9-e43c01787445 10.779s 2021-01-28 13:36:26,139 81948 DEBUG[tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': ''} Body: None Response - Headers: {'date': 'Thu, 28 Jan 2021 13:36:26 GMT', 'server': 'Apache/2.4.41 (Ubuntu)', 'content-type': 'application/json', 'content-length': '150', 'x-openstack-request-id': 'req-2392442f-a3b0-4c79-a79c-c765c7cab834', 'connection': 'close', 'status': '500', 'content-location': 'https://10.209.160.170:9696/v2.0/routers/3f337e0e-3bed-44f7-a8f9-e43c01787445'} Body: b'{"NeutronError": {"type": "HTTPInternalServerError", "message": "Request Failed: internal server error while processing your request.", "detail": ""}}' 2021-01-28 13:36:26,305 81948 INFO [tempest.lib.common.rest_client] Request (RoutersAdminTest:_run_cleanups): 409 DELETE https://10.209.160.170:9696/v2.0/subnets/812a9855-15a2-4a8e-b246-c6ea68cdadcd 0.165s 2021-01-28 13:36:26,306 81948 DEBUG[tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': ''} Body: None Response - Headers: {'date': 'Thu, 28 Jan 2021 13:36:26 GMT', 'server': 'Apache/2.4.41 (Ubuntu)', 'content-type': 'application/json', 'content-length': '204', 'x-openstack-request-id': 'req-9eb0af8b-9154-489b-94b8-93d3da458c05', 'connection': 'close', 'status': '409', 'content-location': 'https://10.209.160.170:9696/v2.0/subnets/812a9855-15a2-4a8e-b246-c6ea68cdadcd'} Body: b'{"NeutronError": {"type": "SubnetInUse", "message": "Unable to
[Yahoo-eng-team] [Bug 1905552] Re: neutron-fwaas netlink conntrack driver would catch error while conntrack rules protocol is 'unknown'
Neutron-fwaas development is stopped: https://review.opendev.org/c/openstack/governance/+/735828/ ** Changed in: neutron Status: New => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1905552 Title: neutron-fwaas netlink conntrack driver would catch error while conntrack rules protocol is 'unknown' Status in neutron: Won't Fix Bug description: 2020-11-25 11:07:32.606 127 DEBUG oslo_concurrency.lockutils [req-ab14782d-80b1-43f6-8d1b-2874531aca5e - 9d40b483f885496896d81c487f420438 - - -] Releasing semaphore "iptables-qrouter-9e18395d-961d-46b3-a0e9-4c6a94c32baf" lock /var/lib/kolla/venv/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:228 2020-11-25 11:07:32.609 127 ERROR neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2 [req-ab14782d-80b1-43f6-8d1b-2874531aca5e - 9d40b483f885496896d81c487f420438 - - -] Failed to update firewall: daedc38a-04ee-4818-b7a6-3d8311d7fc30: KeyError: 'unknown' 2020-11-25 11:07:32.609 127 ERROR neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2 Traceback (most recent call last): 2020-11-25 11:07:32.609 127 ERROR neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2 File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron_fwaas/services/firewall/service_drivers/agents/drivers/linux/iptables_fwaas_v2.py", line 144, in update_firewall_group 2020-11-25 11:07:32.609 127 ERROR neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2 apply_list, self.pre_firewall, firewall) 2020-11-25 11:07:32.609 127 ERROR neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2 File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron_fwaas/services/firewall/service_drivers/agents/drivers/linux/iptables_fwaas_v2.py", line 327, in _remove_conntrack_updated_firewall 2020-11-25 11:07:32.609 127 ERROR neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2 ipt_mgr.namespace) 2020-11-25 11:07:32.609 127 ERROR neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2 File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron_fwaas/services/firewall/service_drivers/agents/drivers/linux/netlink_conntrack.py", line 41, in delete_entries 2020-11-25 11:07:32.609 127 ERROR neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2 entries = nl_lib.list_entries(namespace) 2020-11-25 11:07:32.609 127 ERROR neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2 File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 207, in _wrap 2020-11-25 11:07:32.609 127 ERROR neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2 return self.channel.remote_call(name, args, kwargs) 2020-11-25 11:07:32.609 127 ERROR neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2 File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 202, in remote_call 2020-11-25 11:07:32.609 127 ERROR neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2 raise exc_type(*result[2]) 2020-11-25 11:07:32.609 127 ERROR neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2 KeyError: 'unknown' 2020-11-25 11:07:32.609 127 ERROR neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2 This error appears when configured the neutron-fwaas v2 with netlink_conntrack driver in fwaas_agent.ini vim /etc/kolla/neutron-l3-agent/fwaas_driver.ini [fwaas] enabled = True agent_version = v2 driver = iptables_v2 conntrack_driver = netlink_conntrack And the conntrack list has 'unknown' rules, example below: unknown 2 597 src=169.254.192.2 dst=224.0.0.22 [UNREPLIED] src=224.0.0.22 dst=169.254.192.2 mark=0 use=1 unknown 112 598 src=169.254.192.2 dst=224.0.0.18 [UNREPLIED] src=224.0.0.18 dst=169.254.192.2 mark=0 use=1 This may interrupt conntrack refresh when firewall rules update. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1905552/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1905538] [NEW] Some OVS bridges may lack OpenFlow10 protocol
Public bug reported: After commit https://review.opendev.org/c/openstack/neutron/+/371455 OVSAgentBridge.setup_controllers() no longer sets OpenFlow10 protocol for the bridge, instead it was moved to ovs_lib.OVSBridge.create(). However some (custom) OVS bridges could be created by nova/os-vif when plugging VM interface. For such bridges neutron does not call create(), only setup_controllers() - as a result such bridges support only OpenFlow13 and ovs-ofctl command fails: 2020-11-24T20:18:38Z|1|vconn|WARN|unix:/var/run/openvswitch/br01711489f-fe.24081.mgmt: version negotiation failed (we support version 0x01, peer supports version 0x04) ovs-ofctl: br01711489f-fe: failed to connect to socket (Broken pipe) Fix: return setting of OpenFlow10 (along with OpenFlow13) to setup_controllers(). It's doesn't hurt even if bridge already has OpenFlow10 in supported protocols. ** Affects: neutron Importance: Low Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: ovs ovs-lib -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1905538 Title: Some OVS bridges may lack OpenFlow10 protocol Status in neutron: New Bug description: After commit https://review.opendev.org/c/openstack/neutron/+/371455 OVSAgentBridge.setup_controllers() no longer sets OpenFlow10 protocol for the bridge, instead it was moved to ovs_lib.OVSBridge.create(). However some (custom) OVS bridges could be created by nova/os-vif when plugging VM interface. For such bridges neutron does not call create(), only setup_controllers() - as a result such bridges support only OpenFlow13 and ovs-ofctl command fails: 2020-11-24T20:18:38Z|1|vconn|WARN|unix:/var/run/openvswitch/br01711489f-fe.24081.mgmt: version negotiation failed (we support version 0x01, peer supports version 0x04) ovs-ofctl: br01711489f-fe: failed to connect to socket (Broken pipe) Fix: return setting of OpenFlow10 (along with OpenFlow13) to setup_controllers(). It's doesn't hurt even if bridge already has OpenFlow10 in supported protocols. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1905538/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1905392] Re: xanax online cod overnight
** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1905392 Title: xanax online cod overnight Status in neutron: Invalid Bug description: xanax online cod overnight To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1905392/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1905268] [NEW] port list performance for trunks can be optimized
Public bug reported: Use case: many trunk ports each with many subports. Problem: port list takes much time. Reason: for each port trunk extension adds a DB call to retrieve subports mac addresses. Solution: retrieve subports info once when having a full list of trunk ports and hence full list of subport IDs. ** Affects: neutron Importance: Wishlist Assignee: Oleg Bondarev (obondarev) Status: Confirmed ** Tags: loadimpact -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1905268 Title: port list performance for trunks can be optimized Status in neutron: Confirmed Bug description: Use case: many trunk ports each with many subports. Problem: port list takes much time. Reason: for each port trunk extension adds a DB call to retrieve subports mac addresses. Solution: retrieve subports info once when having a full list of trunk ports and hence full list of subport IDs. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1905268/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1902998] [NEW] tempest test_create_router_set_gateway_with_fixed_ip often fails with DVR
Public bug reported: test_create_router_set_gateway_with_fixed_ip often fails in neutron- tempest-dvr-ha-multinode-full: traceback-1: {{{ Traceback (most recent call last): File "/opt/stack/tempest/tempest/lib/common/utils/test_utils.py", line 87, in call_and_ignore_notfound_exc return func(*args, **kwargs) File "/opt/stack/tempest/tempest/lib/services/network/networks_client.py", line 52, in delete_network return self.delete_resource(uri) File "/opt/stack/tempest/tempest/lib/services/network/base.py", line 41, in delete_resource resp, body = self.delete(req_uri) File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 329, in delete return self.request('DELETE', url, extra_headers, headers, body) File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 702, in request self._error_checker(resp, resp_body) File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 823, in _error_checker raise exceptions.Conflict(resp_body, resp=resp) tempest.lib.exceptions.Conflict: Conflict with state of target resource Details: {'type': 'NetworkInUse', 'message': 'Unable to complete operation on network 40a63562-61e7-41b0-82c8-e076b8463584. There are one or more ports still in use on the network.', 'detail': ''} }}} Traceback (most recent call last): File "/opt/stack/tempest/tempest/lib/common/utils/test_utils.py", line 87, in call_and_ignore_notfound_exc return func(*args, **kwargs) File "/opt/stack/tempest/tempest/lib/services/network/subnets_client.py", line 52, in delete_subnet return self.delete_resource(uri) File "/opt/stack/tempest/tempest/lib/services/network/base.py", line 41, in delete_resource resp, body = self.delete(req_uri) File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 329, in delete return self.request('DELETE', url, extra_headers, headers, body) File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 702, in request self._error_checker(resp, resp_body) File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 823, in _error_checker raise exceptions.Conflict(resp_body, resp=resp) tempest.lib.exceptions.Conflict: Conflict with state of target resource Details: {'type': 'SubnetInUse', 'message': 'Unable to complete operation on subnet a1110e0b-d7c8-4830-b1df-e526b632aab9: One or more ports have an IP allocation from this subnet.', 'detail': ''} ** Affects: neutron Importance: Medium Assignee: Oleg Bondarev (obondarev) Status: In Progress ** Tags: l3-dvr-backlog -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1902998 Title: tempest test_create_router_set_gateway_with_fixed_ip often fails with DVR Status in neutron: In Progress Bug description: test_create_router_set_gateway_with_fixed_ip often fails in neutron- tempest-dvr-ha-multinode-full: traceback-1: {{{ Traceback (most recent call last): File "/opt/stack/tempest/tempest/lib/common/utils/test_utils.py", line 87, in call_and_ignore_notfound_exc return func(*args, **kwargs) File "/opt/stack/tempest/tempest/lib/services/network/networks_client.py", line 52, in delete_network return self.delete_resource(uri) File "/opt/stack/tempest/tempest/lib/services/network/base.py", line 41, in delete_resource resp, body = self.delete(req_uri) File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 329, in delete return self.request('DELETE', url, extra_headers, headers, body) File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 702, in request self._error_checker(resp, resp_body) File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 823, in _error_checker raise exceptions.Conflict(resp_body, resp=resp) tempest.lib.exceptions.Conflict: Conflict with state of target resource Details: {'type': 'NetworkInUse', 'message': 'Unable to complete operation on network 40a63562-61e7-41b0-82c8-e076b8463584. There are one or more ports still in use on the network.', 'detail': ''} }}} Traceback (most recent call last): File "/opt/stack/tempest/tempest/lib/common/utils/test_utils.py", line 87, in call_and_ignore_notfound_exc return func(*args, **kwargs) File "/opt/stack/tempest/tempest/lib/services/network/subnets_client.py", line 52, in delete_subnet return self.delete_resource(uri) File "/opt/stack/tempest/tempest/lib/services/network/base.py", line 41, in delete_resource resp, body = self.delete(req_uri) File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 329, in delete return self.request('D
[Yahoo-eng-team] [Bug 1862315] [NEW] Sometimes VMs can't get IP when spawned concurrently
Public bug reported: Version: Stein Scenario description: Rally creates 60 VMs with 6 threads. Each thread: - creates a VM - pings it - if successful ping, tries to reach the VM via ssh and execute a command. It tries to do that during 2 minutes. - if successful ssh - deletes the VM For some VMs ping fails. Console log shows that VM failed to get IP from DHCP. tcpdump on corresponding DHCP port shows VM's DHCP requests, but dnsmasq does not reply. >From dnsmasq logs: Feb 6 00:15:43 dnsmasq[4175]: read /var/lib/neutron/dhcp/da73026e-09b9-4f8d-bbdd-84d89c2487b2/addn_hosts - 28 addresses Feb 6 00:15:43 dnsmasq[4175]: duplicate dhcp-host IP address 10.2.0.194 at line 28 of /var/lib/neutron/dhcp/da73026e-09b9-4f8d-bbdd-84d89c2487b2/host So it must be something wrong with neutron-dhcp-agent network cache. >From neutron-dhcp-agent log: 2020-02-06 00:15:20.282 40 DEBUG neutron.agent.dhcp.agent [req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Resync event has been scheduled _periodic_resync_helper /var/lib/openstack/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py:276 2020-02-06 00:15:20.282 40 DEBUG neutron.common.utils [req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Calling throttled function clear wrapper /var/lib/openstack/lib/python3.6/site-packages/neutron/common/utils.py:102 2020-02-06 00:15:20.283 40 DEBUG neutron.agent.dhcp.agent [req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] resync (da73026e-09b9-4f8d-bbdd-84d89c2487b2): ['Duplicate IP addresses found, DHCP cache is out of sync'] _periodic_resync_helper /var/lib/openstack/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py:293 so the agent is aware of invalid cache for the net, but for unknown reason actual net resync happens only in 8 minutes: 2020-02-06 00:23:55.297 40 INFO neutron.agent.dhcp.agent [req-f5107bdd- d53a-4171-a283-de3d7cf7c708 - - - - -] Synchronizing state ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l3-ipam-dhcp -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1862315 Title: Sometimes VMs can't get IP when spawned concurrently Status in neutron: New Bug description: Version: Stein Scenario description: Rally creates 60 VMs with 6 threads. Each thread: - creates a VM - pings it - if successful ping, tries to reach the VM via ssh and execute a command. It tries to do that during 2 minutes. - if successful ssh - deletes the VM For some VMs ping fails. Console log shows that VM failed to get IP from DHCP. tcpdump on corresponding DHCP port shows VM's DHCP requests, but dnsmasq does not reply. From dnsmasq logs: Feb 6 00:15:43 dnsmasq[4175]: read /var/lib/neutron/dhcp/da73026e-09b9-4f8d-bbdd-84d89c2487b2/addn_hosts - 28 addresses Feb 6 00:15:43 dnsmasq[4175]: duplicate dhcp-host IP address 10.2.0.194 at line 28 of /var/lib/neutron/dhcp/da73026e-09b9-4f8d-bbdd-84d89c2487b2/host So it must be something wrong with neutron-dhcp-agent network cache. From neutron-dhcp-agent log: 2020-02-06 00:15:20.282 40 DEBUG neutron.agent.dhcp.agent [req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Resync event has been scheduled _periodic_resync_helper /var/lib/openstack/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py:276 2020-02-06 00:15:20.282 40 DEBUG neutron.common.utils [req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Calling throttled function clear wrapper /var/lib/openstack/lib/python3.6/site-packages/neutron/common/utils.py:102 2020-02-06 00:15:20.283 40 DEBUG neutron.agent.dhcp.agent [req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] resync (da73026e-09b9-4f8d-bbdd-84d89c2487b2): ['Duplicate IP addresses found, DHCP cache is out of sync'] _periodic_resync_helper /var/lib/openstack/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py:293 so the agent is aware of invalid cache for the net, but for unknown reason actual net resync happens only in 8 minutes: 2020-02-06 00:23:55.297 40 INFO neutron.agent.dhcp.agent [req- f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Synchronizing state To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1862315/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1860521] [NEW] L2 pop notifications are not reliable
Public bug reported: Problem: lack of connectivity (e.g. vxlan tunnels, OVS flows) between nodes/VMs in L2 segment due to partial RabbitMQ unavailability, RPC message loss or agent failure on applying fdb entry updates. Why: currently FDB entries are sent by neutron server to L2 agents one- way (no feedback), thus agent has no way to detect if all required tunnels/flows are built. On the other hand server has no way to detect if all sent FDB entries were delivered and required flows were applied. In case some messages are lost - only agent restart fixes possible issues. Way to address: new synchronization mechanism on L2 agent side, which will periodically request net topology from server and match it to actual config applied on the node, with applying missing parts. Option 2: move from RPC fanouts and casts to RPC calls which guarantee message delivery. Concerns: scalability, increased load on neutron server. ** Affects: neutron Importance: Undecided Status: New ** Tags: rfe -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1860521 Title: L2 pop notifications are not reliable Status in neutron: New Bug description: Problem: lack of connectivity (e.g. vxlan tunnels, OVS flows) between nodes/VMs in L2 segment due to partial RabbitMQ unavailability, RPC message loss or agent failure on applying fdb entry updates. Why: currently FDB entries are sent by neutron server to L2 agents one-way (no feedback), thus agent has no way to detect if all required tunnels/flows are built. On the other hand server has no way to detect if all sent FDB entries were delivered and required flows were applied. In case some messages are lost - only agent restart fixes possible issues. Way to address: new synchronization mechanism on L2 agent side, which will periodically request net topology from server and match it to actual config applied on the node, with applying missing parts. Option 2: move from RPC fanouts and casts to RPC calls which guarantee message delivery. Concerns: scalability, increased load on neutron server. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1860521/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1850639] Re: FloatingIP list bad performance
Ok, so it's not related to sqlalchemy, as I expected it's an issue with neutron DB object, fixed in Rocky: https://review.opendev.org/#/c/565358/ ** Changed in: neutron Status: In Progress => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1850639 Title: FloatingIP list bad performance Status in neutron: Invalid Bug description: Faced on stable/queens but applicable to master too. On quite a heavy loaded environment it was noticed that simple floatingip list command takes significant time (~1200 fips) while for example port list is always faster (>7000 ports). If enable sqlalchemy debug logs there can be seen lots of: 2019-10-22 21:02:44,977.977 23957 DEBUG sqlalchemy.orm.path_registry [req-3db31d53-f6b9-408e-b8c7-bf037ef10a1b 1df8a7d5eb b5414b9e29cf581098681c 10479799101a4fe4ada17daa105707c5 - default default] set 'memoized_setups' on path 'EntityRegistry( (,))' to '{}' set /usr/lib/python2.7/dist-packages/sqlalchemy/orm/path_registry. py:63 - which basically eats all the time of a request. As a test I commented 'dns' field in FloatingIP DB object definition and response time reduced from 14 to 1 second. DNS extension is not configured on the environment and no external DNS is used. Also I don't see this field used anywhere in neutron. Interestingly Port DB object has 'dns' field either (with corresponding portdnses table in DB, all the same as done for floatingips), however DB object is not used when listing ports. The proposal would be to remove 'dns' field from FloatingIP OVO as not used, until we find performance bottleneck. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1850639/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1851609] [NEW] Add an option for graceful l3 agent shutdown
Public bug reported: If KillMode in systemd config of a neutron l3 agent service is set to 'process' - it will not kill child processes on main service stop - this is useful when we don't want data-plane downtime on agent stop/restart due to keepalived exit. However in some cases graceful cleanup on l3 agent shutdown is needed - like with containerised control plane, when kubernetes kills l3-agent pod, it automatically kills its children (keepalived processes) in non- graceful way, so that keepalived does not clear VIPs. This leads to a situation when same VIP is present on different nodes and hence to long downtime. The proposal is to add a new l3 agent config so that it handles stop (SIGTERM) by deleting all routers. For HA routers it results in graceful keepalived shutdown. ** Affects: neutron Importance: Medium Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l3-ha l3-ipam-dhcp -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1851609 Title: Add an option for graceful l3 agent shutdown Status in neutron: New Bug description: If KillMode in systemd config of a neutron l3 agent service is set to 'process' - it will not kill child processes on main service stop - this is useful when we don't want data-plane downtime on agent stop/restart due to keepalived exit. However in some cases graceful cleanup on l3 agent shutdown is needed - like with containerised control plane, when kubernetes kills l3-agent pod, it automatically kills its children (keepalived processes) in non-graceful way, so that keepalived does not clear VIPs. This leads to a situation when same VIP is present on different nodes and hence to long downtime. The proposal is to add a new l3 agent config so that it handles stop (SIGTERM) by deleting all routers. For HA routers it results in graceful keepalived shutdown. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1851609/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1850639] [NEW] FloatingIP list bad performance
Public bug reported: Faced on stable/queens but applicable to master too. On quite a heavy loaded environment it was noticed that simple floatingip list command takes significant time (~1200 fips) while for example port list is always faster (>7000 ports). If enable sqlalchemy debug logs there can be seen lots of: 2019-10-22 21:02:44,977.977 23957 DEBUG sqlalchemy.orm.path_registry [req-3db31d53-f6b9-408e-b8c7-bf037ef10a1b 1df8a7d5eb b5414b9e29cf581098681c 10479799101a4fe4ada17daa105707c5 - default default] set 'memoized_setups' on path 'EntityRegistry( (,))' to '{}' set /usr/lib/python2.7/dist-packages/sqlalchemy/orm/path_registry. py:63 - which basically eats all the time of a request. As a test I commented 'dns' field in FloatingIP DB object definition and response time reduced from 14 to 1 second. DNS extension is not configured on the environment and no external DNS is used. Also I don't see this field used anywhere in neutron. Interestingly Port DB object has 'dns' field either (with corresponding portdnses table in DB, all the same as done for floatingips), however DB object is not used when listing ports. The proposal would be to remove 'dns' field from FloatingIP OVO as not used, until we find performance bottleneck. ** Affects: neutron Importance: High Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1850639 Title: FloatingIP list bad performance Status in neutron: New Bug description: Faced on stable/queens but applicable to master too. On quite a heavy loaded environment it was noticed that simple floatingip list command takes significant time (~1200 fips) while for example port list is always faster (>7000 ports). If enable sqlalchemy debug logs there can be seen lots of: 2019-10-22 21:02:44,977.977 23957 DEBUG sqlalchemy.orm.path_registry [req-3db31d53-f6b9-408e-b8c7-bf037ef10a1b 1df8a7d5eb b5414b9e29cf581098681c 10479799101a4fe4ada17daa105707c5 - default default] set 'memoized_setups' on path 'EntityRegistry( (,))' to '{}' set /usr/lib/python2.7/dist-packages/sqlalchemy/orm/path_registry. py:63 - which basically eats all the time of a request. As a test I commented 'dns' field in FloatingIP DB object definition and response time reduced from 14 to 1 second. DNS extension is not configured on the environment and no external DNS is used. Also I don't see this field used anywhere in neutron. Interestingly Port DB object has 'dns' field either (with corresponding portdnses table in DB, all the same as done for floatingips), however DB object is not used when listing ports. The proposal would be to remove 'dns' field from FloatingIP OVO as not used, until we find performance bottleneck. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1850639/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1849098] [NEW] ovs agent is stuck with OVSFWTagNotFound when dealing with unbound port
thon3.6/site-packages/neutron/agent/securitygroups_rpc.py", line 125, in decorated_function 2019-10-17 11:32:21.906 135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent *args, **kwargs) 2019-10-17 11:32:21.906 135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/var/lib/openstack/lib/python3.6/site-packages/neutron/agent/securitygroups_rpc.py", line 133, in prepare_devices_filter 2019-10-17 11:32:21.906 135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent self._apply_port_filter(device_ids) 2019-10-17 11:32:21.906 135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/var/lib/openstack/lib/python3.6/site-packages/neutron/agent/securitygroups_rpc.py", line 164, in _apply_port_filter 2019-10-17 11:32:21.906 135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent self.firewall.prepare_port_filter(device) 2019-10-17 11:32:21.906 135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/var/lib/openstack/lib/python3.6/site-packages/neutron/agent/linux/openvswitch_firewall/firewall.py", line 555, in prepare_port_filter 2019-10-17 11:32:21.906 135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent of_port = self.get_or_create_ofport(port) 2019-10-17 11:32:21.906 135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/var/lib/openstack/lib/python3.6/site-packages/neutron/agent/linux/openvswitch_firewall/firewall.py", line 532, in get_or_create_ofport 2019-10-17 11:32:21.906 135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent port_vlan_id = self._get_port_vlan_tag(ovs_port.port_name) 2019-10-17 11:32:21.906 135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/var/lib/openstack/lib/python3.6/site-packages/neutron/agent/linux/openvswitch_firewall/firewall.py", line 516, in _get_port_vlan_tag 2019-10-17 11:32:21.906 135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent return get_tag_from_other_config(self.int_br.br, port_name) 2019-10-17 11:32:21.906 135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/var/lib/openstack/lib/python3.6/site-packages/neutron/agent/linux/openvswitch_firewall/firewall.py", line 84, in get_tag_from_other_config 2019-10-17 11:32:21.906 135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent port_name=port_name, other_config=other_config) 2019-10-17 11:32:21.906 135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent neutron.agent.linux.openvswitch_firewall.exceptions.OVSFWTagNotFound: Cannot get tag for port o-hm0 from its other_config: {} 2019-10-17 11:32:21.909 135 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-aae68b42-a99f-4bb3-bcf6-a6d3c4ca9e31 - - - - -] Agent out of sync with plugin! this happens in each agent cycle so agent can't do anything. Need to handle OVSFWTagNotFound in prepare_port_filter() like was done for update_port_filter in https://review.opendev.org/#/c/630910/ ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1849098 Title: ovs agent is stuck with OVSFWTagNotFound when dealing with unbound port Status in neutron: In Progress Bug description: neutron-openvswitch-agent meets unbound port: 2019-10-17 11:32:21.868 135 WARNING neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req- aae68b42-a99f-4bb3-bcf6-a6d3c4ca9e31 - - - - -] Device ef34215f-e099-4fd0-935f-c9a42951d166 not defined on plugin or binding failed Later when applying firewall rules: 2019-10-17 11:32:21.901 135 INFO neutron.agent.securitygroups_rpc [req-aae68b42-a99f-4bb3-bcf6-a6d3c4ca9e31 - - - - -] Preparing filters for devices {'ef34215f-e099-4fd0-935f-c9a42951d166', 'e9c97cf0-1a5e-4d77-b57b-0ba474d12e29', 'fff1bb24-6423-4486-87c4-1fe17c552cca', '2e20f9ee-bcb5-445c-b31f-d70d276d45c9', '03a60047-cb07-42a4-8b49-619d5982a9bd', 'a452cea2-deaf-4411-bbae-ce83870cbad4', '79b03e5c-9be0-4808-9784-cb4878c3dbd5', '9b971e75-3c1b-463d-88cf-3f298105fa6e'} 2019-10-17 11:32:21.906 135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-aae68b42-a99f-4bb3-bcf6-a6d3c4ca9e31 - - - - -] Error while processing VIF ports: neutron.agent.linux.openvswitch_firewall.exceptions.OVSFWTagNotFound: Cannot get tag for port o-hm0 from its other_config: {} 2019-10-17 11:32:21.906 135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last): 2019-10-17 11:32:21.906 135 ERROR neutron.plugins.ml2.drivers.openvswitch.ag
[Yahoo-eng-team] [Bug 1839252] [NEW] Connectivity issues due to skb marks on the encapsulating packet
Public bug reported: Looks like by default OVS tunnels inherit skb marks from tunneled packets. As a result Neutron IPTables marks set in qrouter namespace are inherited by VXLAN encapsulating packets. These marks may conflict with marks used by underlying networking (like Calico) and lead to VXLAN tunneled packets being dropped. The proposal is to set 'egress_pkt_mark = 0' explicitly for tunnel ports. The option was added in OVS 2.8.0 (https://www.openvswitch.org/releases/NEWS-2.8.0.txt) ** Affects: neutron Importance: Undecided Assignee: Oleg Bondarev (obondarev) Status: In Progress ** Tags: ovs ovs-lib -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1839252 Title: Connectivity issues due to skb marks on the encapsulating packet Status in neutron: In Progress Bug description: Looks like by default OVS tunnels inherit skb marks from tunneled packets. As a result Neutron IPTables marks set in qrouter namespace are inherited by VXLAN encapsulating packets. These marks may conflict with marks used by underlying networking (like Calico) and lead to VXLAN tunneled packets being dropped. The proposal is to set 'egress_pkt_mark = 0' explicitly for tunnel ports. The option was added in OVS 2.8.0 (https://www.openvswitch.org/releases/NEWS-2.8.0.txt) To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1839252/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1836023] [NEW] OVS agent "hangs" while processing trusted ports
Public bug reported: Queens, ovsdb native interface. On a loaded gtw node hosting > 1000 ports when restarting neutron- openvswitch-agent at some moment agent stops sending state reports and do any logging for a significant time, depending on number of ports. In our case gtw node hosts > 1400 ports and agent hangs for ~100 seconds. Thus if configured agent_down_time is less that 100 seconds, neutron server sees agent as down, starts resources rescheduling. After agent stops hanging it sees itself as "revived" and starts new full sync. This loop is almost endless. Debug showed the culprit is process_trusted_ports: https://github.com/openstack/neutron/blob/13.0.4/neutron/agent/linux/openvswitch_firewall/firewall.py#L655 - this func does not yield control to other greenthreads and blocks until all trusted ports are processed. Since on gateway nodes almost al ports are "trusted" (router and dhcp ports) process_trusted_ports may take significant time. The proposal would be to add greenlet.sleep(0) inside loop in process_trusted_ports - that fixed the issue on our environment. ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: In Progress ** Tags: ovs-fw -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1836023 Title: OVS agent "hangs" while processing trusted ports Status in neutron: In Progress Bug description: Queens, ovsdb native interface. On a loaded gtw node hosting > 1000 ports when restarting neutron- openvswitch-agent at some moment agent stops sending state reports and do any logging for a significant time, depending on number of ports. In our case gtw node hosts > 1400 ports and agent hangs for ~100 seconds. Thus if configured agent_down_time is less that 100 seconds, neutron server sees agent as down, starts resources rescheduling. After agent stops hanging it sees itself as "revived" and starts new full sync. This loop is almost endless. Debug showed the culprit is process_trusted_ports: https://github.com/openstack/neutron/blob/13.0.4/neutron/agent/linux/openvswitch_firewall/firewall.py#L655 - this func does not yield control to other greenthreads and blocks until all trusted ports are processed. Since on gateway nodes almost al ports are "trusted" (router and dhcp ports) process_trusted_ports may take significant time. The proposal would be to add greenlet.sleep(0) inside loop in process_trusted_ports - that fixed the issue on our environment. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1836023/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1835731] [NEW] Neutron server error: failed to update port DOWN
Public bug reported: Before adding extra logging: 2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc [req- 2b7602fb-e990-45ee-974f-ef3b55b41bed - - - - -] Failed to update device d75fca78-2f64-4c5a-9a94-6684c753bf3d down After adding logging: 2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc [req- 2b7602fb-e990-45ee-974f-ef3b55b41bed - - - - -] Failed to update device d75fca78-2f64-4c5a-9a94-6684c753bf3d down: 'NoneType' object has no attribute 'started_at': AttributeError: 'NoneType' object has no attribute 'started_at'2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc Traceback (most recent call last):2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/rpc.py", line 367, in update_device_list2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc **kwargs)2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc File "/usr/lib/python2.7/dist- packages/neutron/plugins/ml2/rpc.py", line 233, in update_device_down2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc n_const.PORT_STATUS_DOWN, host)2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/rpc.py", line 319, in notify_l2pop_port_wiring2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc agent_restarted = l2pop_driver.obj.agent_restarted(port_context)2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc File "/usr/lib/python2.7/dist- packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py", line 253, in agent_restarted2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc if l2pop_db.get_agent_uptime(agent) < cfg.CONF.l2pop.agent_boot_time:2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc File "/usr/lib/python2.7/dist- packages/neutron/plugins/ml2/drivers/l2pop/db.py", line 51, in get_agent_uptime2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc return timeutils.delta_seconds(agent.started_at,2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc AttributeError: 'NoneType' object has no attribute 'started_at' ** Affects: neutron Importance: Undecided Assignee: Oleg Bondarev (obondarev) Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1835731 Title: Neutron server error: failed to update port DOWN Status in neutron: New Bug description: Before adding extra logging: 2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc [req- 2b7602fb-e990-45ee-974f-ef3b55b41bed - - - - -] Failed to update device d75fca78-2f64-4c5a-9a94-6684c753bf3d down After adding logging: 2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc [req- 2b7602fb-e990-45ee-974f-ef3b55b41bed - - - - -] Failed to update device d75fca78-2f64-4c5a-9a94-6684c753bf3d down: 'NoneType' object has no attribute 'started_at': AttributeError: 'NoneType' object has no attribute 'started_at'2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc Traceback (most recent call last):2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/rpc.py", line 367, in update_device_list2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc **kwargs)2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc File "/usr/lib/python2.7/dist- packages/neutron/plugins/ml2/rpc.py", line 233, in update_device_down2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc n_const.PORT_STATUS_DOWN, host)2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/rpc.py", line 319, in notify_l2pop_port_wiring2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc agent_restarted = l2pop_driver.obj.agent_restarted(port_context)2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc File "/usr/lib/python2.7/dist- packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py", line 253, in agent_restarted2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc if l2pop_db.get_agent_uptime(agent) < cfg.CONF.l2pop.agent_boot_time:2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc File "/usr/lib/python2.7/dist- packages/neutron/plugins/ml2/drivers/l2pop/db.py", line 51, in get_agent_uptime2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc return timeutils.delta_seconds(agent.started_at,2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc AttributeError: 'NoneType' object has no attribute 'started_at' To manage notifications about this bug go t
[Yahoo-eng-team] [Bug 1831622] [NEW] SRIOV: agent may not register VFs
Public bug reported: When a VM instantiated with a PF-PT (direct-physical) port, the Neutron SR-IOV agent removes the respective embedded switch device instance from the switch manager. After the VM releases the PF, the associated device (sys/class/net/) appears immediately, but the initialization of its VFs and the creation of the appropriate sysfs entries (/sys/class/net//device/virtfn<#vf>) may even take more than a second, depending on the platform and the NIC's kernel driver capabilities. The Neutron SR-IOV agent eagerly tries to discover and register NIC devices, that are not blacklisted and not yet known, by creating the respective embedded switch instances and enumerating the avalable VFs underneat them. However, when it is done in an early phase, where the sysfs entries for the VFs are not yet present, because the PF has just been released, then a port-less embedded switch will be created to represent that device. As a consequence, port updates that target VFs which are supposed to belong to a incorrectly registered embedded device, won't be treated properly by the agent, causing a VM instantiation timeout. ** Affects: neutron Importance: Undecided Assignee: Oleg Bondarev (obondarev) Status: In Progress ** Tags: sriov-pci-pt -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1831622 Title: SRIOV: agent may not register VFs Status in neutron: In Progress Bug description: When a VM instantiated with a PF-PT (direct-physical) port, the Neutron SR-IOV agent removes the respective embedded switch device instance from the switch manager. After the VM releases the PF, the associated device (sys/class/net/) appears immediately, but the initialization of its VFs and the creation of the appropriate sysfs entries (/sys/class/net//device/virtfn<#vf>) may even take more than a second, depending on the platform and the NIC's kernel driver capabilities. The Neutron SR-IOV agent eagerly tries to discover and register NIC devices, that are not blacklisted and not yet known, by creating the respective embedded switch instances and enumerating the avalable VFs underneat them. However, when it is done in an early phase, where the sysfs entries for the VFs are not yet present, because the PF has just been released, then a port-less embedded switch will be created to represent that device. As a consequence, port updates that target VFs which are supposed to belong to a incorrectly registered embedded device, won't be treated properly by the agent, causing a VM instantiation timeout. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1831622/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1830383] [NEW] SRIOV: MAC address in use error
Public bug reported: When using direct-physical port, the port inherits physical device MAC address on binding. When deleting the VM later - MAC address stays. If try spawn a VM with another direct-physical port - we have "Neutron error: MAC address 0c:c4:7a:de:ae:19 is already in use on network None.: MacAddressInUseClient: Unable to complete operation for network 42915db3-4e46-4150-af9d-86d0c59d765f. The mac address 0c:c4:7a:de:ae:19 is in use." The proposal is to reset port's MAC address when unbinding. ** Affects: neutron Importance: Undecided Assignee: Oleg Bondarev (obondarev) Status: In Progress ** Tags: sriov-pci-pt -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1830383 Title: SRIOV: MAC address in use error Status in neutron: In Progress Bug description: When using direct-physical port, the port inherits physical device MAC address on binding. When deleting the VM later - MAC address stays. If try spawn a VM with another direct-physical port - we have "Neutron error: MAC address 0c:c4:7a:de:ae:19 is already in use on network None.: MacAddressInUseClient: Unable to complete operation for network 42915db3-4e46-4150-af9d-86d0c59d765f. The mac address 0c:c4:7a:de:ae:19 is in use." The proposal is to reset port's MAC address when unbinding. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1830383/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1825521] [NEW] Bulk IPv6 subnet create: update port called within a transaction
m': item}) File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py", line 706, in _create_bulk_ml2 result, mech_context = obj_creator(context, item) File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py", line 1048, in _create_subnet_db self._create_subnet_postcommit(context, result, net_db, ipam_sub) File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 163, in wrapped return method(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/neutron/db/db_base_plugin_v2.py", line 716, in _create_subnet_postcommit self.update_port(context, port_id, port_info) File "/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 673, in inner "transaction.") % f) RuntimeError: Method cannot be called within a transaction. ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1825521 Title: Bulk IPv6 subnet create: update port called within a transaction Status in neutron: New Bug description: When bulk creating auto address IPv6 subnets, port update happens within a transaction: 2019-03-28 15:48:50.894 2377 ERROR neutron.pecan_wsgi.hooks.translation [req-e84aba73-3fc5-4b3f- bf41-a7e762af4bdf 166b7ed45cd6404e884ba63f89e88bf9 2ce6b2792eee4dc88639a3575f1ac7f0 - default default] POST failed.: RuntimeError: Method cannot be called within a transaction. Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/pecan/core.py", line 678, in __call__ self.invoke_controller(controller, args, kwargs, state) File "/usr/lib/python2.7/dist-packages/pecan/core.py", line 569, in invoke_controller result = controller(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 93, in wrapped setattr(e, '_RETRY_EXCEEDED', True) File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 89, in wrapped return f(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/oslo_db/api.py", line 150, in wrapper ectxt.value = e.inner_exc File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "/usr/lib/python2.7/dist-packages/oslo_db/api.py", line 138, in wrapper return f(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 128, in wrapped LOG.debug("Retry wrapper got retriable exception: %s", e) File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 124, in wrapped return f(*dup_args, **dup_kwargs) File "/usr/lib/python2.7/dist-packages/neutron/pecan_wsgi/controllers/utils.py", line 76, in wrapped return f(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/neutron/pecan_wsgi/controllers/resource.py", line 159, in post return self.create(resources) File "/usr/lib/python2.7/dist-packages/neutron/pecan_wsgi/controllers/resource.py", line 177, in create return {key: creator(*creator_args, **creator_kwargs)} File "/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 674, in inner return f(self, context, *args, **kwargs) File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 163, in wrapped return method(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 93, in wrapped setattr(e, '_RETRY_EXCEEDED', True) File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "/usr/lib/python2.7/dist-packages/oslo_utils/
[Yahoo-eng-team] [Bug 1824299] [NEW] Race condition during init may lead to neutron server malfunction
Public bug reported: release: Queens quite a lot of advanced services enabled: "service_plugins = neutron.services.l3_router.l3_router_plugin.L3RouterPlugin,metering,lbaasv2,neutron.services.qos.qos_plugin.QoSPlugin,trunk,networking_l2gw.services.l2gateway.plugin.L2GatewayPlugin,bgpvpn" Neutron server fails to start with repeating sqlalchemy errors "Class 'networking_bgpvpn.neutron.db.bgpvpn_db.BGPVPNPortAssociationRoute' is not mapped" or "Class 'networking_bgpvpn.neutron.db.bgpvpn_db.BGPVPN' is not mapped". The errors happen on handling state reports from agents. So if stop all neutron agents, start server, wait server initialization, and only then start agents - everything is ok. Also it appears that if place 'bgpvpn' in "service_plugins" config closer to beginning: "service_plugins = neutron.services.l3_router.l3_router_plugin.L3RouterPlugin,bgpvpn,metering,lbaasv2,neutron.services.qos.qos_plugin.QoSPlugin,trunk,networking_l2gw.services.l2gateway.plugin.L2GatewayPlugin" - no errors happen, even if not stop neutron agents during server restart. Full log near error trace: 2019-04-09 13:42:27,194.194 15604 INFO sqlalchemy.orm.mapper.Mapper [req-0392737c-e6c8-472f-8685-91190f882862 - - - - -] (BGPVPNPortAssociation|bgpvpn_port_associations) initialize prop routes 2019-04-09 13:42:27,197.197 15604 INFO sqlalchemy.orm.mapper.Mapper [req-917b59e9-2b38-4214-a808-6bf2872d708f - - - - -] (BGPVPNPortAssociationRoute|bgpvpn_port_association_routes) _configure_property(port_association, RelationshipProperty) 2019-04-09 13:42:27,197.197 15604 INFO sqlalchemy.orm.mapper.Mapper [req-917b59e9-2b38-4214-a808-6bf2872d708f - - - - -] (BGPVPNPortAssociationRoute|bgpvpn_port_association_routes) _configure_property(bgpvpn, RelationshipProperty) 2019-04-09 13:42:27,198.198 15604 INFO sqlalchemy.orm.mapper.Mapper [req-917b59e9-2b38-4214-a808-6bf2872d708f - - - - -] (BGPVPNPortAssociationRoute|bgpvpn_port_association_routes) _configure_property(id, Column) 2019-04-09 13:42:27,199.199 15604 INFO sqlalchemy.orm.mapper.Mapper [req-917b59e9-2b38-4214-a808-6bf2872d708f - - - - -] (BGPVPNPortAssociationRoute|bgpvpn_port_association_routes) _configure_property(port_association_id, Column) 2019-04-09 13:42:27,200.200 15604 INFO sqlalchemy.orm.mapper.Mapper [req-917b59e9-2b38-4214-a808-6bf2872d708f - - - - -] (BGPVPNPortAssociationRoute|bgpvpn_port_association_routes) _configure_property(type, Column) 2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server [req-0392737c-e6c8-472f-8685-91190f882862 - - - - -] Exception during message handling: UnmappedClassError: Class 'networking_bgpvpn.neutron.db.bgpvpn_db.BGPVPNPortAssociationRoute' is not mapped 2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 160, in _process_incoming 2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 213, in dispatch 2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 183, in _do_dispatch 2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 161, in wrapped 2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server return method(*args, **kwargs) 2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 91, in wrapped 2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server setattr(e, '_RETRY_EXCEEDED', True) 2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ 2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server self.force_reraise() 2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise 2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb) 2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 87, in wrapped 2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server return f(*args, **kwargs) 2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server
[Yahoo-eng-team] [Bug 1821753] [NEW] openvswitch agent ofctl request errors: 'timed out' and 'Datapath Invalid'
Public bug reported: Release: Queens, ovsdb_interface=native, of_request_timeout = 30 With number of OVS ports growing on the node following errors start to occur (starting at ~1200 ports): ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch [req-db47426c-1719-43dd-8ecf-4fb4bdcbc316 - - - - -] ofctl request version=None,msg_type=None,msg_len=None,xid=None,OFPFlowMod(buffer_id=4294967295,command=0,cookie=5881109557449606263L,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18), OFPActionSetField(tunnel_id=725), OFPActionOutput(len=16,max_len=0,port=1793,type=0), OFPActionOutput(len=16,max_len=0,port=2,type=0)],type=4)],match=OFPMatch(oxm_fields={'vlan_vid': 4175}),out_group=0,out_port=0,priority=1,table_id=22) error Datapath Invalid 64183592930369: InvalidDatapath: Datapath Invalid or ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch [req-632b8ede-1234-4682-afe0-3aefb615b121 - - - - -] ofctl request version=0x4,msg_type=0xe,msg_len=0x78,xid=0x73c67c07,OFPFlow Mod(buffer_id=4294967295,command=0,cookie=5881109557449606263L,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18), OFPActionSetField(tunnel_id=666), OFPActionOu tput(len=16,max_len=0,port=2,type=0)],len=48,type=4)],match=OFPMatch(oxm_fields={'eth_dst': 'fa:16:3e:4a:79:ce', 'vlan_vid': 6107}),out_group=0,out_port=0,priority=2,table_id=20) timed out: Timeout: 30 seconds with corresponding errors is ovs-vswitchd logs: |rconn|ERR|br-tun<->tcp:127.0.0.1:6633: no response to inactivity probe after 5 seconds, disconnecting |rconn|ERR|br-floating<->tcp:127.0.0.1:6633: no response to inactivity probe after 5 seconds, disconnecting |rconn|ERR|br-int<->tcp:127.0.0.1:6633: no response to inactivity probe after 5 seconds, disconnecting Setting inactivity_probe to a greater value helps: #ovs-vsctl set controller br-int inactivity_probe=3 #ovs-vsctl set controller br-tun inactivity_probe=3 #ovs-vsctl set controller br-floating inactivity_probe=3 Should neutron allow setting inactivity_probe for controllers? Should it correspond to of_request_timeout value? ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1821753 Title: openvswitch agent ofctl request errors: 'timed out' and 'Datapath Invalid' Status in neutron: New Bug description: Release: Queens, ovsdb_interface=native, of_request_timeout = 30 With number of OVS ports growing on the node following errors start to occur (starting at ~1200 ports): ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch [req-db47426c-1719-43dd-8ecf-4fb4bdcbc316 - - - - -] ofctl request version=None,msg_type=None,msg_len=None,xid=None,OFPFlowMod(buffer_id=4294967295,command=0,cookie=5881109557449606263L,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18), OFPActionSetField(tunnel_id=725), OFPActionOutput(len=16,max_len=0,port=1793,type=0), OFPActionOutput(len=16,max_len=0,port=2,type=0)],type=4)],match=OFPMatch(oxm_fields={'vlan_vid': 4175}),out_group=0,out_port=0,priority=1,table_id=22) error Datapath Invalid 64183592930369: InvalidDatapath: Datapath Invalid or ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch [req-632b8ede-1234-4682-afe0-3aefb615b121 - - - - -] ofctl request version=0x4,msg_type=0xe,msg_len=0x78,xid=0x73c67c07,OFPFlow Mod(buffer_id=4294967295,command=0,cookie=5881109557449606263L,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18), OFPActionSetField(tunnel_id=666), OFPActionOu tput(len=16,max_len=0,port=2,type=0)],len=48,type=4)],match=OFPMatch(oxm_fields={'eth_dst': 'fa:16:3e:4a:79:ce', 'vlan_vid': 6107}),out_group=0,out_port=0,priority=2,table_id=20) timed out: Timeout: 30 seconds with corresponding errors is ovs-vswitchd logs: |rconn|ERR|br-tun<->tcp:127.0.0.1:6633: no response to inactivity probe after 5 seconds, disconnecting |rconn|ERR|br-floating<->tcp:127.0.0.1:6633: no response to inactivity probe after 5 seconds, disconnecting |rconn|ERR|br-int<->tcp:127.0.0.1:6633: no response to inactivity probe after 5 seconds, disconnecting Setting inactivity_probe to a greater value helps: #ovs-vsctl set controller br-int inactivity_probe=3 #ovs-vsctl set controller br-tun inactivity_probe=3 #ovs-vsctl set controller br-floating inactivity_probe=3 Should neutron allow setting inactivity_probe for controllers? Should it correspond to of_request_timeout value? To manage notifications about this bug go to:
[Yahoo-eng-team] [Bug 1817306] [NEW] Failed gARP for floating IP in l3 agent logs
Public bug reported: release: Pike. When a DVR router with centralized floating IPs is rescheduled from down l3 agent to another, following traces are seen on the destination agent: 2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib [-] Failed sending gratuitous ARP to 10.13.250.14 on qg-af0de258-a8 in namespace snat-afe70a67-a007-4bcf-93ac-099aad63411c: Exit code: 2; Stdin: ; Stdout: ; Stderr: bind: Cannot assign requested address : ProcessExecutionError: Exit code: 2; Stdin: ; Stdout: ; Stderr: bind: Cannot assign requested address 2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib Traceback (most recent call last): 2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 1097, in _arping 2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib ip_wrapper.netns.execute(arping_cmd, extra_ok_codes=[1]) 2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 916, in execute 2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib log_fail_as_error=log_fail_as_error, **kwargs) 2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 151, in execute 2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib raise ProcessExecutionError(msg, returncode=returncode) 2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib ProcessExecutionError: Exit code: 2; Stdin: ; Stdout: ; Stderr: bind: Cannot assign requested address Earlier in logs following is seen: 2019-02-22 07:52:14,894.894 9528 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'snat-afe70a67-a007-4bcf-93ac-099aad63411c', 'ip', '-4', 'addr', 'del', '10.13.250.14/32', 'dev', 'qg-af0de258-a8'] execute_rootwrap_daemon /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:108 2019-02-22 07:52:14,922.922 9528 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'snat-afe70a67-a007-4bcf-93ac-099aad63411c', 'conntrack', '-D', '-d', '10.13.250.14'] execute_rootwrap_daemon /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:108 So centralized floating ip is deleted from gw device for some reason. ** Affects: neutron Importance: Undecided Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l3-dvr-backlog -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1817306 Title: Failed gARP for floating IP in l3 agent logs Status in neutron: New Bug description: release: Pike. When a DVR router with centralized floating IPs is rescheduled from down l3 agent to another, following traces are seen on the destination agent: 2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib [-] Failed sending gratuitous ARP to 10.13.250.14 on qg-af0de258-a8 in namespace snat-afe70a67-a007-4bcf-93ac-099aad63411c: Exit code: 2; Stdin: ; Stdout: ; Stderr: bind: Cannot assign requested address : ProcessExecutionError: Exit code: 2; Stdin: ; Stdout: ; Stderr: bind: Cannot assign requested address 2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib Traceback (most recent call last): 2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 1097, in _arping 2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib ip_wrapper.netns.execute(arping_cmd, extra_ok_codes=[1]) 2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 916, in execute 2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib log_fail_as_error=log_fail_as_error, **kwargs) 2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 151, in execute 2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib raise ProcessExecutionError(msg, returncode=returncode) 2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib ProcessExecutionError: Exit code: 2; Stdin: ; Stdout: ; Stderr: bind: Cannot assign requested address Earlier in logs following is seen: 2019-02-22 07:52:14,894.894 9528 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'snat-afe70a67-a007-4bcf-93ac-099aad63411c', 'ip', '-4', 'addr', 'del', '10.13.250.14/32', 'dev', 'qg-af0de258-a8'] execute_rootwrap_daemon /usr/lib/python2.7/dist-packages/neutron/agent
[Yahoo-eng-team] [Bug 1808136] [NEW] l2 pop doesn't always provide the whole list of fdb entries on OVS restart
Public bug reported: bug https://bugs.launchpad.net/neutron/+bug/1804842 was fixed, but there is still a race condition, which leads to the same issue: success scenario: - OVS is restarted - agent start_flag set to True - report state done - ports updated, server sends fdb entries as it sees agent as restarted fail scenario: - OVS is restarted - agent start_flag set to True - ports updated, server doesn't send fdb entries as report state with start flag has not yet come - report state done - no fdb entries on agent (since no update port messages to server within config.agent_boot_time sec) The proposal is to force state report right after setting start_flag. No side effects expected. ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l2-pop -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1808136 Title: l2 pop doesn't always provide the whole list of fdb entries on OVS restart Status in neutron: New Bug description: bug https://bugs.launchpad.net/neutron/+bug/1804842 was fixed, but there is still a race condition, which leads to the same issue: success scenario: - OVS is restarted - agent start_flag set to True - report state done - ports updated, server sends fdb entries as it sees agent as restarted fail scenario: - OVS is restarted - agent start_flag set to True - ports updated, server doesn't send fdb entries as report state with start flag has not yet come - report state done - no fdb entries on agent (since no update port messages to server within config.agent_boot_time sec) The proposal is to force state report right after setting start_flag. No side effects expected. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1808136/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1800157] [NEW] privsep: lack of capabilities on kernel 4.15
Public bug reported: l3 and dhcp agents are not functioning on kernel 4.15 due to privsep errors: 2018-10-25 09:10:38,747.747 24060 INFO oslo.privsep.daemon [-] Running privsep helper: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'privsep-helper', '--config-file', '/etc/neutron/l3_agent.ini', '--config-file', '/etc/neutron/fwaas_driver.ini', '--config-file', '/etc/neutron/neutron.conf', '--privsep_context', 'neutron.privileged.default', '--privsep_sock_path', '/tmp/tmpS5k5y2/privsep.sock'] 2018-10-25 09:10:39,361.361 24060 WARNING oslo.privsep.daemon [-] privsep log: Error in sys.excepthook: 2018-10-25 09:10:39,363.363 24060 WARNING oslo.privsep.daemon [-] privsep log: Traceback (most recent call last): 2018-10-25 09:10:39,363.363 24060 WARNING oslo.privsep.daemon [-] privsep log: File "/usr/lib/python2.7/dist-packages/oslo_log/log.py", line 193, in logging_excepthook 2018-10-25 09:10:39,364.364 24060 WARNING oslo.privsep.daemon [-] privsep log: getLogger(product_name).critical('Unhandled error', **extra) 2018-10-25 09:10:39,365.365 24060 WARNING oslo.privsep.daemon [-] privsep log: File "/usr/lib/python2.7/logging/__init__.py", line 1481, in critical 2018-10-25 09:10:39,365.365 24060 WARNING oslo.privsep.daemon [-] privsep log: self.logger.critical(msg, *args, **kwargs) 2018-10-25 09:10:39,366.366 24060 WARNING oslo.privsep.daemon [-] privsep log: File "/usr/lib/python2.7/logging/__init__.py", line 1212, in critical 2018-10-25 09:10:39,366.366 24060 WARNING oslo.privsep.daemon [-] privsep log: self._log(CRITICAL, msg, args, **kwargs) 2018-10-25 09:10:39,367.367 24060 WARNING oslo.privsep.daemon [-] privsep log: File "/usr/lib/python2.7/logging/__init__.py", line 1286, in _log 2018-10-25 09:10:39,367.367 24060 WARNING oslo.privsep.daemon [-] privsep log: self.handle(record) 2018-10-25 09:10:39,368.368 24060 WARNING oslo.privsep.daemon [-] privsep log: File "/usr/lib/python2.7/logging/__init__.py", line 1296, in handle 2018-10-25 09:10:39,368.368 24060 WARNING oslo.privsep.daemon [-] privsep log: self.callHandlers(record) 2018-10-25 09:10:39,369.369 24060 WARNING oslo.privsep.daemon [-] privsep log: File "/usr/lib/python2.7/logging/__init__.py", line 1336, in callHandlers 2018-10-25 09:10:39,370.370 24060 WARNING oslo.privsep.daemon [-] privsep log: hdlr.handle(record) 2018-10-25 09:10:39,370.370 24060 WARNING oslo.privsep.daemon [-] privsep log: File "/usr/lib/python2.7/logging/__init__.py", line 759, in handle 2018-10-25 09:10:39,371.371 24060 WARNING oslo.privsep.daemon [-] privsep log: self.emit(record) 2018-10-25 09:10:39,371.371 24060 WARNING oslo.privsep.daemon [-] privsep log: File "/usr/lib/python2.7/logging/handlers.py", line 414, in emit 2018-10-25 09:10:39,372.372 24060 WARNING oslo.privsep.daemon [-] privsep log: sres = os.stat(self.baseFilename) 2018-10-25 09:10:39,372.372 24060 WARNING oslo.privsep.daemon [-] privsep log: OSError: [Errno 13] Permission denied: '/var/log/neutron/neutron.log' ... 24060 ERROR neutron.agent.l3.agent FailedToDropPrivileges: Privsep daemon failed to start ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: In Progress ** Tags: l3-dvr-backlog l3-ipam-dhcp -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1800157 Title: privsep: lack of capabilities on kernel 4.15 Status in neutron: In Progress Bug description: l3 and dhcp agents are not functioning on kernel 4.15 due to privsep errors: 2018-10-25 09:10:38,747.747 24060 INFO oslo.privsep.daemon [-] Running privsep helper: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'privsep-helper', '--config-file', '/etc/neutron/l3_agent.ini', '--config-file', '/etc/neutron/fwaas_driver.ini', '--config-file', '/etc/neutron/neutron.conf', '--privsep_context', 'neutron.privileged.default', '--privsep_sock_path', '/tmp/tmpS5k5y2/privsep.sock'] 2018-10-25 09:10:39,361.361 24060 WARNING oslo.privsep.daemon [-] privsep log: Error in sys.excepthook: 2018-10-25 09:10:39,363.363 24060 WARNING oslo.privsep.daemon [-] privsep log: Traceback (most recent call last): 2018-10-25 09:10:39,363.363 24060 WARNING oslo.privsep.daemon [-] privsep log: File "/usr/lib/python2.7/dist-packages/oslo_log/log.py", line 193, in logging_excepthook 2018-10-25 09:10:39,364.364 24060 WARNING oslo.privsep.daemon [-] privsep log: getLogger(product_name).critical('Unhandled error', **extra) 2018-10-25 09:10:39,365.365 24060 WARNING oslo.privsep.daemon [-] privsep log: File "/usr/lib/python2.7/logging/__init__.py", line 1481, in critical 2018-10-25 09:10:39,365.365 24060 WARNING oslo.privsep.daemon [-] privsep log: self.logger.critical(ms
[Yahoo-eng-team] [Bug 1799178] [NEW] l2 pop doesn't always provide the whole list of fdb entries on agent restart
Public bug reported: The whole list of fdb entries is provided to the agent in case a port form new network appears, or when agent is restarted. Currently agent restart is detected by agent_boot_time option, 180 sec by default. In fact boot time differs depending on port count and on some loaded clusters may exceed 180 secs on gateway nodes easily. Changing boot time in config works, but honestly this is not an ideal solution. There should be a smarter way for agent restart detection (like agent itself sending flag in state report). ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1799178 Title: l2 pop doesn't always provide the whole list of fdb entries on agent restart Status in neutron: New Bug description: The whole list of fdb entries is provided to the agent in case a port form new network appears, or when agent is restarted. Currently agent restart is detected by agent_boot_time option, 180 sec by default. In fact boot time differs depending on port count and on some loaded clusters may exceed 180 secs on gateway nodes easily. Changing boot time in config works, but honestly this is not an ideal solution. There should be a smarter way for agent restart detection (like agent itself sending flag in state report). To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1799178/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1789846] [NEW] l2_pop flows missing when spawning VMs at a high rate
Public bug reported: version: Pike, DVR enabled, 28 compute nodes scenario: spawn 140 VMs concurrently, with pre-created neutron ports issue: on some compute nodes VMs cannot get IP address, no reply on dhcp broadcasts. All new VMs spawned on the same compute node in the same network have this problem. it appears that flood table on compute nodes with issue is missing outputs to dhcp nodes: cookie=0x679aebcfbb8dc9a2, duration=296.991s, table=22, n_packets=2, n_bytes=220, priority=1,dl_vlan=4 actions=strip_vlan,load:0x14->NXM_NX_TUN_ID[],output:"vxlan- ac1ef47a",output:"vxlan-ac1ef480",output:"vxlan-ac1ef46d",output:"vxlan- ac1ef477",... ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: Confirmed ** Tags: l2-pop l3-dvr-backlog -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1789846 Title: l2_pop flows missing when spawning VMs at a high rate Status in neutron: Confirmed Bug description: version: Pike, DVR enabled, 28 compute nodes scenario: spawn 140 VMs concurrently, with pre-created neutron ports issue: on some compute nodes VMs cannot get IP address, no reply on dhcp broadcasts. All new VMs spawned on the same compute node in the same network have this problem. it appears that flood table on compute nodes with issue is missing outputs to dhcp nodes: cookie=0x679aebcfbb8dc9a2, duration=296.991s, table=22, n_packets=2, n_bytes=220, priority=1,dl_vlan=4 actions=strip_vlan,load:0x14->NXM_NX_TUN_ID[],output:"vxlan- ac1ef47a",output:"vxlan-ac1ef480",output:"vxlan-ac1ef46d",output :"vxlan-ac1ef477",... To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1789846/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1783728] [NEW] DVR: router interface creation failure (load testing)
Public bug reported: Release: Pike. Under load (10 parallel threads) several requests for router interface creation failed. No error logs seen, just several DB retries for StaleDataError in standardattributes table. Server response: INFO neutron.api.v2.resource [req-...] add_router_interface failed (client error): The server could not comply with the request since it is either malformed or otherwise incorrect. INFO neutron.wsgi [req-...] 125.22.218.43,172.30.121.16 "PUT /v2.0/routers/d63811e1-9e54-467b-a4d3-8c407e38ccf2/add_router_interface HTTP/1.1" status: 400 len: 365 time: 4.040 Analysis to follow. ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: Confirmed ** Tags: l3-dvr-backlog ** Description changed: Under load (10 parallel threads) several requests for router interface creation failed. No error logs seen, just several DB retries for StaleDataError in standardattributes table. + Server response: + + INFO neutron.api.v2.resource [req-...] add_router_interface failed (client error): The server could not comply with the request since it is either malformed or otherwise incorrect. + INFO neutron.wsgi [req-...] 125.22.218.43,172.30.121.16 "PUT /v2.0/routers/d63811e1-9e54-467b-a4d3-8c407e38ccf2/add_router_interface HTTP/1.1" status: 400 len: 365 time: 4.040 Analysis to follow. -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1783728 Title: DVR: router interface creation failure (load testing) Status in neutron: Confirmed Bug description: Release: Pike. Under load (10 parallel threads) several requests for router interface creation failed. No error logs seen, just several DB retries for StaleDataError in standardattributes table. Server response: INFO neutron.api.v2.resource [req-...] add_router_interface failed (client error): The server could not comply with the request since it is either malformed or otherwise incorrect. INFO neutron.wsgi [req-...] 125.22.218.43,172.30.121.16 "PUT /v2.0/routers/d63811e1-9e54-467b-a4d3-8c407e38ccf2/add_router_interface HTTP/1.1" status: 400 len: 365 time: 4.040 Analysis to follow. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1783728/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1447227] Re: Connecting two or more distributed routers to a subnet doesn't work properly
Works for me on Mitaka and on master, followed steps from John's comment #6. Just added host-route on a subnet connected to 2 DVR routers instead of manual adding a static route on VM. Marking as invalid. ** Changed in: neutron Status: Incomplete => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1447227 Title: Connecting two or more distributed routers to a subnet doesn't work properly Status in neutron: Invalid Bug description: DVR code currently assumes that only one router may be attached to a subnet but this is not the case. OVS flows for example will not work correctly for E/W traffic as incoming traffic is always assumed to be coming from one of the two routers. The simple solution is to block the attachment of a distributed router to a subnet already attached to another distributed router. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1447227/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1447227] Re: Connecting two or more distributed routers to a subnet doesn't work properly
Shouldn't this case be handled by specifying the proper host routes for such a subnet (connected to several routers)? ** Changed in: neutron Status: Confirmed => Opinion -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1447227 Title: Connecting two or more distributed routers to a subnet doesn't work properly Status in neutron: Opinion Bug description: DVR code currently assumes that only one router may be attached to a subnet but this is not the case. OVS flows for example will not work correctly for E/W traffic as incoming traffic is always assumed to be coming from one of the two routers. The simple solution is to block the attachment of a distributed router to a subnet already attached to another distributed router. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1447227/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1702635] [NEW] SR-IOV: sometimes a port may hang in BUILD state
Public bug reported: Scenario: 1) vfio-pci driver is used for VFs 2) 2 ports are created in neutron with binding type 'direct' 3) VMs are spawned and deleted on 2 compute nodes using pre-created ports 4) one neutron port may be bound to different compute nodes at different moments 5) for some reason (probably a bug, but current bug is not about it) vfio-pci is not properly handling VF reset after VM deletion and for sriov agent it looks like some port's MAC is still mapped to some PCI slot though the port is not bound to the node 6) sriov agent requests port info from server with get_devices_details_list() but doesn't specify 'host' in parameters 7) in this case neutron server sets this port to BUILD, though it may be bound to another host: def _get_new_status(self, host, port_context): port = port_context.current if not host or host == port_context.host: new_status = (n_const.PORT_STATUS_BUILD if port['admin_state_up'] else n_const.PORT_STATUS_DOWN) if port['status'] != new_status: return new_status 8) after processing, the agent notifies server with update_device_list() and this time specifies 'host' parameter 9) server detects port's and agent's host mismatch and doesn't update status of the port 10) port stays in BUILD state A simple fix would be to specify host at step 6 - in this case neutron server won't set port's status to BUILD because of host mismatch. ** Affects: neutron Importance: Medium Assignee: Oleg Bondarev (obondarev) Status: Confirmed ** Tags: sriov-pci-pt ** Description changed: Scenario: 1) vfio-pci driver is used for VFs 2) 2 ports are created in neutron with binding type 'direct' 3) VMs are spawned and deleted on 2 compute nodes using pre-created ports - 4) one neutron port may be bound to different compute nodes at different moments - 5) for some reason (probably a bug, but current bug is not about it) vfio-pci is not properly -handling VF reset after VM deletion and for sriov agent it looks like some port's MAC is -still mapped to some PCI slot though the port is not bound to the node - 6) sriov agent requests port info from server with get_devices_details_list() but doesn't specify 'host' in parameters - 7) in this case neutron server sets this port to BUILD, though it may be bound to another host: + 4) one neutron port may be bound to different compute nodes at different +moments + 5) for some reason (probably a bug, but current bug is not about it) +vfio-pci is not properly handling VF reset after VM deletion and for +sriov agent it looks like some port's MAC is still mapped to some PCI +slot though the port is not bound to the node + 6) sriov agent requests port info from server with +get_devices_details_list() but doesn't specify 'host' in parameters + 7) in this case neutron server sets this port to BUILD, though it may be +bound to another host: - def _get_new_status(self, host, port_context): - port = port_context.current + def _get_new_status(self, host, port_context): + port = port_context.current >> if not host or host == port_context.host: - new_status = (n_const.PORT_STATUS_BUILD if port['admin_state_up'] - else n_const.PORT_STATUS_DOWN) - if port['status'] != new_status: - return new_status + new_status = (n_const.PORT_STATUS_BUILD if port['admin_state_up'] + else n_const.PORT_STATUS_DOWN) + if port['status'] != new_status: + return new_status 8) after processing, the agent notifies server with update_device_list() and this time specifies 'host' parameter 9) server detects port's and agent's host mismatch and doesn't update status of the port 10) port stays in BUILD state A simple fix would be to specify host at step 6 - in this case neutron server won't set port's status to BUILD because of host mismatch. ** Description changed: Scenario: 1) vfio-pci driver is used for VFs 2) 2 ports are created in neutron with binding type 'direct' 3) VMs are spawned and deleted on 2 compute nodes using pre-created ports - 4) one neutron port may be bound to different compute nodes at different -moments - 5) for some reason (probably a bug, but current bug is not about it) -vfio-pci is not properly handling VF reset after VM deletion and for -sriov agent it looks like some port's MAC is still mapped to some PCI -slot though the port is not bound to the node - 6) sriov agent requests port info from server with -get_devices_details_list() but doesn't specify 'host' in parameters - 7) in this case neutron server sets this port to BUILD, though it may be -bound to another host: + 4) one neutron port may be bound to different compute nodes at different + momen
[Yahoo-eng-team] [Bug 1678104] [NEW] DHCP may not work on a dualstack network
Public bug reported: There might be race between ipv6 auto-address subnet create and network dhcp port create. Neutron server adds an ipv6 address to a dhcp port in two cases: 1) network already has ipv6 subnet by the time dhcp agent requests dhcp port creation - in this case agent includes both subnets into requested IPs of the port and both get allocated; 2) ipv6 subnet is created after the network already has dhcp port existing - ipv6 IP then gets allocated on the dhcp port as part of subnet creation on the server side; The bug is with the third case: 3) ipv6 subnet and dhcp port are created at the same time: so no ipv6 IP is requested for dhcp port by dhcp agent, as well as no ipv6 address is added to dhcp port as part of subnet creation; In this case dhcp agent tries to reprocess network after subnet/port creation and updates IPs on the dhcp port: 2017-03-30 05:12:38.990 29848 DEBUG neutron.api.rpc.handlers.dhcp_rpc [req-bcd62396-0e9a-4f39-8bf7-e56f0588805c - - - - -] Update dhcp port {u'port': {u'network_id': u'29d6752b-027a-4eb9-aa73-711eff1b58ca', 'binding:host_id': u'node-2.test.domain.local', u'fixed_ips': [{u'subnet_id': u'3f81f975-5718-4bdc-878c-614f22b1b783', u'ip_address': u'192.168.100.2'}, {u'subnet_id': u'8363ac60-c30d-43dc- a1d1-3d39820602fd'}]}, 'id': u'2e3a5343-a995-498a-85d2-db686d119fab'} Server ignores ipv6 auto-address subnets in this request. So agent says: 2017-03-24 01:51:41.661 28865 DEBUG neutron.agent.linux.dhcp [req-16627de0-3069-4aa8-bfe1-3db559811c53 - - - - -] Requested DHCP port with IPs on subnets set([u'f22021f4-c876-4d94-be93-d24ccb6b0e31', u'869f4abf-c440-4b65-a9be-074776fadaf1']) but only got IPs on subnets set([u'869f4abf-c440-4b65-a9be-074776fadaf1']) ... 2017-03-24 01:51:41.775 28865 DEBUG neutron.agent.dhcp.agent [req-16627de0-3069-4aa8-bfe1-3db559811c53 - - - - -] Error configuring DHCP port, scheduling resync: Subnet on port 798cefef-c4c1-482e-bbc0-acea52e6490d does not match the requested subnet f22021f4-c876-4d94-be93-d24ccb6b0e31. call_driver /usr/lib/python2.7/dist-packages/neutron/agent/dhcp/agent.py:124 DHCP agent keeps trying, dhcp doesn't work. ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l3-ipam-dhcp -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1678104 Title: DHCP may not work on a dualstack network Status in neutron: New Bug description: There might be race between ipv6 auto-address subnet create and network dhcp port create. Neutron server adds an ipv6 address to a dhcp port in two cases: 1) network already has ipv6 subnet by the time dhcp agent requests dhcp port creation - in this case agent includes both subnets into requested IPs of the port and both get allocated; 2) ipv6 subnet is created after the network already has dhcp port existing - ipv6 IP then gets allocated on the dhcp port as part of subnet creation on the server side; The bug is with the third case: 3) ipv6 subnet and dhcp port are created at the same time: so no ipv6 IP is requested for dhcp port by dhcp agent, as well as no ipv6 address is added to dhcp port as part of subnet creation; In this case dhcp agent tries to reprocess network after subnet/port creation and updates IPs on the dhcp port: 2017-03-30 05:12:38.990 29848 DEBUG neutron.api.rpc.handlers.dhcp_rpc [req-bcd62396-0e9a-4f39-8bf7-e56f0588805c - - - - -] Update dhcp port {u'port': {u'network_id': u'29d6752b-027a-4eb9-aa73-711eff1b58ca', 'binding:host_id': u'node-2.test.domain.local', u'fixed_ips': [{u'subnet_id': u'3f81f975-5718-4bdc-878c-614f22b1b783', u'ip_address': u'192.168.100.2'}, {u'subnet_id': u'8363ac60-c30d-43dc- a1d1-3d39820602fd'}]}, 'id': u'2e3a5343-a995-498a-85d2-db686d119fab'} Server ignores ipv6 auto-address subnets in this request. So agent says: 2017-03-24 01:51:41.661 28865 DEBUG neutron.agent.linux.dhcp [req-16627de0-3069-4aa8-bfe1-3db559811c53 - - - - -] Requested DHCP port with IPs on subnets set([u'f22021f4-c876-4d94-be93-d24ccb6b0e31', u'869f4abf-c440-4b65-a9be-074776fadaf1']) but only got IPs on subnets set([u'869f4abf-c440-4b65-a9be-074776fadaf1']) ... 2017-03-24 01:51:41.775 28865 DEBUG neutron.agent.dhcp.agent [req-16627de0-3069-4aa8-bfe1-3db559811c53 - - - - -] Error configuring DHCP port, scheduling resync: Subnet on port 798cefef-c4c1-482e-bbc0-acea52e6490d does not match the requested subnet f22021f4-c876-4d94-be93-d24ccb6b0e31. call_driver /usr/lib/python2.7/dist-packages/neutron/agent/dhcp/agent.py:124 DHCP agent keeps trying, dhcp doesn't work. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1678104/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo
[Yahoo-eng-team] [Bug 1666549] Re: Infinite router update in neutron L3 agent (HA)
** Project changed: neutron => mos ** Tags added: area-neutron -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1666549 Title: Infinite router update in neutron L3 agent (HA) Status in Mirantis OpenStack: New Bug description: After fresh deployment of environment and launched ostf tests (or rally), neutron l3 agent logs on nodes filled (every .003 second timestamp) with such traces: http://paste.openstack.org/show/599851/ which causes cluster fall when log partition will filled up. Environment: Fuel 9.0 upgraded to 9.2, fresh install 3 controllers/kafka + 3 computes + 4 storage ceph-osd + 1 LMA nodes neutron agents 8.3.0: neutron-dhcp-agent 2:8.3.0-1~u14.04+mos30 all OpenStack virtual network service - DHCP agent neutron-l3-agent 2:8.3.0-1~u14.04+mos30 all OpenStack virtual network service - l3 agent neutron-lbaasv2-agent2:8.3.0-2~u14.04+mos1 all Neutron is a virtual network service for Openstack - LBaaSv2 agent neutron-metadata-agent 2:8.3.0-1~u14.04+mos30 all OpenStack virtual network service - metadata agent neutron-openvswitch-agent2:8.3.0-1~u14.04+mos30 all OpenStack virtual network service - Open vSwitch agent Steps to reproduce: 1. Deploy openstack witj Fuel 9.2 2. Create rally venv and run scenario NeutronNetworks.create_and_delete_routers (concurrency 100 and times 100, or more) 3. /var/log/neutron/l3-agent.log full of these traces. To manage notifications about this bug go to: https://bugs.launchpad.net/mos/+bug/1666549/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1660305] [NEW] DVR multinode job fails over 20 tests
Public bug reported: Example: http://logs.openstack.org/39/426339/2/check/gate-tempest-dsvm- neutron-dvr-multinode-full-ubuntu-xenial- nv/b31bdd2/logs/testr_results.html.gz Mostly connectivity failures: cannot ssh to instance, floating IP not ACTIVE, etc. ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: Confirmed ** Tags: gate-failure l3-dvr-backlog -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1660305 Title: DVR multinode job fails over 20 tests Status in neutron: Confirmed Bug description: Example: http://logs.openstack.org/39/426339/2/check/gate-tempest- dsvm-neutron-dvr-multinode-full-ubuntu-xenial- nv/b31bdd2/logs/testr_results.html.gz Mostly connectivity failures: cannot ssh to instance, floating IP not ACTIVE, etc. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1660305/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1657476] [NEW] Metadata agent fails to serve requests in python 3
Public bug reported: from http://logs.openstack.org/09/421209/7/experimental/gate-tempest- dsvm-nova-py35-ubuntu-xenial/2dda79b/logs/screen-q-meta.txt.gz: Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/eventlet/greenpool.py", line 82, in _spawn_n_impl func(*args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/eventlet/wsgi.py", line 719, in process_request proto.__init__(sock, address, self) File "/opt/stack/new/neutron/neutron/agent/linux/utils.py", line 409, in __init__ server) File "/usr/lib/python3.5/socketserver.py", line 681, in __init__ self.handle() File "/usr/lib/python3.5/http/server.py", line 422, in handle self.handle_one_request() File "/usr/local/lib/python3.5/dist-packages/eventlet/wsgi.py", line 379, in handle_one_request self.environ = self.get_environ() File "/usr/local/lib/python3.5/dist-packages/eventlet/wsgi.py", line 593, in get_environ env['REMOTE_ADDR'] = self.client_address[0] IndexError: index out of range ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: Confirmed ** Tags: py34 -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1657476 Title: Metadata agent fails to serve requests in python 3 Status in neutron: Confirmed Bug description: from http://logs.openstack.org/09/421209/7/experimental/gate-tempest- dsvm-nova-py35-ubuntu-xenial/2dda79b/logs/screen-q-meta.txt.gz: Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/eventlet/greenpool.py", line 82, in _spawn_n_impl func(*args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/eventlet/wsgi.py", line 719, in process_request proto.__init__(sock, address, self) File "/opt/stack/new/neutron/neutron/agent/linux/utils.py", line 409, in __init__ server) File "/usr/lib/python3.5/socketserver.py", line 681, in __init__ self.handle() File "/usr/lib/python3.5/http/server.py", line 422, in handle self.handle_one_request() File "/usr/local/lib/python3.5/dist-packages/eventlet/wsgi.py", line 379, in handle_one_request self.environ = self.get_environ() File "/usr/local/lib/python3.5/dist-packages/eventlet/wsgi.py", line 593, in get_environ env['REMOTE_ADDR'] = self.client_address[0] IndexError: index out of range To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1657476/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1629816] [NEW] Misleading "DVR: Duplicate DVR router interface detected for subnet"
Public bug reported: The error message is seen on each ovs agent resync on compute node where there are dvr serviced ports. Resync can be triggered by any error - this is unrelated to this bug. The error message appears on processing distributed router port for a subnet which is already in local_dvr_map of the agent, see _bind_distributed_router_interface_port in ovs_dvr_neutron_agent.py: if subnet_uuid in self.local_dvr_map: ldm = self.local_dvr_map[subnet_uuid] csnat_ofport = ldm.get_csnat_ofport() if csnat_ofport == constants.OFPORT_INVALID: LOG.error(_LE("DVR: Duplicate DVR router interface detected " "for subnet %s"), subnet_uuid) return where csnat_ofport = OFPORT_INVALID by default and can only change when the agent processes csnat port of the router - this will never happen on compute node and we'll see the misleading log. The proposal would be to delete the condition and the log as they're useless. ** Affects: neutron Importance: Medium Assignee: Oleg Bondarev (obondarev) Status: Confirmed ** Tags: l3-dvr-backlog -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1629816 Title: Misleading "DVR: Duplicate DVR router interface detected for subnet" Status in neutron: Confirmed Bug description: The error message is seen on each ovs agent resync on compute node where there are dvr serviced ports. Resync can be triggered by any error - this is unrelated to this bug. The error message appears on processing distributed router port for a subnet which is already in local_dvr_map of the agent, see _bind_distributed_router_interface_port in ovs_dvr_neutron_agent.py: if subnet_uuid in self.local_dvr_map: ldm = self.local_dvr_map[subnet_uuid] csnat_ofport = ldm.get_csnat_ofport() if csnat_ofport == constants.OFPORT_INVALID: LOG.error(_LE("DVR: Duplicate DVR router interface detected " "for subnet %s"), subnet_uuid) return where csnat_ofport = OFPORT_INVALID by default and can only change when the agent processes csnat port of the router - this will never happen on compute node and we'll see the misleading log. The proposal would be to delete the condition and the log as they're useless. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1629816/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1628017] Re: unable to access vm by floating ip from vm without floating
Ok, so "Connection refused" was a result of stale ip address on rfp device that was not deleted after l3 agent restart with new code. If recreate instances/floating ip from scratch everything works fine. I'm going to backport the fix to stable/mitaka. Marking this as invalid since the problem should be fixed on master. ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1628017 Title: unable to access vm by floating ip from vm without floating Status in neutron: Invalid Bug description: Steps to reproduce: 1. create 2 machines in one internal network. Make sure that VMs created on one compute node 2. assign floating ip to one vm 3. try to connect from second vm (without floating ip) to vm with floating ip like nc -v floating_ip 22 Expected result === nc -v 192.168.120.116 22 Connection to 192.168.120.116 22 port [tcp/ssh] succeeded! SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 Actual result = nc -v 192.168.120.116 22 nc: connect to 192.168.120.116 port 22 (tcp) failed: Connection timed out BTW, if I ping floating ip - I get internal icmp response ping 192.168.120.116 PING 192.168.120.116 (192.168.120.116) 56(84) bytes of data. 64 bytes from 192.168.111.20: icmp_seq=1 ttl=64 time=0.513 ms 64 bytes from 192.168.111.20: icmp_seq=2 ttl=64 time=0.538 ms This bug could be reproduced only if VMs created on the same compute node, if I migrate one VM to another - I am able to access floating ip Environment === 1. Mirantis Openstack 9.0.1 with DVR enabled To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1628017/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1625557] [NEW] Concurrent security groups creation fails with DBDuplicateEntry
Public bug reported: - create_security_group() is wrapped with a db retry decorator - it calls _ensure_default_security_group() to create a default security group for a tenant if one does not exist - _ensure_default_security_group() in turn calls back to create_security_group() to create a default security group - due to concurrency the creation of default security group my fail with DBDuplicateEntry - this is retried for max attempts and the request eventually fails Traceback: http://paste.openstack.org/show/581903/ Example of failed job in rally: http://logs.openstack.org/04/371604/1/check/gate-rally-dsvm-neutron-rally/b1c384d/ ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: In Progress ** Tags: sg-fw ** Changed in: neutron Status: New => In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1625557 Title: Concurrent security groups creation fails with DBDuplicateEntry Status in neutron: In Progress Bug description: - create_security_group() is wrapped with a db retry decorator - it calls _ensure_default_security_group() to create a default security group for a tenant if one does not exist - _ensure_default_security_group() in turn calls back to create_security_group() to create a default security group - due to concurrency the creation of default security group my fail with DBDuplicateEntry - this is retried for max attempts and the request eventually fails Traceback: http://paste.openstack.org/show/581903/ Example of failed job in rally: http://logs.openstack.org/04/371604/1/check/gate-rally-dsvm-neutron-rally/b1c384d/ To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1625557/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1625209] Re: ipv6 options are being checked for ipv4 subnet
Right, neutron DB layer has these fields set for both IPv4 and IPv6 layers, it also adds them when making subnet dict. So other places in code expect these fields and the reported case might not be single one. I'd suggest fix this on Calico plugin side. ** Also affects: networking-calico Importance: Undecided Status: New ** Changed in: neutron Status: Confirmed => Invalid ** Changed in: neutron Assignee: Oleg Bondarev (obondarev) => (unassigned) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1625209 Title: ipv6 options are being checked for ipv4 subnet Status in networking-calico: New Status in neutron: Invalid Bug description: When DHCP agent tries to set fixed_ips parameter for DHCP port (see https://bugs.launchpad.net/networking-calico/+bug/1541490) Neutron checks ipv6_address_mode and ipv6_ra_mode options of subnet that corresponds to the given fixed IP even for IPv4 subnet. And this fails as IPv4 subnet does not have such options (see traceback http://paste.openstack.org/show/580996/). And, of course, you cannot set such flags for IPv4 subnet. I'd expect that such check is performed for IPv6 subnets only. Probably, this situation is possible not only while using non-native DHCP agent. Neutron version: Newton (7f6b5b5d8953159740f74b0a4a5280527f6baa69). Environment: Calico (https://github.com/openstack/networking-calico) over Neutron. Point of failure: https://github.com/openstack/neutron/blob/7f6b5b5d8953159740f74b0a4a5280527f6baa69/neutron/agent/linux/dhcp.py#L1342 Traceback: http://paste.openstack.org/show/580996/ Failure rate: always. To manage notifications about this bug go to: https://bugs.launchpad.net/networking-calico/+bug/1625209/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1614452] [NEW] Port create time grows at scale due to dvr arp update
Public bug reported: Scale tests show that sometimes VMs are not able to spawn because of timeouts on port creation. Neutron server logs show that port creation time grows due to dvr arp table updates being sent to each l3 dvr agent hosting the router one by one - this takes > 90% of time: http://paste.openstack.org/show/560761/ ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: Confirmed ** Tags: l3-dvr-backlog loadimpact -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1614452 Title: Port create time grows at scale due to dvr arp update Status in neutron: Confirmed Bug description: Scale tests show that sometimes VMs are not able to spawn because of timeouts on port creation. Neutron server logs show that port creation time grows due to dvr arp table updates being sent to each l3 dvr agent hosting the router one by one - this takes > 90% of time: http://paste.openstack.org/show/560761/ To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1614452/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1610303] [NEW] l2pop mech fails to update_port_postcommit on a loaded cluster
Public bug reported: On a cluster where VMs boots and deletes happen pretty intensively following traces can pop up in neutron server log: 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers [req-1b5e9a29-7f7e-48f8-84ee-19ce217cb556 - - - - -] Mechanism driver 'l2population' failed in update_port_postcommit 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers Traceback (most recent call last): 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/managers.py", line 401, in _call_on_drivers 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers getattr(driver.obj, method_name)(context) 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py", line 120, in update_port_postcommit 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers self._update_port_up(context) 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py", line 227, in _update_port_up 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers network_id) 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py", line 176, in _create_agent_fdb 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers fdbs.extend(self._get_port_fdb_entries(binding.port)) 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py", line 45, in _get_port_fdb_entries 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers for ip in port['fixed_ips']] 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers TypeError: 'NoneType' object has no attribute '__getitem__' 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers 2016-08-05 14:08:29.578 9560 ERROR neutron.plugins.ml2.rpc [req-1b5e9a29-7f7e-48f8-84ee-19ce217cb556 - - - - -] Failed to update device 4c499a14-7211-4714-afa2-95b280d595a2 up This leads to device to being set to Active state and hence Nova timeouts waiting for the interface to be ready. ** Affects: neutron Importance: Undecided Status: New ** Description changed: On a cluster where VMs boots and deletes happen pretty intensively following traces can pop up in neutron server log: 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers [req-1b5e9a29-7f7e-48f8-84ee-19ce217cb556 - - - - -] Mechanism driver 'l2population' failed in update_port_postcommit 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers Traceback (most recent call last): 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/managers.py", line 401, in _call_on_drivers 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers getattr(driver.obj, method_name)(context) 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py", line 120, in update_port_postcommit 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers self._update_port_up(context) 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py", line 227, in _update_port_up 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers network_id) 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py", line 176, in _create_agent_fdb 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers fdbs.extend(self._get_port_fdb_entries(binding.port)) 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py", line 45, in _get_port_fdb_entries 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers for ip in port['fixed_ips']] 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers TypeError: 'NoneType' object has no attribute '__getitem__' 2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers 2016-08-05 14:08:29.578 9560 ERROR neutron.plugins.ml2.rpc [req-1b5e9a29-7f7e-48f8-84ee-19ce217cb556 - - - - -] Failed to update device 4c499a14-7211-4714-afa2-95b280d595a2 up + + This leads to device to being set to Active state and hence Nova + timeouts waiting for the interface to be ready. -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1610303 Title: l2pop
[Yahoo-eng-team] [Bug 1610153] [NEW] nova list can sometimes return 404
Public bug reported: On a large number of instances 'nova list' may return 404, probably this is because some instances are deleted during command execution. Trace: 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack [req-707a0e40-67cf-43a9-865d-c44a678b2986 2e2a43e956f344d184e40771d59c991d 13f508a4dd0e4b538561be2afcf5d699 - - -] Caught error: Instance 28c33ed4-c1a4-432c-96de-059b94a3dd91 could not be found. 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack Traceback (most recent call last): 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack File "/usr/lib/python2.7/dist-packages/nova/api/openstack/__init__.py", line 139, in __call__ 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack return req.get_response(self.application) 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack File "/usr/lib/python2.7/dist-packages/webob/request.py", line 1320, in send 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack application, catch_exc_info=False) 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack File "/usr/lib/python2.7/dist-packages/webob/request.py", line 1284, in call_application 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack app_iter = application(self.environ, start_response) 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__ 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack return resp(environ, start_response) 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 130, in __call__ 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack resp = self.call_func(req, *args, **self.kwargs) 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 195, in call_func 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack return self.func(req, *args, **kwargs) 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack File "/usr/lib/python2.7/dist-packages/keystonemiddleware/auth_token/__init__.py", line 467, in __call__ 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack response = req.get_response(self._app) 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack File "/usr/lib/python2.7/dist-packages/webob/request.py", line 1320, in send 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack application, catch_exc_info=False) 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack File "/usr/lib/python2.7/dist-packages/webob/request.py", line 1284, in call_application 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack app_iter = application(self.environ, start_response) 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__ 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack return resp(environ, start_response) 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__ 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack return resp(environ, start_response) 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack File "/usr/lib/python2.7/dist-packages/routes/middleware.py", line 136, in __call__ 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack response = self.app(environ, start_response) 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__ 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack return resp(environ, start_response) 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 130, in __call__ 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack resp = self.call_func(req, *args, **self.kwargs) 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 195, in call_func 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack return self.func(req, *args, **kwargs) 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack File "/usr/lib/python2.7/dist-packages/nova/api/openstack/wsgi.py", line 672, in __call__ 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack content_type, body, accept) 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack File "/usr/lib/python2.7/dist-packages/nova/api/openstack/wsgi.py", line 756, in _process_stack 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack request, action_args) 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack File "/usr/lib/python2.7/dist-packages/nova/api/openstack/wsgi.py", line 619, in post_process_extensions 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack **action_args) 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack File "/usr/lib/python2.7/dist-packages/nova/api/openstack/compute/extended_server_attributes.py", line 97, in detail 2016-08-05 09:30:52.666 878 ERROR nova.api.openstack
[Yahoo-eng-team] [Bug 1606844] [NEW] Neutron constantly resyncing deleted router
Public bug reported: No need to constantly resync router which was deleted and for which there is no namespace. Observed: l3 agent log full of 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent [-] Error while deleting router 81ef46de-f7f9-4c5e-b787-c935e0af253a 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent Traceback (most recent call last): 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 359, in _safe_router_removed 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent self._router_removed(router_id) 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 377, in _router_removed 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent ri.delete(self) 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 347, in delete 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent self.process_delete(agent) 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 385, in call 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent self.logger(e) 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent self.force_reraise() 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent six.reraise(self.type_, self.value, self.tb) 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 382, in call 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent return func(*args, **kwargs) 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 947, in process_delete 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent self._process_internal_ports(agent.pd) 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 530, in _process_internal_ports 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent existing_devices = self._get_existing_devices() 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 413, in _get_existing_devices 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent ip_devs = ip_wrapper.get_devices(exclude_loopback=True) 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 130, in get_devices 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent log_fail_as_error=self.log_fail_as_error 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 140, in execute 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent raise RuntimeError(msg) 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent RuntimeError: Exit code: 1; Stdin: ; Stdout: ; Stderr: Cannot open network namespace "qrouter-81ef46de-f7f9-4c5e-b787-c935e0af253a": No such file or directory 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent 2016-07-26 14:00:45.236 13360 ERROR neutron.agent.linux.utils [-] Exit code: 1; Stdin: ; Stdout: ; Stderr: Cannot open network namespace "qrouter-81ef46de-f7f9-4c5e-b787-c935e0af253a": No such file or directory this consumes memory, cpu, disk. ** Affects: neutron Importance: Undecided Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l3-ipam-dhcp -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1606844 Title: Neutron constantly resyncing deleted router Status in neutron: New Bug description: No need to constantly resync router which was deleted and for which there is no namespace. Observed: l3 agent log full of 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent [-] Error while deleting router 81ef46de-f7f9-4c5e-b787-c935e0af253a 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent Traceback (most recent call last): 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/di
[Yahoo-eng-team] [Bug 1606845] [NEW] L3 agent constantly resyncing deleted router
Public bug reported: No need to constantly resync router which was deleted and for which there is no namespace. Observed: l3 agent log full of 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent [-] Error while deleting router 81ef46de-f7f9-4c5e-b787-c935e0af253a 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent Traceback (most recent call last): 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 359, in _safe_router_removed 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent self._router_removed(router_id) 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 377, in _router_removed 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent ri.delete(self) 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 347, in delete 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent self.process_delete(agent) 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 385, in call 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent self.logger(e) 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent self.force_reraise() 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent six.reraise(self.type_, self.value, self.tb) 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 382, in call 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent return func(*args, **kwargs) 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 947, in process_delete 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent self._process_internal_ports(agent.pd) 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 530, in _process_internal_ports 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent existing_devices = self._get_existing_devices() 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 413, in _get_existing_devices 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent ip_devs = ip_wrapper.get_devices(exclude_loopback=True) 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 130, in get_devices 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent log_fail_as_error=self.log_fail_as_error 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 140, in execute 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent raise RuntimeError(msg) 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent RuntimeError: Exit code: 1; Stdin: ; Stdout: ; Stderr: Cannot open network namespace "qrouter-81ef46de-f7f9-4c5e-b787-c935e0af253a": No such file or directory 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent 2016-07-26 14:00:45.236 13360 ERROR neutron.agent.linux.utils [-] Exit code: 1; Stdin: ; Stdout: ; Stderr: Cannot open network namespace "qrouter-81ef46de-f7f9-4c5e-b787-c935e0af253a": No such file or directory this consumes memory, cpu, disk. ** Affects: neutron Importance: Undecided Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l3-ipam-dhcp ** Summary changed: - Neutron constantly resyncing deleted router + L3 agent constantly resyncing deleted router -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1606845 Title: L3 agent constantly resyncing deleted router Status in neutron: New Bug description: No need to constantly resync router which was deleted and for which there is no namespace. Observed: l3 agent log full of 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent [-] Error while deleting router 81ef46de-f7f9-4c5e-b787-c935e0af253a 2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent Traceback (most
[Yahoo-eng-team] [Bug 1599089] [NEW] DVR: floating ip stops working after reassignment
Public bug reported: When reassigning floating IP from one VM to another on the same host, it stops responding. This happens due to l3 agent just checks that IP address is configured on the interface and does not update ip rules to reflect new fixed IP. Reassignment works if disassociate floating IP first and then associate it with another fixed IP. However API allows reassignment without disassociation so it should work as well. ** Affects: neutron Importance: Undecided Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l3-dvr-backlog liberty-backport-potential mitaka-backport-potential ** Tags added: mitaka-backport-potential ** Tags added: liberty-backport-potential -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1599089 Title: DVR: floating ip stops working after reassignment Status in neutron: New Bug description: When reassigning floating IP from one VM to another on the same host, it stops responding. This happens due to l3 agent just checks that IP address is configured on the interface and does not update ip rules to reflect new fixed IP. Reassignment works if disassociate floating IP first and then associate it with another fixed IP. However API allows reassignment without disassociation so it should work as well. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1599089/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1595878] [NEW] Memory leak in unit tests
Public bug reported: tests.unit.agent.ovsdb.native.test_connection.TestOVSNativeConnection calls Connection.start() which starts a daemon with a while True loop full of mocks. mock._CallList of those mocks start to grow very quick and finally eat all available memory. mem_top output during unit tests run: refs: 18118[call(1), call().get_nowait(), call().get_nowait().do_commit(), call().get_nowait().results.put(), call().task_do 18117[call.get_nowait(), call.get_nowait().do_commit(), call.get_nowait().results.put(), call.task_done(), call.get_no 17990[call(1), call().get_nowait(), call().get_nowait().do_commit(), call().get_nowait().results.put(), call().task_do 17989[call.get_nowait(), call.get_nowait().do_commit(), call.get_nowait().results.put(), call.task_done(), call.get_no 13592[call(), call().fd_wait(, 1), call().timer_wait(), call().block(), call().fd_wait( [call.fd_wait(, 1), call.timer_wait(), call.block(), call.fd_wait( [call.fd_wait(, 1), call.timer_wait(), call.block(), call.fd_wait( [call(), call().do_commit(), call().results.put(), call(), call().do_commit(), call().results.put( [call(), call().fd_wait(, 1), call().timer_wait(), call().block(), call().fd_wait( [call.fd_wait(, 1), call.timer_wait(), call.block(), call.fd_wait( [call.fd_wait(, 1), call.timer_wait(), call.block(), call.fd_wait( [call(), call().do_commit(), call().results.put(), call(), call().do_commit(), call().results.put( {'keystoneclient.service_catalog': , 'oslo_messaging.r 9061 [call(, ), call().wait(), call().run(), call().wait( [call.wait(), call.run(), call.wait(), call.run(), call.wait( [call.wait(), call.run(), call.wait(), call.run(), call.wait( [call.do_commit(), call.results.put(), call.do_commit(), call.results.put( [call.do_commit(), call.results.put(), call.do_commit(), call.results.put( [call.get_nowait(), call.task_done(), call.get_nowait(), call.task_done(), call.get_nowait(), call.task_done(), call.get_nowait(), call.task_done(), call.get_nowait(), call.task_done(), call 8997 [call(, ), call().wait(), call().run(), call().wait( 79091 47269 45542 30758 14696 8601 6579 5639 4940 3858 3291 3275 3267 2439 2304 2219 1869 1424 ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: In Progress ** Tags: unittest -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1595878 Title: Memory leak in unit tests Status in neutron: In Progress Bug description: tests.unit.agent.ovsdb.native.test_connection.TestOVSNativeConnection calls Connection.start() which starts a daemon with a while True loop full of mocks. mock._CallList of those mocks start to grow very quick and finally eat all available memory. mem_top output during unit tests run: refs: 18118 [call(1), call().get_nowait(), call().get_nowait().do_commit(), call().get_nowait().results.put(), call().task_do 18117 [call.get_nowait(), call.get_nowait().do_commit(), call.get_nowait().results.put(), call.task_done(), call.get_no 17990 [call(1), call().get_nowait(), call().get_nowait().do_commit(), call().get_nowait().results.put(), call().task_do 17989 [call.get_nowait(), call.get_nowait().do_commit(), call.get_nowait().results.put(), call.task_done(), call.get_no 13592 [call(), call().fd_wait(, 1), call().timer_wait(), call().block(), call().fd_wait( [call.fd_wait(, 1), call.timer_wait(), call.block(), call.fd_wait( [call.fd_wait(, 1), call.timer_wait(), call.block(), call.fd_wait( [call(), call().do_commit(), call().results.put(), call(), call().do_commit(), call().results.put( [call(), call().fd_wait(, 1), call().timer_wait(), call().block(), call().fd_wait( [call.fd_wait(, 1), call.timer_wait(), call.block(), call.fd_wait( [call.fd_wait(, 1), call.timer_wait(), call.block(), call.fd_wait( [call(), call().do_commit(), call().results.put(), call(), call().do_commit(), call().results.put( {'keystoneclient.service_catalog': , 'oslo_messaging.r 9061 [call(, ), call().wait(), call().run(), call().wait( [call.wait(), call.run(), call.wait(), call.run(), call.wait( [call.wait(), call.run(), call.wait(), call.run(), call.wait( [call.do_commit(), call.results.put(), call.do_commit(), call.results.put( [call.do_commit(), call.results.put(), call.do_commit(), call.results.put( [call.get_nowait(), call.task_done(), call.get_nowait(), call.task_done(), call.get_nowait(), call.task_done(), call.get_nowait(), call.task_done(), call.get_nowait(), call.task_done(), call 8997
[Yahoo-eng-team] [Bug 1593653] [NEW] DVR: cannot manually remove router from l3 agent
Public bug reported: This is a regression from commit c198710dc551bc0f79851a7801038b033088a8c2: if there are dvr serviceable ports on the node with agent, server now will notify agent with router_updated rather than router_removed, however when updating router, agent will request router_info and that's where server will schedule router back to this l3 agent due to autoscheduling being enabled. ** Affects: neutron Importance: High Status: New ** Tags: l3-dvr-backlog -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1593653 Title: DVR: cannot manually remove router from l3 agent Status in neutron: New Bug description: This is a regression from commit c198710dc551bc0f79851a7801038b033088a8c2: if there are dvr serviceable ports on the node with agent, server now will notify agent with router_updated rather than router_removed, however when updating router, agent will request router_info and that's where server will schedule router back to this l3 agent due to autoscheduling being enabled. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1593653/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1590041] [NEW] DVR: regression with router rescheduling
Public bug reported: L3 agent may not fully process dvr router being rescheduled to it which leads to loss of external connectivity. The reason is that with commit 9dc70ed77e055677a4bd3257a0e9e24239ed4cce dvr edge router now creates snat_namespace object in constructor while some logic in the module still checks for existence of this object: like external_gateway_updated() will not fully process router if snat_namespace object exists. The proposal is to revert commit 9dc70ed77e055677a4bd3257a0e9e24239ed4cce and the make another attempt to fix bug 1557909. ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: In Progress ** Tags: l3-dvr-backlog -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1590041 Title: DVR: regression with router rescheduling Status in neutron: In Progress Bug description: L3 agent may not fully process dvr router being rescheduled to it which leads to loss of external connectivity. The reason is that with commit 9dc70ed77e055677a4bd3257a0e9e24239ed4cce dvr edge router now creates snat_namespace object in constructor while some logic in the module still checks for existence of this object: like external_gateway_updated() will not fully process router if snat_namespace object exists. The proposal is to revert commit 9dc70ed77e055677a4bd3257a0e9e24239ed4cce and the make another attempt to fix bug 1557909. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1590041/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1585623] [NEW] A vm's port is in down state after compute node reboot
Public bug reported: After compute node reboot some ports may end up in DOWN state and corresponding VMs lose net access. ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: Confirmed ** Tags: mitaka-backport-potential ovs -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1585623 Title: A vm's port is in down state after compute node reboot Status in neutron: Confirmed Bug description: After compute node reboot some ports may end up in DOWN state and corresponding VMs lose net access. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1585623/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1585149] [NEW] Do not inherit test case classes from regular Neutron classes
Public bug reported: It's a bad practice itself and it may lead to errors during tests initialization. Test case classes are initialized during test loading stage by testing framework. Some neutron classes may not be ready to be created at this stage, for example those requiring rpc messaging system to be initialized first. I faced this bug after I added an rpc notifier to AgentDBMixin: unit tests started failing with: Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/home/obondarev/Neutron/neutron/.tox/py27/lib/python2.7/site-packages/subunit/run.py", line 149, in main() File "/home/obondarev/Neutron/neutron/.tox/py27/lib/python2.7/site-packages/subunit/run.py", line 145, in main stdout=stdout, exit=False) File "/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/testtools/run.py", line 171, in __init__ self.parseArgs(argv) File "/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/unittest2/main.py", line 113, in parseArgs self._do_discovery(argv[2:]) File "/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/testtools/run.py", line 211, in _do_discovery super(TestProgram, self)._do_discovery(argv, Loader=Loader) File "/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/unittest2/main.py", line 223, in _do_discovery self.test = loader.discover(self.start, self.pattern, self.top) File "/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/unittest2/loader.py", line 374, in discover tests = list(self._find_tests(start_dir, pattern)) File "/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/unittest2/loader.py", line 440, in _find_tests for test in path_tests: File "/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/unittest2/loader.py", line 440, in _find_tests for test in path_tests: File "/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/unittest2/loader.py", line 431, in _find_tests full_path, pattern, namespace) File "/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/unittest2/loader.py", line 487, in _find_test_path return self.loadTestsFromModule(module, pattern=pattern), False File "/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/unittest2/loader.py", line 148, in loadTestsFromModule tests.append(self.loadTestsFromTestCase(obj)) File "/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/unittest2/loader.py", line 112, in loadTestsFromTestCase loaded_suite = self.suiteClass(map(testCaseClass, testCaseNames)) File "neutron/db/agents_db.py", line 190, in __init__ resources_rpc.ResourcesPushToServersRpcApi()) File "neutron/api/rpc/handlers/resources_rpc.py", line 135, in __init__ self.client = n_rpc.get_client(target) File "neutron/common/rpc.py", line 174, in get_client assert TRANSPORT is not None AssertionError ** Affects: neutron Importance: Low Assignee: Oleg Bondarev (obondarev) Status: Confirmed ** Tags: unittest -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1585149 Title: Do not inherit test case classes from regular Neutron classes Status in neutron: Confirmed Bug description: It's a bad practice itself and it may lead to errors during tests initialization. Test case classes are initialized during test loading stage by testing framework. Some neutron classes may not be ready to be created at this stage, for example those requiring rpc messaging system to be initialized first. I faced this bug after I added an rpc notifier to AgentDBMixin: unit tests started failing with: Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/home/obondarev/Neutron/neutron/.tox/py27/lib/python2.7/site-packages/subunit/run.py", line 149, in main() File "/home/obondarev/Neutron/neutron/.tox/py27/lib/python2.7/site-packages/subunit/run.py", line 145, in main stdout=stdout, exit=False) File "/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/testtools/run.py", line 171, in __init__
[Yahoo-eng-team] [Bug 1576757] [NEW] SRIOV: ESwitchManager should handle multiple NICs per physical net
Public bug reported: Commit 46ddaf4288a1cac44d8afc0525b4ecb3ae2186a3 made it possible to specify multiple NICs per network. However ESwitchManager now stores only one EmbSwitch per physical net (the last one). ** Affects: neutron Importance: High Assignee: Vladimir Eremin (yottatsa) Status: Confirmed ** Tags: mitaka-backport-potential sriov-pci-pt -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1576757 Title: SRIOV: ESwitchManager should handle multiple NICs per physical net Status in neutron: Confirmed Bug description: Commit 46ddaf4288a1cac44d8afc0525b4ecb3ae2186a3 made it possible to specify multiple NICs per network. However ESwitchManager now stores only one EmbSwitch per physical net (the last one). To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1576757/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1558626] Re: [sriov] physical_device_mappings allows only one physnet for per nic
New bug was filed for handling multiple nics per phys net: https://bugs.launchpad.net/neutron/+bug/1576757 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1558626 Title: [sriov] physical_device_mappings allows only one physnet for per nic Status in neutron: Fix Released Bug description: Mitaka, ML2, ml2_sriov.agent_required=True sriov_nic.physical_device_mappings is allowed to specify ony one NIC per physnet. If I try to specify two nics like following [sriov_nic] physical_device_mappings=physnet2:enp1s0f0,physnet2:enp1s0f1 I've got next error on start 2016-03-17 15:26:48.818 6832 INFO neutron.common.config [-] Logging enabled! 2016-03-17 15:26:48.819 6832 INFO neutron.common.config [-] /usr/bin/neutron-sriov-nic-agent version 8.0.0.0b3 2016-03-17 15:26:48.819 6832 DEBUG neutron.common.config [-] command line: /usr/bin/neutron-sriov-nic-agent --config-file=/etc/neutron/plugins/ml2/sriov_agent.ini --log-file=/var/log/neutron/neutron-sriov-agent.log --config-file=/etc/neutron/neutron.conf setup_logging /usr/lib/python2.7/dist-packages/neutron/common/config.py:266 2016-03-17 15:26:48.819 6832 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [-] Failed on Agent configuration parse. Agent terminated! 2016-03-17 15:26:48.819 6832 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent Traceback (most recent call last): 2016-03-17 15:26:48.819 6832 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 436, in main 2016-03-17 15:26:48.819 6832 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent config_parser.parse() 2016-03-17 15:26:48.819 6832 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 411, in parse 2016-03-17 15:26:48.819 6832 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent cfg.CONF.SRIOV_NIC.physical_device_mappings) 2016-03-17 15:26:48.819 6832 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 240, in parse_mappings 2016-03-17 15:26:48.819 6832 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent "unique") % {'key': key, 'mapping': mapping}) 2016-03-17 15:26:48.819 6832 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent ValueError: Key physnet2 in mapping: 'physnet2:enp1s0f1' not unique 2016-03-17 15:26:48.819 6832 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1558626/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1573843] [NEW] Minimize agent state reports handling on server side
Public bug reported: Agent state reports are mostly needed in order for neutron server to properly (re)schedule resources among agents. New features may require more precise scheduling which in turn requires agents to report more and servers to handle more data. However adding new logic to agent state reports handling has negative effect on scalability and overall neutron server performance. Here is one of examples: https://bugs.launchpad.net/neutron/+bug/1567497 with more cases possibly coming in future: like https://review.openstack.org/#/c/285548 which is adding a new db update request for each state report. One of the things that could be done is to not include (or just to ignore on server side) the data which can't be changed during runtime (like config parameters) in each state report. Such data should only be processed on agent (re)start/revival. So mainly it's about separating static and dynamic data in state reports handling to reduce the amount of db updates. ** Affects: neutron Importance: Undecided Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: loadimpact -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1573843 Title: Minimize agent state reports handling on server side Status in neutron: New Bug description: Agent state reports are mostly needed in order for neutron server to properly (re)schedule resources among agents. New features may require more precise scheduling which in turn requires agents to report more and servers to handle more data. However adding new logic to agent state reports handling has negative effect on scalability and overall neutron server performance. Here is one of examples: https://bugs.launchpad.net/neutron/+bug/1567497 with more cases possibly coming in future: like https://review.openstack.org/#/c/285548 which is adding a new db update request for each state report. One of the things that could be done is to not include (or just to ignore on server side) the data which can't be changed during runtime (like config parameters) in each state report. Such data should only be processed on agent (re)start/revival. So mainly it's about separating static and dynamic data in state reports handling to reduce the amount of db updates. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1573843/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1567497] [NEW] resource_versions in agents state reports led to performance degradation
Public bug reported: resource_versions were included into agent state reports recently to support rolling upgrades (commit 97a272a892fcf488949eeec4959156618caccae8) The downside is that it brought additional processing when handling state reports on server side: update of local resources versions cache and more seriously rpc casts to all other servers to do the same. All this led to a visible performance degradation at scale with hundreds of agents constantly sending reports. Under load (rally test) agents may start "blinking" which makes cluster very unstable. Need to optimize agents notifications about resource_versions. ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1567497 Title: resource_versions in agents state reports led to performance degradation Status in neutron: In Progress Bug description: resource_versions were included into agent state reports recently to support rolling upgrades (commit 97a272a892fcf488949eeec4959156618caccae8) The downside is that it brought additional processing when handling state reports on server side: update of local resources versions cache and more seriously rpc casts to all other servers to do the same. All this led to a visible performance degradation at scale with hundreds of agents constantly sending reports. Under load (rally test) agents may start "blinking" which makes cluster very unstable. Need to optimize agents notifications about resource_versions. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1567497/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1566291] [NEW] L3 agent: at some point an agent becomes unable to handle new routers
Public bug reported: Following seen in l3 agent logs: 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent [-] Failed to process compatible router 'e341e0e2-5089-46e9-91f9-2099a156b27f' 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent Traceback (most recent call last): 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 497, in _process_router_update 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent self._process_router_if_compatible(router) 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 434, in _process_router_if_compatible 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent self._process_added_router(router) 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 439, in _process_added_router 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent self._router_added(router['id'], router) 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 340, in _router_added 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent ri = self._create_router(router_id, router) 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 337, in _create_router 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent return legacy_router.LegacyRouter(*args, **kwargs) 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 61, in __init__ 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent DEFAULT_ADDRESS_SCOPE: ADDRESS_SCOPE_MARK_IDS.pop()} 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent KeyError: 'pop from an empty set' 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent 2016-04-05 09:30:09.034 24216 DEBUG neutron.agent.l3.agent [-] Starting router update for e341e0e2-5089-46e9-91f9-2099a156b27f, action None, priority 1 _process_router_update /usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py:463 2016-04-05 09:30:09.035 24216 DEBUG oslo_messaging._drivers.amqpdriver [-] CALL msg_id: 6295fbe9cf2040d79c68f5c5f8b1e963 exchange 'neutron' topic 'q-l3-plugin' _send /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:454 2016-04-05 09:30:09.417 24216 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 6295fbe9cf2040d79c68f5c5f8b1e963 __call__ /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:302 2016-04-05 09:30:09.418 24216 ERROR neutron.agent.l3.agent [-] Failed to process compatible router 'e341e0e2-5089-46e9-91f9-2099a156b27f' So agent is constantly resyncing (causing load on neutron server) and unable to handle new routers. I believe that set "ADDRESS_SCOPE_MARK_IDS = set(range(1024, 2048))" from router_info.py should not be agent global but it should be ADDRESS_SCOPE_MARK_IDS per router. Or at least need to return values back to the set when router is deleted. ** Affects: neutron Importance: Undecided Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l3-ipam-dhcp -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1566291 Title: L3 agent: at some point an agent becomes unable to handle new routers Status in neutron: New Bug description: Following seen in l3 agent logs: 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent [-] Failed to process compatible router 'e341e0e2-5089-46e9-91f9-2099a156b27f' 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent Traceback (most recent call last): 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 497, in _process_router_update 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent self._process_router_if_compatible(router) 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 434, in _process_router_if_compatible 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent self._process_added_router(router) 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 439, in _process_added_router 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent self._router_added(router['id'], router) 2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/age
[Yahoo-eng-team] [Bug 1522436] Re: No need to autoreschedule routers if l3 agent is back online
It appeared the fix was not complete. I'm reopening the bug, will upload a fix shortly ** Changed in: neutron Status: Fix Released => Triaged ** Tags removed: in-stable-liberty -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1522436 Title: No need to autoreschedule routers if l3 agent is back online Status in neutron: Triaged Bug description: - in case l3 agent goes offline the auto-rescheduling task is triggered and starts to reschedule each router from dead agent one by one - If there are a lot of routers scheduled to the agent, rescheduling all of them might take some time - during that time the agent might get back online - currently autorescheduling will be continued until all routers are rescheduled from the (already alive!) agent The proposal is to skip rescheduling if agent is back online. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1522436/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1546110] [NEW] DB error causes router rescheduling loop to fail
y", line 713, in _checkout 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall fairy = _ConnectionRecord.checkout(pool) 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 485, in checkout 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall rec.checkin() 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__ 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall compat.reraise(exc_type, exc_value, exc_tb) 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 482, in checkout 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall dbapi_connection = rec.get_connection() 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 594, in get_connection 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall self.connection = self.__connect() 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 607, in __connect 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall connection = self.__pool._invoke_creator(self) 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/strategies.py", line 97, in connect 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall return dialect.connect(*cargs, **cparams) 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 385, in connect 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall return self.dbapi.connect(*cargs, **cparams) 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/MySQLdb/__init__.py", line 81, in Connect 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall return Connection(*args, **kwargs) 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 206, in __init__ 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall super(Connection, self).__init__(*args, **kwargs2) 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall DBConnectionError: (_mysql_exceptions.OperationalError) (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") ** Affects: neutron Importance: Medium Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l3-ipam-dhcp liberty-backport-potential ** Tags added: liberty-backport-potential -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1546110 Title: DB error causes router rescheduling loop to fail Status in neutron: New Bug description: In router rescheduling looping task db call to get down bindings is done outside of try/except block which may cause task to fail (see traceback below). Need to put db operation inside try/except. 2016-02-15T10:44:44.259995+00:00 err: 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall [req-79bce4c3-2e81-446c-8b37-6d30e3a964e2 - - - - -] Fixed interval looping call 'neutron.services.l3_router.l3_router_plugin.L3RouterPlugin.reschedule_routers_from_down_agents' failed 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall Traceback (most recent call last): 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/oslo_service/loopingcall.py", line 113, in _run_loop 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall result = func(*self.args, **self.kw) 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/neutron/db/l3_agentschedulers_db.py", line 101, in reschedule_routers_from_down_agents 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall down_bindings = self._get_down_bindings(context, cutoff) 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/neutron/db/l3_dvrscheduler_db.py", line 460, in _get_down_bindings 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall context, cutoff) 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/neutron/db/l3_agentschedulers_db.py", line 149, in _get_down_bindings 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall return query.all() 2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/dist-packages/
[Yahoo-eng-team] [Bug 1545695] [NEW] L3 agent: traceback is suppressed on floating ip setup failure
Public bug reported: Following traceback says nothing about actual exception and makes it hard to debug issues: 2016-02-10 05:26:54.025 682 ERROR neutron.agent.l3.router_info [-] L3 agent failure to setup floating IPs 2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info Traceback (most recent call last): 2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 604, in process_external 2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info fip_statuses = self.configure_fip_addresses(interface_name) 2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 268, in configure_fip_addresses 2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info raise n_exc.FloatingIpSetupException('L3 agent failure to setup ' 2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info FloatingIpSetupException: L3 agent failure to setup floating IPs Need to log actual exception with traceback before reraising. ** Affects: neutron Importance: Low Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l3-ipam-dhcp -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1545695 Title: L3 agent: traceback is suppressed on floating ip setup failure Status in neutron: New Bug description: Following traceback says nothing about actual exception and makes it hard to debug issues: 2016-02-10 05:26:54.025 682 ERROR neutron.agent.l3.router_info [-] L3 agent failure to setup floating IPs 2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info Traceback (most recent call last): 2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 604, in process_external 2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info fip_statuses = self.configure_fip_addresses(interface_name) 2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 268, in configure_fip_addresses 2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info raise n_exc.FloatingIpSetupException('L3 agent failure to setup ' 2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info FloatingIpSetupException: L3 agent failure to setup floating IPs Need to log actual exception with traceback before reraising. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1545695/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1543513] [NEW] Bing back dvr routers autoscheduling
Public bug reported: Commit 1105d732b2cb6ec66d042c85968d47fe6d733f5f disabled auto scheduling for dvr routers because of the complexity of DVR scheduling itself which led to a number of logical and DB issues. Now after blueprint improve-dvr-l3-agent-binding is merged DVR scheduling is almost no different from legacy scheduling (no extra DVR logic required for auto scheduling) so we can bring auto scheduling for DVR routers back. This is better for consistency and improves UX. ** Affects: neutron Importance: Wishlist Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l3-dvr-backlog -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1543513 Title: Bing back dvr routers autoscheduling Status in neutron: New Bug description: Commit 1105d732b2cb6ec66d042c85968d47fe6d733f5f disabled auto scheduling for dvr routers because of the complexity of DVR scheduling itself which led to a number of logical and DB issues. Now after blueprint improve-dvr-l3-agent-binding is merged DVR scheduling is almost no different from legacy scheduling (no extra DVR logic required for auto scheduling) so we can bring auto scheduling for DVR routers back. This is better for consistency and improves UX. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1543513/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1541348] [NEW] Regression in routers auto scheduling logic
Public bug reported: Routers auto scheduling works when an l3 agent starst and performs a full sync with neutron server . Neutron server looks for all unscheduled routers (non-dvr routers only) and schedules them to that agent if applicable. This was broken by commit 0e97feb0f30bc0ef6f4fe041cb41b7aa81042263 which changed full sync logic a bit: now l3 agent requests all ids of routers scheduled to it first. get_router_ids() didn't call routers auto scheduling which caused the regression. ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l3-ipam-dhcp liberty-backport-potential ** Summary changed: - regression in routers auto scheduling logic + Regression in routers auto scheduling logic -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1541348 Title: Regression in routers auto scheduling logic Status in neutron: New Bug description: Routers auto scheduling works when an l3 agent starst and performs a full sync with neutron server . Neutron server looks for all unscheduled routers (non-dvr routers only) and schedules them to that agent if applicable. This was broken by commit 0e97feb0f30bc0ef6f4fe041cb41b7aa81042263 which changed full sync logic a bit: now l3 agent requests all ids of routers scheduled to it first. get_router_ids() didn't call routers auto scheduling which caused the regression. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1541348/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1538163] [NEW] DVR: race in dvr serviceable port deletion
Public bug reported: In ml2 plugin when dvr seviceable port is deleted, we check if any dvr routers should be deleted from port's host. This is done prior to actual port deletion from db by checking if there are any more dvr serviceable ports on this host. This is prone to races: if two last compute ports on the host are deleted concurrently, the check might not return any routers as in both cases it will see yet another dvr serviceable port on the host: - p1 and p2 are last compute ports on compute host 'host1' - p1 and p2 are on the same subnet connected to a dvr router 'r1' - p1 and p2 are deleted concurrently - on p1 deletion plugin checks if there are any more dvr serviceable ports on host1 - sees p2 -> no dvr routers should be deleted - same on p2 deletion plugin checks if there are any more dvr serviceable ports on host1 - sees p1 -> no dvr routers should be deleted - p1 is deleted from DB - p2 is deleted from DB - r1 is not deleted from host1 though there are no more ports on it ** Affects: neutron Importance: Undecided Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l3-dvr-backlog -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1538163 Title: DVR: race in dvr serviceable port deletion Status in neutron: New Bug description: In ml2 plugin when dvr seviceable port is deleted, we check if any dvr routers should be deleted from port's host. This is done prior to actual port deletion from db by checking if there are any more dvr serviceable ports on this host. This is prone to races: if two last compute ports on the host are deleted concurrently, the check might not return any routers as in both cases it will see yet another dvr serviceable port on the host: - p1 and p2 are last compute ports on compute host 'host1' - p1 and p2 are on the same subnet connected to a dvr router 'r1' - p1 and p2 are deleted concurrently - on p1 deletion plugin checks if there are any more dvr serviceable ports on host1 - sees p2 -> no dvr routers should be deleted - same on p2 deletion plugin checks if there are any more dvr serviceable ports on host1 - sees p1 -> no dvr routers should be deleted - p1 is deleted from DB - p2 is deleted from DB - r1 is not deleted from host1 though there are no more ports on it To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1538163/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1536110] [NEW] OVS agent should fail if can't get DVR mac address
Public bug reported: If ovs agent is configured to run in dvr mode it has to get it's unique mac address from server on startup . In case it cannot get it after several attempts (commit 51303b5fe4785d0cda76f095c95eb4d746d7d783) due to some error, it falls back to non-dvr mode. The question is what is the purpoise of ovs agent to be running in non- dvr mode while it was configured for dvr? Server code does not handle ovs agent 'in_distributed_mode' flag in any way and will continue scheduling dvr routers to such nodes. This may lead to connectivity issues which are hard to debug. Example: 2016-01-12 11:29:15.186 16238 WARNING neutron.plugins.ml2.drivers.openvswitch.agent.ovs_dvr_neutron_agent [req-e3b3643d-6976-4656-b247-ab291e6a4b27 - - - - -] L2 agent could not get DVR MAC address at startup due to RPC error. It happens when the server does not support this RPC API. Detailed message: Remote error: DBConnectionError (_mysql_exceptions.OperationalError) (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") There were some issues with mysql on startup which led to half of ovs agents running in non-dvr mode silently. The proposal is to fail in case agent cannot operate in the mode it was configured to. ** Affects: neutron Importance: Undecided Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l3-dvr-backlog -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1536110 Title: OVS agent should fail if can't get DVR mac address Status in neutron: New Bug description: If ovs agent is configured to run in dvr mode it has to get it's unique mac address from server on startup . In case it cannot get it after several attempts (commit 51303b5fe4785d0cda76f095c95eb4d746d7d783) due to some error, it falls back to non-dvr mode. The question is what is the purpoise of ovs agent to be running in non-dvr mode while it was configured for dvr? Server code does not handle ovs agent 'in_distributed_mode' flag in any way and will continue scheduling dvr routers to such nodes. This may lead to connectivity issues which are hard to debug. Example: 2016-01-12 11:29:15.186 16238 WARNING neutron.plugins.ml2.drivers.openvswitch.agent.ovs_dvr_neutron_agent [req-e3b3643d-6976-4656-b247-ab291e6a4b27 - - - - -] L2 agent could not get DVR MAC address at startup due to RPC error. It happens when the server does not support this RPC API. Detailed message: Remote error: DBConnectionError (_mysql_exceptions.OperationalError) (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") There were some issues with mysql on startup which led to half of ovs agents running in non-dvr mode silently. The proposal is to fail in case agent cannot operate in the mode it was configured to. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1536110/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1530179] [NEW] get_subnet_for_dvr() returns wrong gateway mac
Public bug reported: get_subnet_for_dvr should return proper gateway mac address in order for ovs agent to add proper flows for dvr interface on br-int. commit e82b0e108332964c90e9d2cfaf3d334a92127155 added 'fixed_ips' parameter to the handler to filter gateway port of the subnet. However actual filtering was applied improperly which leads to wrong gateway mac being returned: if fixed_ips: filter = fixed_ips[0] else: filter = {'fixed_ips': {'subnet_id': [subnet], 'ip_address': [subnet_info['gateway_ip']]}} internal_gateway_ports = self.plugin.get_ports( context, filters=filter) internal_port = internal_gateway_ports[0] subnet_info['gateway_mac'] = internal_port['mac_address'] get_ports() here actually returns _all_ ports so mac address of a random port is returned as 'gateway_mac'. In most cases it doesn't lead to any noticeable side effects but in some cases it may cause very weird behavior. The case that we faced was: root@node-9:~# ovs-ofctl dump-flows br-int ... cookie=0x971c69a135b8ce1f, duration=23023.412s, table=2, n_packets=1339, n_bytes=131234, idle_age=19050, priority=4,dl_vlan=3556,dl_dst=fa:16:3e:da:53:f1 actions=strip_vlan,mod_dl_src:fa:16:3e:2c:24:86,output:6 cookie=0x971c69a135b8ce1f, duration=31946.414s, table=2, n_packets=25320, n_bytes=2481408, idle_age=1, priority=4,dl_vlan=3556,dl_dst=fa:16:3e:2c:24:86 actions=strip_vlan,mod_dl_src:fa:16:3e:2c:24:86,output:5 ... fa:16:3e:2c:24:86 is mac address of a vm port and it was returned as gateway mac due to the bug. This vm was unreachable from other subnets connected to the same dvr router. However another vm on the same host and the same subnet was ok. It took a while to find out what was wrong :) ** Affects: neutron Importance: Medium Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l3-dvr-backlog -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1530179 Title: get_subnet_for_dvr() returns wrong gateway mac Status in neutron: New Bug description: get_subnet_for_dvr should return proper gateway mac address in order for ovs agent to add proper flows for dvr interface on br-int. commit e82b0e108332964c90e9d2cfaf3d334a92127155 added 'fixed_ips' parameter to the handler to filter gateway port of the subnet. However actual filtering was applied improperly which leads to wrong gateway mac being returned: if fixed_ips: filter = fixed_ips[0] else: filter = {'fixed_ips': {'subnet_id': [subnet], 'ip_address': [subnet_info['gateway_ip']]}} internal_gateway_ports = self.plugin.get_ports( context, filters=filter) internal_port = internal_gateway_ports[0] subnet_info['gateway_mac'] = internal_port['mac_address'] get_ports() here actually returns _all_ ports so mac address of a random port is returned as 'gateway_mac'. In most cases it doesn't lead to any noticeable side effects but in some cases it may cause very weird behavior. The case that we faced was: root@node-9:~# ovs-ofctl dump-flows br-int ... cookie=0x971c69a135b8ce1f, duration=23023.412s, table=2, n_packets=1339, n_bytes=131234, idle_age=19050, priority=4,dl_vlan=3556,dl_dst=fa:16:3e:da:53:f1 actions=strip_vlan,mod_dl_src:fa:16:3e:2c:24:86,output:6 cookie=0x971c69a135b8ce1f, duration=31946.414s, table=2, n_packets=25320, n_bytes=2481408, idle_age=1, priority=4,dl_vlan=3556,dl_dst=fa:16:3e:2c:24:86 actions=strip_vlan,mod_dl_src:fa:16:3e:2c:24:86,output:5 ... fa:16:3e:2c:24:86 is mac address of a vm port and it was returned as gateway mac due to the bug. This vm was unreachable from other subnets connected to the same dvr router. However another vm on the same host and the same subnet was ok. It took a while to find out what was wrong :) To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1530179/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1424096] Re: DVR routers attached to shared networks aren't being unscheduled from a compute node after deleting the VMs using the shared net
I faced the bug while reworking unit tests into functional tests: when performing steps described in the description I get: 2015-12-15 17:41:23,484ERROR [neutron.callbacks.manager] Error during notification for neutron.db.l3_dvrscheduler_db._notify_port_delete port, after_delete Traceback (most recent call last): File "neutron/callbacks/manager.py", line 141, in _notify_loop callback(resource, event, trigger, **kwargs) File "neutron/db/l3_dvrscheduler_db.py", line 485, in _notify_port_delete context, router['agent_id'], router['router_id']) File "neutron/db/l3_dvrscheduler_db.py", line 439, in remove_router_from_l3_agent router = self.get_router(context, router_id) File "neutron/db/l3_db.py", line 451, in get_router router = self._get_router(context, id) File "neutron/db/l3_db.py", line 137, in _get_router raise l3.RouterNotFound(router_id=router_id) RouterNotFound: Router 7d52836b-8fe5-4417-842f-3cbe0920c89c could not be found and router is not removed from host which has no more dvr serviceable ports. Looks like we also need admin context in order to remove admin router from a host when non-admin tenant removes last dvr serviceable port on a shared network. ** Changed in: neutron Status: Fix Released => Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1424096 Title: DVR routers attached to shared networks aren't being unscheduled from a compute node after deleting the VMs using the shared net Status in neutron: Confirmed Status in neutron juno series: Fix Released Status in neutron kilo series: New Bug description: As the administrator, a DVR router is created and attached to a shared network. The administrator also created the shared network. As a non-admin tenant, a VM is created with the port using the shared network. The only VM using the shared network is scheduled to a compute node. When the VM is deleted, it is expected the qrouter namespace of the DVR router is removed. But it is not. This doesn't happen with routers attached to networks that are not shared. The environment consists of 1 controller node and 1 compute node. Routers having the problem are created by the administrator attached to shared networks that are also owned by the admin: As the administrator, do the following commands on a setup having 1 compute node and 1 controller node: 1. neutron net-create shared-net -- --shared True Shared net's uuid is f9ccf1f9-aea9-4f72-accc-8a03170fa242. 2. neutron subnet-create --name shared-subnet shared-net 10.0.0.0/16 3. neutron router-create shared-router Router's UUID is ab78428a-9653-4a7b-98ec-22e1f956f44f. 4. neutron router-interface-add shared-router shared-subnet 5. neutron router-gateway-set shared-router public As a non-admin tenant (tenant-id: 95cd5d9c61cf45c7bdd4e9ee52659d13), boot a VM using the shared-net network: 1. neutron net-show shared-net +-+--+ | Field | Value| +-+--+ | admin_state_up | True | | id | f9ccf1f9-aea9-4f72-accc-8a03170fa242 | | name| shared-net | | router:external | False| | shared | True | | status | ACTIVE | | subnets | c4fd4279-81a7-40d6-a80b-01e8238c1c2d | | tenant_id | 2a54d6758fab47f4a2508b06284b5104 | +-+--+ At this point, there are no VMs using the shared-net network running in the environment. 2. Boot a VM that uses the shared-net network: nova boot ... --nic net-id=f9ccf1f9-aea9-4f72-accc-8a03170fa242 ... vm_sharednet 3. Assign a floating IP to the VM "vm_sharednet" 4. Delete "vm_sharednet". On the compute node, the qrouter namespace of the shared router (qrouter-ab78428a-9653-4a7b-98ec-22e1f956f44f) is left behind stack@DVR-CN2:~/DEVSTACK/manage$ ip netns qrouter-ab78428a-9653-4a7b-98ec-22e1f956f44f ... This is consistent with the output of "neutron l3-agent-list-hosting-router" command. It shows the router is still being hosted on the compute node. $ neutron l3-agent-list-hosting-router ab78428a-9653-4a7b-98ec-22e1f956f44f +--+++---+ | id | host | admin_state_up | alive | +--+++---+ | 42f12eb0-51bc-4861-928a-48de51ba7ae1 | DVR-Controller | True | :-) | |
[Yahoo-eng-team] [Bug 1524908] [NEW] Router may be removed from dvr_snat agent by accident
Public bug reported: This popped up during https://review.openstack.org/#/c/238478 - when dvr serviceable port is deleted/migrated, dvr callback checks if there are any more dvr serviceable ports on the host and if there are no - removes the router from the agent on that host - in case dhcp port is deleted/migrated this may lead to router being deleted from dvr_snat agent, which includes snat namespace deletion Need to check agent mode and only remove router from dvr agents running on compute nodes in this case. ** Affects: neutron Importance: Undecided Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l3-dvr-backlog -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1524908 Title: Router may be removed from dvr_snat agent by accident Status in neutron: New Bug description: This popped up during https://review.openstack.org/#/c/238478 - when dvr serviceable port is deleted/migrated, dvr callback checks if there are any more dvr serviceable ports on the host and if there are no - removes the router from the agent on that host - in case dhcp port is deleted/migrated this may lead to router being deleted from dvr_snat agent, which includes snat namespace deletion Need to check agent mode and only remove router from dvr agents running on compute nodes in this case. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1524908/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1522824] Re: DVR multinode job: test_shelve_instance failure due to SSHTimeout
On a second thought it might be not fair to require nova to wait for some events from neutron on cleanup. Also in case of live migration vifs on source node are deleted after vm is already migrated and ports are active on destination node, so neutron will not send any network-vif- unplugged events in this case. Shelve-unshelve seems a corner case and I'd like to avoid hacks in vm cleanup logic. The other idea for the fix (on neutron side now) would be to change port status to smth like PENDING_BUILD right after db update. Nova will count such ports as non-ACTIVE and will wait for network-vif-plugged events for them. When agent requests info for the port, neutron server will update status to BUILD. Later when agent reports device up, the port will be put back into ACTIVE state and network-vif-plugged event will be sent to nova. Changing project back to neutron. ** Project changed: nova => neutron -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1522824 Title: DVR multinode job: test_shelve_instance failure due to SSHTimeout Status in neutron: New Bug description: gate-tempest-dsvm-neutron-multinode-full fails from time to time due to tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance failure: Captured traceback: 2015-12-04 01:17:12.569 | ~~~ 2015-12-04 01:17:12.569 | Traceback (most recent call last): 2015-12-04 01:17:12.570 | File "tempest/test.py", line 127, in wrapper 2015-12-04 01:17:12.570 | return f(self, *func_args, **func_kwargs) 2015-12-04 01:17:12.570 | File "tempest/scenario/test_shelve_instance.py", line 101, in test_shelve_instance 2015-12-04 01:17:12.570 | self._create_server_then_shelve_and_unshelve() 2015-12-04 01:17:12.570 | File "tempest/scenario/test_shelve_instance.py", line 93, in _create_server_then_shelve_and_unshelve 2015-12-04 01:17:12.570 | private_key=keypair['private_key']) 2015-12-04 01:17:12.570 | File "tempest/scenario/manager.py", line 645, in get_timestamp 2015-12-04 01:17:12.571 | private_key=private_key) 2015-12-04 01:17:12.571 | File "tempest/scenario/manager.py", line 383, in get_remote_client 2015-12-04 01:17:12.571 | linux_client.validate_authentication() 2015-12-04 01:17:12.571 | File "tempest/common/utils/linux/remote_client.py", line 63, in validate_authentication 2015-12-04 01:17:12.571 | self.ssh_client.test_connection_auth() 2015-12-04 01:17:12.571 | File "/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/tempest_lib/common/ssh.py", line 167, in test_connection_auth 2015-12-04 01:17:12.571 | connection = self._get_ssh_connection() 2015-12-04 01:17:12.572 | File "/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/tempest_lib/common/ssh.py", line 87, in _get_ssh_connection 2015-12-04 01:17:12.572 | password=self.password) 2015-12-04 01:17:12.572 | tempest_lib.exceptions.SSHTimeout: Connection to the 172.24.5.209 via SSH timed out. 2015-12-04 01:17:12.572 | User: cirros, Password: None To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1522824/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1522824] [NEW] DVR multinode job: test_shelve_instance failure due to SSHTimeout
Public bug reported: gate-tempest-dsvm-neutron-multinode-full fails from time to time due to tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance failure: Captured traceback: 2015-12-04 01:17:12.569 | ~~~ 2015-12-04 01:17:12.569 | Traceback (most recent call last): 2015-12-04 01:17:12.570 | File "tempest/test.py", line 127, in wrapper 2015-12-04 01:17:12.570 | return f(self, *func_args, **func_kwargs) 2015-12-04 01:17:12.570 | File "tempest/scenario/test_shelve_instance.py", line 101, in test_shelve_instance 2015-12-04 01:17:12.570 | self._create_server_then_shelve_and_unshelve() 2015-12-04 01:17:12.570 | File "tempest/scenario/test_shelve_instance.py", line 93, in _create_server_then_shelve_and_unshelve 2015-12-04 01:17:12.570 | private_key=keypair['private_key']) 2015-12-04 01:17:12.570 | File "tempest/scenario/manager.py", line 645, in get_timestamp 2015-12-04 01:17:12.571 | private_key=private_key) 2015-12-04 01:17:12.571 | File "tempest/scenario/manager.py", line 383, in get_remote_client 2015-12-04 01:17:12.571 | linux_client.validate_authentication() 2015-12-04 01:17:12.571 | File "tempest/common/utils/linux/remote_client.py", line 63, in validate_authentication 2015-12-04 01:17:12.571 | self.ssh_client.test_connection_auth() 2015-12-04 01:17:12.571 | File "/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/tempest_lib/common/ssh.py", line 167, in test_connection_auth 2015-12-04 01:17:12.571 | connection = self._get_ssh_connection() 2015-12-04 01:17:12.572 | File "/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/tempest_lib/common/ssh.py", line 87, in _get_ssh_connection 2015-12-04 01:17:12.572 | password=self.password) 2015-12-04 01:17:12.572 | tempest_lib.exceptions.SSHTimeout: Connection to the 172.24.5.209 via SSH timed out. 2015-12-04 01:17:12.572 | User: cirros, Password: None ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l3-dvr-backlog ** Description changed: gate-tempest-dsvm-neutron-multinode-full fails from time to time due to - tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance: + tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance + failure: Captured traceback: 2015-12-04 01:17:12.569 | ~~~ 2015-12-04 01:17:12.569 | Traceback (most recent call last): 2015-12-04 01:17:12.570 | File "tempest/test.py", line 127, in wrapper 2015-12-04 01:17:12.570 | return f(self, *func_args, **func_kwargs) 2015-12-04 01:17:12.570 | File "tempest/scenario/test_shelve_instance.py", line 101, in test_shelve_instance 2015-12-04 01:17:12.570 | self._create_server_then_shelve_and_unshelve() 2015-12-04 01:17:12.570 | File "tempest/scenario/test_shelve_instance.py", line 93, in _create_server_then_shelve_and_unshelve 2015-12-04 01:17:12.570 | private_key=keypair['private_key']) 2015-12-04 01:17:12.570 | File "tempest/scenario/manager.py", line 645, in get_timestamp 2015-12-04 01:17:12.571 | private_key=private_key) 2015-12-04 01:17:12.571 | File "tempest/scenario/manager.py", line 383, in get_remote_client 2015-12-04 01:17:12.571 | linux_client.validate_authentication() 2015-12-04 01:17:12.571 | File "tempest/common/utils/linux/remote_client.py", line 63, in validate_authentication 2015-12-04 01:17:12.571 | self.ssh_client.test_connection_auth() 2015-12-04 01:17:12.571 | File "/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/tempest_lib/common/ssh.py", line 167, in test_connection_auth 2015-12-04 01:17:12.571 | connection = self._get_ssh_connection() 2015-12-04 01:17:12.572 | File "/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/tempest_lib/common/ssh.py", line 87, in _get_ssh_connection 2015-12-04 01:17:12.572 | password=self.password) 2015-12-04 01:17:12.572 | tempest_lib.exceptions.SSHTimeout: Connection to the 172.24.5.209 via SSH timed out. 2015-12-04 01:17:12.572 | User: cirros, Password: None -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1522824 Title: DVR multinode job: test_shelve_instance failure due to SSHTimeout Status in neutron: New Bug description: gate-tempest-dsvm-neutron-multinode-full fails from time to time due to tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance failure: Captured traceback: 2015-12-04 01:17:12
[Yahoo-eng-team] [Bug 1522824] Re: DVR multinode job: test_shelve_instance failure due to SSHTimeout
Changing project to nova due to reasons described in comment #3 ** Project changed: neutron => nova -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1522824 Title: DVR multinode job: test_shelve_instance failure due to SSHTimeout Status in OpenStack Compute (nova): New Bug description: gate-tempest-dsvm-neutron-multinode-full fails from time to time due to tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance failure: Captured traceback: 2015-12-04 01:17:12.569 | ~~~ 2015-12-04 01:17:12.569 | Traceback (most recent call last): 2015-12-04 01:17:12.570 | File "tempest/test.py", line 127, in wrapper 2015-12-04 01:17:12.570 | return f(self, *func_args, **func_kwargs) 2015-12-04 01:17:12.570 | File "tempest/scenario/test_shelve_instance.py", line 101, in test_shelve_instance 2015-12-04 01:17:12.570 | self._create_server_then_shelve_and_unshelve() 2015-12-04 01:17:12.570 | File "tempest/scenario/test_shelve_instance.py", line 93, in _create_server_then_shelve_and_unshelve 2015-12-04 01:17:12.570 | private_key=keypair['private_key']) 2015-12-04 01:17:12.570 | File "tempest/scenario/manager.py", line 645, in get_timestamp 2015-12-04 01:17:12.571 | private_key=private_key) 2015-12-04 01:17:12.571 | File "tempest/scenario/manager.py", line 383, in get_remote_client 2015-12-04 01:17:12.571 | linux_client.validate_authentication() 2015-12-04 01:17:12.571 | File "tempest/common/utils/linux/remote_client.py", line 63, in validate_authentication 2015-12-04 01:17:12.571 | self.ssh_client.test_connection_auth() 2015-12-04 01:17:12.571 | File "/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/tempest_lib/common/ssh.py", line 167, in test_connection_auth 2015-12-04 01:17:12.571 | connection = self._get_ssh_connection() 2015-12-04 01:17:12.572 | File "/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/tempest_lib/common/ssh.py", line 87, in _get_ssh_connection 2015-12-04 01:17:12.572 | password=self.password) 2015-12-04 01:17:12.572 | tempest_lib.exceptions.SSHTimeout: Connection to the 172.24.5.209 via SSH timed out. 2015-12-04 01:17:12.572 | User: cirros, Password: None To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1522824/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1522436] [NEW] No need to autoreschedule routers if l3 agent is back online
Public bug reported: - in case l3 agent goes offline the auto-rescheduling task is triggered and starts to reschedule each router from dead agent one by one - If there are a lot of routers scheduled to the agent, rescheduling all of them might take some time - during that time the agent might get back online - currently autorescheduling will be continued until all routers are rescheduled from the (already alive!) agent The proposal is to skip rescheduling if agent is back online. ** Affects: neutron Importance: Undecided Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l3-ipam-dhcp -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1522436 Title: No need to autoreschedule routers if l3 agent is back online Status in neutron: New Bug description: - in case l3 agent goes offline the auto-rescheduling task is triggered and starts to reschedule each router from dead agent one by one - If there are a lot of routers scheduled to the agent, rescheduling all of them might take some time - during that time the agent might get back online - currently autorescheduling will be continued until all routers are rescheduled from the (already alive!) agent The proposal is to skip rescheduling if agent is back online. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1522436/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1521524] [NEW] With DVR enabled instances sometimes fail to get metadata
Public bug reported: Rally scenario which creates VMs with floating IPs at a high rate sometimes fails with SSHTimeout when trying to connect to the VM by floating IP. At the same time pings to the VM are fine. It appeared that VMs may sometimes fail to get public key from metadata. That happens because metadata proxy process was started after VM boot. Further analysis showed that l3 agent on compute node was not notified about new VM port at the time this port was created. ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: In Progress ** Tags: l3-dvr-backlog liberty-backport-potential -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1521524 Title: With DVR enabled instances sometimes fail to get metadata Status in neutron: In Progress Bug description: Rally scenario which creates VMs with floating IPs at a high rate sometimes fails with SSHTimeout when trying to connect to the VM by floating IP. At the same time pings to the VM are fine. It appeared that VMs may sometimes fail to get public key from metadata. That happens because metadata proxy process was started after VM boot. Further analysis showed that l3 agent on compute node was not notified about new VM port at the time this port was created. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1521524/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1414559] Re: OVS drops RARP packets by QEMU upon live-migration - VM temporarily disconnected
Nova patch: https://review.openstack.org/246910/ ** Also affects: nova Importance: Undecided Status: New ** Changed in: nova Assignee: (unassigned) => Oleg Bondarev (obondarev) ** Changed in: nova Status: New => In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1414559 Title: OVS drops RARP packets by QEMU upon live-migration - VM temporarily disconnected Status in neutron: In Progress Status in OpenStack Compute (nova): In Progress Bug description: When live-migrating a VM the QEMU send 5 RARP packets in order to allow re-learning of the new location of the VM's MAC address. However the VIF creation scheme between nova-compute and neutron-ovs-agent drops these RARPs: 1. nova creates a port on OVS but without the internal tagging. 2. At this stage all the packets that come out from the VM, or QEMU process it runs in, will be dropped. 3. The QEMU sends five RARP packets in order to allow MAC learning. These packets are dropped as described in #2. 4. In the meanwhile neutron-ovs-agent loops every POLLING_INTERVAL and scans for new ports. Once it detects a new port is added. it will read the properties of the new port, and assign the correct internal tag, that will allow connection of the VM. The flow above suggests that: 1. RARP packets are dropped, so MAC learning takes much longer and depends on internal traffic and advertising by the VM. 2. VM is disconnected from the network for a mean period of POLLING_INTERVAL/2 Seems like this could be solved by direct messages between nova vif driver and neutron-ovs-agent To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1414559/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1509295] [NEW] L3: agent may do double work upon start/resync
Public bug reported: The issue was noticed during scale testing of DVR. When l3 agent starts up it initiates a full sync with neutron server: requests full info about all the routers scheduled to it. At the same time agent may receive different notifications (router_added/updated/deleted) which were sent while agent was offline or starting up. For each of such notifications the agent will request router info again, so server will have to process it twice (first is for resync request). The following optimization makes sense: when agent is about to fullsync we can skip all router notifications since fyllsync should bring the agent up to date anyway. ** Affects: neutron Importance: Undecided Assignee: Oleg Bondarev (obondarev) Status: In Progress ** Tags: l3-ipam-dhcp loadimpact -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1509295 Title: L3: agent may do double work upon start/resync Status in neutron: In Progress Bug description: The issue was noticed during scale testing of DVR. When l3 agent starts up it initiates a full sync with neutron server: requests full info about all the routers scheduled to it. At the same time agent may receive different notifications (router_added/updated/deleted) which were sent while agent was offline or starting up. For each of such notifications the agent will request router info again, so server will have to process it twice (first is for resync request). The following optimization makes sense: when agent is about to fullsync we can skip all router notifications since fyllsync should bring the agent up to date anyway. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1509295/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1508869] [NEW] DVR: handle dvr serviceable port's host change
Public bug reported: When a VM port's host is changed we need to check if router should be unscheduled from old host and send corresponding notifications. commit d5a8074ec3c67ed68e64a96827da990f1c34e10f added such a check when port is unbound. Need to add similar check in case of host change. ** Affects: neutron Importance: Undecided Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l3-dvr-backlog -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1508869 Title: DVR: handle dvr serviceable port's host change Status in neutron: New Bug description: When a VM port's host is changed we need to check if router should be unscheduled from old host and send corresponding notifications. commit d5a8074ec3c67ed68e64a96827da990f1c34e10f added such a check when port is unbound. Need to add similar check in case of host change. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1508869/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1505557] [NEW] L3 agent not always properly update floatingip status on server
Public bug reported: commit c44506bfd60b2dd6036e113464f1ea682cfaeb6c introduced an optimization to not send floating ip status update when status didn't change: so if server returned floating ip as ACTIVE we don't need to update it's status after successfull processing. This migh be wrong in DVR case: when floatingip's associated fixed port is moved from one host to another, the notification is sent to both l3 agents on compute nodes (old and new). Here is what happens next: - old agent receives notification and requests router info from server - same for new agent - server returns router info without floating ip to old agent - server returns router info with floating ip to new agent. The status of floating ip is ACTIVE. - old agent removes floating ip and sends status update so server puts floatingip to DOWN state - new agent adds floatingip and doesn't send status update since it didn't changed from agent's point of view - floating ip stays in DOWN state though it's actually active The fix would be to always update status of floating ip if agent actually applies it. ** Affects: neutron Importance: Undecided Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l3-dvr-backlog -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1505557 Title: L3 agent not always properly update floatingip status on server Status in neutron: New Bug description: commit c44506bfd60b2dd6036e113464f1ea682cfaeb6c introduced an optimization to not send floating ip status update when status didn't change: so if server returned floating ip as ACTIVE we don't need to update it's status after successfull processing. This migh be wrong in DVR case: when floatingip's associated fixed port is moved from one host to another, the notification is sent to both l3 agents on compute nodes (old and new). Here is what happens next: - old agent receives notification and requests router info from server - same for new agent - server returns router info without floating ip to old agent - server returns router info with floating ip to new agent. The status of floating ip is ACTIVE. - old agent removes floating ip and sends status update so server puts floatingip to DOWN state - new agent adds floatingip and doesn't send status update since it didn't changed from agent's point of view - floating ip stays in DOWN state though it's actually active The fix would be to always update status of floating ip if agent actually applies it. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1505557/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1505661] [NEW] RetryRequest failure on create_security_group_bulk
Public bug reported: <163>Oct 5 09:10:29 node-203 neutron-server 2015-10-05 09:10:29.831 34082 ERROR neutron.api.v2.resource [req-ea0e5480-e8ec-4014-9015-2199424f54bc ] create failed 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource Traceback (most recent call last): 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/api/v2/resource.py", line 83, in resource 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource result = method(request=request, **args) 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/oslo_db/api.py", line 131, in wrapper 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource return f(*args, **kwargs) 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/api/v2/base.py", line 448, in create 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource objs = obj_creator(request.context, body, **kwargs) 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/db/securitygroups_db.py", line 123, in create_security_group_bulk 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource security_group_rule) 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/db/db_base_plugin_v2.py", line 954, in _create_bulk 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource {'resource': resource, 'item': item}) 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 85, in __exit__ 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource six.reraise(self.type_, self.value, self.tb) 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/db/db_base_plugin_v2.py", line 947, in _create_bulk 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource objects.append(obj_creator(context, item)) 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/db/securitygroups_db.py", line 150, in create_security_group 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource self._ensure_default_security_group(context, tenant_id) 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/db/securitygroups_db.py", line 663, in _ensure_default_security_group 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource raise db_exc.RetryRequest(ex) 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource RetryRequest 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource <167>Oct 5 09:10:29 node-203 neutron-server 2015-10-05 09:10:29.820 34082 DEBUG neutron.db.securitygroups_db [req-ea0e5480-e8ec-4014-9015-2199424f54bc ] Duplicate default security group 9839de92fb8049598f1c3ea8f32b9cf9 was not created _ ensure_default_security_group /usr/lib/python2.7/dist-packages/neutron/db/securitygroups_db.py:679 <163>Oct 5 09:10:29 node-203 neutron-server 2015-10-05 09:10:29.831 34082 ERROR neutron.db.db_base_plugin_v2 [req-ea0e5480-e8ec-4014-9015-2199424f54bc ] An exception occurred while creating the security_group:{u'security_group': {'tenan t_id': u'9839de92fb8049598f1c3ea8f32b9cf9', u'name': u'rally_neutronsecgrp_F44SF1uvTciIQJlu', u'description': u'Rally SG'}} ** Affects: neutron Importance: Undecided Assignee: Oleg Bondarev (obondarev) Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1505661 Title: RetryRequest failure on create_security_group_bulk Status in neutron: New Bug description: <163>Oct 5 09:10:29 node-203 neutron-server 2015-10-05 09:10:29.831 34082 ERROR neutron.api.v2.resource [req-ea0e5480-e8ec-4014-9015-2199424f54bc ] create failed 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource Traceback (most recent call last): 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/api/v2/resource.py", line 83, in resource 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource result = method(request=request, **args) 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/oslo_db/api.py", line 131, in wrapper 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource return f(*args, **kwargs) 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/api/v2/base.py", line 448, in create 2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.r
[Yahoo-eng-team] [Bug 1505282] [NEW] L3 agent: explicit call to resync on init may lead to double syncing
Public bug reported: Currently L3 agent has an explicit call to self.periodic_sync_routers_task() after initialization. Given that periodic job spacing is set to 1 second, this may lead to double syncing with server on initialization (especially if there are a lot of routers scheduled to the agent): - agent starts, fullsync flag is True - periodic_sync_routers_task is called from after_start(), agent requests router info from server, fullsync flag is True - periodic_sync_routers_task is called by periodic task framework, fullsync flag is still True, agent requests router info from server once again. So it's double work on both server and agent sides which might be quite expensive at scale. The proposal is to just use run_immediately parameter. ** Affects: neutron Importance: Undecided Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l3-ipam-dhcp -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1505282 Title: L3 agent: explicit call to resync on init may lead to double syncing Status in neutron: New Bug description: Currently L3 agent has an explicit call to self.periodic_sync_routers_task() after initialization. Given that periodic job spacing is set to 1 second, this may lead to double syncing with server on initialization (especially if there are a lot of routers scheduled to the agent): - agent starts, fullsync flag is True - periodic_sync_routers_task is called from after_start(), agent requests router info from server, fullsync flag is True - periodic_sync_routers_task is called by periodic task framework, fullsync flag is still True, agent requests router info from server once again. So it's double work on both server and agent sides which might be quite expensive at scale. The proposal is to just use run_immediately parameter. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1505282/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1494157] [NEW] Regression: ObjectDeletedError on network delete
251 10128 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/loading.py", line 614, in load_scalar_attributes 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource raise orm_exc.ObjectDeletedError(state) 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource ObjectDeletedError: Instance '' has been deleted, or its row is otherwise not present. 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource ** Affects: neutron Importance: Undecided Assignee: Oleg Bondarev (obondarev) Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1494157 Title: Regression: ObjectDeletedError on network delete Status in neutron: New Bug description: Exception is raised when deleting network ports: 2015-09-09T01:24:36.253938+00:00 err: 2015-09-09 01:24:36.251 10128 ERROR neutron.api.v2.resource [req-81135bfb-f40b-41ee-b6ce-279eafba97dd ] delete failed 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource Traceback (most recent call last): 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/api/v2/resource.py", line 83, in resource 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource result = method(request=request, **args) 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/oslo_db/api.py", line 131, in wrapper 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource return f(*args, **kwargs) 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/api/v2/base.py", line 495, in delete 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource obj_deleter(request.context, id, **kwargs) 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py", line 780, in delete_network 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource self._delete_ports(context, port_ids) 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py", line 693, in _delete_ports 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource port_id) 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 85, in __exit__ 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource six.reraise(self.type_, self.value, self.tb) 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py", line 685, in _delete_ports 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource self.delete_port(context, port_id) 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py", line 1292, in delete_port 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource super(Ml2Plugin, self).delete_port(context, id) 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/db/db_base_plugin_v2.py", line 1915, in delete_port 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource self._delete_port(context, id) 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/db/db_base_plugin_v2.py", line 1938, in _delete_port 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource query.delete() 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2670, in delete 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource delete_op.exec_() 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 896, in exec_ 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource self._do_pre_synchronize() 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 958, in _do_pre_synchronize 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource eval_condition(obj)] 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/evaluator.py", line 115, in evaluate 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource left_val = eval_left(obj) 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/evaluator.py", line 72, in 2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource return lamb
[Yahoo-eng-team] [Bug 1491922] [NEW] ovs agent doesn't configure new ovs-port for an instance
Public bug reported: in case of massive resource deletion (networks, ports) it may take agent quite a lot of time to process. Port delete priocessing is happening during ovs agent periodic task. It takes agent ~0.25s to process one port deletion. >From the attached log we can see that on a certain iteration the agent had to >process deletion of 1625 ports. 1625 * 0.25 = 406 seconds. Indeed: 2015-08-29 09:13:46.004 21292 DEBUG neutron.plugins.openvswitch.agent.ovs_neutron_agent [req- 55e0a577-e03b-4476-9bdd-f5480cfef966 ] Agent rpc_loop - iteration:25863 - starting polling. Elapsed:0.047 rpc_loop /usr/lib/python2.7/dist- packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py:1733 ... (ports deletion handling) 2015-08-29 09:20:28.569 21292 DEBUG neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-55e0a577-e03b-4476-9bdd-f5480cfef966 ] Agent rpc_loop - iteration:25863 - port information retrieved. Elapsed:402.612 rpc_loop /usr/lib/python2.7/dist-packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py:1748 ... (from here agent starts processing new ports) 402 seconds is somewhat not acceptable. Nova waits for 300 seconds by default and then fails with vif plugging timeout. >From log we can also see that new ovs port appeared during agent was busy with >ports deletion stuff: 2015-08-29 09:13:52.432 21292 DEBUG neutron.agent.linux.ovsdb_monitor [-] Output received from ovsdb monitor: {"data":[["8fd481a4-1267-445b-bedc-f1f6b3a47898","old",null,["set",[]]],["","new","qvoced59c11-1b",76]],"headings":["row","action","name","ofport"]} _read_stdout /usr/lib/python2.7/dist-packages/neutron/agent/linux/ovsdb_monitor.py:44 Port deletion handling needs to be optimised on agent side. ** Affects: neutron Importance: Undecided Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: ovs ** Attachment added: "ovs-agent.log.gz" https://bugs.launchpad.net/bugs/1491922/+attachment/4456894/+files/ovs-agent.log.gz -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1491922 Title: ovs agent doesn't configure new ovs-port for an instance Status in neutron: New Bug description: in case of massive resource deletion (networks, ports) it may take agent quite a lot of time to process. Port delete priocessing is happening during ovs agent periodic task. It takes agent ~0.25s to process one port deletion. From the attached log we can see that on a certain iteration the agent had to process deletion of 1625 ports. 1625 * 0.25 = 406 seconds. Indeed: 2015-08-29 09:13:46.004 21292 DEBUG neutron.plugins.openvswitch.agent.ovs_neutron_agent [req- 55e0a577-e03b-4476-9bdd-f5480cfef966 ] Agent rpc_loop - iteration:25863 - starting polling. Elapsed:0.047 rpc_loop /usr/lib/python2.7/dist- packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py:1733 ... (ports deletion handling) 2015-08-29 09:20:28.569 21292 DEBUG neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-55e0a577-e03b-4476-9bdd-f5480cfef966 ] Agent rpc_loop - iteration:25863 - port information retrieved. Elapsed:402.612 rpc_loop /usr/lib/python2.7/dist-packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py:1748 ... (from here agent starts processing new ports) 402 seconds is somewhat not acceptable. Nova waits for 300 seconds by default and then fails with vif plugging timeout. From log we can also see that new ovs port appeared during agent was busy with ports deletion stuff: 2015-08-29 09:13:52.432 21292 DEBUG neutron.agent.linux.ovsdb_monitor [-] Output received from ovsdb monitor: {"data":[["8fd481a4-1267-445b-bedc-f1f6b3a47898","old",null,["set",[]]],["","new","qvoced59c11-1b",76]],"headings":["row","action","name","ofport"]} _read_stdout /usr/lib/python2.7/dist-packages/neutron/agent/linux/ovsdb_monitor.py:44 Port deletion handling needs to be optimised on agent side. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1491922/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1428713] Re: migrate non-dvr to dvr case, snat netns not created
I think we need to add explicit validation for router being set to admin state down prior to upgrade. This should eliminate the confusion. ** Changed in: neutron Status: Invalid = Triaged ** Changed in: neutron Assignee: ZongKai LI (lzklibj) = Oleg Bondarev (obondarev) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1428713 Title: migrate non-dvr to dvr case, snat netns not created Status in neutron: In Progress Bug description: On a 1+2 env, router has external network attached. Use follow steps to migrate from non-dvr to dvr: 1) modify related config files. 2) restart related services. 3) run command neutron router-update --distributed=True ROUTER. Now, there's no snat-* netns create on controller node. As a workaround, restart neutron-l3-agent on controller node will work. And in l3-agent.log, we can find: 2015-02-28 01:26:21.377 5283 ERROR neutron.agent.l3.agent [-] 'LegacyRouter' object has no attribute 'dist_fip_count' 2015-02-28 01:26:21.377 5283 TRACE neutron.agent.l3.agent Traceback (most recent call last): 2015-02-28 01:26:21.377 5283 TRACE neutron.agent.l3.agent File /usr/lib/python2.7/site-packages/neutron/common/utils.py, line 342, in call 2015-02-28 01:26:21.377 5283 TRACE neutron.agent.l3.agent return func(*args, **kwargs) 2015-02-28 01:26:21.377 5283 TRACE neutron.agent.l3.agent File /usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py, line 592, in process_router 2015-02-28 01:26:21.377 5283 TRACE neutron.agent.l3.agent self.scan_fip_ports(ri) 2015-02-28 01:26:21.377 5283 TRACE neutron.agent.l3.agent File /usr/lib/python2.7/site-packages/neutron/agent/l3/dvr.py, line 128, in scan_fip_ports 2015-02-28 01:26:21.377 5283 TRACE neutron.agent.l3.agent if not ri.router.get('distributed') or ri.dist_fip_count is not None: 2015-02-28 01:26:21.377 5283 TRACE neutron.agent.l3.agent AttributeError: 'LegacyRouter' object has no attribute 'dist_fip_count' It seems current code is not ready to migrate LegacyRouter to DvrRouter. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1428713/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1447397] Re: Removing one interface from a Router, deletes the qrouter namespace
*** This bug is a duplicate of bug 1443524 *** https://bugs.launchpad.net/bugs/1443524 ** This bug is no longer a duplicate of bug 1443596 Removing an interface from a DVR router removes all SNAT ports of all connected subnets ** This bug has been marked a duplicate of bug 1443524 Removing an interface by port from a DVR router deletes all SNAT ports -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1447397 Title: Removing one interface from a Router, deletes the qrouter namespace Status in neutron: Confirmed Bug description: In the DVR mode, when an interface is removed from the router, it is deleting the qrouter namesapce itself. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1447397/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1484135] [NEW] DetachedInstanceError on network delete
Public bug reported: DetachedInstanceError occures when logging dhcp port was deleted concurrently and accessing db object while it was already expunged from session. Code in question: def _delete_ports(self, context, ports): for port in ports: try: self.delete_port(context, port.id) except (exc.PortNotFound, sa_exc.ObjectDeletedError): context.session.expunge(port) # concurrent port deletion can be performed by # release_dhcp_port caused by concurrent subnet_delete LOG.info(_LI(Port %s was deleted concurrently), port.id) Traceback: 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource Traceback (most recent call last): 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource File /usr/lib/python2.7/dist-packages/neutron/api/v2/resource.py, line 83, in resource 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource result = method(request=request, **args) 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource File /usr/lib/python2.7/dist-packages/neutron/api/v2/base.py, line 490, in delete 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource obj_deleter(request.context, id, **kwargs) 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource File /usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py, line 775, in delete_network 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource self._delete_ports(context, ports) 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource File /usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py, line 686, in _delete_ports 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource LOG.info(_LI(Port %s was deleted concurrently), port.id) 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource File /usr/lib/python2.7/dist-packages/sqlalchemy/orm/attributes.py, line 239, in __get__ 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource return self.impl.get(instance_state(instance), dict_) 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource File /usr/lib/python2.7/dist-packages/sqlalchemy/orm/attributes.py, line 589, in get 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource value = callable_(state, passive) 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource File /usr/lib/python2.7/dist-packages/sqlalchemy/orm/state.py, line 424, in __call__ 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource self.manager.deferred_scalar_loader(self, toload) 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource File /usr/lib/python2.7/dist-packages/sqlalchemy/orm/loading.py, line 563, in load_scalar_attributes 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource (state_str(state))) 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource DetachedInstanceError: Instance Port at 0x7f8f7d544dd0 is not bound to a Session; attribute refresh operation cannot proceed 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource 2015-08-12T09:26:42.990805+00:00 info: 2015-08-12 09:26:42.987 4250 INFO neutron.wsgi [req-2bbc2b06-40f1-41e7-a230-3026ea94414d ] 10.109.2.3 - - [12/Aug/2015 09:26:42] DELETE /v2.0/networks/a3322fce-2fc9-4be3-88d7-ba1d4f4294df.json HTTP/1.1 500 378 0.938119 ** Affects: neutron Importance: High Assignee: Oleg Bondarev (obondarev) Status: Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1484135 Title: DetachedInstanceError on network delete Status in neutron: Confirmed Bug description: DetachedInstanceError occures when logging dhcp port was deleted concurrently and accessing db object while it was already expunged from session. Code in question: def _delete_ports(self, context, ports): for port in ports: try: self.delete_port(context, port.id) except (exc.PortNotFound, sa_exc.ObjectDeletedError): context.session.expunge(port) # concurrent port deletion can be performed by # release_dhcp_port caused by concurrent subnet_delete LOG.info(_LI(Port %s was deleted concurrently), port.id) Traceback: 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource Traceback (most recent call last): 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource File /usr/lib/python2.7/dist-packages/neutron/api/v2/resource.py, line 83, in resource 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource result = method(request=request, **args) 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource File /usr/lib/python2.7/dist-packages/neutron/api/v2/base.py, line 490, in delete 2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource obj_deleter(request.context, id, **kwargs) 2015-08-12 09
[Yahoo-eng-team] [Bug 1482630] [NEW] Router resources lost after rescheduling
Public bug reported: Currently router_added_to_agent (and other) notifications are sent to agents with an RPC cast() method which does not ensure that the message is actually delivered to the recipient. If the message is lost (for example due to instability of messaging system during failover scenarios) neither server nor agent will be aware of that and router namespace will not be created by the hosting agent till the next resync. Resync will only happen in case of errors on agent side or restart which might take quite a long time. The proposal would be to use RPC call() to notify agents about added routers thus ensuring no routers will be lost by agents. ** Affects: neutron Importance: Undecided Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: l3-ipam-dhcp -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1482630 Title: Router resources lost after rescheduling Status in neutron: New Bug description: Currently router_added_to_agent (and other) notifications are sent to agents with an RPC cast() method which does not ensure that the message is actually delivered to the recipient. If the message is lost (for example due to instability of messaging system during failover scenarios) neither server nor agent will be aware of that and router namespace will not be created by the hosting agent till the next resync. Resync will only happen in case of errors on agent side or restart which might take quite a long time. The proposal would be to use RPC call() to notify agents about added routers thus ensuring no routers will be lost by agents. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1482630/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp