[Yahoo-eng-team] [Bug 1611119] [NEW] Add new configuration options for DHCP
Public bug reported: In networking-ovn, DHCP is provided natively. We want to refactor the DHCP agent to be used for metadata. This can be achieved with the following new configuration options whose defaults are set to the values used currently in dhcp. In ML2 plugin - 1) dhcp_rpc_callback default: "neutron.api.rpc.handlers.DhcpRpcCallback" 2) dhcp_notifier default: "neutron.api.rpc.agentnotifiers.DhcpAgentNotifyAPI" In DHCP Agent - 3) dhcp_plugin_api default: "neutron.agent.dhcp.agent.DhcpPluginApi" 4) dhcp_device_manager default: "neutron.agent.linux.dhcp.DeviceManager" 5) notifies_port_ready default: True ** Affects: neutron Importance: Undecided Assignee: Ramu Ramamurthy (ramu-ramamurthy) Status: New ** Changed in: neutron Assignee: (unassigned) => Ramu Ramamurthy (ramu-ramamurthy) ** Description changed: - - In networking-ovn, DHCP is provided natively. + In networking-ovn, DHCP is provided natively. We want to refactor the DHCP agent to be used for metadata. This can be - achieved with the following new configuration options whose defaults are set to - the values used currently in dhcp. - + achieved with the following new configuration options whose defaults are set to the values used currently in dhcp. In ML2 plugin - - dhcp_rpc_callback defaults to: "neutron.api.rpc.handlers.DhcpRpcCallback" - dhcp_notifier defaults to: "neutron.api.rpc.agentnotifiers.DhcpAgentNotifyAPI" - + dhcp_rpc_callback defaults to: +"neutron.api.rpc.handlers.DhcpRpcCallback" + dhcp_notifier defaults to: "neutron.api.rpc.agentnotifiers.DhcpAgentNotifyAPI" In DHCP Agent - dhcp_plugin_api defaults to: "neutron.agent.dhcp.agent.DhcpPluginApi" dhcp_device_manager defaults to: "neutron.agent.linux.dhcp.DeviceManager" notifies_port_ready defaults to: True ** Description changed: In networking-ovn, DHCP is provided natively. We want to refactor the DHCP agent to be used for metadata. This can be achieved with the following new configuration options whose defaults are set to the values used currently in dhcp. In ML2 plugin - - dhcp_rpc_callback defaults to: -"neutron.api.rpc.handlers.DhcpRpcCallback" - dhcp_notifier defaults to: "neutron.api.rpc.agentnotifiers.DhcpAgentNotifyAPI" + 1) dhcp_rpc_callback defaults to: + "neutron.api.rpc.handlers.DhcpRpcCallback" + + 2) dhcp_notifier defaults to: + "neutron.api.rpc.agentnotifiers.DhcpAgentNotifyAPI" In DHCP Agent - - dhcp_plugin_api defaults to: "neutron.agent.dhcp.agent.DhcpPluginApi" - dhcp_device_manager defaults to: "neutron.agent.linux.dhcp.DeviceManager" + 3) dhcp_plugin_api defaults to: + "neutron.agent.dhcp.agent.DhcpPluginApi" - notifies_port_ready defaults to: True + 4) dhcp_device_manager defaults to: + "neutron.agent.linux.dhcp.DeviceManager" + + 5) notifies_port_ready defaults to: True ** Description changed: In networking-ovn, DHCP is provided natively. We want to refactor the DHCP agent to be used for metadata. This can be achieved with the following new configuration options whose defaults are set to the values used currently in dhcp. In ML2 plugin - - 1) dhcp_rpc_callback defaults to: - "neutron.api.rpc.handlers.DhcpRpcCallback" + 1) dhcp_rpc_callback defaults: + "neutron.api.rpc.handlers.DhcpRpcCallback" - 2) dhcp_notifier defaults to: + 2) dhcp_notifier default: "neutron.api.rpc.agentnotifiers.DhcpAgentNotifyAPI" In DHCP Agent - 3) dhcp_plugin_api defaults to: "neutron.agent.dhcp.agent.DhcpPluginApi" 4) dhcp_device_manager defaults to: "neutron.agent.linux.dhcp.DeviceManager" 5) notifies_port_ready defaults to: True ** Description changed: In networking-ovn, DHCP is provided natively. We want to refactor the DHCP agent to be used for metadata. This can be achieved with the following new configuration options whose defaults are set to the values used currently in dhcp. In ML2 plugin - - 1) dhcp_rpc_callback defaults: + 1) dhcp_rpc_callback default: + "neutron.api.rpc.handlers.DhcpRpcCallback" 2) dhcp_notifier default: + "neutron.api.rpc.agentnotifiers.DhcpAgentNotifyAPI" In DHCP Agent - - 3) dhcp_plugin_api defaults to: + 3) dhcp_plugin_api defaults: + "neutron.agent.dhcp.agent.DhcpPluginApi" - 4) dhcp_device_manager defaults to: + 4) dhcp_device_manager default: + "neutron.agent.linux.dhcp.DeviceManager" - 5) notifies_port_ready defau
[Yahoo-eng-team] [Bug 1555717] [NEW] Change log level at runtime
Public bug reported: We need the ability to change the debug levels on the fly on neutron components (neutron-server, agents) to debug problems as they are happening. Currently, changing log level requires a service restart (after changing the log level in the config file). Changing debug-level cannot require a restart. We want the ability to change the log level as the service is running. ** Affects: neutron Importance: Undecided Status: New ** Summary changed: - Change log level when running + Change log level at runtime ** Description changed: + We need the ability to change the debug levels on the fly on neutron + components (neutron-server, agents) to debug problems as they are + happening. - We need the ability to change the debug levels on the fly on neutron components - (neutron-server, agents) to debug problems as they are happening. Currently, - changing log level requires a service restart (after changing the log level in the - config file). + Currently, changing log level requires a service restart (after changing + the log level in the config file). We want the ability to change the log level as the service is running. ** Description changed: We need the ability to change the debug levels on the fly on neutron components (neutron-server, agents) to debug problems as they are happening. Currently, changing log level requires a service restart (after changing - the log level in the config file). + the log level in the config file). Changing debug-level cannot require a + restart. We want the ability to change the log level as the service is running. -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1555717 Title: Change log level at runtime Status in neutron: New Bug description: We need the ability to change the debug levels on the fly on neutron components (neutron-server, agents) to debug problems as they are happening. Currently, changing log level requires a service restart (after changing the log level in the config file). Changing debug-level cannot require a restart. We want the ability to change the log level as the service is running. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1555717/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1519537] [NEW] [RFE] - Diagnostics Extension for Neutron
Public bug reported: Problem -- Debugging common networking/neutron problems (1. cannot ping VM, 2. cannot ping FIP), tends to be manual, and requires root access to look into the state of the agents or the datapath on different hosts. Neutron needs to provide a "diagnostics" extension api which can be used for debugging networking problems. Each agent/driver exposes its own state in a structured (json) format via the diagnostics extension. The structured content can be parsed by automation to pin-point problems or at least help get to the next step of the debugging much faster than done manually. In addition, there should be diagnostics/operational support to ping a neutron port from the dhcp or l3 agents. Nova "diagnostics" serves as an example here. https://wiki.openstack.org/wiki/Nova_VM_Diagnostics Fix A "diagnostics" extension is added to neutron Each agent and corresponding drivers supports a get_diagnostics() API invoked from neutron-server upon the following GET APIs (limited by policy to admin-only). The outputs are structured so, they can be processed by other tools. GET: /agent/:id/diagnostics example output from neutron-ovs agent: OVS bridges, ports and flows GET: /agent/:id/diagnostics/network/:id example output from dhcp-agent (dnsmasq driver): contents of host,lease files GET: /agent/:id/diagnostics/port/:id example output from dhcp-agent: dhcp transactions for that port (from dnsmasq logs) example output from ovs-agent: stats on qvo,qbr,tap interfaces GET: /agent/:id/diagnostics/port/:id/security-groups example output from l2-agent (iptables-firewalldriver): iptables rules programmed (ingress/egress/spoofing) for that port GET: /agent/:id/diagnostics/port/:id/ping This is an "operational" command - ping the port from the agent (dhcp/l3) network/router namespace Neutron Command-line Client supports the following new commands neutron l2-diagnostics --network-id <> --port-id <> agent neutron dhcp-diagnostics --network-id <> --port-id <> --ping agent neutron l3-diagnostics --network-id <> --port-id <> --ping agent Sample Diagnostics Extension Code -- See Attached Code Diff ** Affects: neutron Importance: Undecided Status: New ** Tags: rfe ** Patch added: "sample code diff for diagnostics extension" https://bugs.launchpad.net/bugs/1519537/+attachment/4524935/+files/diagnostics-extension-code-diff.txt ** Description changed: - Problem -- - Debugging common networking/neutron problems (1. cannot ping VM, 2. cannot ping FIP), + Debugging common networking/neutron problems (1. cannot ping VM, 2. cannot ping FIP), tends to be manual, and requires root-shell access to look into the state of the agents or the datapath on different hosts. Neutron needs to provide a "diagnostics" extension api which can be used for debugging networking problems. Each agent/driver exposes its own state in a structured (json) format via the diagnostics extension. The structured - content can be parsed by automation to pin-point problems or at least help get to - the next step of the debugging much faster than done manually. + content can be parsed by automation to pin-point problems or at least help get to the next step of the debugging much faster than done manually. In addition, there should be diagnostics/operational support to ping a neutron port from the dhcp or l3 agents. Nova "diagnostics" serves as an example here. https://wiki.openstack.org/wiki/Nova_VM_Diagnostics Fix A "diagnostics" extension is added to neutron Each agent and corresponding drivers supports a get_diagnostics() API invoked from neutron-server upon - the following GET APIs (limited by policy to admin-only). The outputs are structured so, they can be - processed by other tools. + the following GET APIs (limited by policy to admin-only). The outputs are structured so, they can be processed by other tools. GET: /agent/:id/diagnostics -example output from neutron-ovs agent: OVS bridges, ports and flows + example output from neutron-ovs agent: OVS bridges, ports and flows GET: /agent/:id/diagnostics/network/:id -example output from dhcp-agent (dnsmasq driver): contents of host,lease files + example output from dhcp-agent (dnsmasq driver): contents of host,lease files GET: /agent/:id/diagnostics/port/:id - example output from dhcp-agent: dhcp transactions for that port (from dnsmasq logs) - + example output from dhcp-agent: dhcp transactions for that port (from dnsmasq logs) + example output from ovs-agent: stats on qvo,qbr,tap interfaces + GET:
[Yahoo-eng-team] [Bug 1512864] [NEW] Application Metrics for Neutron
Public bug reported: Problem Currently, there is no application-metrics framework to measure neutron behavior, and identify trends and problems in operation. RPC latencies, HW programming latency (for security-groups, ovs), API latencies, DB latencies come to mind immediately as metrics which need to be monitored, trended, and analyzed. Logs are one piece of the puzzle but they do not capture tagged metrics data in a format that can be easily analyzed. This is filed as a RFE bug (following the blueprint protocol) and this can be upgraded to a blueprint after discussion/inputs. Fix Provide a simple framework that has the following components: 1) A metrics front end provide at the least the following apis @timeit(tag) - a decorator to time functions or code blocks. increment(tag) decrement(tag) set_value(tag, value) 2) Metrics processor A class that can provide functionality like: sampling, throttling metrics data etc. 3) Configurable Metrics output driver statsd-driver : output to statsd server graphite/carbon driver: output to carbon server logfile driver: output to logfile With the above library, neutron code is instrumented to add tagged metrics. I have some code, and can contribute it after community input/discussion on the rfe (following the blueprint protocol). Example Usecases -- 1) Time a function or code block @metrics.timeit def _apply(self): lock_name = 'iptables' if self.namespace: lock_name += '-' + self.namespace Example output data (in graphite format) neutron.agent.linux.iptables_manager._apply 319.076061249 1446328923 neutron.agent.linux.iptables_manager._apply 274.07002449 1446328925 Notes -- 1) Swift (and other projects ?) already have "statsd" based application metrics. However, we want to design in a manner not entirely tied to statsd. For example see. https://review.openstack.org/#/c/6058/ 2) Such a metrics library needs to ideally be an oslo project - as it can be used by all openstack projects, but it can be proved in neutron first. ** Affects: neutron Importance: Undecided Assignee: Ramu Ramamurthy (ramu-ramamurthy) Status: New ** Tags: rfe ** Description changed: Problem Currently, there is no application-metrics framework to measure neutron behavior, and identify trends and problems in operation. RPC latencies, HW programming latency - (for security-groups, ovs), API latencies, DB latencies come to mind immediately as metrics + (for security-groups, ovs), API latencies, DB latencies come to mind immediately as metrics which need to be monitored, trended, and analyzed. - Logs are one piece of the puzzle but they do not capture tagged metrics data in a + Logs are one piece of the puzzle but they do not capture tagged metrics data in a format that can be easily analyzed. This is filed as a RFE bug (following the blueprint protocol) and this can be upgraded to a blueprint after discussion/inputs. Fix Provide a simple framework that has the following components: - 1) A metrics front end - provide at the least the following apis - @timeit(tag) - a decorator to time functions or code blocks. - increment(tag) - decrement(tag) - set_value(tag, value) - - 2) Metrics processor - A class that can provide functionality like: sampling, throttling metrics data etc. - + 1) A metrics front end + provide at the least the following apis + @timeit(tag) - a decorator to time functions or code blocks. + increment(tag) + decrement(tag) + set_value(tag, value) - 3) Configurable Metrics output driver - statsd-driver : output to statsd server - graphite/carbon driver: output to carbon server - logfile driver: output to logfile + 2) Metrics processor + A class that can provide functionality like: sampling, throttling metrics data etc. - With the above library, neutron code is instrumented to add tagged + 3) Configurable Metrics output driver + statsd-driver : output to statsd server + graphite/carbon driver: output to carbon server + logfile driver: output to logfile + + With the above library, neutron code is instrumented to add tagged metrics. - I have some code, and can contribute after community input/discussion on the rfe (following the + I have some code, and can contribute it after community input/discussion on the rfe (following the blueprint protocol). Example Usecases -- 1) Time a function or code block - @metrics.timeit - def _apply(self): - lock_name =
[Yahoo-eng-team] [Bug 1492456] Re: cProfile - fix Security Groups hotfunctions
As per the comment above, I will close this as fix-released (the patch noted above) and create a RFE bug to track further improvements needed. I will also create a RFE bug to provide an option to run neutron agents in a "profiled" mode. ** Changed in: neutron Status: Triaged => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1492456 Title: cProfile - fix Security Groups hotfunctions Status in neutron: Fix Released Bug description: I used cProfile to profile neutron-ovs-agent (from neutron kilo 2015.1.0) as VMs are provisioned (see code sample below to reproduce). I find a couple of functions in the IptablesManager scaling poorly with # of VMs (_modify_rules, and its callee find_last_entry). As the # of current VMs doubles, the time spent in these functions to provision 10 new VMs also roughly doubles, While we wait for the new IptablesTables firewall driver: https://blueprints.launchpad.net/neutron/+spec/new-iptables-driver Can we improve the performance of the current iptables firewall on those 2 functions which do a lot of string processing on iptables rule strings checking for dups ? Current: #VMs: 20, # iptables rules: 657, provision 10 new VMs ncalls tottime percall cumtime percall filename:lineno(function) 600.1430.0023.9790.066 iptables_manager.py:511(_modify_rules) 259892.7520.0003.3320.000 iptables_manager.py:504(_find_last_entry) Cumulative time spent in _find_last_entry: 3.3 sec Current #VMs: 40, # iptables rules: 1277 , provision 10 new VMs 650.2200.0037.9740.123 iptables_manager.py:511(_modify_rules) 388915.7820.0006.9860.000 iptables_manager.py:504(_find_last_entry) Cumulative time spent in _find_last_entry: 6.9 sec Current #VMs: 80, # iptables rules: 2517 , provision 10 new VMs 300.2740.009 20.4960.683 iptables_manager.py:511(_modify_rules) 43862 15.9200.000 19.2920.000 iptables_manager.py:504(_find_last_entry) Cumulative time spent in _find_last_entry: 19.2 sec current #VMs: 160, # iptables rules: 4997, provision 10 new VMs 200.3750.019 49.2552.463 iptables_manager.py:511(_modify_rules) 56478 39.2750.001 47.6290.001 iptables_manager.py:504(_find_last_entry) Cumulative time spent in _find_last_entry: 47.6 sec To Reproduce: THis is one way where we can control start/stop of profiling based on presence of a file (/tmp/cprof) in the file-system --- Make following change to neutron_ovs_agent.py to enable/disable cProfile for a given scenario. import cProfile import os.path pr_enabled = False pr = None In OVSNeutronAgent add method: def toggle_cprofile(self): global pr, pr_enabled start = False data = "" fname = "vm.profile" try: if os.path.isfile("/tmp/cprof"): start = True except IOError as e: LOG.warn("Error %s", e.strerror) if start and not pr_enabled: pr = cProfile.Profile() pr.enable() pr_enabled = True LOG.warn("enabled cprofile") if not start and pr_enabled: pr.disable() pr.create_stats() pr.dump_stats("/tmp/%s"%fname) pr_enabled = False LOG.warn("disabled cprofile") In polling loop: self.toggle_cprofile() - This is another way to run the cProfile, but here, there is no way to control start/stop of profiling, and the profile includes the initialization also. Run, neutron-ovs-agent as follows: sudo -u neutron bash -c "/usr/bin/python -m cProfile -o /tmp/vm_1.profile /usr/bin/neutron-openvswitch-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini --log-file /var/log/neutron/openvswitch-agent.log" To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1492456/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1502297] [NEW] Improve SG performance as VMs/containers scale on compute node
Public bug reported: Please refer to the comments in the following bug: https://bugs.launchpad.net/neutron/+bug/1492456 In which it was suggested to handle improving SG programming performance as a RFE bug. To Summarize the problem, when there are about 160 VMs, the neutron-ovs- agent takes more than 2 seconds per VM to program the iptables rules mainly because of the inefficiency in the iptables programming code. #VMs = 0, provision 10 new VMs on compute node cumulative time in _modify_rules : Before 0.34 After: 0.20 #VMs = 10, provision 10 new VMs on compute node cumulative time in _modify_rules : Before 1.68 After: 0.94 #VMs = 20, provision 10 new VMs on compute node cumulative time in _modify_rules : Before 4.27 After: 2.12 #VMs = 40, provision 10 new VMs on compute node cumulative time in _modify_rules : Before 11.8 After: 6.44 #VMs = 80, provision 10 new VMs on compute node cumulative time in _modify_rules : Before 20.2 After: 13.6 #VMs = 160, provision 10 new VMs on compute node cumulative time in _modify_rules : Before 50 After: 23.2 < more than 2 seconds per VM !!! ** Affects: neutron Importance: Undecided Status: New ** Tags: rfe ** Description changed: - Please refer to the comments in the following bug: https://bugs.launchpad.net/neutron/+bug/1492456 In which it was suggested to handle improving SG programming performance as a RFE bug. - To Summarize the problem, when there are about 180 VMs/containers, the - neutron-ovs-agent takes more than 2 seconds per VM/container to program - the iptables rules mainly because of the inefficiency in the iptables - programming code. + To Summarize the problem, when there are about 160 VMs, the neutron-ovs- + agent takes more than 2 seconds per VM to program the iptables rules + mainly because of the inefficiency in the iptables programming code. #VMs = 0, provision 10 new VMs on compute node cumulative time in _modify_rules : Before 0.34 After: 0.20 #VMs = 10, provision 10 new VMs on compute node cumulative time in _modify_rules : Before 1.68 After: 0.94 #VMs = 20, provision 10 new VMs on compute node cumulative time in _modify_rules : Before 4.27 After: 2.12 #VMs = 40, provision 10 new VMs on compute node cumulative time in _modify_rules : Before 11.8 After: 6.44 #VMs = 80, provision 10 new VMs on compute node cumulative time in _modify_rules : Before 20.2 After: 13.6 #VMs = 160, provision 10 new VMs on compute node cumulative time in _modify_rules : Before 50 After: 23.2 < more than 2 seconds per VM !!! -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1502297 Title: Improve SG performance as VMs/containers scale on compute node Status in neutron: New Bug description: Please refer to the comments in the following bug: https://bugs.launchpad.net/neutron/+bug/1492456 In which it was suggested to handle improving SG programming performance as a RFE bug. To Summarize the problem, when there are about 160 VMs, the neutron- ovs-agent takes more than 2 seconds per VM to program the iptables rules mainly because of the inefficiency in the iptables programming code. #VMs = 0, provision 10 new VMs on compute node cumulative time in _modify_rules : Before 0.34 After: 0.20 #VMs = 10, provision 10 new VMs on compute node cumulative time in _modify_rules : Before 1.68 After: 0.94 #VMs = 20, provision 10 new VMs on compute node cumulative time in _modify_rules : Before 4.27 After: 2.12 #VMs = 40, provision 10 new VMs on compute node cumulative time in _modify_rules : Before 11.8 After: 6.44 #VMs = 80, provision 10 new VMs on compute node cumulative time in _modify_rules : Before 20.2 After: 13.6 #VMs = 160, provision 10 new VMs on compute node cumulative time in _modify_rules : Before 50 After: 23.2 < more than 2 seconds per VM !!! To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1502297/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1502301] [NEW] add config option to start agents in "cprofiled" mode
Public bug reported: For motivation see this bug, where this "rfe" was requested. https://bugs.launchpad.net/neutron/+bug/1502297 By running the neutron-agents under a profiler like cprofile, we can obtain valuable insights into the performance of the agent-code, and identify areas for improvement (see bug above for concrete example) This bug seeks to add a config option in the agent config file like: profiled_mode = false by default the above is false, but when set to true, it starts the agent under cprofile, and dumps the profiled output to a file. ** Affects: neutron Importance: Undecided Assignee: Ramu Ramamurthy (ramu-ramamurthy) Status: New ** Tags: rfe ** Tags added: rfe ** Changed in: neutron Assignee: (unassigned) => Ramu Ramamurthy (ramu-ramamurthy) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1502301 Title: add config option to start agents in "cprofiled" mode Status in neutron: New Bug description: For motivation see this bug, where this "rfe" was requested. https://bugs.launchpad.net/neutron/+bug/1502297 By running the neutron-agents under a profiler like cprofile, we can obtain valuable insights into the performance of the agent-code, and identify areas for improvement (see bug above for concrete example) This bug seeks to add a config option in the agent config file like: profiled_mode = false by default the above is false, but when set to true, it starts the agent under cprofile, and dumps the profiled output to a file. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1502301/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1494039] [NEW] Must audit SG chains on ovs-agent restart
Public bug reported: I am running Kilo 2015.1.0, with neutron-OVS, and iptables firewall. I run into situations, where, the iptables SG chains/rules are inconsistent with ovs-ports, and system interfaces - see below for an example. In these situations, when I restart neutron-ovs-agent, I expect that such inconsistencies are cleaned up. Arguably, agent should not allow such situations to happen (but thats expecting code to be perfect) By design, the neutron-ovs-agent must audit hw-resources (iptables in this case) during startup , and cleanup inconsistencies/deltas between whats in iptables vs whats in the control plane (ports) The audit can look something like this: 1. IptablesManager recovers iptables chains during startup and marks all of them as "To Be Audited". 2. NeutronOvsAgent programs firewall rules for devices during startup 3. As chains are programmed, Iptables clears the "to be audited" state 4. Chains which still have the "To Be Audited" flag at the end of the init phase are removed. - Inconsistency between OVS ports, System Interfaces and Iptables [root@rhel7-25 agent]# iptables -S -P INPUT ACCEPT -P FORWARD ACCEPT -P OUTPUT ACCEPT -N neutron-filter-top -N neutron-openvswi-FORWARD -N neutron-openvswi-INPUT -N neutron-openvswi-OUTPUT -N neutron-openvswi-i76845da1-5 inconsistent chains, these remain as garbage -N neutron-openvswi-ie3cb2e38-a -N neutron-openvswi-local -N neutron-openvswi-o76845da1-5 -N neutron-openvswi-oe3cb2e38-a -N neutron-openvswi-s76845da1-5 -N neutron-openvswi-se3cb2e38-a -N neutron-openvswi-sg-chain -N neutron-openvswi-sg-fallback [root@rhel7-25 agent]# ovs-vsctl show ce7f5dac-9d4d-4354-9cfd-4d94dfaf1697 Bridge br-int fail_mode: secure Port patch-tun Interface patch-tun type: patch options: {peer=patch-int} Port br-int Interface br-int type: internal Bridge br-tun fail_mode: secure Port "vxlan-0a0a0a1e" Interface "vxlan-0a0a0a1e" type: vxlan options: {csum="true", df_default="true", in_key=flow, local_ip="10.10.10.25", out_key=flow, remote_ip="10.10.10.30"} Port "vxlan-0a0a0a17" Interface "vxlan-0a0a0a17" type: vxlan options: {csum="true", df_default="true", in_key=flow, local_ip="10.10.10.25", out_key=flow, remote_ip="10.10.10.23"} Port patch-int Interface patch-int type: patch options: {peer=patch-tun} Port "vxlan-0a0a0a15" Interface "vxlan-0a0a0a15" type: vxlan options: {csum="true", df_default="true", in_key=flow, local_ip="10.10.10.25", out_key=flow, remote_ip="10.10.10.21"} Port br-tun Interface br-tun type: internal ovs_version: "2.3.0" [root@rhel7-25 agent]# ip link 1: lo:mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth0: mtu 9000 qdisc mq state UP mode DEFAULT qlen 1000 link/ether a0:36:9f:09:2c:20 brd ff:ff:ff:ff:ff:ff 3: eth1: mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000 link/ether a0:36:9f:09:2c:21 brd ff:ff:ff:ff:ff:ff 4: eth2: mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000 link/ether 34:40:b5:e5:68:60 brd ff:ff:ff:ff:ff:ff 5: eth3: mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000 link/ether 34:40:b5:e5:68:64 brd ff:ff:ff:ff:ff:ff 6: eth4: mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000 link/ether 34:40:b5:e5:68:62 brd ff:ff:ff:ff:ff:ff 7: eth5: mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000 link/ether 34:40:b5:e5:68:66 brd ff:ff:ff:ff:ff:ff 8: usb0: mtu 1500 qdisc pfifo_fast state UNKNOWN mode DEFAULT qlen 1000 link/ether 36:40:b5:e8:b4:37 brd ff:ff:ff:ff:ff:ff 29133: br-int: mtu 1500 qdisc noop state DOWN mode DEFAULT link/ether 56:bf:95:17:ad:4d brd ff:ff:ff:ff:ff:ff 29134: br-tun: mtu 1500 qdisc noop state DOWN mode DEFAULT link/ether 66:6a:80:04:d8:4b brd ff:ff:ff:ff:ff:ff 20695: ovs-system: mtu 1500 qdisc noop state DOWN mode DEFAULT link/ether 6e:c5:a4:c5:89:b5 brd ff:ff:ff:ff:ff:ff ** Affects: neutron Importance: Undecided Status: New ** Description changed: - I am running Kilo 2015.1.0, with neutron-OVS, and iptables firewall. - I run into situations, where, the iptables SG chains/rules are inconsistent with ovs-ports see below for an example. - In these situations, when I restart neutron-ovs-agent, I expect that such inconsistencies are cleaned up.
[Yahoo-eng-team] [Bug 1492456] [NEW] cProfile - fix Security Groups hotfunctions
Public bug reported: I used cProfile to profile neutron-ovs-agent (from neutron kilo 2015.1.0) as VMs are provisioned (see code sample below to reproduce). I find a couple of functions in the IptablesManager scaling poorly with # of VMs (_modify_rules, and its callee find_last_entry). As the # of current VMs doubles, the time spent in these functions to provision 10 new VMs also roughly doubles, While we wait for the new IptablesTables firewall driver: https://blueprints.launchpad.net/neutron/+spec/new-iptables-driver Can we improve the performance of the current iptables firewall on those 2 functions which do a lot of string processing ? Current: #VMs: 20, # SG rules: 657, provision 10 new VMs 600.1430.0023.9790.066 iptables_manager.py:511(_modify_rules) 259892.7520.0003.3320.000 iptables_manager.py:504(_find_last_entry) Current #VMs: 40, # SG rules: 1277 , provision 10 new VMs 650.2200.0037.9740.123 iptables_manager.py:511(_modify_rules) 388915.7820.0006.9860.000 iptables_manager.py:504(_find_last_entry) Current #VMs: 80, # SG rules: 2517 , provision 10 new VMs 300.2740.009 20.4960.683 iptables_manager.py:511(_modify_rules) 43862 15.9200.000 19.2920.000 iptables_manager.py:504(_find_last_entry) current #VMs: 160, #SG rules: 4997, provision 10 new VMs 200.3750.019 49.2552.463 iptables_manager.py:511(_modify_rules) 56478 39.2750.001 47.6290.001 iptables_manager.py:504(_find_last_entry) To Reproduce: --- Make following change to neutron_ovs_agent.py to enable/disable cProfile for a given scenario. < import cProfile < import os.path < pr_enabled = False < pr = None In OVSNeutronAgent add method: < def toggle_cprofile(self): < global pr, pr_enabled < start = False < data = "" < fname = "vm.profile" < try: < if os.path.isfile("/tmp/cprof"): < start = True < except IOError as e: < LOG.warn("Error %s", e.strerror) < < if start and not pr_enabled: < pr = cProfile.Profile() < pr.enable() < pr_enabled = True < LOG.warn("enabled cprofile") < if not start and pr_enabled: < pr.disable() < pr.create_stats() < pr.dump_stats("/tmp/%s"%fname) < pr_enabled = False < LOG.warn("disabled cprofile") In polling loop: < self.toggle_cprofile() ** Affects: neutron Importance: Undecided Status: New ** Description changed: + I used cProfile to profile neutron-ovs-agent (from neutron 2015.1.0) as + VMs are provisioned (see code sample below to reproduce). - I used cProfile to profile neutron-ovs-agent (from neutron 2015.1.0) as VMs are provisioned (see code sample below to reproduce). - - I find a couple of functions in the IptablesManager scaling poorly with # of VMs (_modify_rules, and its callee find_last_entry). + I find a couple of functions in the IptablesManager scaling poorly with # of VMs (_modify_rules, and its callee find_last_entry). As the # of current VMs doubles, the time spent in these functions to provision 10 new VMs also roughly doubles, While we wait for the new IptablesTables firewall driver: https://blueprints.launchpad.net/neutron/+spec/new-iptables-driver - Can we improve the performance of the current iptables firewall driver to perform better ? + Can we improve the performance of the current iptables firewall on those 2 functions which do a lot of string processing ? - - Current: #VMs: 20, # SG rules: 657, + Current: #VMs: 20, # SG rules: 657, provision 10 new VMs -600.1430.0023.9790.066 iptables_manager.py:511(_modify_rules) - 259892.7520.0003.3320.000 iptables_manager.py:504(_find_last_entry) + 600.1430.0023.9790.066 iptables_manager.py:511(_modify_rules) + 259892.7520.0003.3320.000 iptables_manager.py:504(_find_last_entry) - Current #VMs: 40, # SG rules: 1277 , + Current #VMs: 40, # SG rules: 1277 , provision 10 new VMs -650.2200.0037.9740.123 iptables_manager.py:511(_modify_rules) - 388915.7820.0006.9860.000 iptables_manager.py:504(_find_last_entry) + 650.2200.0037.9740.123 iptables_manager.py:511(_modify_rules) + 388915.7820.0006.9860.000 iptables_manager.py:504(_find_last_entry) - Current #VMs: 80, # SG rules: 2517 , + Current #VMs: 80, # SG rules: 2517 , provision 10 new VMs -300.2740.009 20.4960.683 iptables_manager.py:511(_modify_rules) - 43862 15.9200.000 19.2920.000 iptables_manager.py:504(_find_last_entry) + 300.2740.009 20.496
[Yahoo-eng-team] [Bug 1489200] Re: Upon VM deletes, SG iptables not cleaned up, garbage piles up
I applied the following patch released in the later kilo release (neutron/2015.1.1) - [81e043f] Don't delete port from bridge on delete_port event https://bugs.launchpad.net/neutron/+bug/165 and the problem is not seen anymore. ** Changed in: neutron Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1489200 Title: Upon VM deletes, SG iptables not cleaned up, garbage piles up Status in neutron: Fix Released Bug description: Summary: 40 VMs are created and then deleted on the same host. At the end of this, I find that iptables rules for some ports are not cleaned up, and remain as garbage. This garbage keeps piling up, as more VMs are created and deleted. Topology: Openstack Kilo, with Neutron Network using OVS & neutron security groups. Kilo Component versions are as follows: openstack-neutron-2015.1.0.2 openstack-neutron-ml2-2015.1.0.2 openstack-neutron-openvswitch-2015.1.0.2 Test Case: 1) create 1 network, 1 subnetwork 2) boot 40 VMs on one hypervisor and 40 VMs on another hypervisor using the default Security Group 3) Run some traffic tests between VMs 4) delete all VMs Result: Find that iptables rules are not cleaned up for the ports of the VMs Root Cause: In the neutron-ovs-agent polling loop, there is an exception during the processing of port events. As a result of this exception, the neutron-ovs-agent resyncs with plugin. This takes a while, At the same time, VM ports are getting deleted. In this scenario, the neutron-ovs-agent "misses" some deleted ports, and does not cleanup SG filters for those "missed" ports Reproducability: Happens almost every time. With more number of VMs, it is more likely Logs: Attached are a set of neutron-ovs-agent logs, and the garbage iptables rules that remain. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1489200/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1489200] [NEW] Upon VM deletes, SG iptables not cleaned up, garbage piles up
Public bug reported: Summary: 40 VMs are created and then deleted on the same host. At the end of this, I find that iptables rules for some ports are not cleaned up, and remain as garbage. This garbage keeps piling up, as more VMs are created and deleted. Topology: Neutron Network using OVS neutron security groups. Test Case: 1) create 1 network, 1 subnetwork 2) boot 40 VMs on one hypervisor and 40 VMs on another hypervisor using the default Security Group 3) Run some traffic tests between VMs 4) delete all VMs Result: Find that iptables rules are not cleaned up for the ports of the VMs Root Cause: In the neutron-ovs-agent polling loop, there is an exception during the processing of port events. As a result of this exception, the neutron-ovs-agent resyncs with plugin. This takes a while, At the same time, VM ports are getting deleted. In this scenario, the neutron-ovs-agent misses some deleted ports, and does not cleanup SG filters for those missed ports Reproducability: Happens almost every time. With more number of VMs, it is more likely Logs: Attached are a set of neutron-ovs-agent logs, and the garbage iptables rules that remain. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1489200 Title: Upon VM deletes, SG iptables not cleaned up, garbage piles up Status in neutron: New Bug description: Summary: 40 VMs are created and then deleted on the same host. At the end of this, I find that iptables rules for some ports are not cleaned up, and remain as garbage. This garbage keeps piling up, as more VMs are created and deleted. Topology: Neutron Network using OVS neutron security groups. Test Case: 1) create 1 network, 1 subnetwork 2) boot 40 VMs on one hypervisor and 40 VMs on another hypervisor using the default Security Group 3) Run some traffic tests between VMs 4) delete all VMs Result: Find that iptables rules are not cleaned up for the ports of the VMs Root Cause: In the neutron-ovs-agent polling loop, there is an exception during the processing of port events. As a result of this exception, the neutron-ovs-agent resyncs with plugin. This takes a while, At the same time, VM ports are getting deleted. In this scenario, the neutron-ovs-agent misses some deleted ports, and does not cleanup SG filters for those missed ports Reproducability: Happens almost every time. With more number of VMs, it is more likely Logs: Attached are a set of neutron-ovs-agent logs, and the garbage iptables rules that remain. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1489200/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1420056] [NEW] Deleting last rule in Security Group does not update firewall
Public bug reported: Scenario: VM port with 1 Security Group with 1 egress icmp rule (example rule: {u'ethertype': u'IPv4', u'direction': u'egress', u'protocol': u'icmp', u'dest_ip_prefix': u'0.0.0.0/0'} ) Steps: Delete the (last) rule from the above Security Group via Horizon Result: Find that iptables shows the egress icmp rule even after its deletion Root Cause: In this scenario, security_group_info_for_devices() returns the following to the agent: Note that the 'security_groups ' field is an empty dictionary {} !! this causes _update_security_groups_info in the agent to NOT update firewall. The security_groups field must contain the security_group_id as key with an empty list for the rules. {u'sg_member_ips': {}, u'devices': {u'ea19fb55-39bb-4e59-9d10-26c74eb3ff95': {u'status': u'ACTIVE', u'security_group_source_groups': [], u'binding:host_id': u'vRHEL29-1', u'name': u'', u'allowed_address_pairs': [{u'ip_address': u'10.0.0.201', u'mac_address': u'fa:16:3e:02:4b:b3'}, {u'ip_address': u'10.0.10.202', u'mac_address': u'fa:16:3e:02:4b:b3'}, {u'ip_address': u'10.0.20.203', u'mac_address': u'fa:16:3e:02:4b:b3'}], u'admin_state_up': True, u'network_id': u'f665dc8c-76da-4fde-8d26-535871487e4c', u'tenant_id': u'f5019aeae9e64443970bb0842e22e2b3', u'extra_dhcp_opts': [], u'security_group_rules': [{u'source_port_range_min': 67, u'direction': u'ingress', u'protocol': u'udp', u'ethertype': u'IPv4', u'port_range_max': 68, u'source_port_range_max': 67, u'source_ip_prefix': u'10.0.2.3', u'port_range_min': 68}], u'binding:vif_details': {u'port_filter': False}, u'binding:vif_type': u'bridge', u'device_owner': u'compute:nova', u'mac_address': u'fa:16:3e:02:4b:b3', u'device': u'tapea19fb55- 39', u'binding:profile': {}, u'binding:vnic_type': u'normal', u'fixed_ips': [u'10.0.2.6'], u'id': u'ea19fb55-39bb-4e59-9d10-26c74eb3ff95', u'security_groups': [u'849ee59c-d100-4940-930b-44e358775ed3'], u'device_id': u'2b330c29-c16f-4bbf-b80a-bd5bae41b514'}}, u'security_groups': {}} security_group_info_for_devices /usr/lib/python2.6/site-packages/neutron/agent/securitygroups_rpc.py:104 ** Affects: neutron Importance: Undecided Status: New ** Summary changed: - Deleting last rule in Security Group does not work + Deleting last rule in Security Group does not update firewall -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1420056 Title: Deleting last rule in Security Group does not update firewall Status in OpenStack Neutron (virtual network service): New Bug description: Scenario: VM port with 1 Security Group with 1 egress icmp rule (example rule: {u'ethertype': u'IPv4', u'direction': u'egress', u'protocol': u'icmp', u'dest_ip_prefix': u'0.0.0.0/0'} ) Steps: Delete the (last) rule from the above Security Group via Horizon Result: Find that iptables shows the egress icmp rule even after its deletion Root Cause: In this scenario, security_group_info_for_devices() returns the following to the agent: Note that the 'security_groups ' field is an empty dictionary {} !! this causes _update_security_groups_info in the agent to NOT update firewall. The security_groups field must contain the security_group_id as key with an empty list for the rules. {u'sg_member_ips': {}, u'devices': {u'ea19fb55-39bb-4e59-9d10-26c74eb3ff95': {u'status': u'ACTIVE', u'security_group_source_groups': [], u'binding:host_id': u'vRHEL29-1', u'name': u'', u'allowed_address_pairs': [{u'ip_address': u'10.0.0.201', u'mac_address': u'fa:16:3e:02:4b:b3'}, {u'ip_address': u'10.0.10.202', u'mac_address': u'fa:16:3e:02:4b:b3'}, {u'ip_address': u'10.0.20.203', u'mac_address': u'fa:16:3e:02:4b:b3'}], u'admin_state_up': True, u'network_id': u'f665dc8c-76da-4fde-8d26-535871487e4c', u'tenant_id': u'f5019aeae9e64443970bb0842e22e2b3', u'extra_dhcp_opts': [], u'security_group_rules': [{u'source_port_range_min': 67, u'direction': u'ingress', u'protocol': u'udp', u'ethertype': u'IPv4', u'port_range_max': 68, u'source_port_range_max': 67, u'source_ip_prefix': u'10.0.2.3', u'port_range_min': 68}], u'binding:vif_details': {u'port_filter': False}, u'binding:vif_type': u'bridge', u'device_owner': u'compute:nova', u'mac_address': u'fa:16:3e:02:4b:b3', u'device': u'tapea19fb5 5-39', u'binding:profile': {}, u'binding:vnic_type': u'normal', u'fixed_ips': [u'10.0.2.6'], u'id': u'ea19fb55-39bb-4e59-9d10-26c74eb3ff95', u'security_groups': [u'849ee59c-d100-4940-930b-44e358775ed3'], u'device_id': u'2b330c29-c16f-4bbf-b80a-bd5bae41b514'}}, u'security_groups': {}} security_group_info_for_devices /usr/lib/python2.6/site-packages/neutron/agent/securitygroups_rpc.py:104 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1420056/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to
[Yahoo-eng-team] [Bug 1417699] Re: Security Groups anti-spoofing rule blocks traffic on multi-nic VMs
Marking this as invalid because, a solution to the problem exists - and as such it is not a code bug. ** Changed in: neutron Status: New = Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1417699 Title: Security Groups anti-spoofing rule blocks traffic on multi-nic VMs Status in OpenStack Neutron (virtual network service): Invalid Bug description: Scenario: MultiNic VM -eth0 (192.168.100.44) -eth1 (192.168.0.10) -eth2 (192.168.20.10) Test: Ping 192.168.0.10 does not work Ping 192.168.100.44 works RootCause: default route on VM is pointing to eth0 Ping requests arrive at VM on eth1, but the Ping responses go out of eth0 Security AntiSpoofing rule drops this ping response, because, the IP address does not match Fix: Provide a configurable knob in Security Groups or PortSecurity Extension to disable just the anti-spoofing rules, but keep the other ingress/egress filters. We dont want to disable security-groups entirely on such VMs Notes: Workarounds include: multiple default routes in the guest VM via linux route tables (works only on linux) Any other ideas for a fix or a workaround ? To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1417699/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1417699] [NEW] Security Groups anti-spoofing rule blocks traffic on multi-nic VMs
Public bug reported: Scenario: MultiNic VM -eth0 (192.168.100.44) -eth1 (192.168.0.10) -eth2 (192.168.20.10) Test: Ping 192.168.0.10 does not work Ping 192.168.100.44 works RootCause: default route on VM is pointing to eth0 Ping requests arrive at VM on eth1, but the Ping responses go out of eth0 Security AntiSpoofing rule drops this ping response, because, the IP address does not match Fix: Provide a configurable knob in Security Groups or PortSecurity Extension to disable just the anti-spoofing rules, but keep the other ingress/egress filters. We dont want to disable security-groups entirely on such VMs Notes: Workarounds include: multiple default routes in the guest VM via linux route tables (works only on linux) Any other ideas for a fix or a workaround ? ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1417699 Title: Security Groups anti-spoofing rule blocks traffic on multi-nic VMs Status in OpenStack Neutron (virtual network service): New Bug description: Scenario: MultiNic VM -eth0 (192.168.100.44) -eth1 (192.168.0.10) -eth2 (192.168.20.10) Test: Ping 192.168.0.10 does not work Ping 192.168.100.44 works RootCause: default route on VM is pointing to eth0 Ping requests arrive at VM on eth1, but the Ping responses go out of eth0 Security AntiSpoofing rule drops this ping response, because, the IP address does not match Fix: Provide a configurable knob in Security Groups or PortSecurity Extension to disable just the anti-spoofing rules, but keep the other ingress/egress filters. We dont want to disable security-groups entirely on such VMs Notes: Workarounds include: multiple default routes in the guest VM via linux route tables (works only on linux) Any other ideas for a fix or a workaround ? To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1417699/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp