[openstack-dev] [TripleO][OVN] Switching the default network backend to ML2/OVN
Hi Stackers! The purpose of this email is to share with the community the intention of switching the default network backend in TripleO from ML2/OVS to ML2/OVN by changing the mechanism driver from openvswitch to ovn. This doesn’t mean that ML2/OVS will be dropped but users deploying OpenStack without explicitly specifying a network driver will get ML2/OVN by default. OVN in Short == Open Virtual Network is managed under the OVS project, and was created by the original authors of OVS. It is an attempt to re-do the ML2/OVS control plane, using lessons learned throughout the years. It is intended to be used in projects such as OpenStack and Kubernetes. OVN has a different architecture, moving us away from Python agents communicating with the Neutron API service via RabbitMQ to daemons written in C communicating via OpenFlow and OVSDB. OVN is built with a modern architecture that offers better foundations for a simpler and more performant solution. What does this mean? For example, at Red Hat we executed some preliminary testing during the Queens cycle and found significant CPU savings due to OVN not using RabbitMQ (CPU utilization during a Rally scenario using ML2/OVS [0] or ML2/OVN [1]). Also, we tested API performance and found out that most of the operations are significantly faster with ML2/OVN. Please see more details in the FAQ section. Here’s a few useful links about OpenStack’s integration of OVN: * OpenStack Boston Summit talk on OVN [2] * OpenStack networking-ovn documentation [3] * OpenStack networking-ovn code repository [4] How? The goal is to merge this patch [5] during the Stein cycle which pursues the following actions: 1. Switch the default mechanism driver from openvswitch to ovn. 2. Adapt all jobs so that they use ML2/OVN as the network backend. 3. Create legacy environment file for ML2/OVS to allow deployments based on it. 4. Flip scenario007 job from ML2/OVN to ML2/OVS so that we continue testing it. 5. Continue using ML2/OVS in the undercloud. 6. Ensure that updates/upgrades from ML2/OVS don’t break and don’t switch automatically to the new default. As some parity gaps exist right now, we don’t want to change the network backend automatically. Instead, if the user wants to migrate from ML2/OVS to ML2/OVN, we’ll provide an ansible based tool that will perform the operation. More info and code at [6]. Reviews, comments and suggestions are really appreciated :) FAQ === Can you talk about the advantages of OVN over ML2/OVS? --- If asked to describe the ML2/OVS control plane (OVS, L3, DHCP and metadata agents using the messaging bus to sync with the Neutron API service) one would not tend to use the term ‘simple’. There is liberal use of a smattering of Linux networking technologies such as: * iptables * network namespaces * ARP manipulation * Different forms of NAT * keepalived, radvd, haproxy, dnsmasq * Source based routing, * … and of course OVS flows. OVN simplifies this to a single process running on compute nodes, and another process running on centralized nodes, communicating via OVSDB and OpenFlow, ultimately setting OVS flows. The simplified, new architecture allows us to re-do features like DVR and L3 HA in more efficient and elegant ways. For example, L3 HA failover is faster: It doesn’t use keepalived, rather OVN monitors neighbor tunnel endpoints. OVN supports enabling both DVR and L3 HA simultaneously, something we never supported with ML2/OVS. We also found out that not depending on RPC messages for agents communication brings a lot of benefits. From our experience, RabbitMQ sometimes represents a bottleneck and it can be very intense when it comes to resources utilization. What about the undercloud? -- ML2/OVS will be still used in the undercloud as OVN has some limitations with regards to baremetal provisioning mainly (keep reading about the parity gaps). We aim to convert the undercloud to ML2/OVN to provide the operator a more consistent experience as soon as possible. It would be possible however to use the Neutron DHCP agent in the short term to solve this limitation but in the long term we intend to implement support for baremetal provisioning in the OVN built-in DHCP server. What about CI? - * networking-ovn has: * Devstack based Tempest (API, scenario from Tempest and Neutron Tempest plugin) against the latest released OVS version, and against OVS master (thus also OVN master) * Devstack based Rally * Grenade * A multinode, container based TripleO job that installs and issues a basic VM connectivity scenario test * Supports Python 3 and 2 * TripleO has currently OVN enabled in one quickstart featureset (fs30). Are there any known parity issues with ML2/OVS? --- * OVN supports VLAN provider networks, but not VLAN tenant networks. This wil
Re: [openstack-dev] [tripleo] [quickstart] [networking-ovn] No more overcloud_prep-containers.sh script
Hi Miguel, This patch should fix it [0]. I ran into same issues and had to manually patch and/or generate the OVN containers myself. Try it out and let me know if the problem persists. To confirm that this is the same issue try to check which images you got in your local registry (ODL images may be present while OVN ones are not). [0] https://review.openstack.org/#/c/604953/5 Cheers, Daniel On Wed, Oct 3, 2018 at 10:15 AM Miguel Angel Ajo Pelayo wrote: > Hi folks > > I was trying to deploy neutron with networking-ovn via > tripleo-quickstart scripts on master, and this config file [1]. It doesn't > work, overcloud deploy cries with: > > 1) trying to deploy ovn I end up with a 2018-10-02 17:48:12 | "2018-10-02 > 17:47:51,864 DEBUG: 26691 -- Error: image > tripleomaster/centos-binary-ovn-controller:current-tripleo not found", > > it seems like the overcloud_prep-containers.sh is not there anymore (I > guess overcloud deploy handles it automatically now? but it fails to > generate the ovn containers for some reason) > > Also, if you look at [2] which are our ansible migration scripts to > migrate ml2/ovs to ml2/networking-ovn, you will see that we make use of > overcloud_prep-containers.sh , I guess that we will need to make sure [1] > works and we will get [2] for free. > > > > [1] > https://github.com/openstack/networking-ovn/blob/master/tripleo/ovn.yml > [2] > https://docs.openstack.org/networking-ovn/latest/install/migration.html > -- > Miguel Ángel Ajo > OSP / Networking DFG, OVN Squad Engineering > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Stepping down from Neutron core team
Thanks a lot Kuba for all your contributions! You've been a great mentor to me since I joined OpenStack and I'm so happy that I got to work with you. Great engineer and even better person! All the best, my friend! On Fri, Aug 31, 2018 at 10:25 AM Jakub Libosvar wrote: > Hi all, > > as you have might already heard, I'm no longer involved in Neutron > development due to some changes. Therefore I'm officially stepping down > from the core team because I can't provide same quality reviews as I > tried to do before. > > I'd like to thank you all for the opportunity I was given in the Neutron > team, thank you for all I have learned over the years professionally, > technically and personally. Tomorrow it's gonna be exactly 5 years since > I started hacking Neutron and I must say I really enjoyed working with > all Neutrinos here and I had privilege to meet most of you in person and > that has an extreme value for me. Keep on being a great community! > > Thank you again! > Kuba > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] [OVN] Tempest API / Scenario tests and OVN metadata
Hi, Thanks Lucas for writing this down. On Thu, Apr 5, 2018 at 11:35 AM, Lucas Alvares Gomes wrote: > Hi, > > The tests below are failing in the tempest API / Scenario job that > runs in the networking-ovn gate (non-voting): > > neutron_tempest_plugin.api.admin.test_quotas_negative. > QuotasAdminNegativeTestJSON.test_create_port_when_quotas_is_full > neutron_tempest_plugin.api.test_routers.RoutersIpV6Test. > test_router_interface_status > neutron_tempest_plugin.api.test_routers.RoutersTest.test_ > router_interface_status > neutron_tempest_plugin.api.test_subnetpools.SubnetPoolsTest.test_create_ > subnet_from_pool_with_prefixlen > neutron_tempest_plugin.api.test_subnetpools.SubnetPoolsTest.test_create_ > subnet_from_pool_with_quota > neutron_tempest_plugin.api.test_subnetpools.SubnetPoolsTest.test_create_ > subnet_from_pool_with_subnet_cidr > > Digging a bit into it I noticed that with the exception of the two > "test_router_interface_status" (ipv6 and ipv4) all other tests are > failing because the way metadata works in networking-ovn. > > Taking the "test_create_port_when_quotas_is_full" as an example. The > reason why it fails is because when the OVN metadata is enabled, > networking-ovn will metadata port at the moment a network is created > [0] and that will already fulfill the quota limit set by that test > [1]. > > That port will also allocate an IP from the subnet which will cause > the rest of the tests to fail with a "No more IP addresses available > on network ..." error. > With ML2/OVS we would run into the same Quota problem if DHCP would be enabled for the created subnets. This means that if we modify the current tests to enable DHCP on them and we account this extra port it would be valid for all networking-ovn as well. Does it sound good or we still want to isolate quotas? > > This is not very trivial to fix because: > > 1. Tempest should be backend agnostic. So, adding a conditional in the > tempest test to check whether OVN is being used or not doesn't sound > correct. > > 2. Creating a port to be used by the metadata agent is a core part of > the design implementation for the metadata functionality [2] > > So, I'm sending this email to try to figure out what would be the best > approach to deal with this problem and start working towards having > that job to be voting in our gate. Here are some ideas: > > 1. Simple disable the tests that are affected by the metadata approach. > > 2. Disable metadata for the tempest API / Scenario tests (here's a > test patch doing it [3]) > IMHO, we don't want to do this as metadata is likely to be enabled in all the clouds either using ML2/OVS or OVN so it's good to keep exercising this part. > > 3. Same as 1. but also create similar tempest tests specific for OVN > somewhere else (in the networking-ovn tree?!) > As we discussed on IRC I'm keen on doing this instead of getting bits in tempest to do different things depending on the backend used. Unless we want to enable DHCP on the subnets that these tests create :) > What you think would be the best way to workaround this problem, any > other ideas ? > > As for the "test_router_interface_status" tests that are failing > independent of the metadata, there's a bug reporting the problem here > [4]. So we should just fix it. > > [0] https://github.com/openstack/networking-ovn/blob/ > f3f5257fc465bbf44d589cc16e9ef7781f6b5b1d/networking_ovn/ > common/ovn_client.py#L1154 > [1] https://github.com/openstack/neutron-tempest-plugin/blob/ > 35bf37d1830328d72606f9c790b270d4fda2b854/neutron_tempest_ > plugin/api/admin/test_quotas_negative.py#L66 > [2] https://docs.openstack.org/networking-ovn/latest/ > contributor/design/metadata_api.html#overview-of-proposed-approach > [3] https://review.openstack.org/#/c/558792/ > [4] https://bugs.launchpad.net/networking-ovn/+bug/1713835 > > Cheers, > Lucas > Thanks, Daniel > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] [neutron] Current containerized neutron agents introduce a significant regression in the dataplane
On Wed, Feb 14, 2018 at 5:40 AM, Brian Haley wrote: > On 02/13/2018 05:08 PM, Armando M. wrote: > >> >> >> On 13 February 2018 at 14:02, Brent Eagles > beag...@redhat.com>> wrote: >> >> Hi, >> >> The neutron agents are implemented in such a way that key >> functionality is implemented in terms of haproxy, dnsmasq, >> keepalived and radvd configuration. The agents manage instances of >> these services but, by design, the parent is the top-most (pid 1). >> >> On baremetal this has the advantage that, while control plane >> changes cannot be made while the agents are not available, the >> configuration at the time the agents were stopped will work (for >> example, VMs that are restarted can request their IPs, etc). In >> short, the dataplane is not affected by shutting down the agents. >> >> In the TripleO containerized version of these agents, the supporting >> processes (haproxy, dnsmasq, etc.) are run within the agent's >> container so when the container is stopped, the supporting processes >> are also stopped. That is, the behavior with the current containers >> is significantly different than on baremetal and stopping/restarting >> containers effectively breaks the dataplane. At the moment this is >> being considered a blocker and unless we can find a resolution, we >> may need to recommend running the L3, DHCP and metadata agents on >> baremetal. >> > > I didn't think the neutron metadata agent was affected but just the > ovn-metadata agent? Or is there a problem with the UNIX domain sockets the > haproxy instances use to connect to it when the container is restarted? That's right. In ovn-metadata-agent we spawn haproxy inside the q-ovnmeta namespace and this is where we'll find a problem if the process goes away. As you said, neutron metadata agent is basically receiving the proxied requests from haproxies residing in either q-router or q-dhcp namespaces on its UNIX socket and sending them to Nova. > > > There's quite a bit to unpack here: are you suggesting that running these >> services in HA configuration doesn't help either with the data plane being >> gone after a stop/restart? Ultimately this boils down to where the state is >> persisted, and while certain agents rely on namespaces and processes whose >> ephemeral nature is hard to persist, enough could be done to allow for a >> non-disruptive bumping of the afore mentioned services. >> > > Armando - https://review.openstack.org/#/c/542858/ (if accepted) should > help with dataplane downtime, as sharing the namespaces lets them persist, > which eases what the agent has to configure on the restart of a container > (think of what the l3-agent needs to create for 1000 routers). > > But it doesn't address dnsmasq being unavailable when the dhcp-agent > container is restarted like it is today. Maybe one way around that is to > run 2+ agents per network, but that still leaves a regression from how it > works today. Even with l3-ha I'm not sure things are perfect, might > wind-up with two masters sometimes. > > I've seen one suggestion of putting all these processes in their own > container instead of the agent container so they continue to run, it just > might be invasive to the neutron code. Maybe there is another option? I had some idea based on that one to reduce the impact on neutron code and its dependency on containers. Basically, we would be running dnsmasq, haproxy, keepalived, radvd, etc in separate containers (it makes sense as they have independent lifecycles) and we would drive those through the docker socket from neutron agents. In order to reduce this dependency, I thought of having some sort of 'rootwrap-daemon-docker' which takes the commands and checks if it has to spawn the process in a separate container (for example, iptables wouldn't be the case) and if so, it'll use the docker socket to do it. We'll also have to monitor the PID files on those containers to respawn them in case they die. IMHO, this is far from the containers philosophy since we're using host networking, privileged access, sharing namespaces, relying on 'sidecar' containers... but I can't think of a better way to do it. > > -Brian > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][ovn] networking-ovn core team update
Thanks a lot guys! It's a pleasure to work with you all :) Cheers, Daniel On Fri, Dec 1, 2017 at 5:48 PM, Miguel Angel Ajo Pelayo wrote: > Welcome Daniel! :) > > On Fri, Dec 1, 2017 at 5:45 PM, Lucas Alvares Gomes > wrote: > >> Hi all, >> >> I would like to welcome Daniel Alvarez to the networking-ovn core team! >> >> Daniel has been contributing with the project for a good time already >> and helping *a lot* with reviews and code. >> >> Welcome onboard man! >> >> Cheers, >> Lucas >> >> >> __ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib >> e >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [OVN] Functional tests failures
Hi folks, We've seen failures in functional tests lately [0] since ovsdbapp was bumped to 0.8.0. Not sure if it's related since according to logs, it looks like the connection to OVSDB is lost and then it's not recovered. We're running functional tests on OVS master so I sent this patch [1] to test it out with OVS 2.8 branch and, even though it's a bit early to confirm, it looks like it may solve it. We've had some IRC discussion around merging [1] or setting up new jobs but it's still unclear. I'm keen on switching to stable release for our CI (actually our tempest job against master is nv while the one voting is against 2.8 branch) and maybe set up a new rally job against master to detect regressions and also serve as comparison between the two. Thoughts? Thanks, Daniel [0] https://bugs.launchpad.net/networking-ovn/+bug/1734090 [1] https://review.openstack.org/#/c/522574/ __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron][infra] Functional job failure rate at 100%
Some more info added to Jakub's excellent report :) New kernel Ubuntu-4.4.0-89.112HEADUbuntu-4.4.0-89.112master was tagged 9 days ago (07/31/2017) [0]. >From a quick look, the only commit around this function is [1]. [0] http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/commit/?id=64de31ed97a03ec1b86fd4f76e445506dce55b02 [1] http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/commit/?id=2ad4caea651e1cc0fc86111ece9f9d74de825b78 On Wed, Aug 9, 2017 at 3:29 PM, Jakub Libosvar wrote: > Daniel Alvarez and I spent some time looking at it and the culprit was > finally found. > > tl;dr > > We updated a kernel on machines to one containing bug when creating > conntrack entries which makes functional tests stuck. More info at [4]. > > For now, I sent a patch [5] to disable for now jobs that create > conntrack entries manually, it needs update of commit message. Once it > merges, we an enable back functional job to voting to avoid regressions. > > Is it possible to switch used image for jenkins machines to use back the > older version? Any other ideas how to deal with the kernel bug? > > Thanks > Jakub > > [5] https://review.openstack.org/#/c/492068/1 > > On 07/08/2017 11:52, Jakub Libosvar wrote: > > Hi all, > > > > as per grafana [1] the functional job is broken. Looking at logstash [2] > > it started happening consistently since 2017-08-03 16:27. I didn't find > > any particular patch in Neutron that could cause it. > > > > The culprit is that ovsdb starts misbehaving [3] and then we retry calls > > indefinitely. We still use 2.5.2 openvswitch as we had before. I opened > > a bug [4] and started investigation, I'll update my findings there. > > > > I think at this point there is no reason to run "recheck" on your > patches. > > > > Thanks, > > Jakub > > > > [1] > > http://grafana.openstack.org/dashboard/db/neutron-failure- > rate?panelId=7&fullscreen > > [2] http://bit.ly/2vdKMwy > > [3] > > http://logs.openstack.org/14/488914/8/check/gate-neutron- > dsvm-functional-ubuntu-xenial/75d7482/logs/openvswitch/ovsdb-server.txt.gz > > [4] https://bugs.launchpad.net/neutron/+bug/1709032 > > > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [networking-ovn] metadata agent implementation
Hi folks, Now that it looks like the metadata proposal is more refined [0], I'd like to get some feedback from you on the driver implementation. The ovn-metadata-agent in networking-ovn will be responsible for creating the namespaces, spawning haproxies and so on. But also, it must implement most of the "old" neutron-metadata-agent functionality which listens on a UNIX socket and receives requests from haproxy, adds some headers and forwards them to Nova. This means that we can import/reuse big part of neutron code. I wonder what you guys think about depending on neutron tree for the agent implementation despite we can benefit from a lot of code reuse. On the other hand, if we want to get rid of this dependency, we could probably write the agent "from scratch" in C (what about having C code in the networking-ovn repo?) and, at the same time, it should buy us a performance boost (probably not very noticeable since it'll respond to requests from local VMs involving a few lookups and processing simple HTTP requests; talking to nova would take most of the time and this only happens at boot time). I would probably aim for a Python implementation reusing/importing code from neutron tree but I'm not sure how we want to deal with changes in neutron codebase (we're actually importing code now). Looking forward to reading your thoughts :) Thanks, Daniel [0] https://review.openstack.org/#/c/452811/ __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] - Team photo
+1 On Mon, Feb 20, 2017 at 7:20 PM, Bhatia, Manjeet S < manjeet.s.bha...@intel.com> wrote: > +1 > > > > *From:* Kevin Benton [mailto:ke...@benton.pub] > *Sent:* Friday, February 17, 2017 3:08 PM > *To:* openstack-dev@lists.openstack.org > *Subject:* [openstack-dev] [neutron] - Team photo > > > > Hello! > > > > Is everyone free Thursday at 11:20AM (right before lunch break) for 10 > minutes for a group photo? > > > > Cheers, > Kevin Benton > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Some findings while profiling instances boot
Awesome work, Kevin! For the DHCP notification, in my profiling I got only 10% of the CPU time [0] without taking the waiting times into account which it's probably what you also measured. Your patch seems like a neat and great optimization :) Also, since "get_devices_details_list_and_failed_devices()" takes quite a long time, does it make sense to trigger this request asynchronously (same approach you took for OVO notifier) and continue executing the iteration? This would not result in a huge improvement but, in the case I showed in the diagram, both 'get_device_details' can be issued at the same time instead of one after another and, probably, freeing the iteration for further processing on the agent side. Thoughts on this? Regarding, the time of SQL queries, it looks like the server spends a significant amount of time building those and reducing that time will result in a nice improvement. Mike's outstanding analysis looks promising and maybe it's worth to discuss it. [0] http://imgur.com/lDikZ0I On Thu, Feb 16, 2017 at 8:23 AM, Kevin Benton wrote: > Thanks for the stats and the nice diagram. I did some profiling and I'm > sure it's the RPC handler on the Neutron server-side behaving like garbage. > > There are several causes that I have a string of patches up to address > that mainly stem from the fact that l2pop requires multiple port status > updates to function correctly: > > * The DHCP notifier will trigger a notification to the DHCP agents on the > network on a port status update. This wouldn't be too problematic on it's > own, but it does several queries for networks and segments to determine > which agents it should talk to. Patch to address it here: > https://review.openstack.org/#/c/434677/ > > * The OVO notifier will also generate a notification on any port data > model change, including the status. This is ultimately the desired > behavior, but until we eliminate the frivolous status flipping, it's going > to incur a performance hit. Patch here to put it asynced into the > background so it doesn't block the port update process: > https://review.openstack.org/#/c/434678/ > > * A wasteful DB query in the ML2 PortContext: https://review.op > enstack.org/#/c/434679/ > > * More unnecessary queries for the status update case in the ML2 > PortContext: https://review.openstack.org/#/c/434680/ > > * Bulking up the DB queries rather than retrieving port details one by > one. > https://review.openstack.org/#/c/434681/ https://review.open > stack.org/#/c/434682/ > > The top two accounted for more than 60% of the overhead in my profiling > and they are pretty simple, so we may be able to get them into Ocata for RC > depending on how other cores feel. If not, they should be good candidates > for back-porting later. Some of the others start to get more invasive so we > may be stuck. > > Cheers, > Kevin Benton > > On Wed, Feb 15, 2017 at 12:25 PM, Jay Pipes wrote: > >> On 02/15/2017 12:46 PM, Daniel Alvarez Sanchez wrote: >> >>> Hi there, >>> >>> We're trying to figure out why, sometimes, rpc_loop takes over 10 >>> seconds to process an iteration when booting instances. So we deployed >>> devstack on a 8GB, 4vCPU VM and did some profiling on the following >>> command: >>> >>> nova boot --flavor m1.nano --image cirros-0.3.4-x86_64-uec --nic >>> net-name=private --min-count 8 instance >>> >> >> Hi Daniel, thanks for posting the information here. Quick request of you, >> though... can you try re-running the test but doing 8 separate calls to >> nova boot instead of using the --min-count 8 parameter? I'm curious to see >> if you notice any difference in contention/performance. >> >> Best, >> -jay >> >> >> __ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib >> e >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [neutron] Some findings while profiling instances boot
Hi there, We're trying to figure out why, sometimes, rpc_loop takes over 10 seconds to process an iteration when booting instances. So we deployed devstack on a 8GB, 4vCPU VM and did some profiling on the following command: nova boot --flavor m1.nano --image cirros-0.3.4-x86_64-uec --nic net-name=private --min-count 8 instance (network private has port_security_enabled set to False to avoid the overhead of setting up sgs) Logs showed that sometimes, the network-vif-plugged event was sent by the server ~12 seconds after the vif was detected by ovsdb monitor. Usually, first and second events come faster while the rest take longer. Further analysis showed that rpc_loop iterations take several seconds to complete so, if the vif is detected while iteration X is running, it won't be processed until iteration X+1. As an example, I've attached a simplified sequence diagram [0] to show what happened in a particular iteration of my debug (I have full logs and pstat files for this session for those interested). In this example, iteration 76 is going to process two ports while some of the previous spawned machines are being managed by libvirt and so on... At the beginning of iteration 76, a new vif is detected by ovsdb monitor but it won't be processed until 12 seconds later in iteration 77. Profiling files show that aggregated CPU time for neutron workers is 97 seconds, while CPU time for ovs agent is 2.1 seconds. Most of its time is spent waiting for RPC so it looks like there's apparently some room for optimization and multiprocessing here. According to dstat.log, CPU is at ~90% and there's ~1GB of free RAM. I can't tell whether the hypervisor was swapping or not since I didn't have access to it. system total-cpu-usage --memory-usage- -net/total-> time |usr sys idl wai hiq siq| used buff cach free| recv send> 05-02 14:22:50| 89 11 0 0 0 0|5553M0 1151M 1119M|1808B 1462B> 05-02 14:22:51| 90 10 0 0 0 0|5567M0 1151M 1106M|1588B 836B> 05-02 14:22:52| 89 11 1 0 0 0|5581M0 1151M 1092M|3233B 2346B> 05-02 14:22:53| 89 10 0 0 0 0|5598M0 1151M 1075M|2676B 2038B> 05-02 14:22:54| 90 10 0 0 0 0|5600M0 1151M 1073M| 20k 14k> 05-02 14:22:55| 90 9 0 0 0 0|5601M0 1151M 1072M| 22k 16k> Also, while having a look at server profiling, around the 33% of the time was spent building SQL queries [1]. Mike Bayer went through this and suggested having a look at baked queries and also submitted a sketch of his proposal [2]. I wanted to share these findings with you (probably most of you knew but I'm quite new to OpenStack so It's been a really nice exercise for me to better understand how things work) and gather your feedback about how things can be improved. Also, I'll be happy to share the results and discuss further if you think it's worth during the PTG next week. Thanks a lot for reading and apologies for such a long email! Cheers, Daniel IRC: dalvarez [0] http://imgur.com/WQqaiYQ [1] http://imgur.com/6KrfJUC [2] https://review.openstack.org/430973 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] "Setup firewall filters only for required ports" bug
On Wed, Jan 18, 2017 at 10:45 PM, Bernard Cafarelli wrote: > Hi neutrinos, > > I would like your feedback on the mentioned changeset in title[1] > (yes, added since Liberty). > > With this patch, we (should) skip ports with > port_security_enabled=False or with an empty list of security groups > when processing added ports [2]. But we found multiple problems here > > * Ports create with port_security_enabled=False > > This is the original bug that started this mail: if the FORWARD > iptables chain has a REJECT default policy/last rule, the traffic is > still blocked[3]. There is also a launchpad bug with similar details > [4] > The problem here: these ports must not be skipped, as we add specific > firewall rules to allow all traffic. These iptables rules have the > following comment: > "/* Accept all packets when port security is disabled. */" > > With the current code, any port created with port security will not > have these rules (and updates do not work). > I initially sent a patch to process these ports again [5], but there > is more (as detailed by some in the launchpad bug) > > * Ports with no security groups, current code > > There is a bug in the current agent code [6]: even with no security > groups, the check will return true as, the security_groups key exists > in the port details (with value "[]"). > So the port will not be skipped > > * Ports with no security groups, updated code > > Next step was to update checks (security groups list not empy, port > security True or None), and test again. The port this time was > skipped, but this showed up in openvswitch-agent.log: > 2017-01-18 16:19:56.780 7458 INFO > neutron.agent.linux.iptables_firewall > [req-c49ca24f-1df8-40d7-8c48-6aab842ba34a - - - - -] Attempted to > update port filter which is not filtered > c2c58f8f-3b76-4c00-b792-f1726b28d2fc > 2017-01-18 16:19:56.853 7458 INFO > neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent > [req-c49ca24f-1df8-40d7-8c48-6aab842ba34a - - - - -] Configuration for > devices up [u'c2c58f8f-3b76-4c00-b792-f1726b28d2fc'] and devices down > [] completed. > > Which is the kind of logs we saw in the first bug report. So as an > additional test, I tried to update this port, adding a security group. > New log entries: > 2017-01-18 17:36:53.164 7458 INFO neutron.agent.securitygroups_rpc > [req-c49ca24f-1df8-40d7-8c48-6aab842ba34a - - - - -] Refresh firewall > rules > 2017-01-18 17:36:55.873 7458 INFO > neutron.agent.linux.iptables_firewall > [req-c49ca24f-1df8-40d7-8c48-6aab842ba34a - - - - -] Attempted to > update port filter which is not filtered > 0f2eea88-0e6a-4ea9-819c-e26eb692cb25 > 2017-01-18 17:36:58.587 7458 INFO > neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent > [req-c49ca24f-1df8-40d7-8c48-6aab842ba34a - - - - -] Configuration for > devices up [u'0f2eea88-0e6a-4ea9-819c-e26eb692cb25'] and devices down > [] completed. > > And the iptables configuration did not change to show the newly allowed > ports. > > So with a fixed check, wend up back in the same buggy situation as the > first one. > > * Feedback > > So which course of action should we take? After checking these 3 cases > out, I am in favour of reverting this commit entirely, as in its > current state it does not help for ports without security groups, and > breaks ports with port security disabled. > > After having gone through the code and debugged the situation I'm also in favor of reverting the patch. We should explicitly setup a rule which allows traffic for that tap device exactly as we do when the port_security_enabled is switched from True to False. We can't relay on traffic to be implicitly allowed. Also, on the tests side, should we add more tests only using create > calls (port_security tests mostly update an existing port)? How to > make sure these iptables rules are correctly applied (the ping tests > are not enough, especially if the host system does not reject packets > by default)? Tests are incomplete so we should add either functional or fullstack/tempest tests that validate these cases (ports created with port_security_enabled set to False, ports created with no security groups, etc.). I can try to do that. > [1] https://review.openstack.org/#/c/210321/ > [2] https://github.com/openstack/neutron/blob/ > a66c27193573ce015c6c1234b0f2a1d86fb85a22/neutron/plugins/ > ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L1640 > [3] https://bugzilla.redhat.com/show_bug.cgi?id=1406263 > [4] https://bugs.launchpad.net/neutron/+bug/1549443 > [5] https://review.openstack.org/#/c/421832/ > [6] https://github.com/openstack/neutron/blob/ > a66c27193573ce015c6c1234b0f2a1d86fb85a22/neutron/plugins/ > ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L1521 > > Thanks! > > -- > Bernard Cafarelli > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
Re: [openstack-dev] [Neutron] Neutron team social event in Barcelona
Hi, +1 here On Mon, Oct 17, 2016 at 10:01 AM, Korzeniewski, Artur < artur.korzeniew...@intel.com> wrote: > +1 > > > > *From:* Oleg Bondarev [mailto:obonda...@mirantis.com] > *Sent:* Monday, October 17, 2016 9:52 AM > *To:* OpenStack Development Mailing List (not for usage questions) < > openstack-dev@lists.openstack.org> > *Subject:* Re: [openstack-dev] [Neutron] Neutron team social event in > Barcelona > > > > +1 > > > > On Mon, Oct 17, 2016 at 10:23 AM, Jakub Libosvar > wrote: > > +1 > > > > On 14/10/2016 20:30, Miguel Lavalle wrote: > > Dear Neutrinos, > > I am organizing a social event for the team on Thursday 27th at 19:30. > After doing some Google research, I am proposing Raco de la Vila, which > is located in Poblenou: http://www.racodelavila.com/en/index.htm. The > menu is here: http://www.racodelavila.com/en/carta-racodelavila.htm > > It is easy to get there by subway from the Summit venue: > https://goo.gl/maps/HjaTEcBbDUR2. I made a reservation for 25 people > under 'Neutron' or "Miguel Lavalle". Please confirm your attendance so > we can get a final count. > > Here's some reviews: > https://www.tripadvisor.com/Restaurant_Review-g187497- > d1682057-Reviews-Raco_De_La_Vila-Barcelona_Catalonia.html > > Cheers > > Miguel > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev