[openstack-dev] Experiences of using Neutron in large scale
Hi, We are considering to run openstack Neutron in a large scale deployment. I would like to know community experience and suggestions. To get to know the quality I am going through neutron bugs( I assume that is the best way to know the quality) Some of them are real concerning like below bugs https://bugs.launchpad.net/neutron/+bug/1211915 https://bugs.launchpad.net/neutron/+bug/1230407 https://bugs.launchpad.net/neutron/+bug/121 The bug 1211915 is raised for simple tempest tests,whats about huge deployments? I am told even vendor neutron plugins too have similar issues when we create tens of instances in single click on horizon. And people see too many connection timeouts in quantum service logs with vendor plugins as well. I was told that some were struck with nova-network as there is no support yet to migrate Neutron and they could not take advantage of new network services. I would like to know community thinking on the same. Please note that I am not concerned on fix availability. Thanks, -Kumar ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Experiences of using Neutron in large scale
Hi Kumar, some comments to your questions inline. I am afraid I am unable to provide thorough answers. hopefully my thoughts will be beneficial at least to provide more context. Salvatore On 2 October 2013 19:04, Kumar chvs...@gmail.com wrote: Hi, We are considering to run openstack Neutron in a large scale deployment. I would like to know community experience and suggestions. To get to know the quality I am going through neutron bugs( I assume that is the best way to know the quality) Some of them are real concerning like below bugs https://bugs.launchpad.net/neutron/+bug/1211915 https://bugs.launchpad.net/neutron/+bug/1230407 https://bugs.launchpad.net/neutron/+bug/121 The bug 1211915 is raised for simple tempest tests,whats about huge deployments? I am told even vendor neutron plugins too have similar issues when we create tens of instances in single click on horizon. And people see too many connection timeouts in quantum service logs with vendor plugins as well. Preamble: The aim of the next paragraph is not aimed at downplaying the issues on the gate. During each release cycle, new features are added. In particular this time Neutron added VPN and Firewall services. This means that there is a lot of code churn, both on the neutron-server and python-neutronclient. Is not infrequent that critical bugs like the ones above (and you also left out bug 1240001) are in the code base up to a few days before the release. For vendor plugins, this might even be different, as they're not regulated by the same QA process as the plugin used by the gate (one might say it should not be like this - but this is probably out of the scope of this thread). I have to agree that during this release cycle Neutron has cause quite a few gate-blocking issues; on the other hand I don't think that flakiness during the release cycle is enough of a reason to label a project as immature, unstable, or does not scale. I was told that some were struck with nova-network as there is no support yet to migrate Neutron and they could not take advantage of new network services. This is correct. The migration process unfortunately is not easy, because you need to rearrange your cloud networking at different layers. I wish it was as easy as doing a db migration, but unfortunately it's nothing like that. I don't feel I have the authority and the competence to provide any migration advice, but in my opinion the current best bet is to provide parallel openstack installations with nova-network and neutron, and then progressively allocate new networks on the neutron installation until there are no more instances deployed on the nova-network installation. But please take the previous statement as nothing more than 'thinking aloud'. I would like to know community thinking on the same. Please note that I am not concerned on fix availability. From my side I can tell you that I am using on a daily basis an Openstack installation with a Neutron vendor plugin. We had our fair share of issues, but we're now fairly stable and happy performance wise on a Grizzly installation, and already working on the Havana upgrade. However, since I am one of the developers for said plugin, probably this doesn't count. On the other hand, I've also been given a chance to test some production or beta Openstack clouds entirely based on opensource components; and I've been completely satisfied with the user experience; but my point of view here is limited again, because I don't have the perspective of the cloud admin in this case. Thanks, -Kumar ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Experiences of using Neutron in large scale
Hi Salvatore, Please see my responses. On Wed, Oct 2, 2013 at 11:03 AM, Salvatore Orlando sorla...@nicira.comwrote: Hi Kumar, some comments to your questions inline. I am afraid I am unable to provide thorough answers. hopefully my thoughts will be beneficial at least to provide more context. Salvatore On 2 October 2013 19:04, Kumar chvs...@gmail.com wrote: Hi, We are considering to run openstack Neutron in a large scale deployment. I would like to know community experience and suggestions. To get to know the quality I am going through neutron bugs( I assume that is the best way to know the quality) Some of them are real concerning like below bugs https://bugs.launchpad.net/neutron/+bug/1211915 https://bugs.launchpad.net/neutron/+bug/1230407 https://bugs.launchpad.net/neutron/+bug/121 The bug 1211915 is raised for simple tempest tests,whats about huge deployments? I am told even vendor neutron plugins too have similar issues when we create tens of instances in single click on horizon. And people see too many connection timeouts in quantum service logs with vendor plugins as well. Preamble: The aim of the next paragraph is not aimed at downplaying the issues on the gate. During each release cycle, new features are added. In particular this time Neutron added VPN and Firewall services. This means that there is a lot of code churn, both on the neutron-server and python-neutronclient. Is not infrequent that critical bugs like the ones above (and you also left out bug 1240001) are in the code base up to a few days before the release. For vendor plugins, this might even be different, as they're not regulated by the same QA process as the plugin used by the gate (one might say it should not be like this - but this is probably out of the scope of this thread). I have to agree that during this release cycle Neutron has cause quite a few gate-blocking issues; on the other hand I don't think that flakiness during the release cycle is enough of a reason to label a project as immature, unstable, or does not scale. Kumar I did get chance to meet folks using Vendor Plugins but they expressed the same concern. Be it folsom, grizzly or Havana they have seen constant behavior issues either it could be tuning db connection poolsize etc., or neutron plugin so busy talking to its Openflow Controllers/quantum agents that it timeouts neutron client requests from nova. I am with Neutron, pushing it and I am sure it brings in more flexibility in our deployments. I need the fuel to answer any questions. In production, where a small issue can cost us. So, we need to make a cautious step. Most importantly, I have seen bugs proposed to fix in future versions and no backport onto old releases. This is a concern as deployments like us would not migrate to new releases as it consumes lot of time and effort to certify. I was told that some were struck with nova-network as there is no support yet to migrate Neutron and they could not take advantage of new network services. This is correct. The migration process unfortunately is not easy, because you need to rearrange your cloud networking at different layers. I wish it was as easy as doing a db migration, but unfortunately it's nothing like that. I don't feel I have the authority and the competence to provide any migration advice, but in my opinion the current best bet is to provide parallel openstack installations with nova-network and neutron, and then progressively allocate new networks on the neutron installation until there are no more instances deployed on the nova-network installation. But please take the previous statement as nothing more than 'thinking aloud'. I would like to know community thinking on the same. Please note that I am not concerned on fix availability. From my side I can tell you that I am using on a daily basis an Openstack installation with a Neutron vendor plugin. We had our fair share of issues, but we're now fairly stable and happy performance wise on a Grizzly installation, and already working on the Havana upgrade. However, since I am one of the developers for said plugin, probably this doesn't count. On the other hand, I've also been given a chance to test some production or beta Openstack clouds entirely based on opensource components; and I've been completely satisfied with the user experience; but my point of view here is limited again, because I don't have the perspective of the cloud admin in this case. Thanks, -Kumar ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list
Re: [openstack-dev] Experiences of using Neutron in large scale
Kumar, How large of a deployment are you considering it for? We've run Neutron in a fairly large environment (10k+ nodes) for a year now and have learned some interesting lessons. We use a modified Openvswitch plugin and as such have no experience with the Nicira plugin. I think the largest single problem that we have as it pertains to scalability are the race conditions in neutron-server. Allocating IPs, network, ports etc tend to have some racey behaviors. I feel like many of these issues are being addressed by neutron developers, but also Neutron is very viable for large-scale production today. For instance most of the race conditions that I mention can be averted if you aren't writing to the database concurrently. You could designate ONE neutron-server as the write server and the rest as read, it's a little tricky to do because you have to have a router in front of them all or reroute requests, but the API set is not very large so a very doable task. That being said, in our environment we use a single neutron-server with another standing by as backup. It's not as performant as we'd like it to be, but it hasn't stopped us from growing so far. -Mike Wilson P.S. There is a presentation from the Portland summit that myself and Jun Park did. In it we talk about some of the issues around scale although neutron (quantum at the time) is a smaller part of the talk. : http://www.openstack.org/summit/portland-2013/session-videos/presentation/using-openstack-in-a-traditional-hosting-environment . On Wed, Oct 2, 2013 at 11:04 AM, Kumar chvs...@gmail.com wrote: Hi, We are considering to run openstack Neutron in a large scale deployment. I would like to know community experience and suggestions. To get to know the quality I am going through neutron bugs( I assume that is the best way to know the quality) Some of them are real concerning like below bugs https://bugs.launchpad.net/neutron/+bug/1211915 https://bugs.launchpad.net/neutron/+bug/1230407 https://bugs.launchpad.net/neutron/+bug/121 The bug 1211915 is raised for simple tempest tests,whats about huge deployments? I am told even vendor neutron plugins too have similar issues when we create tens of instances in single click on horizon. And people see too many connection timeouts in quantum service logs with vendor plugins as well. I was told that some were struck with nova-network as there is no support yet to migrate Neutron and they could not take advantage of new network services. I would like to know community thinking on the same. Please note that I am not concerned on fix availability. Thanks, -Kumar ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev