[openstack-dev] Experiences of using Neutron in large scale

2013-10-02 Thread Kumar
Hi,
  We are considering to run openstack Neutron in a large scale deployment.
I would like to know community experience and suggestions.

To get to know the quality I am going through neutron bugs( I assume that
is the best way to know the quality)
Some of them are real concerning like below bugs
https://bugs.launchpad.net/neutron/+bug/1211915
https://bugs.launchpad.net/neutron/+bug/1230407
https://bugs.launchpad.net/neutron/+bug/121

The bug 1211915 is raised for simple tempest tests,whats about huge
deployments?
I am told even vendor neutron plugins too have similar issues when we
create tens of instances in single click on horizon. And people see too
many connection timeouts in quantum service logs with vendor plugins as
well.

I was told that some were struck with nova-network as  there is no support
yet to migrate  Neutron and they could not take advantage of new network
services.

I would like to know community thinking on the same. Please note that I am
not concerned on fix availability.

Thanks,
-Kumar
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Experiences of using Neutron in large scale

2013-10-02 Thread Salvatore Orlando
Hi Kumar,

some comments to your questions inline.
I am afraid I am unable to provide thorough answers. hopefully my thoughts
will be beneficial at least to provide more context.

Salvatore


On 2 October 2013 19:04, Kumar chvs...@gmail.com wrote:

 Hi,
   We are considering to run openstack Neutron in a large scale deployment.
 I would like to know community experience and suggestions.

 To get to know the quality I am going through neutron bugs( I assume that
 is the best way to know the quality)
 Some of them are real concerning like below bugs
 https://bugs.launchpad.net/neutron/+bug/1211915
 https://bugs.launchpad.net/neutron/+bug/1230407
 https://bugs.launchpad.net/neutron/+bug/121

 The bug 1211915 is raised for simple tempest tests,whats about huge
 deployments?
 I am told even vendor neutron plugins too have similar issues when we
 create tens of instances in single click on horizon. And people see too
 many connection timeouts in quantum service logs with vendor plugins as
 well.


Preamble: The aim of the next paragraph is not aimed at downplaying the
issues on the gate.
During each release cycle, new features are added. In particular this time
Neutron added VPN and Firewall services. This means that there is a lot of
code churn, both on the neutron-server and python-neutronclient. Is not
infrequent that critical bugs like the ones above (and you also left out
bug 1240001) are in the code base up to a few days before the release. For
vendor plugins, this might even be different, as they're not regulated by
the same QA process as the plugin used by the gate (one might say it should
not be like this - but this is probably out of the scope of this thread). I
have to agree that during this release cycle Neutron has cause quite a few
gate-blocking issues; on the other hand I don't think that flakiness during
the release cycle is enough of a reason to label a project as immature,
unstable, or does not scale.


 I was told that some were struck with nova-network as  there is no support
 yet to migrate  Neutron and they could not take advantage of new network
 services.


This is correct. The migration process unfortunately is not easy, because
you need to rearrange your cloud networking at different layers. I wish it
was as easy as doing a db migration, but unfortunately it's nothing like
that. I don't feel I have the authority and the competence to provide any
migration advice, but in my opinion the current best bet is to provide
parallel openstack installations with nova-network and neutron, and then
progressively allocate new networks on the neutron installation until there
are no more instances deployed on the nova-network installation. But please
take the previous statement as nothing more than 'thinking aloud'.



 I would like to know community thinking on the same. Please note that I am
 not concerned on fix availability.


From my side I can tell you that I am using on a daily basis an Openstack
installation with a Neutron vendor plugin. We had our fair share of issues,
but we're now fairly stable and happy performance wise on a Grizzly
installation, and already working on the Havana upgrade. However, since I
am one of the developers for said plugin, probably this doesn't count.
On the other hand, I've also been given a chance to test some production or
beta Openstack clouds entirely based on opensource components; and I've
been completely satisfied with the user experience; but my point of view
here is limited again, because I don't have the perspective of the cloud
admin in this case.



 Thanks,
 -Kumar


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Experiences of using Neutron in large scale

2013-10-02 Thread Kumar
Hi Salvatore,
  Please see my responses.


On Wed, Oct 2, 2013 at 11:03 AM, Salvatore Orlando sorla...@nicira.comwrote:

 Hi Kumar,

 some comments to your questions inline.
 I am afraid I am unable to provide thorough answers. hopefully my thoughts
 will be beneficial at least to provide more context.

 Salvatore


 On 2 October 2013 19:04, Kumar chvs...@gmail.com wrote:

 Hi,
   We are considering to run openstack Neutron in a large scale
 deployment. I would like to know community experience and suggestions.

 To get to know the quality I am going through neutron bugs( I assume that
 is the best way to know the quality)
 Some of them are real concerning like below bugs
 https://bugs.launchpad.net/neutron/+bug/1211915
 https://bugs.launchpad.net/neutron/+bug/1230407
 https://bugs.launchpad.net/neutron/+bug/121

 The bug 1211915 is raised for simple tempest tests,whats about huge
 deployments?
 I am told even vendor neutron plugins too have similar issues when we
 create tens of instances in single click on horizon. And people see too
 many connection timeouts in quantum service logs with vendor plugins as
 well.


 Preamble: The aim of the next paragraph is not aimed at downplaying the
 issues on the gate.
 During each release cycle, new features are added. In particular this time
 Neutron added VPN and Firewall services. This means that there is a lot of
 code churn, both on the neutron-server and python-neutronclient. Is not
 infrequent that critical bugs like the ones above (and you also left out
 bug 1240001) are in the code base up to a few days before the release. For
 vendor plugins, this might even be different, as they're not regulated by
 the same QA process as the plugin used by the gate (one might say it should
 not be like this - but this is probably out of the scope of this thread). I
 have to agree that during this release cycle Neutron has cause quite a few
 gate-blocking issues; on the other hand I don't think that flakiness during
 the release cycle is enough of a reason to label a project as immature,
 unstable, or does not scale.


Kumar  I did get chance to meet folks using Vendor Plugins but they
expressed the same concern.

Be it folsom, grizzly or Havana they have seen constant behavior issues
either it could be tuning db connection poolsize etc., or neutron plugin so
busy talking to its Openflow Controllers/quantum agents that it timeouts
neutron client requests from nova.

I am with Neutron, pushing it and I am sure it brings in more flexibility
in our deployments. I need the fuel to answer any questions. In production,
where a small issue can cost us. So, we need to make a cautious step.

Most importantly, I have seen bugs proposed to fix in future versions and
no backport onto old releases. This is a concern as deployments like us
would not migrate to new releases as it consumes lot of time and effort to
certify.



 I was told that some were struck with nova-network as  there is no
 support yet to migrate  Neutron and they could not take advantage of new
 network services.


 This is correct. The migration process unfortunately is not easy, because
 you need to rearrange your cloud networking at different layers. I wish it
 was as easy as doing a db migration, but unfortunately it's nothing like
 that. I don't feel I have the authority and the competence to provide any
 migration advice, but in my opinion the current best bet is to provide
 parallel openstack installations with nova-network and neutron, and then
 progressively allocate new networks on the neutron installation until there
 are no more instances deployed on the nova-network installation. But please
 take the previous statement as nothing more than 'thinking aloud'.



 I would like to know community thinking on the same. Please note that I
 am not concerned on fix availability.


 From my side I can tell you that I am using on a daily basis an Openstack
 installation with a Neutron vendor plugin. We had our fair share of issues,
 but we're now fairly stable and happy performance wise on a Grizzly
 installation, and already working on the Havana upgrade. However, since I
 am one of the developers for said plugin, probably this doesn't count.
 On the other hand, I've also been given a chance to test some production
 or beta Openstack clouds entirely based on opensource components; and I've
 been completely satisfied with the user experience; but my point of view
 here is limited again, because I don't have the perspective of the cloud
 admin in this case.



 Thanks,
 -Kumar


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list

Re: [openstack-dev] Experiences of using Neutron in large scale

2013-10-02 Thread Mike Wilson
Kumar,

How large of a deployment are you considering it for? We've run Neutron in
a fairly large environment (10k+ nodes) for a year now and have learned
some interesting lessons. We use a modified Openvswitch plugin and as such
have no experience with the Nicira plugin. I think the largest single
problem that we have as it pertains to scalability are the race conditions
in neutron-server. Allocating IPs, network, ports etc tend to have some
racey behaviors. I feel like many of these issues are being addressed by
neutron developers, but also Neutron is very viable for large-scale
production today. For instance most of the race conditions that I mention
can be averted if you aren't writing to the database concurrently. You
could designate ONE neutron-server as the write server and the rest as
read, it's a little tricky to do because you have to have a router in
front of them all or reroute requests, but the API set is not very large so
a very doable task. That being said, in our environment we use a single
neutron-server with another standing by as backup. It's not as performant
as we'd like it to be, but it hasn't stopped us from growing so far.

-Mike Wilson

P.S. There is a presentation from the Portland summit that myself and Jun
Park did. In it we talk about some of the issues around scale although
neutron (quantum at the time) is a smaller part of the talk. :
http://www.openstack.org/summit/portland-2013/session-videos/presentation/using-openstack-in-a-traditional-hosting-environment
.


On Wed, Oct 2, 2013 at 11:04 AM, Kumar chvs...@gmail.com wrote:

 Hi,
   We are considering to run openstack Neutron in a large scale deployment.
 I would like to know community experience and suggestions.

 To get to know the quality I am going through neutron bugs( I assume that
 is the best way to know the quality)
 Some of them are real concerning like below bugs
 https://bugs.launchpad.net/neutron/+bug/1211915
 https://bugs.launchpad.net/neutron/+bug/1230407
 https://bugs.launchpad.net/neutron/+bug/121

 The bug 1211915 is raised for simple tempest tests,whats about huge
 deployments?
 I am told even vendor neutron plugins too have similar issues when we
 create tens of instances in single click on horizon. And people see too
 many connection timeouts in quantum service logs with vendor plugins as
 well.

 I was told that some were struck with nova-network as  there is no support
 yet to migrate  Neutron and they could not take advantage of new network
 services.

 I would like to know community thinking on the same. Please note that I am
 not concerned on fix availability.

 Thanks,
 -Kumar


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev