Re: [openstack-dev] [neutron] Neutron scaling datapoints?
And for another recent one that came out yesterday: Interesting to read for those who are using mongodb + openstack... https://aphyr.com/posts/322-call-me-maybe-mongodb-stale-reads -Josh Joshua Harlow wrote: Joshua Harlow wrote: Kevin Benton wrote: Timestamps are just one way (and likely the most primitive), using redis (or memcache) key/value and expiry are another (and letting memcache or redis expire using its own internal algorithms), using zookeeper ephemeral nodes[1] are another... The point being that its backend specific and tooz supports varying backends. Very cool. Is the backend completely transparent so a deployer could choose a service they are comfortable maintaining, or will that change the properties WRT to resiliency of state on node restarts, partitions, etc? Of course... we tried to make it 'completely' transparent, but in reality certain backends (zookeeper which uses a paxos-like algorithm and redis with sentinel support...) are better (more resilient, more consistent, handle partitions/restarts better...) than others (memcached is after all just a distributed cache). This is just the nature of the game... And for some more reading fun: https://aphyr.com/posts/315-call-me-maybe-rabbitmq https://aphyr.com/posts/291-call-me-maybe-zookeeper https://aphyr.com/posts/283-call-me-maybe-redis https://aphyr.com/posts/316-call-me-maybe-etcd-and-consul ... (aphyr.com has alot of these neat posts)... The Nova implementation of Tooz seemed pretty straight-forward, although it looked like it had pluggable drivers for service management already. Before I dig into it much further I'll file a spec on the Neutron side to see if I can get some other cores onboard to do the review work if I push a change to tooz. Sounds good to me. On Sun, Apr 12, 2015 at 9:38 AM, Joshua Harlow harlo...@outlook.com mailto:harlo...@outlook.com wrote: Kevin Benton wrote: So IIUC tooz would be handling the liveness detection for the agents. That would be nice to get ride of that logic in Neutron and just register callbacks for rescheduling the dead. Where does it store that state, does it persist timestamps to the DB like Neutron does? If so, how would that scale better? If not, who does a given node ask to know if an agent is online or offline when making a scheduling decision? Timestamps are just one way (and likely the most primitive), using redis (or memcache) key/value and expiry are another (and letting memcache or redis expire using its own internal algorithms), using zookeeper ephemeral nodes[1] are another... The point being that its backend specific and tooz supports varying backends. However, before (what I assume is) the large code change to implement tooz, I would like to quantify that the heartbeats are actually a bottleneck. When I was doing some profiling of them on the master branch a few months ago, processing a heartbeat took an order of magnitude less time (50ms) than the 'sync routers' task of the l3 agent (~300ms). A few query optimizations might buy us a lot more headroom before we have to fall back to large refactors. Sure, always good to avoid prematurely optimizing things... Although this is relevant for u I think anyway: https://review.openstack.org/#__/c/138607/ https://review.openstack.org/#/c/138607/ (same thing/nearly same in nova)... https://review.openstack.org/#__/c/172502/ https://review.openstack.org/#/c/172502/ (a WIP implementation of the latter). [1] https://zookeeper.apache.org/__doc/trunk/__zookeeperProgrammers.html#__Ephemeral+Nodes https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#Ephemeral+Nodes Kevin Benton wrote: One of the most common is the heartbeat from each agent. However, I don't think we can't eliminate them because they are used to determine if the agents are still alive for scheduling purposes. Did you have something else in mind to determine if an agent is alive? Put each agent in a tooz[1] group; have each agent periodically heartbeat[2], have whoever needs to schedule read the active members of that group (or use [3] to get notified via a callback), profit... Pick from your favorite (supporting) driver at: http://docs.openstack.org/developer/tooz/compatibility.html http://docs.openstack.org/__developer/tooz/compatibility.__html http://docs.openstack.org/__developer/tooz/compatibility.__html http://docs.openstack.org/developer/tooz/compatibility.html [1] http://docs.openstack.org/developer/tooz/compatibility.html#grouping http://docs.openstack.org/__developer/tooz/compatibility.__html#grouping http://docs.openstack.org/__developer/tooz/compatibility.__html#grouping http://docs.openstack.org/developer/tooz/compatibility.html#grouping [2] https://github.com/openstack/tooz/blob/0.13.1/tooz/coordination.py#L315 https://github.com/openstack/__tooz/blob/0.13.1/tooz/__coordination.py#L315 https://github.com/openstack/__tooz/blob/0.13.1/tooz/__coordination.py#L315
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
Hi, Attila, only address the issue of agent status/liveness management is not enough for Neutron scalability. The concurrent dynamic load impact on large scale ( for example 100k managed nodes with the dynamic load like security group rule update, routers_updated, etc ) should also be taken into account too. So even if is agent status/liveness management improved in Neutron, that doesn't mean the scalability issue totally being addressed. And on the other hand, Nova already supports several segregation concepts, for example, Cells, Availability Zone... If there are 100k nodes to be managed by one OpenStack instances, it's impossible to work without hardware resources segregation. It's weird to put agent liveness manager in availability zone(AZ in short) 1, but all managed agents in AZ 2. If AZ 1 is power off, then all agents in AZ2 lost management. The benchmark is already here for scalability test report for million ports scalability of Neutron http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers The cascading may be not perfect, but at least it provides a feasible way if we really want scalability. I am also working to evolve OpenStack to a world no need to worry about OpenStack Scalability Issue based on cascading: Tenant level virtual OpenStack service over hybrid or federated or multiple OpenStack based clouds: There are lots of OpenStack based clouds, each tenant will be allocated with one cascading OpenStack as the virtual OpenStack service, and single OpenStack API endpoint served for this tenant. The tenant's resources can be distributed or dynamically scaled to multi-OpenStack based clouds, these clouds may be federated with KeyStone, or using shared KeyStone, or even some OpenStack clouds built in AWS or Azure, or VMWare vSphere. Under this deployment scenario, unlimited scalability in a cloud can be achieved, no unified cascading layer, tenant level resources orchestration among multi-OpenStack clouds fully distributed(even geographically). The database and load for one casacding OpenStack is very very small, easy for disaster recovery or backup. Multiple tenant may share one cascading OpenStack to reduce resource waste, but the principle is to keep the cascading OpenStack as thin as possible. You can find the information here: https://wiki.openstack.org/wiki/OpenStack_cascading_solution#Use_Case Best Regards Chaoyi Huang ( joehuang ) -Original Message- From: Attila Fazekas [mailto:afaze...@redhat.com] Sent: Thursday, April 16, 2015 3:06 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? - Original Message - From: joehuang joehu...@huawei.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Sent: Sunday, April 12, 2015 3:46:24 AM Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? As Kevin talking about agents, I want to remind that in TCP/IP stack, port ( not Neutron Port ) is a two bytes field, i.e. port ranges from 0 ~ 65535, supports maximum 64k port number. above 100k managed node means more than 100k L2 agents/L3 agents... will be alive under Neutron. Want to know the detail design how to support 99.9% possibility for scaling Neutron in this way, and PoC and test would be a good support for this idea. Would you consider something as PoC which uses the technology in similar way, with a similar port - security problem, but with a lower level API than neutron using currently ? Is it an acceptable flaw: If you kill -9 the q-svc 1 times at the `right` millisec the rabbitmq memory usage increases by ~1MiB ? (Rabbit usually eats ~10GiB under pressure) The memory can be freed without broker restart, it also gets freed on agent restart. I'm 99.9% sure, for scaling above 100k managed node, we do not really need to split the openstack to multiple smaller openstack, or use significant number of extra controller machine. Best Regards Chaoyi Huang ( joehuang ) From: Kevin Benton [blak...@gmail.com] Sent: 11 April 2015 12:34 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Which periodic updates did you have in mind to eliminate? One of the few remaining ones I can think of is sync_routers but it would be great if you can enumerate the ones you observed because eliminating overhead in agents is something I've been working on as well. One of the most common is the heartbeat from each agent. However, I don't think we can't eliminate them because they are used to determine if the agents are still alive for scheduling purposes. Did you have something else in mind to determine if an agent is alive? On Fri, Apr 10
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
- Original Message - From: joehuang joehu...@huawei.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Sent: Friday, April 17, 2015 9:46:12 AM Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Hi, Attila, only address the issue of agent status/liveness management is not enough for Neutron scalability. The concurrent dynamic load impact on large scale ( for example 100k managed nodes with the dynamic load like security group rule update, routers_updated, etc ) should also be taken into account too. So even if is agent status/liveness management improved in Neutron, that doesn't mean the scalability issue totally being addressed. This story is not about the heartbeat. https://bugs.launchpad.net/neutron/+bug/1438159 What I am looking for is managing lot of nodes, with minimal `controller` resources. The actual required system changes like (for example regarding to vm boot) per/sec is relative low, even if you have many nodes and vms. - Consider the instances average lifetime - The `bug` is for the resources what the agents are related and querying many times, BTW: I am thinking about several alternatives and other variants. In neutron case a `system change` can affect multiple agents like security group rule change. It seams possible to have all agents to `query` a resource only once, and being notified by any subsequent change `for free`. (IP, sec group rule, new neighbor) This is the scenario when the message brokers can shine and scale, and it also offloads lot of work from the DB. And on the other hand, Nova already supports several segregation concepts, for example, Cells, Availability Zone... If there are 100k nodes to be managed by one OpenStack instances, it's impossible to work without hardware resources segregation. It's weird to put agent liveness manager in availability zone(AZ in short) 1, but all managed agents in AZ 2. If AZ 1 is power off, then all agents in AZ2 lost management. The benchmark is already here for scalability test report for million ports scalability of Neutron http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers The cascading may be not perfect, but at least it provides a feasible way if we really want scalability. I am also working to evolve OpenStack to a world no need to worry about OpenStack Scalability Issue based on cascading: Tenant level virtual OpenStack service over hybrid or federated or multiple OpenStack based clouds: There are lots of OpenStack based clouds, each tenant will be allocated with one cascading OpenStack as the virtual OpenStack service, and single OpenStack API endpoint served for this tenant. The tenant's resources can be distributed or dynamically scaled to multi-OpenStack based clouds, these clouds may be federated with KeyStone, or using shared KeyStone, or even some OpenStack clouds built in AWS or Azure, or VMWare vSphere. Under this deployment scenario, unlimited scalability in a cloud can be achieved, no unified cascading layer, tenant level resources orchestration among multi-OpenStack clouds fully distributed(even geographically). The database and load for one casacding OpenStack is very very small, easy for disaster recovery or backup. Multiple tenant may share one cascading OpenStack to reduce resource waste, but the principle is to keep the cascading OpenStack as thin as possible. You can find the information here: https://wiki.openstack.org/wiki/OpenStack_cascading_solution#Use_Case Best Regards Chaoyi Huang ( joehuang ) -Original Message- From: Attila Fazekas [mailto:afaze...@redhat.com] Sent: Thursday, April 16, 2015 3:06 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? - Original Message - From: joehuang joehu...@huawei.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Sent: Sunday, April 12, 2015 3:46:24 AM Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? As Kevin talking about agents, I want to remind that in TCP/IP stack, port ( not Neutron Port ) is a two bytes field, i.e. port ranges from 0 ~ 65535, supports maximum 64k port number. above 100k managed node means more than 100k L2 agents/L3 agents... will be alive under Neutron. Want to know the detail design how to support 99.9% possibility for scaling Neutron in this way, and PoC and test would be a good support for this idea. Would you consider something as PoC which uses the technology in similar way, with a similar port - security problem, but with a lower level API than neutron using currently ? Is it an acceptable flaw: If you kill -9 the q-svc 1 times
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
- Original Message - From: joehuang joehu...@huawei.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Sent: Sunday, April 12, 2015 3:46:24 AM Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? As Kevin talking about agents, I want to remind that in TCP/IP stack, port ( not Neutron Port ) is a two bytes field, i.e. port ranges from 0 ~ 65535, supports maximum 64k port number. above 100k managed node means more than 100k L2 agents/L3 agents... will be alive under Neutron. Want to know the detail design how to support 99.9% possibility for scaling Neutron in this way, and PoC and test would be a good support for this idea. Would you consider something as PoC which uses the technology in similar way, with a similar port - security problem, but with a lower level API than neutron using currently ? Is it an acceptable flaw: If you kill -9 the q-svc 1 times at the `right` millisec the rabbitmq memory usage increases by ~1MiB ? (Rabbit usually eats ~10GiB under pressure) The memory can be freed without broker restart, it also gets freed on agent restart. I'm 99.9% sure, for scaling above 100k managed node, we do not really need to split the openstack to multiple smaller openstack, or use significant number of extra controller machine. Best Regards Chaoyi Huang ( joehuang ) From: Kevin Benton [blak...@gmail.com] Sent: 11 April 2015 12:34 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Which periodic updates did you have in mind to eliminate? One of the few remaining ones I can think of is sync_routers but it would be great if you can enumerate the ones you observed because eliminating overhead in agents is something I've been working on as well. One of the most common is the heartbeat from each agent. However, I don't think we can't eliminate them because they are used to determine if the agents are still alive for scheduling purposes. Did you have something else in mind to determine if an agent is alive? On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas afaze...@redhat.com wrote: I'm 99.9% sure, for scaling above 100k managed node, we do not really need to split the openstack to multiple smaller openstack, or use significant number of extra controller machine. The problem is openstack using the right tools SQL/AMQP/(zk), but in a wrong way. For example.: Periodic updates can be avoided almost in all cases The new data can be pushed to the agent just when it needed. The agent can know when the AMQP connection become unreliable (queue or connection loose), and needs to do full sync. https://bugs.launchpad.net/neutron/+bug/1438159 Also the agents when gets some notification, they start asking for details via the AMQP - SQL. Why they do not know it already or get it with the notification ? - Original Message - From: Neil Jerram neil.jer...@metaswitch.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Sent: Thursday, April 9, 2015 5:01:45 PM Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Hi Joe, Many thanks for your reply! On 09/04/15 03:34, joehuang wrote: Hi, Neil, From theoretic, Neutron is like a broadcast domain, for example, enforcement of DVR and security group has to touch each regarding host where there is VM of this project resides. Even using SDN controller, the touch to regarding host is inevitable. If there are plenty of physical hosts, for example, 10k, inside one Neutron, it's very hard to overcome the broadcast storm issue under concurrent operation, that's the bottleneck for scalability of Neutron. I think I understand that in general terms - but can you be more specific about the broadcast storm? Is there one particular message exchange that involves broadcasting? Is it only from the server to agents, or are there 'broadcasts' in other directions as well? (I presume you are talking about control plane messages here, i.e. between Neutron components. Is that right? Obviously there can also be broadcast storm problems in the data plane - but I don't think that's what you are talking about here.) We need layered architecture in Neutron to solve the broadcast domain bottleneck of scalability. The test report from OpenStack cascading shows that through layered architecture Neutron cascading, Neutron can supports up to million level ports and 100k level physical hosts. You can find the report here: http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers Many thanks, I will take a look at this. Neutron cascading also brings extra benefit: One cascading Neutron can
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
Thanks Joe, I really appreciate these numbers. For an individual (cascaded) Neutron, then, your testing showed that it could happily handle 1000 compute hosts. Apart from the cascading on the northbound side, was that otherwise unmodified from vanilla OpenStack? Do you recall any particular config settings that were needed to achieve that? (e.g. api_workers and rpc_workers) Regards, Neil On 16/04/15 03:03, joehuang wrote: In case it's helpful to see all the cases together, sync_routers (from the L3 agent) was also mentioned in other part of this thread. Plus of course the liveness reporting from all agents. In the test report [1], which shows Neutron can supports up to million level ports and 100k level physical hosts, the scalability is done by one cascading Neutron to manage 100 cascaded Neutrons through current Neutron restful API. For normal Neutron, each compute node will host L2 agent/OVS, L3 agent/DVR. In the cascading Neutron layer, the L2 agent is modified to interact with regarding cascaded Neutron but not OVS, the L3 agent(DVR) is modified to interact with regarding cascaded Neutron but not linux route. That's why we call the cascaded Neutron is the backend of Neutron. Therefore, there are only 100 compute nodes (or say agent ) required in the cascading layer, each compute node will manage one cascaded Neutron. Each cascaded Neutron can manage up to 1000 nodes (there is already report and deployment and lab test can support this). That's the scalability to 100k nodes. Because the cloud is splited into two layer (100 nodes in the cascading layer, 1000 nodes in each cascaded layer ), even current mechanism can meet the demand for sync_routers and liveness reporting from all agents, or L2 population, DVR router update...etc. The test report [1] at least prove that the layered architecture idea is feasible for Neutron scalability, even up to million level ports and 100k level nodes. The extra benefit for the layered architecture is that each cascaded Neutron can leverage different backend technology implementation, for example, one is ML2+OVS, another is OVN or ODL or Calico... [1]test report for million ports scalability of Neutron http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers Best Regards Chaoyi Huang ( Joe Huang ) -Original Message- From: Neil Jerram [mailto:neil.jer...@metaswitch.com] Sent: Wednesday, April 15, 2015 9:46 PM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Hi again Joe, (+ list) On 11/04/15 02:00, joehuang wrote: Hi, Neil, See inline comments. Best Regards Chaoyi Huang From: Neil Jerram [neil.jer...@metaswitch.com] Sent: 09 April 2015 23:01 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Hi Joe, Many thanks for your reply! On 09/04/15 03:34, joehuang wrote: Hi, Neil, From theoretic, Neutron is like a broadcast domain, for example, enforcement of DVR and security group has to touch each regarding host where there is VM of this project resides. Even using SDN controller, the touch to regarding host is inevitable. If there are plenty of physical hosts, for example, 10k, inside one Neutron, it's very hard to overcome the broadcast storm issue under concurrent operation, that's the bottleneck for scalability of Neutron. I think I understand that in general terms - but can you be more specific about the broadcast storm? Is there one particular message exchange that involves broadcasting? Is it only from the server to agents, or are there 'broadcasts' in other directions as well? [[joehuang]] for example, L2 population, Security group rule update, DVR route update. Both direction in different scenario. Thanks. In case it's helpful to see all the cases together, sync_routers (from the L3 agent) was also mentioned in other part of this thread. Plus of course the liveness reporting from all agents. (I presume you are talking about control plane messages here, i.e. between Neutron components. Is that right? Obviously there can also be broadcast storm problems in the data plane - but I don't think that's what you are talking about here.) [[joehuang]] Yes, controll plane here. Thanks for confirming that. We need layered architecture in Neutron to solve the broadcast domain bottleneck of scalability. The test report from OpenStack cascading shows that through layered architecture Neutron cascading, Neutron can supports up to million level ports and 100k level physical hosts. You can find the report here: http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascad ing-solution-to-support-1-million-v-ms-in-100-data-centers Many thanks, I will take a look at this. It was very interesting, thanks. And by following through your
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
Hi, Neil, The api_wokers / rpc-worker configuration for the cascading layer can be found in the test report, it's based on community Juno version, and some issues found and listed in the end of the report. Simulator is used for the cascaded OpenStackNo configuration in the test. For configuration of api_worker/rpc_works for one OpenStack Neutron to support 1152 nodes, you can refer to the article http://www.openstack.cn/p2932.html or http://www.csdn.net/article/2014-12-19/2823077, but unfortunately, it was written in Chinese, and no detail number of workers. Best Regards Chaoyi Huang ( Joe Huang ) -Original Message- From: Neil Jerram [mailto:neil.jer...@metaswitch.com] Sent: Thursday, April 16, 2015 5:15 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Thanks Joe, I really appreciate these numbers. For an individual (cascaded) Neutron, then, your testing showed that it could happily handle 1000 compute hosts. Apart from the cascading on the northbound side, was that otherwise unmodified from vanilla OpenStack? Do you recall any particular config settings that were needed to achieve that? (e.g. api_workers and rpc_workers) Regards, Neil On 16/04/15 03:03, joehuang wrote: In case it's helpful to see all the cases together, sync_routers (from the L3 agent) was also mentioned in other part of this thread. Plus of course the liveness reporting from all agents. In the test report [1], which shows Neutron can supports up to million level ports and 100k level physical hosts, the scalability is done by one cascading Neutron to manage 100 cascaded Neutrons through current Neutron restful API. For normal Neutron, each compute node will host L2 agent/OVS, L3 agent/DVR. In the cascading Neutron layer, the L2 agent is modified to interact with regarding cascaded Neutron but not OVS, the L3 agent(DVR) is modified to interact with regarding cascaded Neutron but not linux route. That's why we call the cascaded Neutron is the backend of Neutron. Therefore, there are only 100 compute nodes (or say agent ) required in the cascading layer, each compute node will manage one cascaded Neutron. Each cascaded Neutron can manage up to 1000 nodes (there is already report and deployment and lab test can support this). That's the scalability to 100k nodes. Because the cloud is splited into two layer (100 nodes in the cascading layer, 1000 nodes in each cascaded layer ), even current mechanism can meet the demand for sync_routers and liveness reporting from all agents, or L2 population, DVR router update...etc. The test report [1] at least prove that the layered architecture idea is feasible for Neutron scalability, even up to million level ports and 100k level nodes. The extra benefit for the layered architecture is that each cascaded Neutron can leverage different backend technology implementation, for example, one is ML2+OVS, another is OVN or ODL or Calico... [1]test report for million ports scalability of Neutron http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascadi ng-solution-to-support-1-million-v-ms-in-100-data-centers Best Regards Chaoyi Huang ( Joe Huang ) -Original Message- From: Neil Jerram [mailto:neil.jer...@metaswitch.com] Sent: Wednesday, April 15, 2015 9:46 PM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Hi again Joe, (+ list) On 11/04/15 02:00, joehuang wrote: Hi, Neil, See inline comments. Best Regards Chaoyi Huang From: Neil Jerram [neil.jer...@metaswitch.com] Sent: 09 April 2015 23:01 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Hi Joe, Many thanks for your reply! On 09/04/15 03:34, joehuang wrote: Hi, Neil, From theoretic, Neutron is like a broadcast domain, for example, enforcement of DVR and security group has to touch each regarding host where there is VM of this project resides. Even using SDN controller, the touch to regarding host is inevitable. If there are plenty of physical hosts, for example, 10k, inside one Neutron, it's very hard to overcome the broadcast storm issue under concurrent operation, that's the bottleneck for scalability of Neutron. I think I understand that in general terms - but can you be more specific about the broadcast storm? Is there one particular message exchange that involves broadcasting? Is it only from the server to agents, or are there 'broadcasts' in other directions as well? [[joehuang]] for example, L2 population, Security group rule update, DVR route update. Both direction in different scenario. Thanks. In case it's helpful to see all the cases together, sync_routers (from the L3 agent) was also mentioned in other
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
Hi again Joe, (+ list) On 11/04/15 02:00, joehuang wrote: Hi, Neil, See inline comments. Best Regards Chaoyi Huang From: Neil Jerram [neil.jer...@metaswitch.com] Sent: 09 April 2015 23:01 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Hi Joe, Many thanks for your reply! On 09/04/15 03:34, joehuang wrote: Hi, Neil, From theoretic, Neutron is like a broadcast domain, for example, enforcement of DVR and security group has to touch each regarding host where there is VM of this project resides. Even using SDN controller, the touch to regarding host is inevitable. If there are plenty of physical hosts, for example, 10k, inside one Neutron, it's very hard to overcome the broadcast storm issue under concurrent operation, that's the bottleneck for scalability of Neutron. I think I understand that in general terms - but can you be more specific about the broadcast storm? Is there one particular message exchange that involves broadcasting? Is it only from the server to agents, or are there 'broadcasts' in other directions as well? [[joehuang]] for example, L2 population, Security group rule update, DVR route update. Both direction in different scenario. Thanks. In case it's helpful to see all the cases together, sync_routers (from the L3 agent) was also mentioned in other part of this thread. Plus of course the liveness reporting from all agents. (I presume you are talking about control plane messages here, i.e. between Neutron components. Is that right? Obviously there can also be broadcast storm problems in the data plane - but I don't think that's what you are talking about here.) [[joehuang]] Yes, controll plane here. Thanks for confirming that. We need layered architecture in Neutron to solve the broadcast domain bottleneck of scalability. The test report from OpenStack cascading shows that through layered architecture Neutron cascading, Neutron can supports up to million level ports and 100k level physical hosts. You can find the report here: http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers Many thanks, I will take a look at this. It was very interesting, thanks. And by following through your links I also learned more about Nova cells, and about how some people question whether we need any kind of partitioning at all, and should instead solve scaling/performance problems in other ways... It will be interesting to see how this plays out. I'd still like to see more information, though, about how far people have scaled OpenStack - and in particular Neutron - as it exists today. Surely having a consensus set of current limits is an important input into any discussion of future scaling work. For example, Kevin mentioned benchmarking where the Neutron server processed a liveness update in 50ms and a sync_routers in 300ms. Suppose, the liveness update time was 50ms (since I don't know in detail what that means) and agents report liveness every 30s. Does that mean that a single Neutron server can only support 600 agents? I'm also especially interested in the DHCP agent, because in Calico we have one of those on every compute host. We've just run tests which appeared to be hitting trouble from just 50 compute hosts onwards, and apparently because of DHCP agent communications. We need to continue looking into that and report findings properly, but if anyone already has any insights, they would be much appreciated. Many thanks, Neil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
Neil Jerram wrote: Hi again Joe, (+ list) On 11/04/15 02:00, joehuang wrote: Hi, Neil, See inline comments. Best Regards Chaoyi Huang From: Neil Jerram [neil.jer...@metaswitch.com] Sent: 09 April 2015 23:01 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Hi Joe, Many thanks for your reply! On 09/04/15 03:34, joehuang wrote: Hi, Neil, From theoretic, Neutron is like a broadcast domain, for example, enforcement of DVR and security group has to touch each regarding host where there is VM of this project resides. Even using SDN controller, the touch to regarding host is inevitable. If there are plenty of physical hosts, for example, 10k, inside one Neutron, it's very hard to overcome the broadcast storm issue under concurrent operation, that's the bottleneck for scalability of Neutron. I think I understand that in general terms - but can you be more specific about the broadcast storm? Is there one particular message exchange that involves broadcasting? Is it only from the server to agents, or are there 'broadcasts' in other directions as well? [[joehuang]] for example, L2 population, Security group rule update, DVR route update. Both direction in different scenario. Thanks. In case it's helpful to see all the cases together, sync_routers (from the L3 agent) was also mentioned in other part of this thread. Plus of course the liveness reporting from all agents. (I presume you are talking about control plane messages here, i.e. between Neutron components. Is that right? Obviously there can also be broadcast storm problems in the data plane - but I don't think that's what you are talking about here.) [[joehuang]] Yes, controll plane here. Thanks for confirming that. We need layered architecture in Neutron to solve the broadcast domain bottleneck of scalability. The test report from OpenStack cascading shows that through layered architecture Neutron cascading, Neutron can supports up to million level ports and 100k level physical hosts. You can find the report here: http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers Many thanks, I will take a look at this. It was very interesting, thanks. And by following through your links I also learned more about Nova cells, and about how some people question whether we need any kind of partitioning at all, and should instead solve scaling/performance problems in other ways... It will be interesting to see how this plays out. I'd still like to see more information, though, about how far people have scaled OpenStack - and in particular Neutron - as it exists today. Surely having a consensus set of current limits is an important input into any discussion of future scaling work. +2 to this... Shooting for the moon (although nice in theory) is not so useful when you can't even get up a hill ;) For example, Kevin mentioned benchmarking where the Neutron server processed a liveness update in 50ms and a sync_routers in 300ms. Suppose, the liveness update time was 50ms (since I don't know in detail what that means) and agents report liveness every 30s. Does that mean that a single Neutron server can only support 600 agents? I'm also especially interested in the DHCP agent, because in Calico we have one of those on every compute host. We've just run tests which appeared to be hitting trouble from just 50 compute hosts onwards, and apparently because of DHCP agent communications. We need to continue looking into that and report findings properly, but if anyone already has any insights, they would be much appreciated. Many thanks, Neil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
In case it's helpful to see all the cases together, sync_routers (from the L3 agent) was also mentioned in other part of this thread. Plus of course the liveness reporting from all agents. In the test report [1], which shows Neutron can supports up to million level ports and 100k level physical hosts, the scalability is done by one cascading Neutron to manage 100 cascaded Neutrons through current Neutron restful API. For normal Neutron, each compute node will host L2 agent/OVS, L3 agent/DVR. In the cascading Neutron layer, the L2 agent is modified to interact with regarding cascaded Neutron but not OVS, the L3 agent(DVR) is modified to interact with regarding cascaded Neutron but not linux route. That's why we call the cascaded Neutron is the backend of Neutron. Therefore, there are only 100 compute nodes (or say agent ) required in the cascading layer, each compute node will manage one cascaded Neutron. Each cascaded Neutron can manage up to 1000 nodes (there is already report and deployment and lab test can support this). That's the scalability to 100k nodes. Because the cloud is splited into two layer (100 nodes in the cascading layer, 1000 nodes in each cascaded layer ), even current mechanism can meet the demand for sync_routers and liveness reporting from all agents, or L2 population, DVR router update...etc. The test report [1] at least prove that the layered architecture idea is feasible for Neutron scalability, even up to million level ports and 100k level nodes. The extra benefit for the layered architecture is that each cascaded Neutron can leverage different backend technology implementation, for example, one is ML2+OVS, another is OVN or ODL or Calico... [1]test report for million ports scalability of Neutron http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers Best Regards Chaoyi Huang ( Joe Huang ) -Original Message- From: Neil Jerram [mailto:neil.jer...@metaswitch.com] Sent: Wednesday, April 15, 2015 9:46 PM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Hi again Joe, (+ list) On 11/04/15 02:00, joehuang wrote: Hi, Neil, See inline comments. Best Regards Chaoyi Huang From: Neil Jerram [neil.jer...@metaswitch.com] Sent: 09 April 2015 23:01 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Hi Joe, Many thanks for your reply! On 09/04/15 03:34, joehuang wrote: Hi, Neil, From theoretic, Neutron is like a broadcast domain, for example, enforcement of DVR and security group has to touch each regarding host where there is VM of this project resides. Even using SDN controller, the touch to regarding host is inevitable. If there are plenty of physical hosts, for example, 10k, inside one Neutron, it's very hard to overcome the broadcast storm issue under concurrent operation, that's the bottleneck for scalability of Neutron. I think I understand that in general terms - but can you be more specific about the broadcast storm? Is there one particular message exchange that involves broadcasting? Is it only from the server to agents, or are there 'broadcasts' in other directions as well? [[joehuang]] for example, L2 population, Security group rule update, DVR route update. Both direction in different scenario. Thanks. In case it's helpful to see all the cases together, sync_routers (from the L3 agent) was also mentioned in other part of this thread. Plus of course the liveness reporting from all agents. (I presume you are talking about control plane messages here, i.e. between Neutron components. Is that right? Obviously there can also be broadcast storm problems in the data plane - but I don't think that's what you are talking about here.) [[joehuang]] Yes, controll plane here. Thanks for confirming that. We need layered architecture in Neutron to solve the broadcast domain bottleneck of scalability. The test report from OpenStack cascading shows that through layered architecture Neutron cascading, Neutron can supports up to million level ports and 100k level physical hosts. You can find the report here: http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascad ing-solution-to-support-1-million-v-ms-in-100-data-centers Many thanks, I will take a look at this. It was very interesting, thanks. And by following through your links I also learned more about Nova cells, and about how some people question whether we need any kind of partitioning at all, and should instead solve scaling/performance problems in other ways... It will be interesting to see how this plays out. I'd still like to see more information, though, about how far people have scaled OpenStack - and in particular Neutron - as it exists today
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
Hi, Joshua, This is a long discussion thread, may we come back to the scalability topic? As you confirmed, Tooz only addresses the issue of agent status management, not solve the concurrent dynamic load impact on large scale ( for example 100k managed nodes with the dynamic load like security group rule update, routers_updated, etc ). So even if Tooz is implemented in Neutron, that doesn't mean the scalability issue totally being addressed. So what's the goal and the whole picture to address the Neutron scalability? And Tooz will help the picture to be completed. Best Regards Chaoyi Huang ( Joe Huang ) -Original Message- From: Joshua Harlow [mailto:harlo...@outlook.com] Sent: Tuesday, April 14, 2015 11:33 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] 答复: [neutron] Neutron scaling datapoints? Daniel Comnea wrote: Joshua, those are old and have been fixed/ documented on Consul side. As for ZK, i have nothing against it, just wish you good luck running it in a multi cross-DC setup :) Totally fair, although I start to question a cross-DC setup of things, and why that's needed in this (and/or any) architecture, but to each there own ;) Dani On Mon, Apr 13, 2015 at 11:37 PM, Joshua Harlow harlo...@outlook.com mailto:harlo...@outlook.com wrote: Did the following get addressed? https://aphyr.com/posts/316-__call-me-maybe-etcd-and-consul https://aphyr.com/posts/316-call-me-maybe-etcd-and-consul Seems like quite a few things got raised in that post about etcd/consul. Maybe they are fixed, idk... https://aphyr.com/posts/291-__call-me-maybe-zookeeper https://aphyr.com/posts/291-call-me-maybe-zookeeper though worked as expected (and without issue)... I quote: ''' Recommendations Use Zookeeper. It’s mature, well-designed, and battle-tested. Because the consequences of its connection model and linearizability properties are subtle, you should, wherever possible, take advantage of tested recipes and client libraries like Curator, which do their best to correctly handle the complex state transitions associated with session and connection loss. ''' Daniel Comnea wrote: My $2 cents: I like the 3rd party backend however instead of ZK wouldn't Consul [1] fit better due to lighter/ out of box multi DC awareness? Dani [1] Consul - https://www.consul.io/ On Mon, Apr 13, 2015 at 9:51 AM, Wangbibo wangb...@huawei.com mailto:wangb...@huawei.com mailto:wangb...@huawei.com mailto:wangb...@huawei.com wrote: Hi Kevin, __ __ Totally agree with you that heartbeat from each agent is something that we cannot eliminate currently. Agent status depends on it, and further scheduler and HA depends on agent status. __ __ I proposed a Liberty spec for introducing open framework/pluggable agent status drivers.[1][2] It allows us to use some other 3^rd party backend to monitor agent status, such as zookeeper, memcached. Meanwhile, it guarantees backward compatibility so that users could still use db-based status monitoring mechanism as their default choice. __ __ Base on that, we may do further optimization on issues Attila and you mentioned. Thanks. __ __ [1] BP - https://blueprints.launchpad.__net/neutron/+spec/agent-group-__and-status-drivers https://blueprints.launchpad.net/neutron/+spec/agent-group-and-status-drivers [2] Liberty Spec proposed - https://review.openstack.org/#__/c/168921/ https://review.openstack.org/#/c/168921/ __ __ Best, Robin __ __ __ __ __ __ __ __ *发件人:*Kevin Benton [mailto:blak...@gmail.com mailto:blak...@gmail.com mailto:blak...@gmail.com mailto:blak...@gmail.com] *发送时间:*2015年4月11日12:35 *收件人:*OpenStack Development Mailing List (not for usage questions) *主题:*Re: [openstack-dev] [neutron] Neutron scaling datapoints? __ __ Which periodic updates did you have in mind to eliminate? One of the few remaining ones I can think of is sync_routers but it would be great if you can enumerate the ones you observed because eliminating overhead in agents is something I've been working on as well. __ __ One of the most
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
Tooz provides a mechanism for grouping agents and agent status/liveness management, multiple coordinator services may be required in large scale deployment, especially for 100k nodes level. We can't make assumption that only one coordinator service is enough to manage all nodes, that means tooz may need to support multiple coordinate backend. And Nova already supports several segregation concepts, for example, Cells, Availability Zone, Host Aggregates, Where the coordinate backend will resides? How to group agents? It's weird to put coordinator in availability zone(AZ in short) 1, but all managed agents in AZ 2. If AZ 1 is power off, then all agents in AZ2 lost management. Do we need segregation concept for agents, or reuse Nova concept, or build mapping between them? Especially if multiple coordinate backend will work under one Neutron. Best Regards Chaoyi Huang ( Joe Huang ) -Original Message- From: Joshua Harlow [mailto:harlo...@outlook.com] Sent: Monday, April 13, 2015 11:11 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? joehuang wrote: Hi, Kevin and Joshua, As my understanding, Tooz only addresses the issue of agent status management, but how to solve the concurrent dynamic load impact on large scale ( for example 100k managed nodes with the dynamic load like security goup rule update, routers_updated, etc ) Yes, that is correct, let's not confuse status/liveness management with updates... since IMHO they are to very different things (the latter can be eventually consistent IMHO will the liveness 'question' probably should not be...). And one more question is, if we have 100k managed nodes, how to do the partition? Or all nodes will be managed by one Tooz service, like Zookeeper? Can Zookeeper manage 100k nodes status? I can get u some data/numbers from some studies I've seen, but what u are talking about is highly specific as to what u are doing with zookeeper... There is no one solution for all the things IMHO; choose what's best from your tool-belt for each problem... Best Regards Chaoyi Huang ( Joe Huang ) *From:*Kevin Benton [mailto:blak...@gmail.com] *Sent:* Monday, April 13, 2015 3:52 AM *To:* OpenStack Development Mailing List (not for usage questions) *Subject:* Re: [openstack-dev] [neutron] Neutron scaling datapoints? Timestamps are just one way (and likely the most primitive), using redis (or memcache) key/value and expiry are another (and letting memcache or redis expire using its own internal algorithms), using zookeeper ephemeral nodes[1] are another... The point being that its backend specific and tooz supports varying backends. Very cool. Is the backend completely transparent so a deployer could choose a service they are comfortable maintaining, or will that change the properties WRT to resiliency of state on node restarts, partitions, etc? The Nova implementation of Tooz seemed pretty straight-forward, although it looked like it had pluggable drivers for service management already. Before I dig into it much further I'll file a spec on the Neutron side to see if I can get some other cores onboard to do the review work if I push a change to tooz. On Sun, Apr 12, 2015 at 9:38 AM, Joshua Harlow harlo...@outlook.com mailto:harlo...@outlook.com wrote: Kevin Benton wrote: So IIUC tooz would be handling the liveness detection for the agents. That would be nice to get ride of that logic in Neutron and just register callbacks for rescheduling the dead. Where does it store that state, does it persist timestamps to the DB like Neutron does? If so, how would that scale better? If not, who does a given node ask to know if an agent is online or offline when making a scheduling decision? Timestamps are just one way (and likely the most primitive), using redis (or memcache) key/value and expiry are another (and letting memcache or redis expire using its own internal algorithms), using zookeeper ephemeral nodes[1] are another... The point being that its backend specific and tooz supports varying backends. However, before (what I assume is) the large code change to implement tooz, I would like to quantify that the heartbeats are actually a bottleneck. When I was doing some profiling of them on the master branch a few months ago, processing a heartbeat took an order of magnitude less time (50ms) than the 'sync routers' task of the l3 agent (~300ms). A few query optimizations might buy us a lot more headroom before we have to fall back to large refactors. Sure, always good to avoid prematurely optimizing things... Although this is relevant for u I think anyway: https://review.openstack.org/#/c/138607/ (same thing/nearly same in nova)... https://review.openstack.org/#/c/172502/ (a WIP implementation of the latter). [1] https://zookeeper.apache.org/doc/trunk
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
-Original Message- From: Attila Fazekas [mailto:afaze...@redhat.com] Sent: Monday, April 13, 2015 3:19 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? - Original Message - From: joehuang joehu...@huawei.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Sent: Sunday, April 12, 2015 1:20:48 PM Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Hi, Kevin, I assumed that all agents are connected to same IP address of RabbitMQ, then the connection will exceed the port ranges limitation. https://news.ycombinator.com/item?id=1571300 TCP connections are identified by the (src ip, src port, dest ip, dest port) tuple. The server doesn't need multiple IPs to handle 65535 connections. All the server connections to a given IP are to the same port. For a given client, the unique key for an http connection is (client-ip, PORT, server-ip, 80). The only number that can vary is PORT, and that's a value on the client. So, the client is limited to 65535 connections to the server. But, a second client could also have another 65K connections to the same server-ip:port. [[joehuang]] Sorry, long time not writing socket based app, I may make a mistake for HTTP server to spawn a thread to handle a new connection. I'll check again. For a RabbitMQ cluster, for sure the client can connect to any one of member in the cluster, but in this case, the client has to be designed in fail-safe manner: the client should be aware of the cluster member failure, and reconnect to other survive member. No such mechnism has been implemented yet. Other way is to use LVS or DNS based like load balancer, or something else. If you put one load balancer ahead of a cluster, then we have to take care of the port number limitation, there are so many agents will require connection concurrently, 100k level, and the requests can not be rejected. Best Regards Chaoyi Huang ( joehuang ) From: Kevin Benton [blak...@gmail.com] Sent: 12 April 2015 9:59 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? The TCP/IP stack keeps track of connections as a combination of IP + TCP port. The two byte port limit doesn't matter unless all of the agents are connecting from the same IP address, which shouldn't be the case unless compute nodes connect to the rabbitmq server via one IP address running port address translation. Either way, the agents don't connect directly to the Neutron server, they connect to the rabbit MQ cluster. Since as many Neutron server processes can be launched as necessary, the bottlenecks will likely show up at the messaging or DB layer. On Sat, Apr 11, 2015 at 6:46 PM, joehuang joehu...@huawei.com wrote: As Kevin talking about agents, I want to remind that in TCP/IP stack, port ( not Neutron Port ) is a two bytes field, i.e. port ranges from 0 ~ 65535, supports maximum 64k port number. above 100k managed node means more than 100k L2 agents/L3 agents... will be alive under Neutron. Want to know the detail design how to support 99.9% possibility for scaling Neutron in this way, and PoC and test would be a good support for this idea. I'm 99.9% sure, for scaling above 100k managed node, we do not really need to split the openstack to multiple smaller openstack, or use significant number of extra controller machine. Best Regards Chaoyi Huang ( joehuang ) From: Kevin Benton [ blak...@gmail.com ] Sent: 11 April 2015 12:34 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Which periodic updates did you have in mind to eliminate? One of the few remaining ones I can think of is sync_routers but it would be great if you can enumerate the ones you observed because eliminating overhead in agents is something I've been working on as well. One of the most common is the heartbeat from each agent. However, I don't think we can't eliminate them because they are used to determine if the agents are still alive for scheduling purposes. Did you have something else in mind to determine if an agent is alive? On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas afaze...@redhat.com wrote: I'm 99.9% sure, for scaling above 100k managed node, we do not really need to split the openstack to multiple smaller openstack, or use significant number of extra controller machine. The problem is openstack using the right tools SQL/AMQP/(zk), but in a wrong way. For example.: Periodic updates can be avoided almost in all cases The new data can be pushed to the agent just when it needed. The agent can know when the AMQP connection
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
- Original Message - From: joehuang joehu...@huawei.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Sent: Sunday, April 12, 2015 1:20:48 PM Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Hi, Kevin, I assumed that all agents are connected to same IP address of RabbitMQ, then the connection will exceed the port ranges limitation. https://news.ycombinator.com/item?id=1571300 TCP connections are identified by the (src ip, src port, dest ip, dest port) tuple. The server doesn't need multiple IPs to handle 65535 connections. All the server connections to a given IP are to the same port. For a given client, the unique key for an http connection is (client-ip, PORT, server-ip, 80). The only number that can vary is PORT, and that's a value on the client. So, the client is limited to 65535 connections to the server. But, a second client could also have another 65K connections to the same server-ip:port. For a RabbitMQ cluster, for sure the client can connect to any one of member in the cluster, but in this case, the client has to be designed in fail-safe manner: the client should be aware of the cluster member failure, and reconnect to other survive member. No such mechnism has been implemented yet. Other way is to use LVS or DNS based like load balancer, or something else. If you put one load balancer ahead of a cluster, then we have to take care of the port number limitation, there are so many agents will require connection concurrently, 100k level, and the requests can not be rejected. Best Regards Chaoyi Huang ( joehuang ) From: Kevin Benton [blak...@gmail.com] Sent: 12 April 2015 9:59 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? The TCP/IP stack keeps track of connections as a combination of IP + TCP port. The two byte port limit doesn't matter unless all of the agents are connecting from the same IP address, which shouldn't be the case unless compute nodes connect to the rabbitmq server via one IP address running port address translation. Either way, the agents don't connect directly to the Neutron server, they connect to the rabbit MQ cluster. Since as many Neutron server processes can be launched as necessary, the bottlenecks will likely show up at the messaging or DB layer. On Sat, Apr 11, 2015 at 6:46 PM, joehuang joehu...@huawei.com wrote: As Kevin talking about agents, I want to remind that in TCP/IP stack, port ( not Neutron Port ) is a two bytes field, i.e. port ranges from 0 ~ 65535, supports maximum 64k port number. above 100k managed node means more than 100k L2 agents/L3 agents... will be alive under Neutron. Want to know the detail design how to support 99.9% possibility for scaling Neutron in this way, and PoC and test would be a good support for this idea. I'm 99.9% sure, for scaling above 100k managed node, we do not really need to split the openstack to multiple smaller openstack, or use significant number of extra controller machine. Best Regards Chaoyi Huang ( joehuang ) From: Kevin Benton [ blak...@gmail.com ] Sent: 11 April 2015 12:34 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Which periodic updates did you have in mind to eliminate? One of the few remaining ones I can think of is sync_routers but it would be great if you can enumerate the ones you observed because eliminating overhead in agents is something I've been working on as well. One of the most common is the heartbeat from each agent. However, I don't think we can't eliminate them because they are used to determine if the agents are still alive for scheduling purposes. Did you have something else in mind to determine if an agent is alive? On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas afaze...@redhat.com wrote: I'm 99.9% sure, for scaling above 100k managed node, we do not really need to split the openstack to multiple smaller openstack, or use significant number of extra controller machine. The problem is openstack using the right tools SQL/AMQP/(zk), but in a wrong way. For example.: Periodic updates can be avoided almost in all cases The new data can be pushed to the agent just when it needed. The agent can know when the AMQP connection become unreliable (queue or connection loose), and needs to do full sync. https://bugs.launchpad.net/neutron/+bug/1438159 Also the agents when gets some notification, they start asking for details via the AMQP - SQL. Why they do not know it already or get it with the notification ? - Original Message - From: Neil Jerram neil.jer...@metaswitch.com To: OpenStack Development Mailing List (not for usage questions
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
- Original Message - From: Kevin Benton blak...@gmail.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Sent: Sunday, April 12, 2015 4:17:29 AM Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? So IIUC tooz would be handling the liveness detection for the agents. That would be nice to get ride of that logic in Neutron and just register callbacks for rescheduling the dead. Where does it store that state, does it persist timestamps to the DB like Neutron does? If so, how would that scale better? If not, who does a given node ask to know if an agent is online or offline when making a scheduling decision? You might find interesting the proposed solution in this bug: https://bugs.launchpad.net/nova/+bug/1437199 However, before (what I assume is) the large code change to implement tooz, I would like to quantify that the heartbeats are actually a bottleneck. When I was doing some profiling of them on the master branch a few months ago, processing a heartbeat took an order of magnitude less time (50ms) than the 'sync routers' task of the l3 agent (~300ms). A few query optimizations might buy us a lot more headroom before we have to fall back to large refactors. Kevin Benton wrote: One of the most common is the heartbeat from each agent. However, I don't think we can't eliminate them because they are used to determine if the agents are still alive for scheduling purposes. Did you have something else in mind to determine if an agent is alive? Put each agent in a tooz[1] group; have each agent periodically heartbeat[2], have whoever needs to schedule read the active members of that group (or use [3] to get notified via a callback), profit... Pick from your favorite (supporting) driver at: http://docs.openstack.org/ developer/tooz/compatibility. html [1] http://docs.openstack.org/ developer/tooz/compatibility. html#grouping [2] https://github.com/openstack/ tooz/blob/0.13.1/tooz/ coordination.py#L315 [3] http://docs.openstack.org/ developer/tooz/tutorial/group_ membership.html#watching- group-changes __ __ __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists. openstack.org?subject: unsubscribe http://lists.openstack.org/ cgi-bin/mailman/listinfo/ openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
Kevin Benton wrote: Timestamps are just one way (and likely the most primitive), using redis (or memcache) key/value and expiry are another (and letting memcache or redis expire using its own internal algorithms), using zookeeper ephemeral nodes[1] are another... The point being that its backend specific and tooz supports varying backends. Very cool. Is the backend completely transparent so a deployer could choose a service they are comfortable maintaining, or will that change the properties WRT to resiliency of state on node restarts, partitions, etc? Of course... we tried to make it 'completely' transparent, but in reality certain backends (zookeeper which uses a paxos-like algorithm and redis with sentinel support...) are better (more resilient, more consistent, handle partitions/restarts better...) than others (memcached is after all just a distributed cache). This is just the nature of the game... The Nova implementation of Tooz seemed pretty straight-forward, although it looked like it had pluggable drivers for service management already. Before I dig into it much further I'll file a spec on the Neutron side to see if I can get some other cores onboard to do the review work if I push a change to tooz. Sounds good to me. On Sun, Apr 12, 2015 at 9:38 AM, Joshua Harlow harlo...@outlook.com mailto:harlo...@outlook.com wrote: Kevin Benton wrote: So IIUC tooz would be handling the liveness detection for the agents. That would be nice to get ride of that logic in Neutron and just register callbacks for rescheduling the dead. Where does it store that state, does it persist timestamps to the DB like Neutron does? If so, how would that scale better? If not, who does a given node ask to know if an agent is online or offline when making a scheduling decision? Timestamps are just one way (and likely the most primitive), using redis (or memcache) key/value and expiry are another (and letting memcache or redis expire using its own internal algorithms), using zookeeper ephemeral nodes[1] are another... The point being that its backend specific and tooz supports varying backends. However, before (what I assume is) the large code change to implement tooz, I would like to quantify that the heartbeats are actually a bottleneck. When I was doing some profiling of them on the master branch a few months ago, processing a heartbeat took an order of magnitude less time (50ms) than the 'sync routers' task of the l3 agent (~300ms). A few query optimizations might buy us a lot more headroom before we have to fall back to large refactors. Sure, always good to avoid prematurely optimizing things... Although this is relevant for u I think anyway: https://review.openstack.org/#__/c/138607/ https://review.openstack.org/#/c/138607/ (same thing/nearly same in nova)... https://review.openstack.org/#__/c/172502/ https://review.openstack.org/#/c/172502/ (a WIP implementation of the latter). [1] https://zookeeper.apache.org/__doc/trunk/__zookeeperProgrammers.html#__Ephemeral+Nodes https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#Ephemeral+Nodes Kevin Benton wrote: One of the most common is the heartbeat from each agent. However, I don't think we can't eliminate them because they are used to determine if the agents are still alive for scheduling purposes. Did you have something else in mind to determine if an agent is alive? Put each agent in a tooz[1] group; have each agent periodically heartbeat[2], have whoever needs to schedule read the active members of that group (or use [3] to get notified via a callback), profit... Pick from your favorite (supporting) driver at: http://docs.openstack.org/developer/tooz/compatibility.html http://docs.openstack.org/__developer/tooz/compatibility.__html http://docs.openstack.org/__developer/tooz/compatibility.__html http://docs.openstack.org/developer/tooz/compatibility.html [1] http://docs.openstack.org/developer/tooz/compatibility.html#grouping http://docs.openstack.org/__developer/tooz/compatibility.__html#grouping http://docs.openstack.org/__developer/tooz/compatibility.__html#grouping http://docs.openstack.org/developer/tooz/compatibility.html#grouping [2] https://github.com/openstack/tooz/blob/0.13.1/tooz/coordination.py#L315 https://github.com/openstack/__tooz/blob/0.13.1/tooz/__coordination.py#L315 https://github.com/openstack/__tooz/blob/0.13.1/tooz/__coordination.py#L315
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
joehuang wrote: Hi, Kevin and Joshua, As my understanding, Tooz only addresses the issue of agent status management, but how to solve the concurrent dynamic load impact on large scale ( for example 100k managed nodes with the dynamic load like security goup rule update, routers_updated, etc ) Yes, that is correct, let's not confuse status/liveness management with updates... since IMHO they are to very different things (the latter can be eventually consistent IMHO will the liveness 'question' probably should not be...). And one more question is, if we have 100k managed nodes, how to do the partition? Or all nodes will be managed by one Tooz service, like Zookeeper? Can Zookeeper manage 100k nodes status? I can get u some data/numbers from some studies I've seen, but what u are talking about is highly specific as to what u are doing with zookeeper... There is no one solution for all the things IMHO; choose what's best from your tool-belt for each problem... Best Regards Chaoyi Huang ( Joe Huang ) *From:*Kevin Benton [mailto:blak...@gmail.com] *Sent:* Monday, April 13, 2015 3:52 AM *To:* OpenStack Development Mailing List (not for usage questions) *Subject:* Re: [openstack-dev] [neutron] Neutron scaling datapoints? Timestamps are just one way (and likely the most primitive), using redis (or memcache) key/value and expiry are another (and letting memcache or redis expire using its own internal algorithms), using zookeeper ephemeral nodes[1] are another... The point being that its backend specific and tooz supports varying backends. Very cool. Is the backend completely transparent so a deployer could choose a service they are comfortable maintaining, or will that change the properties WRT to resiliency of state on node restarts, partitions, etc? The Nova implementation of Tooz seemed pretty straight-forward, although it looked like it had pluggable drivers for service management already. Before I dig into it much further I'll file a spec on the Neutron side to see if I can get some other cores onboard to do the review work if I push a change to tooz. On Sun, Apr 12, 2015 at 9:38 AM, Joshua Harlow harlo...@outlook.com mailto:harlo...@outlook.com wrote: Kevin Benton wrote: So IIUC tooz would be handling the liveness detection for the agents. That would be nice to get ride of that logic in Neutron and just register callbacks for rescheduling the dead. Where does it store that state, does it persist timestamps to the DB like Neutron does? If so, how would that scale better? If not, who does a given node ask to know if an agent is online or offline when making a scheduling decision? Timestamps are just one way (and likely the most primitive), using redis (or memcache) key/value and expiry are another (and letting memcache or redis expire using its own internal algorithms), using zookeeper ephemeral nodes[1] are another... The point being that its backend specific and tooz supports varying backends. However, before (what I assume is) the large code change to implement tooz, I would like to quantify that the heartbeats are actually a bottleneck. When I was doing some profiling of them on the master branch a few months ago, processing a heartbeat took an order of magnitude less time (50ms) than the 'sync routers' task of the l3 agent (~300ms). A few query optimizations might buy us a lot more headroom before we have to fall back to large refactors. Sure, always good to avoid prematurely optimizing things... Although this is relevant for u I think anyway: https://review.openstack.org/#/c/138607/ (same thing/nearly same in nova)... https://review.openstack.org/#/c/172502/ (a WIP implementation of the latter). [1] https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#Ephemeral+Nodes https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#Ephemeral+Nodes Kevin Benton wrote: One of the most common is the heartbeat from each agent. However, I don't think we can't eliminate them because they are used to determine if the agents are still alive for scheduling purposes. Did you have something else in mind to determine if an agent is alive? Put each agent in a tooz[1] group; have each agent periodically heartbeat[2], have whoever needs to schedule read the active members of that group (or use [3] to get notified via a callback), profit... Pick from your favorite (supporting) driver at: http://docs.openstack.org/__developer/tooz/compatibility.__html http://docs.openstack.org/developer/tooz/compatibility.html [1] http://docs.openstack.org/__developer/tooz/compatibility.__html#grouping http://docs.openstack.org/developer/tooz/compatibility.html#grouping [2] https://github.com/openstack/__tooz/blob/0.13.1/tooz/__coordination.py#L315 https://github.com/openstack/tooz/blob/0.13.1/tooz/coordination.py#L315 [3] http://docs.openstack.org/__developer/tooz/tutorial/group___membership.html#watching-__group-changes http://docs.openstack.org/developer/tooz/tutorial
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
Hi, Kevin and Joshua, As my understanding, Tooz only addresses the issue of agent status management, but how to solve the concurrent dynamic load impact on large scale ( for example 100k managed nodes with the dynamic load like security goup rule update, routers_updated, etc ) And one more question is, if we have 100k managed nodes, how to do the partition? Or all nodes will be managed by one Tooz service, like Zookeeper? Can Zookeeper manage 100k nodes status? Best Regards Chaoyi Huang ( Joe Huang ) From: Kevin Benton [mailto:blak...@gmail.com] Sent: Monday, April 13, 2015 3:52 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Timestamps are just one way (and likely the most primitive), using redis (or memcache) key/value and expiry are another (and letting memcache or redis expire using its own internal algorithms), using zookeeper ephemeral nodes[1] are another... The point being that its backend specific and tooz supports varying backends. Very cool. Is the backend completely transparent so a deployer could choose a service they are comfortable maintaining, or will that change the properties WRT to resiliency of state on node restarts, partitions, etc? The Nova implementation of Tooz seemed pretty straight-forward, although it looked like it had pluggable drivers for service management already. Before I dig into it much further I'll file a spec on the Neutron side to see if I can get some other cores onboard to do the review work if I push a change to tooz. On Sun, Apr 12, 2015 at 9:38 AM, Joshua Harlow harlo...@outlook.commailto:harlo...@outlook.com wrote: Kevin Benton wrote: So IIUC tooz would be handling the liveness detection for the agents. That would be nice to get ride of that logic in Neutron and just register callbacks for rescheduling the dead. Where does it store that state, does it persist timestamps to the DB like Neutron does? If so, how would that scale better? If not, who does a given node ask to know if an agent is online or offline when making a scheduling decision? Timestamps are just one way (and likely the most primitive), using redis (or memcache) key/value and expiry are another (and letting memcache or redis expire using its own internal algorithms), using zookeeper ephemeral nodes[1] are another... The point being that its backend specific and tooz supports varying backends. However, before (what I assume is) the large code change to implement tooz, I would like to quantify that the heartbeats are actually a bottleneck. When I was doing some profiling of them on the master branch a few months ago, processing a heartbeat took an order of magnitude less time (50ms) than the 'sync routers' task of the l3 agent (~300ms). A few query optimizations might buy us a lot more headroom before we have to fall back to large refactors. Sure, always good to avoid prematurely optimizing things... Although this is relevant for u I think anyway: https://review.openstack.org/#/c/138607/ (same thing/nearly same in nova)... https://review.openstack.org/#/c/172502/ (a WIP implementation of the latter). [1] https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#Ephemeral+Nodes Kevin Benton wrote: One of the most common is the heartbeat from each agent. However, I don't think we can't eliminate them because they are used to determine if the agents are still alive for scheduling purposes. Did you have something else in mind to determine if an agent is alive? Put each agent in a tooz[1] group; have each agent periodically heartbeat[2], have whoever needs to schedule read the active members of that group (or use [3] to get notified via a callback), profit... Pick from your favorite (supporting) driver at: http://docs.openstack.org/__developer/tooz/compatibility.__html http://docs.openstack.org/developer/tooz/compatibility.html [1] http://docs.openstack.org/__developer/tooz/compatibility.__html#grouping http://docs.openstack.org/developer/tooz/compatibility.html#grouping [2] https://github.com/openstack/__tooz/blob/0.13.1/tooz/__coordination.py#L315 https://github.com/openstack/tooz/blob/0.13.1/tooz/coordination.py#L315 [3] http://docs.openstack.org/__developer/tooz/tutorial/group___membership.html#watching-__group-changes http://docs.openstack.org/developer/tooz/tutorial/group_membership.html#watching-group-changes __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.__openstack.org?subject:__unsubscribehttp://openstack.org?subject:__unsubscribe http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/__cgi-bin/mailman/listinfo/__openstack-dev http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
Hi, Kevin, I assumed that all agents are connected to same IP address of RabbitMQ, then the connection will exceed the port ranges limitation. For a RabbitMQ cluster, for sure the client can connect to any one of member in the cluster, but in this case, the client has to be designed in fail-safe manner: the client should be aware of the cluster member failure, and reconnect to other survive member. No such mechnism has been implemented yet. Other way is to use LVS or DNS based like load balancer, or something else. If you put one load balancer ahead of a cluster, then we have to take care of the port number limitation, there are so many agents will require connection concurrently, 100k level, and the requests can not be rejected. Best Regards Chaoyi Huang ( joehuang ) From: Kevin Benton [blak...@gmail.com] Sent: 12 April 2015 9:59 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? The TCP/IP stack keeps track of connections as a combination of IP + TCP port. The two byte port limit doesn't matter unless all of the agents are connecting from the same IP address, which shouldn't be the case unless compute nodes connect to the rabbitmq server via one IP address running port address translation. Either way, the agents don't connect directly to the Neutron server, they connect to the rabbit MQ cluster. Since as many Neutron server processes can be launched as necessary, the bottlenecks will likely show up at the messaging or DB layer. On Sat, Apr 11, 2015 at 6:46 PM, joehuang joehu...@huawei.commailto:joehu...@huawei.com wrote: As Kevin talking about agents, I want to remind that in TCP/IP stack, port ( not Neutron Port ) is a two bytes field, i.e. port ranges from 0 ~ 65535, supports maximum 64k port number. above 100k managed node means more than 100k L2 agents/L3 agents... will be alive under Neutron. Want to know the detail design how to support 99.9% possibility for scaling Neutron in this way, and PoC and test would be a good support for this idea. I'm 99.9% sure, for scaling above 100k managed node, we do not really need to split the openstack to multiple smaller openstack, or use significant number of extra controller machine. Best Regards Chaoyi Huang ( joehuang ) From: Kevin Benton [blak...@gmail.commailto:blak...@gmail.com] Sent: 11 April 2015 12:34 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Which periodic updates did you have in mind to eliminate? One of the few remaining ones I can think of is sync_routers but it would be great if you can enumerate the ones you observed because eliminating overhead in agents is something I've been working on as well. One of the most common is the heartbeat from each agent. However, I don't think we can't eliminate them because they are used to determine if the agents are still alive for scheduling purposes. Did you have something else in mind to determine if an agent is alive? On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas afaze...@redhat.commailto:afaze...@redhat.com wrote: I'm 99.9% sure, for scaling above 100k managed node, we do not really need to split the openstack to multiple smaller openstack, or use significant number of extra controller machine. The problem is openstack using the right tools SQL/AMQP/(zk), but in a wrong way. For example.: Periodic updates can be avoided almost in all cases The new data can be pushed to the agent just when it needed. The agent can know when the AMQP connection become unreliable (queue or connection loose), and needs to do full sync. https://bugs.launchpad.net/neutron/+bug/1438159 Also the agents when gets some notification, they start asking for details via the AMQP - SQL. Why they do not know it already or get it with the notification ? - Original Message - From: Neil Jerram neil.jer...@metaswitch.commailto:neil.jer...@metaswitch.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Sent: Thursday, April 9, 2015 5:01:45 PM Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Hi Joe, Many thanks for your reply! On 09/04/15 03:34, joehuang wrote: Hi, Neil, From theoretic, Neutron is like a broadcast domain, for example, enforcement of DVR and security group has to touch each regarding host where there is VM of this project resides. Even using SDN controller, the touch to regarding host is inevitable. If there are plenty of physical hosts, for example, 10k, inside one Neutron, it's very hard to overcome the broadcast storm issue under concurrent operation, that's the bottleneck for scalability of Neutron. I think I understand that in general terms - but can you
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
Kevin Benton wrote: So IIUC tooz would be handling the liveness detection for the agents. That would be nice to get ride of that logic in Neutron and just register callbacks for rescheduling the dead. Where does it store that state, does it persist timestamps to the DB like Neutron does? If so, how would that scale better? If not, who does a given node ask to know if an agent is online or offline when making a scheduling decision? Timestamps are just one way (and likely the most primitive), using redis (or memcache) key/value and expiry are another (and letting memcache or redis expire using its own internal algorithms), using zookeeper ephemeral nodes[1] are another... The point being that its backend specific and tooz supports varying backends. However, before (what I assume is) the large code change to implement tooz, I would like to quantify that the heartbeats are actually a bottleneck. When I was doing some profiling of them on the master branch a few months ago, processing a heartbeat took an order of magnitude less time (50ms) than the 'sync routers' task of the l3 agent (~300ms). A few query optimizations might buy us a lot more headroom before we have to fall back to large refactors. Sure, always good to avoid prematurely optimizing things... Although this is relevant for u I think anyway: https://review.openstack.org/#/c/138607/ (same thing/nearly same in nova)... https://review.openstack.org/#/c/172502/ (a WIP implementation of the latter). [1] https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#Ephemeral+Nodes Kevin Benton wrote: One of the most common is the heartbeat from each agent. However, I don't think we can't eliminate them because they are used to determine if the agents are still alive for scheduling purposes. Did you have something else in mind to determine if an agent is alive? Put each agent in a tooz[1] group; have each agent periodically heartbeat[2], have whoever needs to schedule read the active members of that group (or use [3] to get notified via a callback), profit... Pick from your favorite (supporting) driver at: http://docs.openstack.org/__developer/tooz/compatibility.__html http://docs.openstack.org/developer/tooz/compatibility.html [1] http://docs.openstack.org/__developer/tooz/compatibility.__html#grouping http://docs.openstack.org/developer/tooz/compatibility.html#grouping [2] https://github.com/openstack/__tooz/blob/0.13.1/tooz/__coordination.py#L315 https://github.com/openstack/tooz/blob/0.13.1/tooz/coordination.py#L315 [3] http://docs.openstack.org/__developer/tooz/tutorial/group___membership.html#watching-__group-changes http://docs.openstack.org/developer/tooz/tutorial/group_membership.html#watching-group-changes __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.__openstack.org?subject:__unsubscribe http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/__cgi-bin/mailman/listinfo/__openstack-dev http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
Timestamps are just one way (and likely the most primitive), using redis (or memcache) key/value and expiry are another (and letting memcache or redis expire using its own internal algorithms), using zookeeper ephemeral nodes[1] are another... The point being that its backend specific and tooz supports varying backends. Very cool. Is the backend completely transparent so a deployer could choose a service they are comfortable maintaining, or will that change the properties WRT to resiliency of state on node restarts, partitions, etc? The Nova implementation of Tooz seemed pretty straight-forward, although it looked like it had pluggable drivers for service management already. Before I dig into it much further I'll file a spec on the Neutron side to see if I can get some other cores onboard to do the review work if I push a change to tooz. On Sun, Apr 12, 2015 at 9:38 AM, Joshua Harlow harlo...@outlook.com wrote: Kevin Benton wrote: So IIUC tooz would be handling the liveness detection for the agents. That would be nice to get ride of that logic in Neutron and just register callbacks for rescheduling the dead. Where does it store that state, does it persist timestamps to the DB like Neutron does? If so, how would that scale better? If not, who does a given node ask to know if an agent is online or offline when making a scheduling decision? Timestamps are just one way (and likely the most primitive), using redis (or memcache) key/value and expiry are another (and letting memcache or redis expire using its own internal algorithms), using zookeeper ephemeral nodes[1] are another... The point being that its backend specific and tooz supports varying backends. However, before (what I assume is) the large code change to implement tooz, I would like to quantify that the heartbeats are actually a bottleneck. When I was doing some profiling of them on the master branch a few months ago, processing a heartbeat took an order of magnitude less time (50ms) than the 'sync routers' task of the l3 agent (~300ms). A few query optimizations might buy us a lot more headroom before we have to fall back to large refactors. Sure, always good to avoid prematurely optimizing things... Although this is relevant for u I think anyway: https://review.openstack.org/#/c/138607/ (same thing/nearly same in nova)... https://review.openstack.org/#/c/172502/ (a WIP implementation of the latter). [1] https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html# Ephemeral+Nodes Kevin Benton wrote: One of the most common is the heartbeat from each agent. However, I don't think we can't eliminate them because they are used to determine if the agents are still alive for scheduling purposes. Did you have something else in mind to determine if an agent is alive? Put each agent in a tooz[1] group; have each agent periodically heartbeat[2], have whoever needs to schedule read the active members of that group (or use [3] to get notified via a callback), profit... Pick from your favorite (supporting) driver at: http://docs.openstack.org/__developer/tooz/compatibility.__html http://docs.openstack.org/developer/tooz/compatibility.html [1] http://docs.openstack.org/__developer/tooz/compatibility.__html#grouping http://docs.openstack.org/developer/tooz/compatibility.html#grouping [2] https://github.com/openstack/__tooz/blob/0.13.1/tooz/__ coordination.py#L315 https://github.com/openstack/tooz/blob/0.13.1/tooz/coordination.py#L315 [3] http://docs.openstack.org/__developer/tooz/tutorial/group_ __membership.html#watching-__group-changes http://docs.openstack.org/developer/tooz/tutorial/group_ membership.html#watching-group-changes __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.__openstack.org?subject:__unsubscribe http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/__cgi-bin/mailman/listinfo/__openstack-dev http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject: unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Kevin Benton __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
I assumed that all agents are connected to same IP address of RabbitMQ, then the connection will exceed the port ranges limitation. Only if the clients are all using the same IP address. If connections weren't scoped by source IP, busy servers would be completely unreliable because clients would keep having source port collisions. For example, the following is a netstat output from a server with two connections to a service running on port 4000 with both clients using source port 5: http://paste.openstack.org/show/203211/ the client should be aware of the cluster member failure, and reconnect to other survive member. No such mechnism has been implemented yet. If I understand what you are suggesting, it already has been implemented that way. The neutron agents and servers can be configured with multiple rabbitmq servers and they will cycle through the list whenever there is a failure. The only downside to that approach is that every neutron agent and server has to be configured with every rabbitmq server address. This gets tedious to manage if you want to add cluster members dynamically so using a load balancer can help relieve that. Hi, Kevin, I assumed that all agents are connected to same IP address of RabbitMQ, then the connection will exceed the port ranges limitation. For a RabbitMQ cluster, for sure the client can connect to any one of member in the cluster, but in this case, the client has to be designed in fail-safe manner: the client should be aware of the cluster member failure, and reconnect to other survive member. No such mechnism has been implemented yet. Other way is to use LVS or DNS based like load balancer, or something else. If you put one load balancer ahead of a cluster, then we have to take care of the port number limitation, there are so many agents will require connection concurrently, 100k level, and the requests can not be rejected. Best Regards Chaoyi Huang ( joehuang ) -- *From:* Kevin Benton [blak...@gmail.com] *Sent:* 12 April 2015 9:59 *To:* OpenStack Development Mailing List (not for usage questions) *Subject:* Re: [openstack-dev] [neutron] Neutron scaling datapoints? The TCP/IP stack keeps track of connections as a combination of IP + TCP port. The two byte port limit doesn't matter unless all of the agents are connecting from the same IP address, which shouldn't be the case unless compute nodes connect to the rabbitmq server via one IP address running port address translation. Either way, the agents don't connect directly to the Neutron server, they connect to the rabbit MQ cluster. Since as many Neutron server processes can be launched as necessary, the bottlenecks will likely show up at the messaging or DB layer. On Sat, Apr 11, 2015 at 6:46 PM, joehuang joehu...@huawei.com wrote: As Kevin talking about agents, I want to remind that in TCP/IP stack, port ( not Neutron Port ) is a two bytes field, i.e. port ranges from 0 ~ 65535, supports maximum 64k port number. above 100k managed node means more than 100k L2 agents/L3 agents... will be alive under Neutron. Want to know the detail design how to support 99.9% possibility for scaling Neutron in this way, and PoC and test would be a good support for this idea. I'm 99.9% sure, for scaling above 100k managed node, we do not really need to split the openstack to multiple smaller openstack, or use significant number of extra controller machine. Best Regards Chaoyi Huang ( joehuang ) -- *From:* Kevin Benton [blak...@gmail.com] *Sent:* 11 April 2015 12:34 *To:* OpenStack Development Mailing List (not for usage questions) *Subject:* Re: [openstack-dev] [neutron] Neutron scaling datapoints? Which periodic updates did you have in mind to eliminate? One of the few remaining ones I can think of is sync_routers but it would be great if you can enumerate the ones you observed because eliminating overhead in agents is something I've been working on as well. One of the most common is the heartbeat from each agent. However, I don't think we can't eliminate them because they are used to determine if the agents are still alive for scheduling purposes. Did you have something else in mind to determine if an agent is alive? On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas afaze...@redhat.com wrote: I'm 99.9% sure, for scaling above 100k managed node, we do not really need to split the openstack to multiple smaller openstack, or use significant number of extra controller machine. The problem is openstack using the right tools SQL/AMQP/(zk), but in a wrong way. For example.: Periodic updates can be avoided almost in all cases The new data can be pushed to the agent just when it needed. The agent can know when the AMQP connection become unreliable (queue or connection loose), and needs to do full sync. https://bugs.launchpad.net/neutron/+bug/1438159 Also the agents when gets some
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
As Kevin talking about agents, I want to remind that in TCP/IP stack, port ( not Neutron Port ) is a two bytes field, i.e. port ranges from 0 ~ 65535, supports maximum 64k port number. above 100k managed node means more than 100k L2 agents/L3 agents... will be alive under Neutron. Want to know the detail design how to support 99.9% possibility for scaling Neutron in this way, and PoC and test would be a good support for this idea. I'm 99.9% sure, for scaling above 100k managed node, we do not really need to split the openstack to multiple smaller openstack, or use significant number of extra controller machine. Best Regards Chaoyi Huang ( joehuang ) From: Kevin Benton [blak...@gmail.com] Sent: 11 April 2015 12:34 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Which periodic updates did you have in mind to eliminate? One of the few remaining ones I can think of is sync_routers but it would be great if you can enumerate the ones you observed because eliminating overhead in agents is something I've been working on as well. One of the most common is the heartbeat from each agent. However, I don't think we can't eliminate them because they are used to determine if the agents are still alive for scheduling purposes. Did you have something else in mind to determine if an agent is alive? On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas afaze...@redhat.commailto:afaze...@redhat.com wrote: I'm 99.9% sure, for scaling above 100k managed node, we do not really need to split the openstack to multiple smaller openstack, or use significant number of extra controller machine. The problem is openstack using the right tools SQL/AMQP/(zk), but in a wrong way. For example.: Periodic updates can be avoided almost in all cases The new data can be pushed to the agent just when it needed. The agent can know when the AMQP connection become unreliable (queue or connection loose), and needs to do full sync. https://bugs.launchpad.net/neutron/+bug/1438159 Also the agents when gets some notification, they start asking for details via the AMQP - SQL. Why they do not know it already or get it with the notification ? - Original Message - From: Neil Jerram neil.jer...@metaswitch.commailto:neil.jer...@metaswitch.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Sent: Thursday, April 9, 2015 5:01:45 PM Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Hi Joe, Many thanks for your reply! On 09/04/15 03:34, joehuang wrote: Hi, Neil, From theoretic, Neutron is like a broadcast domain, for example, enforcement of DVR and security group has to touch each regarding host where there is VM of this project resides. Even using SDN controller, the touch to regarding host is inevitable. If there are plenty of physical hosts, for example, 10k, inside one Neutron, it's very hard to overcome the broadcast storm issue under concurrent operation, that's the bottleneck for scalability of Neutron. I think I understand that in general terms - but can you be more specific about the broadcast storm? Is there one particular message exchange that involves broadcasting? Is it only from the server to agents, or are there 'broadcasts' in other directions as well? (I presume you are talking about control plane messages here, i.e. between Neutron components. Is that right? Obviously there can also be broadcast storm problems in the data plane - but I don't think that's what you are talking about here.) We need layered architecture in Neutron to solve the broadcast domain bottleneck of scalability. The test report from OpenStack cascading shows that through layered architecture Neutron cascading, Neutron can supports up to million level ports and 100k level physical hosts. You can find the report here: http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers Many thanks, I will take a look at this. Neutron cascading also brings extra benefit: One cascading Neutron can have many cascaded Neutrons, and different cascaded Neutron can leverage different SDN controller, maybe one is ODL, the other one is OpenContrail. Cascading Neutron--- / \ --cascaded Neutron-- --cascaded Neutron- | | -ODL-- OpenContrail And furthermore, if using Neutron cascading in multiple data centers, the DCI controller (Data center inter-connection controller) can also be used under cascading Neutron, to provide NaaS ( network as a service ) across data centers. ---Cascading Neutron
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
The TCP/IP stack keeps track of connections as a combination of IP + TCP port. The two byte port limit doesn't matter unless all of the agents are connecting from the same IP address, which shouldn't be the case unless compute nodes connect to the rabbitmq server via one IP address running port address translation. Either way, the agents don't connect directly to the Neutron server, they connect to the rabbit MQ cluster. Since as many Neutron server processes can be launched as necessary, the bottlenecks will likely show up at the messaging or DB layer. On Sat, Apr 11, 2015 at 6:46 PM, joehuang joehu...@huawei.com wrote: As Kevin talking about agents, I want to remind that in TCP/IP stack, port ( not Neutron Port ) is a two bytes field, i.e. port ranges from 0 ~ 65535, supports maximum 64k port number. above 100k managed node means more than 100k L2 agents/L3 agents... will be alive under Neutron. Want to know the detail design how to support 99.9% possibility for scaling Neutron in this way, and PoC and test would be a good support for this idea. I'm 99.9% sure, for scaling above 100k managed node, we do not really need to split the openstack to multiple smaller openstack, or use significant number of extra controller machine. Best Regards Chaoyi Huang ( joehuang ) -- *From:* Kevin Benton [blak...@gmail.com] *Sent:* 11 April 2015 12:34 *To:* OpenStack Development Mailing List (not for usage questions) *Subject:* Re: [openstack-dev] [neutron] Neutron scaling datapoints? Which periodic updates did you have in mind to eliminate? One of the few remaining ones I can think of is sync_routers but it would be great if you can enumerate the ones you observed because eliminating overhead in agents is something I've been working on as well. One of the most common is the heartbeat from each agent. However, I don't think we can't eliminate them because they are used to determine if the agents are still alive for scheduling purposes. Did you have something else in mind to determine if an agent is alive? On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas afaze...@redhat.com wrote: I'm 99.9% sure, for scaling above 100k managed node, we do not really need to split the openstack to multiple smaller openstack, or use significant number of extra controller machine. The problem is openstack using the right tools SQL/AMQP/(zk), but in a wrong way. For example.: Periodic updates can be avoided almost in all cases The new data can be pushed to the agent just when it needed. The agent can know when the AMQP connection become unreliable (queue or connection loose), and needs to do full sync. https://bugs.launchpad.net/neutron/+bug/1438159 Also the agents when gets some notification, they start asking for details via the AMQP - SQL. Why they do not know it already or get it with the notification ? - Original Message - From: Neil Jerram neil.jer...@metaswitch.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Sent: Thursday, April 9, 2015 5:01:45 PM Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Hi Joe, Many thanks for your reply! On 09/04/15 03:34, joehuang wrote: Hi, Neil, From theoretic, Neutron is like a broadcast domain, for example, enforcement of DVR and security group has to touch each regarding host where there is VM of this project resides. Even using SDN controller, the touch to regarding host is inevitable. If there are plenty of physical hosts, for example, 10k, inside one Neutron, it's very hard to overcome the broadcast storm issue under concurrent operation, that's the bottleneck for scalability of Neutron. I think I understand that in general terms - but can you be more specific about the broadcast storm? Is there one particular message exchange that involves broadcasting? Is it only from the server to agents, or are there 'broadcasts' in other directions as well? (I presume you are talking about control plane messages here, i.e. between Neutron components. Is that right? Obviously there can also be broadcast storm problems in the data plane - but I don't think that's what you are talking about here.) We need layered architecture in Neutron to solve the broadcast domain bottleneck of scalability. The test report from OpenStack cascading shows that through layered architecture Neutron cascading, Neutron can supports up to million level ports and 100k level physical hosts. You can find the report here: http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers Many thanks, I will take a look at this. Neutron cascading also brings extra benefit: One cascading Neutron can have many cascaded Neutrons, and different cascaded Neutron can leverage
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
So IIUC tooz would be handling the liveness detection for the agents. That would be nice to get ride of that logic in Neutron and just register callbacks for rescheduling the dead. Where does it store that state, does it persist timestamps to the DB like Neutron does? If so, how would that scale better? If not, who does a given node ask to know if an agent is online or offline when making a scheduling decision? However, before (what I assume is) the large code change to implement tooz, I would like to quantify that the heartbeats are actually a bottleneck. When I was doing some profiling of them on the master branch a few months ago, processing a heartbeat took an order of magnitude less time (50ms) than the 'sync routers' task of the l3 agent (~300ms). A few query optimizations might buy us a lot more headroom before we have to fall back to large refactors. Kevin Benton wrote: One of the most common is the heartbeat from each agent. However, I don't think we can't eliminate them because they are used to determine if the agents are still alive for scheduling purposes. Did you have something else in mind to determine if an agent is alive? Put each agent in a tooz[1] group; have each agent periodically heartbeat[2], have whoever needs to schedule read the active members of that group (or use [3] to get notified via a callback), profit... Pick from your favorite (supporting) driver at: http://docs.openstack.org/developer/tooz/compatibility.html [1] http://docs.openstack.org/developer/tooz/compatibility.html#grouping [2] https://github.com/openstack/tooz/blob/0.13.1/tooz/coordination.py#L315 [3] http://docs.openstack.org/developer/tooz/tutorial/group_ membership.html#watching-group-changes __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
Which periodic updates did you have in mind to eliminate? One of the few remaining ones I can think of is sync_routers but it would be great if you can enumerate the ones you observed because eliminating overhead in agents is something I've been working on as well. One of the most common is the heartbeat from each agent. However, I don't think we can't eliminate them because they are used to determine if the agents are still alive for scheduling purposes. Did you have something else in mind to determine if an agent is alive? On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas afaze...@redhat.com wrote: I'm 99.9% sure, for scaling above 100k managed node, we do not really need to split the openstack to multiple smaller openstack, or use significant number of extra controller machine. The problem is openstack using the right tools SQL/AMQP/(zk), but in a wrong way. For example.: Periodic updates can be avoided almost in all cases The new data can be pushed to the agent just when it needed. The agent can know when the AMQP connection become unreliable (queue or connection loose), and needs to do full sync. https://bugs.launchpad.net/neutron/+bug/1438159 Also the agents when gets some notification, they start asking for details via the AMQP - SQL. Why they do not know it already or get it with the notification ? - Original Message - From: Neil Jerram neil.jer...@metaswitch.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Sent: Thursday, April 9, 2015 5:01:45 PM Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Hi Joe, Many thanks for your reply! On 09/04/15 03:34, joehuang wrote: Hi, Neil, From theoretic, Neutron is like a broadcast domain, for example, enforcement of DVR and security group has to touch each regarding host where there is VM of this project resides. Even using SDN controller, the touch to regarding host is inevitable. If there are plenty of physical hosts, for example, 10k, inside one Neutron, it's very hard to overcome the broadcast storm issue under concurrent operation, that's the bottleneck for scalability of Neutron. I think I understand that in general terms - but can you be more specific about the broadcast storm? Is there one particular message exchange that involves broadcasting? Is it only from the server to agents, or are there 'broadcasts' in other directions as well? (I presume you are talking about control plane messages here, i.e. between Neutron components. Is that right? Obviously there can also be broadcast storm problems in the data plane - but I don't think that's what you are talking about here.) We need layered architecture in Neutron to solve the broadcast domain bottleneck of scalability. The test report from OpenStack cascading shows that through layered architecture Neutron cascading, Neutron can supports up to million level ports and 100k level physical hosts. You can find the report here: http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers Many thanks, I will take a look at this. Neutron cascading also brings extra benefit: One cascading Neutron can have many cascaded Neutrons, and different cascaded Neutron can leverage different SDN controller, maybe one is ODL, the other one is OpenContrail. Cascading Neutron--- / \ --cascaded Neutron-- --cascaded Neutron- | | -ODL-- OpenContrail And furthermore, if using Neutron cascading in multiple data centers, the DCI controller (Data center inter-connection controller) can also be used under cascading Neutron, to provide NaaS ( network as a service ) across data centers. ---Cascading Neutron-- /| \ --cascaded Neutron-- -DCI controller- --cascaded Neutron- | || -ODL-- | OpenContrail | --(Data center 1)-- --(DCI networking)-- --(Data center 2)-- Is it possible for us to discuss this in OpenStack Vancouver summit? Most certainly, yes. I will be there from mid Monday afternoon through to end Friday. But it will be my first summit, so I have no idea yet as to how I might run into you - please can you suggest! Best Regards Chaoyi Huang ( Joe Huang ) Regards, Neil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
Hi, Neil, See inline comments. Best Regards Chaoyi Huang From: Neil Jerram [neil.jer...@metaswitch.com] Sent: 09 April 2015 23:01 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Hi Joe, Many thanks for your reply! On 09/04/15 03:34, joehuang wrote: Hi, Neil, From theoretic, Neutron is like a broadcast domain, for example, enforcement of DVR and security group has to touch each regarding host where there is VM of this project resides. Even using SDN controller, the touch to regarding host is inevitable. If there are plenty of physical hosts, for example, 10k, inside one Neutron, it's very hard to overcome the broadcast storm issue under concurrent operation, that's the bottleneck for scalability of Neutron. I think I understand that in general terms - but can you be more specific about the broadcast storm? Is there one particular message exchange that involves broadcasting? Is it only from the server to agents, or are there 'broadcasts' in other directions as well? [[joehuang]] for example, L2 population, Security group rule update, DVR route update. Both direction in different scenario. (I presume you are talking about control plane messages here, i.e. between Neutron components. Is that right? Obviously there can also be broadcast storm problems in the data plane - but I don't think that's what you are talking about here.) [[joehuang]] Yes, controll plane here. We need layered architecture in Neutron to solve the broadcast domain bottleneck of scalability. The test report from OpenStack cascading shows that through layered architecture Neutron cascading, Neutron can supports up to million level ports and 100k level physical hosts. You can find the report here: http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers Many thanks, I will take a look at this. Neutron cascading also brings extra benefit: One cascading Neutron can have many cascaded Neutrons, and different cascaded Neutron can leverage different SDN controller, maybe one is ODL, the other one is OpenContrail. Cascading Neutron--- / \ --cascaded Neutron-- --cascaded Neutron- | | -ODL-- OpenContrail And furthermore, if using Neutron cascading in multiple data centers, the DCI controller (Data center inter-connection controller) can also be used under cascading Neutron, to provide NaaS ( network as a service ) across data centers. ---Cascading Neutron-- /| \ --cascaded Neutron-- -DCI controller- --cascaded Neutron- | || -ODL-- | OpenContrail | --(Data center 1)-- --(DCI networking)-- --(Data center 2)-- Is it possible for us to discuss this in OpenStack Vancouver summit? Most certainly, yes. I will be there from mid Monday afternoon through to end Friday. But it will be my first summit, so I have no idea yet as to how I might run into you - please can you suggest! I will also attend the summit whole week, sometimes in the OPNFV parts, sometimes in OpenStack parts. Let me see how to meet. Best Regards Chaoyi Huang ( Joe Huang ) Regards, Neil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
Hi Joe, Many thanks for your reply! On 09/04/15 03:34, joehuang wrote: Hi, Neil, From theoretic, Neutron is like a broadcast domain, for example, enforcement of DVR and security group has to touch each regarding host where there is VM of this project resides. Even using SDN controller, the touch to regarding host is inevitable. If there are plenty of physical hosts, for example, 10k, inside one Neutron, it's very hard to overcome the broadcast storm issue under concurrent operation, that's the bottleneck for scalability of Neutron. I think I understand that in general terms - but can you be more specific about the broadcast storm? Is there one particular message exchange that involves broadcasting? Is it only from the server to agents, or are there 'broadcasts' in other directions as well? (I presume you are talking about control plane messages here, i.e. between Neutron components. Is that right? Obviously there can also be broadcast storm problems in the data plane - but I don't think that's what you are talking about here.) We need layered architecture in Neutron to solve the broadcast domain bottleneck of scalability. The test report from OpenStack cascading shows that through layered architecture Neutron cascading, Neutron can supports up to million level ports and 100k level physical hosts. You can find the report here: http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers Many thanks, I will take a look at this. Neutron cascading also brings extra benefit: One cascading Neutron can have many cascaded Neutrons, and different cascaded Neutron can leverage different SDN controller, maybe one is ODL, the other one is OpenContrail. Cascading Neutron--- / \ --cascaded Neutron-- --cascaded Neutron- | | -ODL-- OpenContrail And furthermore, if using Neutron cascading in multiple data centers, the DCI controller (Data center inter-connection controller) can also be used under cascading Neutron, to provide NaaS ( network as a service ) across data centers. ---Cascading Neutron-- /| \ --cascaded Neutron-- -DCI controller- --cascaded Neutron- | || -ODL-- | OpenContrail | --(Data center 1)-- --(DCI networking)-- --(Data center 2)-- Is it possible for us to discuss this in OpenStack Vancouver summit? Most certainly, yes. I will be there from mid Monday afternoon through to end Friday. But it will be my first summit, so I have no idea yet as to how I might run into you - please can you suggest! Best Regards Chaoyi Huang ( Joe Huang ) Regards, Neil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
Hi Mike, Many thanks for your reply! On 08/04/15 17:56, Mike Spreitzer wrote: Are you looking at scaling the numbers of tenants, Neutron routers, and tenant networks as you scale hosts and guests? I think this is a plausible way to grow. The compartmentalizations that comes with growing those things may make a difference in results. Are you thinking of control plane or data plane limits? In my email I was thinking of control plane points, such as - how many compute host agents can communicate with the Neutron server - how many Neutron server instances or threads are needed - whether there are any limits associated with the Neutron DB (unlikely I guess). Does the use of tenant networks and routers affect those points, in your experience? That would be less obvious to me than simply how many compute hosts or Neutron servers there are. On the data plane side - if that was more what you meant - I can certainly see the limits there and how they are alleviated by using tenant networks and routers, in the L2 model. FWIW, my project Calico [1] tries to avoid those by not providing a L2 domain at all - which can make sense for workloads that only require or provide IP services - and instead routing data through the fabric. To answer your question, then, no, I wasn't thinking of scaling tenant networks and routers, per your suggestion, because Calico doesn't do things that way (or alternatively because Calico already routes everywhere), and because I didn't think that would be relevant to the control plane scaling that I had in mind. But I may be missing something, so please do say if so. Many thanks, Neil [1] http://www.projectcalico.org/ __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
Are you looking at scaling the numbers of tenants, Neutron routers, and tenant networks as you scale hosts and guests? I think this is a plausible way to grow. The compartmentalizations that comes with growing those things may make a difference in results. Thanks, Mike From: Neil Jerram neil.jer...@metaswitch.com To: openstack-dev@lists.openstack.org Date: 04/08/2015 12:29 PM Subject:[openstack-dev] [neutron] Neutron scaling datapoints? My team is working on experiments looking at how far the Neutron server will scale, with increasing numbers of compute hosts and VMs. Does anyone have any datapoints on this that they can share? Or any clever hints? I'm already aware of the following ones: https://javacruft.wordpress.com/2014/06/18/168k-instances/ Icehouse 118 compute hosts 80 Neutron server processes (10 per core on each of 8 cores, on the controller node) 27,000 VMs - but only after disabling all security/iptables http://www.opencontrail.org/openstack-neutron-at-scale/ 1000 hosts 5000 VMs 3 Neutron servers (via a load balancer) But doesn't describe if any specific configuration is needed for this. (Other than using OpenContrail! :-)) Many thanks! Neil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Neutron scaling datapoints?
Hi, Neil, From theoretic, Neutron is like a broadcast domain, for example, enforcement of DVR and security group has to touch each regarding host where there is VM of this project resides. Even using SDN controller, the touch to regarding host is inevitable. If there are plenty of physical hosts, for example, 10k, inside one Neutron, it's very hard to overcome the broadcast storm issue under concurrent operation, that's the bottleneck for scalability of Neutron. We need layered architecture in Neutron to solve the broadcast domain bottleneck of scalability. The test report from OpenStack cascading shows that through layered architecture Neutron cascading, Neutron can supports up to million level ports and 100k level physical hosts. You can find the report here: http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers Neutron cascading also brings extra benefit: One cascading Neutron can have many cascaded Neutrons, and different cascaded Neutron can leverage different SDN controller, maybe one is ODL, the other one is OpenContrail. Cascading Neutron--- / \ --cascaded Neutron-- --cascaded Neutron- | | -ODL-- OpenContrail And furthermore, if using Neutron cascading in multiple data centers, the DCI controller (Data center inter-connection controller) can also be used under cascading Neutron, to provide NaaS ( network as a service ) across data centers. ---Cascading Neutron-- /| \ --cascaded Neutron-- -DCI controller- --cascaded Neutron- | || -ODL-- | OpenContrail | --(Data center 1)-- --(DCI networking)-- --(Data center 2)-- Is it possible for us to discuss this in OpenStack Vancouver summit? Best Regards Chaoyi Huang ( Joe Huang ) -Original Message- From: Neil Jerram [mailto:neil.jer...@metaswitch.com] Sent: Thursday, April 09, 2015 12:27 AM To: openstack-dev@lists.openstack.org Subject: [openstack-dev] [neutron] Neutron scaling datapoints? My team is working on experiments looking at how far the Neutron server will scale, with increasing numbers of compute hosts and VMs. Does anyone have any datapoints on this that they can share? Or any clever hints? I'm already aware of the following ones: https://javacruft.wordpress.com/2014/06/18/168k-instances/ Icehouse 118 compute hosts 80 Neutron server processes (10 per core on each of 8 cores, on the controller node) 27,000 VMs - but only after disabling all security/iptables http://www.opencontrail.org/openstack-neutron-at-scale/ 1000 hosts 5000 VMs 3 Neutron servers (via a load balancer) But doesn't describe if any specific configuration is needed for this. (Other than using OpenContrail! :-)) Many thanks! Neil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev