Re: [openstack-dev] 答复: 答复: [neutron] Neutron scaling datapoints?
Wangbibo wrote: Hi Kevin and Joshua, Thanks for the review. Glad to see that oslo puts distributed coordination into its scope now. Per out of date info [1] (oslo doesn’t do it, while each project should do it separately ), specific backend (zk/memcached) manipulating is included in spec[2], as nova ServiceGroup did. Now we have tooz, then that part should be moved out of AgentGroup and let tooz take it over. Neutron AgentGroup spec needs an update, as what nova ServiceGroup refactor is doing. [3] Per spec[3], tooz doesn’t intend to eliminate or replace ServiceGroup completely. They are integrated and co-work to provide nova ServiceGroup functionalities. That may answer the question from Kevin and Kyle, about relationship between AgentGroup and tooz. Let’s jump into [3][4]: 1)Service Group still exists; 2)Add Tooz driver for ServiceGroup, to take over zk/redis/… backend; 3)Db-based ServiceGroup driver is retained. Db driver was introduced for backward compatibility (with db-based liveness monitoring which existed for a long time before ServiceGroup was added). Since this driver uses tables and a data model that is intrinsically tied to the internal of nova, tooz cannot take it over. 4)Zk/memcached ServiceGroup drivers are temporarily retained, but will be deprecated in future; 5)Eventually, there would be two ServiceGroup drivers: db driver tooz driver; Actually, things are the same for neutron, except that we don’t need to consider zk/memcached driver deprecation. I would like to refine current spec and propose a ”Agent Group and using tooz” spec, following the outlines above. What do you think, Kevin and Joshua? Thanks. J Sounds great to me :) Best, Robin [1] https://wiki.openstack.org/wiki/NovaZooKeeperHeartbeat [2] https://review.openstack.org/#/c/168921/ [3] https://review.openstack.org/#/c/138607/11/specs/liberty/approved/service-group-using-tooz.rst [4] ServiceGroup refactor code: https://review.openstack.org/#/c/172502/ *发件人:*Wangbibo [mailto:wangb...@huawei.com] *发送时间:*2015年4月13日16:52 *收件人:*OpenStack Development Mailing List (not for usage questions) *主题:*[openstack-dev] 答复: [neutron] Neutron scaling datapoints? Hi Kevin, Totally agree with you that heartbeat from each agent is something that we cannot eliminate currently. Agent status depends on it, and further scheduler and HA depends on agent status. I proposed a Liberty spec for introducing open framework/pluggable agent status drivers.[1][2] It allows us to use some other 3^rd party backend to monitor agent status, such as zookeeper, memcached. Meanwhile, it guarantees backward compatibility so that users could still use db-based status monitoring mechanism as their default choice. Base on that, we may do further optimization on issues Attila and you mentioned. Thanks. [1] BP - https://blueprints.launchpad.net/neutron/+spec/agent-group-and-status-drivers https://blueprints.launchpad.net/neutron/+spec/agent-group-and-status-drivers [2] Liberty Spec proposed - https://review.openstack.org/#/c/168921/ Best, Robin *发件人:*Kevin Benton [mailto:blak...@gmail.com] *发送时间:*2015年4月11日12:35 *收件人:*OpenStack Development Mailing List (not for usage questions) *主题:*Re: [openstack-dev] [neutron] Neutron scaling datapoints? Which periodic updates did you have in mind to eliminate? One of the few remaining ones I can think of is sync_routers but it would be great if you can enumerate the ones you observed because eliminating overhead in agents is something I've been working on as well. One of the most common is the heartbeat from each agent. However, I don't think we can't eliminate them because they are used to determine if the agents are still alive for scheduling purposes. Did you have something else in mind to determine if an agent is alive? On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas afaze...@redhat.com mailto:afaze...@redhat.com wrote: I'm 99.9% sure, for scaling above 100k managed node, we do not really need to split the openstack to multiple smaller openstack, or use significant number of extra controller machine. The problem is openstack using the right tools SQL/AMQP/(zk), but in a wrong way. For example.: Periodic updates can be avoided almost in all cases The new data can be pushed to the agent just when it needed. The agent can know when the AMQP connection become unreliable (queue or connection loose), and needs to do full sync. https://bugs.launchpad.net/neutron/+bug/1438159 https://bugs.launchpad.net/neutron/+bug/1438159 Also the agents when gets some notification, they start asking for details via the AMQP - SQL. Why they do not know it already or get it with the notification ? - Original Message - From: Neil Jerram neil.jer...@metaswitch.com mailto:neil.jer...@metaswitch.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org mailto:openstack-dev@lists.openstack.org Sent: Thursday, April 9, 2015 5:01:45 PM Subject: Re: [openstack-dev]
Re: [openstack-dev] 答复: 答复: [neutron] Neutron scaling datapoints?
Hi Robin, The idea sounds good to me too. I am working on refactoring ServiceGroup code. Tooz has a nice compatibility matrix which can be found here [2] which you might find useful. -Vilobh [1] Servicegroup code refactoring : https://review.openstack.org/#/c/172502/ [2] Tooz compatibility matrix : http://docs.openstack.org/developer/tooz/compatibility.html On Tue, Apr 14, 2015 at 6:07 AM, Wangbibo wangb...@huawei.com wrote: Hi Kevin and Joshua, Thanks for the review. Glad to see that oslo puts distributed coordination into its scope now. Per out of date info [1] (oslo doesn’t do it, while each project should do it separately ), specific backend (zk/memcached) manipulating is included in spec[2], as nova ServiceGroup did. Now we have tooz, then that part should be moved out of AgentGroup and let tooz take it over. Neutron AgentGroup spec needs an update, as what nova ServiceGroup refactor is doing. [3] Per spec[3], tooz doesn’t intend to eliminate or replace ServiceGroup completely. They are integrated and co-work to provide nova ServiceGroup functionalities. That may answer the question from Kevin and Kyle, about relationship between AgentGroup and tooz. Let’s jump into [3][4]: 1) Service Group still exists; 2) Add Tooz driver for ServiceGroup, to take over zk/redis/… backend; 3) Db-based ServiceGroup driver is retained. Db driver was introduced for backward compatibility (with db-based liveness monitoring which existed for a long time before ServiceGroup was added). Since this driver uses tables and a data model that is intrinsically tied to the internal of nova, tooz cannot take it over. 4) Zk/memcached ServiceGroup drivers are temporarily retained, but will be deprecated in future; 5) Eventually, there would be two ServiceGroup drivers: db driver tooz driver; Actually, things are the same for neutron, except that we don’t need to consider zk/memcached driver deprecation. I would like to refine current spec and propose a ”Agent Group and using tooz” spec, following the outlines above. What do you think, Kevin and Joshua? Thanks. J Best, Robin [1] https://wiki.openstack.org/wiki/NovaZooKeeperHeartbeat [2] https://review.openstack.org/#/c/168921/ [3] https://review.openstack.org/#/c/138607/11/specs/liberty/approved/service-group-using-tooz.rst [4] ServiceGroup refactor code: https://review.openstack.org/#/c/172502/ *发件人:* Wangbibo [mailto:wangb...@huawei.com] *发送时间:* 2015年4月13日 16:52 *收件人:* OpenStack Development Mailing List (not for usage questions) *主题:* [openstack-dev] 答复: [neutron] Neutron scaling datapoints? Hi Kevin, Totally agree with you that heartbeat from each agent is something that we cannot eliminate currently. Agent status depends on it, and further scheduler and HA depends on agent status. I proposed a Liberty spec for introducing open framework/pluggable agent status drivers.[1][2] It allows us to use some other 3rd party backend to monitor agent status, such as zookeeper, memcached. Meanwhile, it guarantees backward compatibility so that users could still use db-based status monitoring mechanism as their default choice. Base on that, we may do further optimization on issues Attila and you mentioned. Thanks. [1] BP - https://blueprints.launchpad.net/neutron/+spec/agent-group-and-status-drivers [2] Liberty Spec proposed - https://review.openstack.org/#/c/168921/ Best, Robin *发件人:* Kevin Benton [mailto:blak...@gmail.com blak...@gmail.com] *发送时间:* 2015年4月11日 12:35 *收件人:* OpenStack Development Mailing List (not for usage questions) *主题:* Re: [openstack-dev] [neutron] Neutron scaling datapoints? Which periodic updates did you have in mind to eliminate? One of the few remaining ones I can think of is sync_routers but it would be great if you can enumerate the ones you observed because eliminating overhead in agents is something I've been working on as well. One of the most common is the heartbeat from each agent. However, I don't think we can't eliminate them because they are used to determine if the agents are still alive for scheduling purposes. Did you have something else in mind to determine if an agent is alive? On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas afaze...@redhat.com wrote: I'm 99.9% sure, for scaling above 100k managed node, we do not really need to split the openstack to multiple smaller openstack, or use significant number of extra controller machine. The problem is openstack using the right tools SQL/AMQP/(zk), but in a wrong way. For example.: Periodic updates can be avoided almost in all cases The new data can be pushed to the agent just when it needed. The agent can know when the AMQP connection become unreliable (queue or connection loose), and needs to do full sync. https://bugs.launchpad.net/neutron/+bug/1438159 Also the agents when gets some notification, they start
[openstack-dev] 答复: 答复: [neutron] Neutron scaling datapoints?
Hi Kevin and Joshua, Thanks for the review. Glad to see that oslo puts distributed coordination into its scope now. Per out of date info [1] (oslo doesn’t do it, while each project should do it separately ), specific backend (zk/memcached) manipulating is included in spec[2], as nova ServiceGroup did. Now we have tooz, then that part should be moved out of AgentGroup and let tooz take it over. Neutron AgentGroup spec needs an update, as what nova ServiceGroup refactor is doing. [3] Per spec[3], tooz doesn’t intend to eliminate or replace ServiceGroup completely. They are integrated and co-work to provide nova ServiceGroup functionalities. That may answer the question from Kevin and Kyle, about relationship between AgentGroup and tooz. Let’s jump into [3][4]: 1) Service Group still exists; 2) Add Tooz driver for ServiceGroup, to take over zk/redis/… backend; 3) Db-based ServiceGroup driver is retained. Db driver was introduced for backward compatibility (with db-based liveness monitoring which existed for a long time before ServiceGroup was added). Since this driver uses tables and a data model that is intrinsically tied to the internal of nova, tooz cannot take it over. 4) Zk/memcached ServiceGroup drivers are temporarily retained, but will be deprecated in future; 5) Eventually, there would be two ServiceGroup drivers: db driver tooz driver; Actually, things are the same for neutron, except that we don’t need to consider zk/memcached driver deprecation. I would like to refine current spec and propose a ”Agent Group and using tooz” spec, following the outlines above. What do you think, Kevin and Joshua? Thanks. ☺ Best, Robin [1] https://wiki.openstack.org/wiki/NovaZooKeeperHeartbeat [2] https://review.openstack.org/#/c/168921/ [3] https://review.openstack.org/#/c/138607/11/specs/liberty/approved/service-group-using-tooz.rst [4] ServiceGroup refactor code: https://review.openstack.org/#/c/172502/ 发件人: Wangbibo [mailto:wangb...@huawei.com] 发送时间: 2015年4月13日 16:52 收件人: OpenStack Development Mailing List (not for usage questions) 主题: [openstack-dev] 答复: [neutron] Neutron scaling datapoints? Hi Kevin, Totally agree with you that heartbeat from each agent is something that we cannot eliminate currently. Agent status depends on it, and further scheduler and HA depends on agent status. I proposed a Liberty spec for introducing open framework/pluggable agent status drivers.[1][2] It allows us to use some other 3rd party backend to monitor agent status, such as zookeeper, memcached. Meanwhile, it guarantees backward compatibility so that users could still use db-based status monitoring mechanism as their default choice. Base on that, we may do further optimization on issues Attila and you mentioned. Thanks. [1] BP - https://blueprints.launchpad.net/neutron/+spec/agent-group-and-status-drivers [2] Liberty Spec proposed - https://review.openstack.org/#/c/168921/ Best, Robin 发件人: Kevin Benton [mailto:blak...@gmail.com] 发送时间: 2015年4月11日 12:35 收件人: OpenStack Development Mailing List (not for usage questions) 主题: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Which periodic updates did you have in mind to eliminate? One of the few remaining ones I can think of is sync_routers but it would be great if you can enumerate the ones you observed because eliminating overhead in agents is something I've been working on as well. One of the most common is the heartbeat from each agent. However, I don't think we can't eliminate them because they are used to determine if the agents are still alive for scheduling purposes. Did you have something else in mind to determine if an agent is alive? On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas afaze...@redhat.commailto:afaze...@redhat.com wrote: I'm 99.9% sure, for scaling above 100k managed node, we do not really need to split the openstack to multiple smaller openstack, or use significant number of extra controller machine. The problem is openstack using the right tools SQL/AMQP/(zk), but in a wrong way. For example.: Periodic updates can be avoided almost in all cases The new data can be pushed to the agent just when it needed. The agent can know when the AMQP connection become unreliable (queue or connection loose), and needs to do full sync. https://bugs.launchpad.net/neutron/+bug/1438159 Also the agents when gets some notification, they start asking for details via the AMQP - SQL. Why they do not know it already or get it with the notification ? - Original Message - From: Neil Jerram neil.jer...@metaswitch.commailto:neil.jer...@metaswitch.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Sent: Thursday, April 9, 2015 5:01:45 PM Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints? Hi Joe, Many thanks for your reply! On 09/04/15 03:34, joehuang