Re: [openstack-dev] 答复: 答复: [neutron] Neutron scaling datapoints?

2015-04-14 Thread Joshua Harlow

Wangbibo wrote:

Hi Kevin and Joshua,

Thanks for the review. Glad to see that oslo puts distributed
coordination into its scope now. Per out of date info [1] (oslo doesn’t
do it, while each project should do it separately ), specific backend
(zk/memcached) manipulating is included in spec[2], as nova ServiceGroup
did. Now we have tooz, then that part should be moved out of AgentGroup
and let tooz take it over. Neutron AgentGroup spec needs an update, as
what nova ServiceGroup refactor is doing. [3]

Per spec[3], tooz doesn’t intend to eliminate or replace ServiceGroup
completely. They are integrated and co-work to provide nova ServiceGroup
functionalities. That may answer the question from Kevin and Kyle, about
relationship between AgentGroup and tooz. Let’s jump into [3][4]:

1)Service Group still exists;

2)Add Tooz driver for ServiceGroup, to take over zk/redis/… backend;

3)Db-based ServiceGroup driver is retained. Db driver was introduced for
backward compatibility (with db-based liveness monitoring which existed
for a long time before ServiceGroup was added). Since this driver uses
tables and a data model that is intrinsically tied to the internal of
nova, tooz cannot take it over.

4)Zk/memcached ServiceGroup drivers are temporarily retained, but will
be deprecated in future;

5)Eventually, there would be two ServiceGroup drivers: db driver  tooz
driver;

Actually, things are the same for neutron, except that we don’t need to
consider zk/memcached driver deprecation. I would like to refine current
spec and propose a ”Agent Group and using tooz” spec, following the
outlines above. What do you think, Kevin and Joshua? Thanks. J


Sounds great to me :)



Best,

Robin

[1] https://wiki.openstack.org/wiki/NovaZooKeeperHeartbeat

[2] https://review.openstack.org/#/c/168921/

[3]
https://review.openstack.org/#/c/138607/11/specs/liberty/approved/service-group-using-tooz.rst

[4] ServiceGroup refactor code: https://review.openstack.org/#/c/172502/

*发件人:*Wangbibo [mailto:wangb...@huawei.com]
*发送时间:*2015年4月13日16:52
*收件人:*OpenStack Development Mailing List (not for usage questions)
*主题:*[openstack-dev] 答复: [neutron] Neutron scaling datapoints?

Hi Kevin,

Totally agree with you that heartbeat from each agent is something that
we cannot eliminate currently. Agent status depends on it, and further
scheduler and HA depends on agent status.

I proposed a Liberty spec for introducing open framework/pluggable agent
status drivers.[1][2] It allows us to use some other 3^rd party backend
to monitor agent status, such as zookeeper, memcached. Meanwhile, it
guarantees backward compatibility so that users could still use db-based
status monitoring mechanism as their default choice.

Base on that, we may do further optimization on issues Attila and you
mentioned. Thanks.

[1] BP -
https://blueprints.launchpad.net/neutron/+spec/agent-group-and-status-drivers
https://blueprints.launchpad.net/neutron/+spec/agent-group-and-status-drivers

[2] Liberty Spec proposed - https://review.openstack.org/#/c/168921/

Best,

Robin

*发件人:*Kevin Benton [mailto:blak...@gmail.com]
*发送时间:*2015年4月11日12:35
*收件人:*OpenStack Development Mailing List (not for usage questions)
*主题:*Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Which periodic updates did you have in mind to eliminate? One of the few
remaining ones I can think of is sync_routers but it would be great if
you can enumerate the ones you observed because eliminating overhead in
agents is something I've been working on as well.

One of the most common is the heartbeat from each agent. However, I
don't think we can't eliminate them because they are used to determine
if the agents are still alive for scheduling purposes. Did you have
something else in mind to determine if an agent is alive?

On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas afaze...@redhat.com
mailto:afaze...@redhat.com wrote:

I'm 99.9% sure, for scaling above 100k managed node,
we do not really need to split the openstack to multiple smaller openstack,
or use significant number of extra controller machine.

The problem is openstack using the right tools SQL/AMQP/(zk),
but in a wrong way.

For example.:
Periodic updates can be avoided almost in all cases

The new data can be pushed to the agent just when it needed.
The agent can know when the AMQP connection become unreliable (queue or
connection loose),
and needs to do full sync.
https://bugs.launchpad.net/neutron/+bug/1438159
https://bugs.launchpad.net/neutron/+bug/1438159

Also the agents when gets some notification, they start asking for
details via the
AMQP - SQL. Why they do not know it already or get it with the
notification ?


- Original Message -

 From: Neil Jerram neil.jer...@metaswitch.com

mailto:neil.jer...@metaswitch.com


 To: OpenStack Development Mailing List (not for usage questions)

openstack-dev@lists.openstack.org
mailto:openstack-dev@lists.openstack.org

 Sent: Thursday, April 9, 2015 5:01:45 PM
 Subject: Re: [openstack-dev] 

Re: [openstack-dev] 答复: 答复: [neutron] Neutron scaling datapoints?

2015-04-14 Thread Vilobh Meshram
Hi Robin,

The idea sounds good to me too.  I am working on refactoring ServiceGroup
code. Tooz has a nice compatibility matrix which can be found here [2]
which you might find useful.

-Vilobh

[1] Servicegroup code refactoring : https://review.openstack.org/#/c/172502/
[2] Tooz compatibility matrix :
http://docs.openstack.org/developer/tooz/compatibility.html


On Tue, Apr 14, 2015 at 6:07 AM, Wangbibo wangb...@huawei.com wrote:

  Hi Kevin and Joshua,



 Thanks for the review.  Glad to see that oslo puts distributed
 coordination into its scope now.  Per out of date info [1] (oslo doesn’t do
 it, while each project should do it separately ),  specific backend
 (zk/memcached) manipulating is included in spec[2], as nova ServiceGroup
 did.  Now we have tooz, then that part should be moved out of AgentGroup
 and let tooz take it over. Neutron AgentGroup spec needs an update, as what
 nova ServiceGroup refactor is doing. [3]



 Per spec[3], tooz doesn’t intend to eliminate or replace ServiceGroup
 completely. They are integrated and co-work to provide nova ServiceGroup
 functionalities. That may answer the question from Kevin and Kyle, about
 relationship between AgentGroup and tooz. Let’s jump into [3][4]:

 1)  Service Group still exists;

 2)  Add Tooz driver for ServiceGroup, to take over zk/redis/… backend;

 3)  Db-based ServiceGroup driver is retained.  Db driver was
 introduced for backward compatibility (with db-based liveness monitoring
 which existed for a long time before ServiceGroup was added). Since this
 driver uses tables and a data model that is intrinsically tied to the
 internal of nova, tooz cannot take it over.

 4)  Zk/memcached ServiceGroup drivers are temporarily retained, but
 will be deprecated in future;

 5)  Eventually, there would be two ServiceGroup drivers: db driver 
 tooz driver;



 Actually, things are the same for neutron, except that we don’t need to
 consider zk/memcached driver deprecation. I would like to refine current
 spec and propose a ”Agent Group and using tooz” spec, following the
 outlines above. What do you think, Kevin and Joshua? Thanks. J



 Best,

 Robin



 [1] https://wiki.openstack.org/wiki/NovaZooKeeperHeartbeat

 [2] https://review.openstack.org/#/c/168921/

 [3]
 https://review.openstack.org/#/c/138607/11/specs/liberty/approved/service-group-using-tooz.rst

 [4] ServiceGroup refactor code: https://review.openstack.org/#/c/172502/











 *发件人:* Wangbibo [mailto:wangb...@huawei.com]
 *发送时间:* 2015年4月13日 16:52
 *收件人:* OpenStack Development Mailing List (not for usage questions)
 *主题:* [openstack-dev] 答复: [neutron] Neutron scaling datapoints?



 Hi Kevin,



 Totally agree with you that heartbeat from each agent is something that we
 cannot eliminate currently. Agent status depends on it, and further
 scheduler and HA depends on agent status.



 I proposed a Liberty spec for introducing open framework/pluggable agent
 status drivers.[1][2]  It allows us to use some other 3rd party backend
 to monitor agent status, such as zookeeper, memcached. Meanwhile, it
 guarantees backward compatibility so that users could still use db-based
 status monitoring mechanism as their default choice.



 Base on that, we may do further optimization on issues Attila and you
 mentioned. Thanks.



 [1] BP  -
 https://blueprints.launchpad.net/neutron/+spec/agent-group-and-status-drivers

 [2] Liberty Spec proposed - https://review.openstack.org/#/c/168921/



 Best,

 Robin









 *发件人:* Kevin Benton [mailto:blak...@gmail.com blak...@gmail.com]
 *发送时间:* 2015年4月11日 12:35
 *收件人:* OpenStack Development Mailing List (not for usage questions)
 *主题:* Re: [openstack-dev] [neutron] Neutron scaling datapoints?



 Which periodic updates did you have in mind to eliminate? One of the few
 remaining ones I can think of is sync_routers but it would be great if you
 can enumerate the ones you observed because eliminating overhead in agents
 is something I've been working on as well.



 One of the most common is the heartbeat from each agent. However, I don't
 think we can't eliminate them because they are used to determine if the
 agents are still alive for scheduling purposes. Did you have something else
 in mind to determine if an agent is alive?



 On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas afaze...@redhat.com
 wrote:

 I'm 99.9% sure, for scaling above 100k managed node,
 we do not really need to split the openstack to multiple smaller openstack,
 or use significant number of extra controller machine.

 The problem is openstack using the right tools SQL/AMQP/(zk),
 but in a wrong way.

 For example.:
 Periodic updates can be avoided almost in all cases

 The new data can be pushed to the agent just when it needed.
 The agent can know when the AMQP connection become unreliable (queue or
 connection loose),
 and needs to do full sync.
 https://bugs.launchpad.net/neutron/+bug/1438159

 Also the agents when gets some notification, they start 

[openstack-dev] 答复: 答复: [neutron] Neutron scaling datapoints?

2015-04-14 Thread Wangbibo
Hi Kevin and Joshua,

Thanks for the review.  Glad to see that oslo puts distributed coordination 
into its scope now.  Per out of date info [1] (oslo doesn’t do it, while each 
project should do it separately ),  specific backend (zk/memcached) 
manipulating is included in spec[2], as nova ServiceGroup did.  Now we have 
tooz, then that part should be moved out of AgentGroup and let tooz take it 
over. Neutron AgentGroup spec needs an update, as what nova ServiceGroup 
refactor is doing. [3]

Per spec[3], tooz doesn’t intend to eliminate or replace ServiceGroup 
completely. They are integrated and co-work to provide nova ServiceGroup 
functionalities. That may answer the question from Kevin and Kyle, about 
relationship between AgentGroup and tooz. Let’s jump into [3][4]:

1)  Service Group still exists;

2)  Add Tooz driver for ServiceGroup, to take over zk/redis/… backend;

3)  Db-based ServiceGroup driver is retained.  Db driver was introduced for 
backward compatibility (with db-based liveness monitoring which existed for a 
long time before ServiceGroup was added). Since this driver uses tables and a 
data model that is intrinsically tied to the internal of nova, tooz cannot take 
it over.

4)  Zk/memcached ServiceGroup drivers are temporarily retained, but will be 
deprecated in future;

5)  Eventually, there would be two ServiceGroup drivers: db driver  tooz 
driver;

Actually, things are the same for neutron, except that we don’t need to 
consider zk/memcached driver deprecation. I would like to refine current spec 
and propose a ”Agent Group and using tooz” spec, following the outlines above. 
What do you think, Kevin and Joshua? Thanks. ☺

Best,
Robin

[1] https://wiki.openstack.org/wiki/NovaZooKeeperHeartbeat
[2] https://review.openstack.org/#/c/168921/
[3] 
https://review.openstack.org/#/c/138607/11/specs/liberty/approved/service-group-using-tooz.rst
[4] ServiceGroup refactor code: https://review.openstack.org/#/c/172502/





发件人: Wangbibo [mailto:wangb...@huawei.com]
发送时间: 2015年4月13日 16:52
收件人: OpenStack Development Mailing List (not for usage questions)
主题: [openstack-dev] 答复: [neutron] Neutron scaling datapoints?

Hi Kevin,

Totally agree with you that heartbeat from each agent is something that we 
cannot eliminate currently. Agent status depends on it, and further scheduler 
and HA depends on agent status.

I proposed a Liberty spec for introducing open framework/pluggable agent status 
drivers.[1][2]  It allows us to use some other 3rd party backend to monitor 
agent status, such as zookeeper, memcached. Meanwhile, it guarantees backward 
compatibility so that users could still use db-based status monitoring 
mechanism as their default choice.

Base on that, we may do further optimization on issues Attila and you 
mentioned. Thanks.

[1] BP  -  
https://blueprints.launchpad.net/neutron/+spec/agent-group-and-status-drivers
[2] Liberty Spec proposed - https://review.openstack.org/#/c/168921/

Best,
Robin




发件人: Kevin Benton [mailto:blak...@gmail.com]
发送时间: 2015年4月11日 12:35
收件人: OpenStack Development Mailing List (not for usage questions)
主题: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Which periodic updates did you have in mind to eliminate? One of the few 
remaining ones I can think of is sync_routers but it would be great if you can 
enumerate the ones you observed because eliminating overhead in agents is 
something I've been working on as well.

One of the most common is the heartbeat from each agent. However, I don't think 
we can't eliminate them because they are used to determine if the agents are 
still alive for scheduling purposes. Did you have something else in mind to 
determine if an agent is alive?

On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas 
afaze...@redhat.commailto:afaze...@redhat.com wrote:
I'm 99.9% sure, for scaling above 100k managed node,
we do not really need to split the openstack to multiple smaller openstack,
or use significant number of extra controller machine.

The problem is openstack using the right tools SQL/AMQP/(zk),
but in a wrong way.

For example.:
Periodic updates can be avoided almost in all cases

The new data can be pushed to the agent just when it needed.
The agent can know when the AMQP connection become unreliable (queue or 
connection loose),
and needs to do full sync.
https://bugs.launchpad.net/neutron/+bug/1438159

Also the agents when gets some notification, they start asking for details via 
the
AMQP - SQL. Why they do not know it already or get it with the notification ?


- Original Message -
 From: Neil Jerram 
 neil.jer...@metaswitch.commailto:neil.jer...@metaswitch.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
 Sent: Thursday, April 9, 2015 5:01:45 PM
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

 Hi Joe,

 Many thanks for your reply!

 On 09/04/15 03:34, joehuang