[openstack-dev] 答复: 答复: 答复: [neutron] Neutron scaling datapoints?

2015-04-14 Thread Wangbibo
Hi Vilobh,

Thanks a lot for the info. I’ll refine previous spec and propose a new one soon.

Actually the AgentGroup code, as well as the db-base AgentGroup driver is 
almost done. It works well when testing in my OpenStack setup. Will also post 
the code, which may help understand why AgentGroup and tooz is useful for 
neutron scalability and performance. Hope to get more comments from neutron 
team. Thanks.

Best,
Robin

发件人: Vilobh Meshram [mailto:vilobhmeshram.openst...@gmail.com]
发送时间: 2015年4月15日 2:50
收件人: OpenStack Development Mailing List (not for usage questions)
主题: Re: [openstack-dev] 答复: 答复: [neutron] Neutron scaling datapoints?

Hi Robin,

The idea sounds good to me too.  I am working on refactoring ServiceGroup code. 
Tooz has a nice compatibility matrix which can be found here [2] which you 
might find useful.

-Vilobh

[1] Servicegroup code refactoring : https://review.openstack.org/#/c/172502/
[2] Tooz compatibility matrix : 
http://docs.openstack.org/developer/tooz/compatibility.html


On Tue, Apr 14, 2015 at 6:07 AM, Wangbibo 
wangb...@huawei.commailto:wangb...@huawei.com wrote:
Hi Kevin and Joshua,

Thanks for the review.  Glad to see that oslo puts distributed coordination 
into its scope now.  Per out of date info [1] (oslo doesn’t do it, while each 
project should do it separately ),  specific backend (zk/memcached) 
manipulating is included in spec[2], as nova ServiceGroup did.  Now we have 
tooz, then that part should be moved out of AgentGroup and let tooz take it 
over. Neutron AgentGroup spec needs an update, as what nova ServiceGroup 
refactor is doing. [3]

Per spec[3], tooz doesn’t intend to eliminate or replace ServiceGroup 
completely. They are integrated and co-work to provide nova ServiceGroup 
functionalities. That may answer the question from Kevin and Kyle, about 
relationship between AgentGroup and tooz. Let’s jump into [3][4]:

1)  Service Group still exists;

2)  Add Tooz driver for ServiceGroup, to take over zk/redis/… backend;

3)  Db-based ServiceGroup driver is retained.  Db driver was introduced for 
backward compatibility (with db-based liveness monitoring which existed for a 
long time before ServiceGroup was added). Since this driver uses tables and a 
data model that is intrinsically tied to the internal of nova, tooz cannot take 
it over.

4)  Zk/memcached ServiceGroup drivers are temporarily retained, but will be 
deprecated in future;

5)  Eventually, there would be two ServiceGroup drivers: db driver  tooz 
driver;

Actually, things are the same for neutron, except that we don’t need to 
consider zk/memcached driver deprecation. I would like to refine current spec 
and propose a ”Agent Group and using tooz” spec, following the outlines above. 
What do you think, Kevin and Joshua? Thanks. ☺

Best,
Robin

[1] https://wiki.openstack.org/wiki/NovaZooKeeperHeartbeat
[2] https://review.openstack.org/#/c/168921/
[3] 
https://review.openstack.org/#/c/138607/11/specs/liberty/approved/service-group-using-tooz.rst
[4] ServiceGroup refactor code: https://review.openstack.org/#/c/172502/





发件人: Wangbibo [mailto:wangb...@huawei.commailto:wangb...@huawei.com]
发送时间: 2015年4月13日 16:52
收件人: OpenStack Development Mailing List (not for usage questions)
主题: [openstack-dev] 答复: [neutron] Neutron scaling datapoints?

Hi Kevin,

Totally agree with you that heartbeat from each agent is something that we 
cannot eliminate currently. Agent status depends on it, and further scheduler 
and HA depends on agent status.

I proposed a Liberty spec for introducing open framework/pluggable agent status 
drivers.[1][2]  It allows us to use some other 3rd party backend to monitor 
agent status, such as zookeeper, memcached. Meanwhile, it guarantees backward 
compatibility so that users could still use db-based status monitoring 
mechanism as their default choice.

Base on that, we may do further optimization on issues Attila and you 
mentioned. Thanks.

[1] BP  -  
https://blueprints.launchpad.net/neutron/+spec/agent-group-and-status-drivers
[2] Liberty Spec proposed - https://review.openstack.org/#/c/168921/

Best,
Robin




发件人: Kevin Benton [mailto:blak...@gmail.com]
发送时间: 2015年4月11日 12:35
收件人: OpenStack Development Mailing List (not for usage questions)
主题: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Which periodic updates did you have in mind to eliminate? One of the few 
remaining ones I can think of is sync_routers but it would be great if you can 
enumerate the ones you observed because eliminating overhead in agents is 
something I've been working on as well.

One of the most common is the heartbeat from each agent. However, I don't think 
we can't eliminate them because they are used to determine if the agents are 
still alive for scheduling purposes. Did you have something else in mind to 
determine if an agent is alive?

On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas 
afaze...@redhat.commailto:afaze...@redhat.com wrote:
I'm 99.9% sure

[openstack-dev] 答复: 答复: [neutron] Neutron scaling datapoints?

2015-04-14 Thread Wangbibo
Hi Kevin and Joshua,

Thanks for the review.  Glad to see that oslo puts distributed coordination 
into its scope now.  Per out of date info [1] (oslo doesn’t do it, while each 
project should do it separately ),  specific backend (zk/memcached) 
manipulating is included in spec[2], as nova ServiceGroup did.  Now we have 
tooz, then that part should be moved out of AgentGroup and let tooz take it 
over. Neutron AgentGroup spec needs an update, as what nova ServiceGroup 
refactor is doing. [3]

Per spec[3], tooz doesn’t intend to eliminate or replace ServiceGroup 
completely. They are integrated and co-work to provide nova ServiceGroup 
functionalities. That may answer the question from Kevin and Kyle, about 
relationship between AgentGroup and tooz. Let’s jump into [3][4]:

1)  Service Group still exists;

2)  Add Tooz driver for ServiceGroup, to take over zk/redis/… backend;

3)  Db-based ServiceGroup driver is retained.  Db driver was introduced for 
backward compatibility (with db-based liveness monitoring which existed for a 
long time before ServiceGroup was added). Since this driver uses tables and a 
data model that is intrinsically tied to the internal of nova, tooz cannot take 
it over.

4)  Zk/memcached ServiceGroup drivers are temporarily retained, but will be 
deprecated in future;

5)  Eventually, there would be two ServiceGroup drivers: db driver  tooz 
driver;

Actually, things are the same for neutron, except that we don’t need to 
consider zk/memcached driver deprecation. I would like to refine current spec 
and propose a ”Agent Group and using tooz” spec, following the outlines above. 
What do you think, Kevin and Joshua? Thanks. ☺

Best,
Robin

[1] https://wiki.openstack.org/wiki/NovaZooKeeperHeartbeat
[2] https://review.openstack.org/#/c/168921/
[3] 
https://review.openstack.org/#/c/138607/11/specs/liberty/approved/service-group-using-tooz.rst
[4] ServiceGroup refactor code: https://review.openstack.org/#/c/172502/





发件人: Wangbibo [mailto:wangb...@huawei.com]
发送时间: 2015年4月13日 16:52
收件人: OpenStack Development Mailing List (not for usage questions)
主题: [openstack-dev] 答复: [neutron] Neutron scaling datapoints?

Hi Kevin,

Totally agree with you that heartbeat from each agent is something that we 
cannot eliminate currently. Agent status depends on it, and further scheduler 
and HA depends on agent status.

I proposed a Liberty spec for introducing open framework/pluggable agent status 
drivers.[1][2]  It allows us to use some other 3rd party backend to monitor 
agent status, such as zookeeper, memcached. Meanwhile, it guarantees backward 
compatibility so that users could still use db-based status monitoring 
mechanism as their default choice.

Base on that, we may do further optimization on issues Attila and you 
mentioned. Thanks.

[1] BP  -  
https://blueprints.launchpad.net/neutron/+spec/agent-group-and-status-drivers
[2] Liberty Spec proposed - https://review.openstack.org/#/c/168921/

Best,
Robin




发件人: Kevin Benton [mailto:blak...@gmail.com]
发送时间: 2015年4月11日 12:35
收件人: OpenStack Development Mailing List (not for usage questions)
主题: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Which periodic updates did you have in mind to eliminate? One of the few 
remaining ones I can think of is sync_routers but it would be great if you can 
enumerate the ones you observed because eliminating overhead in agents is 
something I've been working on as well.

One of the most common is the heartbeat from each agent. However, I don't think 
we can't eliminate them because they are used to determine if the agents are 
still alive for scheduling purposes. Did you have something else in mind to 
determine if an agent is alive?

On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas 
afaze...@redhat.commailto:afaze...@redhat.com wrote:
I'm 99.9% sure, for scaling above 100k managed node,
we do not really need to split the openstack to multiple smaller openstack,
or use significant number of extra controller machine.

The problem is openstack using the right tools SQL/AMQP/(zk),
but in a wrong way.

For example.:
Periodic updates can be avoided almost in all cases

The new data can be pushed to the agent just when it needed.
The agent can know when the AMQP connection become unreliable (queue or 
connection loose),
and needs to do full sync.
https://bugs.launchpad.net/neutron/+bug/1438159

Also the agents when gets some notification, they start asking for details via 
the
AMQP - SQL. Why they do not know it already or get it with the notification ?


- Original Message -
 From: Neil Jerram 
 neil.jer...@metaswitch.commailto:neil.jer...@metaswitch.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
 Sent: Thursday, April 9, 2015 5:01:45 PM
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

 Hi Joe,

 Many thanks for your reply!

 On 09/04/15 03:34, joehuang

[openstack-dev] 答复: [neutron] Neutron scaling datapoints?

2015-04-13 Thread Wangbibo
Hi Kevin,

Totally agree with you that heartbeat from each agent is something that we 
cannot eliminate currently. Agent status depends on it, and further scheduler 
and HA depends on agent status.

I proposed a Liberty spec for introducing open framework/pluggable agent status 
drivers.[1][2]  It allows us to use some other 3rd party backend to monitor 
agent status, such as zookeeper, memcached. Meanwhile, it guarantees backward 
compatibility so that users could still use db-based status monitoring 
mechanism as their default choice.

Base on that, we may do further optimization on issues Attila and you 
mentioned. Thanks.

[1] BP  -  
https://blueprints.launchpad.net/neutron/+spec/agent-group-and-status-drivers
[2] Liberty Spec proposed - https://review.openstack.org/#/c/168921/

Best,
Robin




发件人: Kevin Benton [mailto:blak...@gmail.com]
发送时间: 2015年4月11日 12:35
收件人: OpenStack Development Mailing List (not for usage questions)
主题: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Which periodic updates did you have in mind to eliminate? One of the few 
remaining ones I can think of is sync_routers but it would be great if you can 
enumerate the ones you observed because eliminating overhead in agents is 
something I've been working on as well.

One of the most common is the heartbeat from each agent. However, I don't think 
we can't eliminate them because they are used to determine if the agents are 
still alive for scheduling purposes. Did you have something else in mind to 
determine if an agent is alive?

On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas 
afaze...@redhat.commailto:afaze...@redhat.com wrote:
I'm 99.9% sure, for scaling above 100k managed node,
we do not really need to split the openstack to multiple smaller openstack,
or use significant number of extra controller machine.

The problem is openstack using the right tools SQL/AMQP/(zk),
but in a wrong way.

For example.:
Periodic updates can be avoided almost in all cases

The new data can be pushed to the agent just when it needed.
The agent can know when the AMQP connection become unreliable (queue or 
connection loose),
and needs to do full sync.
https://bugs.launchpad.net/neutron/+bug/1438159

Also the agents when gets some notification, they start asking for details via 
the
AMQP - SQL. Why they do not know it already or get it with the notification ?


- Original Message -
 From: Neil Jerram 
 neil.jer...@metaswitch.commailto:neil.jer...@metaswitch.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
 Sent: Thursday, April 9, 2015 5:01:45 PM
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

 Hi Joe,

 Many thanks for your reply!

 On 09/04/15 03:34, joehuang wrote:
  Hi, Neil,
 
   From theoretic, Neutron is like a broadcast domain, for example,
   enforcement of DVR and security group has to touch each regarding host
   where there is VM of this project resides. Even using SDN controller, the
   touch to regarding host is inevitable. If there are plenty of physical
   hosts, for example, 10k, inside one Neutron, it's very hard to overcome
   the broadcast storm issue under concurrent operation, that's the
   bottleneck for scalability of Neutron.

 I think I understand that in general terms - but can you be more
 specific about the broadcast storm?  Is there one particular message
 exchange that involves broadcasting?  Is it only from the server to
 agents, or are there 'broadcasts' in other directions as well?

 (I presume you are talking about control plane messages here, i.e.
 between Neutron components.  Is that right?  Obviously there can also be
 broadcast storm problems in the data plane - but I don't think that's
 what you are talking about here.)

  We need layered architecture in Neutron to solve the broadcast domain
  bottleneck of scalability. The test report from OpenStack cascading shows
  that through layered architecture Neutron cascading, Neutron can
  supports up to million level ports and 100k level physical hosts. You can
  find the report here:
  http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers

 Many thanks, I will take a look at this.

  Neutron cascading also brings extra benefit: One cascading Neutron can
  have many cascaded Neutrons, and different cascaded Neutron can leverage
  different SDN controller, maybe one is ODL, the other one is OpenContrail.
 
  Cascading Neutron---
   / \
  --cascaded Neutron--   --cascaded Neutron-
  |  |
  -ODL--   OpenContrail
 
 
  And furthermore, if using Neutron cascading in multiple data centers, the
  DCI controller (Data center inter-connection controller) can also be used
  under cascading Neutron, to provide NaaS ( network as a service ) across
  data