Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-22 Thread Joshua Harlow

And for another recent one that came out yesterday:

Interesting to read for those who are using mongodb + openstack...

https://aphyr.com/posts/322-call-me-maybe-mongodb-stale-reads

-Josh

Joshua Harlow wrote:

Joshua Harlow wrote:

Kevin Benton wrote:

Timestamps are just one way (and likely the most primitive), using
redis (or memcache) key/value and expiry are another (and letting
memcache or redis expire using its own internal algorithms), using
zookeeper ephemeral nodes[1] are another... The point being that its
backend specific and tooz supports varying backends.

Very cool. Is the backend completely transparent so a deployer could
choose a service they are comfortable maintaining, or will that change
the properties WRT to resiliency of state on node restarts,
partitions, etc?


Of course... we tried to make it 'completely' transparent, but in
reality certain backends (zookeeper which uses a paxos-like algorithm
and redis with sentinel support...) are better (more resilient, more
consistent, handle partitions/restarts better...) than others (memcached
is after all just a distributed cache). This is just the nature of the
game...



And for some more reading fun:

https://aphyr.com/posts/315-call-me-maybe-rabbitmq

https://aphyr.com/posts/291-call-me-maybe-zookeeper

https://aphyr.com/posts/283-call-me-maybe-redis

https://aphyr.com/posts/316-call-me-maybe-etcd-and-consul

... (aphyr.com has alot of these neat posts)...



The Nova implementation of Tooz seemed pretty straight-forward, although
it looked like it had pluggable drivers for service management already.
Before I dig into it much further I'll file a spec on the Neutron side
to see if I can get some other cores onboard to do the review work if I
push a change to tooz.


Sounds good to me.




On Sun, Apr 12, 2015 at 9:38 AM, Joshua Harlow harlo...@outlook.com
mailto:harlo...@outlook.com wrote:

Kevin Benton wrote:

So IIUC tooz would be handling the liveness detection for the
agents.
That would be nice to get ride of that logic in Neutron and just
register callbacks for rescheduling the dead.

Where does it store that state, does it persist timestamps to the DB
like Neutron does? If so, how would that scale better? If not,
who does
a given node ask to know if an agent is online or offline when
making a
scheduling decision?


Timestamps are just one way (and likely the most primitive), using
redis (or memcache) key/value and expiry are another (and letting
memcache or redis expire using its own internal algorithms), using
zookeeper ephemeral nodes[1] are another... The point being that its
backend specific and tooz supports varying backends.


However, before (what I assume is) the large code change to
implement
tooz, I would like to quantify that the heartbeats are actually a
bottleneck. When I was doing some profiling of them on the
master branch
a few months ago, processing a heartbeat took an order of
magnitude less
time (50ms) than the 'sync routers' task of the l3 agent
(~300ms). A
few query optimizations might buy us a lot more headroom before
we have
to fall back to large refactors.


Sure, always good to avoid prematurely optimizing things...

Although this is relevant for u I think anyway:

https://review.openstack.org/#__/c/138607/
https://review.openstack.org/#/c/138607/ (same thing/nearly same
in nova)...

https://review.openstack.org/#__/c/172502/
https://review.openstack.org/#/c/172502/ (a WIP implementation of
the latter).

[1]
https://zookeeper.apache.org/__doc/trunk/__zookeeperProgrammers.html#__Ephemeral+Nodes


https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#Ephemeral+Nodes




Kevin Benton wrote:


One of the most common is the heartbeat from each agent.
However, I
don't think we can't eliminate them because they are used
to determine
if the agents are still alive for scheduling purposes. Did
you have
something else in mind to determine if an agent is alive?


Put each agent in a tooz[1] group; have each agent periodically
heartbeat[2], have whoever needs to schedule read the active
members of
that group (or use [3] to get notified via a callback), profit...

Pick from your favorite (supporting) driver at:

http://docs.openstack.org/developer/tooz/compatibility.html
http://docs.openstack.org/__developer/tooz/compatibility.__html
http://docs.openstack.org/__developer/tooz/compatibility.__html
http://docs.openstack.org/developer/tooz/compatibility.html

[1]
http://docs.openstack.org/developer/tooz/compatibility.html#grouping


http://docs.openstack.org/__developer/tooz/compatibility.__html#grouping


http://docs.openstack.org/__developer/tooz/compatibility.__html#grouping

http://docs.openstack.org/developer/tooz/compatibility.html#grouping
[2]
https://github.com/openstack/tooz/blob/0.13.1/tooz/coordination.py#L315


https://github.com/openstack/__tooz/blob/0.13.1/tooz/__coordination.py#L315


https://github.com/openstack/__tooz/blob/0.13.1/tooz/__coordination.py#L315



Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-17 Thread joehuang
Hi, Attila,

only address the issue of agent status/liveness management is not enough for 
Neutron scalability. The concurrent dynamic load impact on large scale ( for 
example 100k managed nodes with the dynamic load like security group rule 
update, routers_updated, etc ) should also be taken into account too. So even 
if is agent status/liveness management improved in Neutron, that doesn't mean 
the scalability issue totally being addressed.

And on the other hand, Nova already supports several segregation concepts, for 
example, Cells, Availability Zone... If there are 100k nodes to be managed by 
one OpenStack instances, it's impossible to work without hardware resources 
segregation. It's weird to put agent liveness manager in availability zone(AZ 
in short) 1, but all managed agents in AZ 2. If AZ 1 is power off, then all 
agents in AZ2 lost management. 

The benchmark is already here for scalability test report for million ports 
scalability of Neutron  
http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers

The cascading may be not perfect, but at least it provides a feasible way if we 
really want scalability.

I am also working to evolve OpenStack to a world no need to worry about 
OpenStack Scalability Issue based on cascading:

Tenant level virtual OpenStack service over hybrid or federated or multiple 
OpenStack based clouds:

There are lots of OpenStack based clouds, each tenant will be allocated with 
one cascading OpenStack as the virtual OpenStack service, and single OpenStack 
API endpoint served for this tenant. The tenant's resources can be distributed 
or dynamically scaled to multi-OpenStack based clouds, these clouds may be 
federated with KeyStone, or using shared KeyStone, or  even some OpenStack 
clouds built in AWS or Azure, or VMWare vSphere.

Under this deployment scenario, unlimited scalability in a cloud can be 
achieved, no unified cascading layer, tenant level resources orchestration 
among multi-OpenStack clouds fully distributed(even geographically). The 
database and load for one casacding OpenStack is very very small, easy for 
disaster recovery or backup. Multiple tenant may share one cascading OpenStack 
to reduce resource waste, but the principle is to keep the cascading OpenStack 
as thin as possible.

You can find the information here:
https://wiki.openstack.org/wiki/OpenStack_cascading_solution#Use_Case

Best Regards
Chaoyi Huang ( joehuang )

-Original Message-
From: Attila Fazekas [mailto:afaze...@redhat.com] 
Sent: Thursday, April 16, 2015 3:06 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?





- Original Message -
 From: joehuang joehu...@huawei.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Sunday, April 12, 2015 3:46:24 AM
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
 
 
 
 As Kevin talking about agents, I want to remind that in TCP/IP stack, 
 port ( not Neutron Port ) is a two bytes field, i.e. port ranges from 
 0 ~ 65535, supports maximum 64k port number.
 
 
 
  above 100k managed node  means more than 100k L2 agents/L3 
 agents... will be alive under Neutron.
 
 
 
 Want to know the detail design how to support 99.9% possibility for 
 scaling Neutron in this way, and PoC and test would be a good support for 
 this idea.
 

Would you consider something as PoC which uses the technology in similar way, 
with a similar port - security problem, but with a lower level API than neutron 
using currently ?

Is it an acceptable flaw:
If you kill -9 the q-svc 1 times at the `right` millisec the rabbitmq 
memory usage increases by ~1MiB ? (Rabbit usually eats ~10GiB under pressure) 
The memory can be freed without broker restart, it also gets freed on agent 
restart.


 
 
 I'm 99.9% sure, for scaling above 100k managed node, we do not really 
 need to split the openstack to multiple smaller openstack, or use 
 significant number of extra controller machine.
 
 
 
 Best Regards
 
 
 
 Chaoyi Huang ( joehuang )
 
 
 
 From: Kevin Benton [blak...@gmail.com]
 Sent: 11 April 2015 12:34
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
 
 Which periodic updates did you have in mind to eliminate? One of the 
 few remaining ones I can think of is sync_routers but it would be 
 great if you can enumerate the ones you observed because eliminating 
 overhead in agents is something I've been working on as well.
 
 One of the most common is the heartbeat from each agent. However, I 
 don't think we can't eliminate them because they are used to determine 
 if the agents are still alive for scheduling purposes. Did you have 
 something else in mind to determine if an agent is alive?
 
 On Fri, Apr 10

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-17 Thread Attila Fazekas




- Original Message -
 From: joehuang joehu...@huawei.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Friday, April 17, 2015 9:46:12 AM
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
 
 Hi, Attila,
 
 only address the issue of agent status/liveness management is not enough for
 Neutron scalability. The concurrent dynamic load impact on large scale ( for
 example 100k managed nodes with the dynamic load like security group rule
 update, routers_updated, etc ) should also be taken into account too. So
 even if is agent status/liveness management improved in Neutron, that
 doesn't mean the scalability issue totally being addressed.
 

This story is not about the heartbeat.
https://bugs.launchpad.net/neutron/+bug/1438159

What I am looking for is managing lot of nodes, with minimal `controller` 
resources.

The actual required system changes like (for example regarding to vm boot) 
per/sec
is relative low, even if you have many nodes and vms. - Consider the instances 
average lifetime -

The `bug` is for the resources what the agents are related and querying many 
times,
BTW: I am thinking about several alternatives and other variants.

In neutron case a `system change` can affect multiple agents
like security group rule change.

It seams possible to have all agents to `query` a resource only once,
and being notified by any subsequent change `for free`. (IP, sec group rule, 
new neighbor) 

This is the scenario when the message brokers can shine and scale,
and it also offloads lot of work from the DB.


 And on the other hand, Nova already supports several segregation concepts,
 for example, Cells, Availability Zone... If there are 100k nodes to be
 managed by one OpenStack instances, it's impossible to work without hardware
 resources segregation. It's weird to put agent liveness manager in
 availability zone(AZ in short) 1, but all managed agents in AZ 2. If AZ 1 is
 power off, then all agents in AZ2 lost management.
 

 The benchmark is already here for scalability test report for million ports
 scalability of Neutron 
 http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers
 
 The cascading may be not perfect, but at least it provides a feasible way if
 we really want scalability.
 
 I am also working to evolve OpenStack to a world no need to worry about
 OpenStack Scalability Issue based on cascading:
 
 Tenant level virtual OpenStack service over hybrid or federated or multiple
 OpenStack based clouds:
 
 There are lots of OpenStack based clouds, each tenant will be allocated with
 one cascading OpenStack as the virtual OpenStack service, and single
 OpenStack API endpoint served for this tenant. The tenant's resources can be
 distributed or dynamically scaled to multi-OpenStack based clouds, these
 clouds may be federated with KeyStone, or using shared KeyStone, or  even
 some OpenStack clouds built in AWS or Azure, or VMWare vSphere.

 
 Under this deployment scenario, unlimited scalability in a cloud can be
 achieved, no unified cascading layer, tenant level resources orchestration
 among multi-OpenStack clouds fully distributed(even geographically). The
 database and load for one casacding OpenStack is very very small, easy for
 disaster recovery or backup. Multiple tenant may share one cascading
 OpenStack to reduce resource waste, but the principle is to keep the
 cascading OpenStack as thin as possible.

 You can find the information here:
 https://wiki.openstack.org/wiki/OpenStack_cascading_solution#Use_Case
 
 Best Regards
 Chaoyi Huang ( joehuang )
 
 -Original Message-
 From: Attila Fazekas [mailto:afaze...@redhat.com]
 Sent: Thursday, April 16, 2015 3:06 PM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
 
 
 
 
 
 - Original Message -
  From: joehuang joehu...@huawei.com
  To: OpenStack Development Mailing List (not for usage questions)
  openstack-dev@lists.openstack.org
  Sent: Sunday, April 12, 2015 3:46:24 AM
  Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
  
  
  
  As Kevin talking about agents, I want to remind that in TCP/IP stack,
  port ( not Neutron Port ) is a two bytes field, i.e. port ranges from
  0 ~ 65535, supports maximum 64k port number.
  
  
  
   above 100k managed node  means more than 100k L2 agents/L3
  agents... will be alive under Neutron.
  
  
  
  Want to know the detail design how to support 99.9% possibility for
  scaling Neutron in this way, and PoC and test would be a good support for
  this idea.
  
 
 Would you consider something as PoC which uses the technology in similar way,
 with a similar port - security problem, but with a lower level API than
 neutron using currently ?
 
 Is it an acceptable flaw:
 If you kill -9 the q-svc 1 times

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-16 Thread Attila Fazekas




- Original Message -
 From: joehuang joehu...@huawei.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Sunday, April 12, 2015 3:46:24 AM
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
 
 
 
 As Kevin talking about agents, I want to remind that in TCP/IP stack, port (
 not Neutron Port ) is a two bytes field, i.e. port ranges from 0 ~ 65535,
 supports maximum 64k port number.
 
 
 
  above 100k managed node  means more than 100k L2 agents/L3 agents... will
 be alive under Neutron.
 
 
 
 Want to know the detail design how to support 99.9% possibility for scaling
 Neutron in this way, and PoC and test would be a good support for this idea.
 

Would you consider something as PoC which uses the technology in similar way,
with a similar port - security problem, but with a lower level API
than neutron using currently ?

Is it an acceptable flaw:
If you kill -9 the q-svc 1 times at the `right` millisec the rabbitmq
memory usage increases by ~1MiB ? (Rabbit usually eats ~10GiB under pressure)
The memory can be freed without broker restart, it also gets freed on
agent restart.


 
 
 I'm 99.9% sure, for scaling above 100k managed node,
 we do not really need to split the openstack to multiple smaller openstack,
 or use significant number of extra controller machine.
 
 
 
 Best Regards
 
 
 
 Chaoyi Huang ( joehuang )
 
 
 
 From: Kevin Benton [blak...@gmail.com]
 Sent: 11 April 2015 12:34
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
 
 Which periodic updates did you have in mind to eliminate? One of the few
 remaining ones I can think of is sync_routers but it would be great if you
 can enumerate the ones you observed because eliminating overhead in agents
 is something I've been working on as well.
 
 One of the most common is the heartbeat from each agent. However, I don't
 think we can't eliminate them because they are used to determine if the
 agents are still alive for scheduling purposes. Did you have something else
 in mind to determine if an agent is alive?
 
 On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas  afaze...@redhat.com 
 wrote:
 
 
 I'm 99.9% sure, for scaling above 100k managed node,
 we do not really need to split the openstack to multiple smaller openstack,
 or use significant number of extra controller machine.
 
 The problem is openstack using the right tools SQL/AMQP/(zk),
 but in a wrong way.
 
 For example.:
 Periodic updates can be avoided almost in all cases
 
 The new data can be pushed to the agent just when it needed.
 The agent can know when the AMQP connection become unreliable (queue or
 connection loose),
 and needs to do full sync.
 https://bugs.launchpad.net/neutron/+bug/1438159
 
 Also the agents when gets some notification, they start asking for details
 via the
 AMQP - SQL. Why they do not know it already or get it with the notification
 ?
 
 
 - Original Message -
  From: Neil Jerram  neil.jer...@metaswitch.com 
  To: OpenStack Development Mailing List (not for usage questions) 
  openstack-dev@lists.openstack.org 
  Sent: Thursday, April 9, 2015 5:01:45 PM
  Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
  
  Hi Joe,
  
  Many thanks for your reply!
  
  On 09/04/15 03:34, joehuang wrote:
   Hi, Neil,
   
   From theoretic, Neutron is like a broadcast domain, for example,
   enforcement of DVR and security group has to touch each regarding host
   where there is VM of this project resides. Even using SDN controller, the
   touch to regarding host is inevitable. If there are plenty of physical
   hosts, for example, 10k, inside one Neutron, it's very hard to overcome
   the broadcast storm issue under concurrent operation, that's the
   bottleneck for scalability of Neutron.
  
  I think I understand that in general terms - but can you be more
  specific about the broadcast storm? Is there one particular message
  exchange that involves broadcasting? Is it only from the server to
  agents, or are there 'broadcasts' in other directions as well?
  
  (I presume you are talking about control plane messages here, i.e.
  between Neutron components. Is that right? Obviously there can also be
  broadcast storm problems in the data plane - but I don't think that's
  what you are talking about here.)
  
   We need layered architecture in Neutron to solve the broadcast domain
   bottleneck of scalability. The test report from OpenStack cascading shows
   that through layered architecture Neutron cascading, Neutron can
   supports up to million level ports and 100k level physical hosts. You can
   find the report here:
   http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers
  
  Many thanks, I will take a look at this.
  
   Neutron cascading also brings extra benefit: One cascading Neutron can

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-16 Thread Neil Jerram

Thanks Joe, I really appreciate these numbers.

For an individual (cascaded) Neutron, then, your testing showed that it 
could happily handle 1000 compute hosts.  Apart from the cascading on 
the northbound side, was that otherwise unmodified from vanilla 
OpenStack?  Do you recall any particular config settings that were 
needed to achieve that?  (e.g. api_workers and rpc_workers)


Regards,
Neil


On 16/04/15 03:03, joehuang wrote:

In case it's helpful to see all the cases together, sync_routers (from the L3 
agent) was also mentioned in other part of this thread.  Plus of course the liveness 
reporting from all agents.

In the test report [1], which shows Neutron can supports up to million level 
ports and 100k level physical hosts, the scalability is done by one cascading 
Neutron to manage 100 cascaded Neutrons through current Neutron restful API. 
For normal Neutron, each compute node will host L2 agent/OVS, L3 agent/DVR. In 
the cascading Neutron layer, the L2 agent is modified to interact with 
regarding cascaded Neutron but not OVS, the L3 agent(DVR) is modified to 
interact with regarding cascaded Neutron but not linux route. That's why we 
call the cascaded Neutron is the backend of Neutron.

Therefore, there are only 100 compute nodes (or say agent ) required in the 
cascading layer, each compute node will manage one cascaded Neutron. Each 
cascaded Neutron can manage up to 1000 nodes (there is already report and 
deployment and lab test can support this). That's the scalability to 100k nodes.

Because the cloud is splited into two layer (100 nodes in the cascading layer, 
1000 nodes in each cascaded layer ), even current mechanism can meet the demand 
for sync_routers and liveness reporting from all agents, or L2 population, DVR 
router update...etc.

The test report [1] at least prove that the layered architecture idea is 
feasible for Neutron scalability, even up to million level ports and 100k level 
nodes. The extra benefit for the layered architecture is that each cascaded 
Neutron can leverage different backend technology implementation, for example, 
one is ML2+OVS, another is OVN or ODL or Calico...

[1]test report for million ports scalability of Neutron 
http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers

Best Regards
Chaoyi Huang ( Joe Huang )

-Original Message-
From: Neil Jerram [mailto:neil.jer...@metaswitch.com]
Sent: Wednesday, April 15, 2015 9:46 PM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Hi again Joe, (+ list)

On 11/04/15 02:00, joehuang wrote:

Hi, Neil,

See inline comments.

Best Regards

Chaoyi Huang


From: Neil Jerram [neil.jer...@metaswitch.com]
Sent: 09 April 2015 23:01
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Hi Joe,

Many thanks for your reply!

On 09/04/15 03:34, joehuang wrote:

Hi, Neil,

   From theoretic, Neutron is like a broadcast domain, for example, enforcement of DVR and 
security group has to touch each regarding host where there is VM of this project resides. Even using SDN 
controller, the touch to regarding host is inevitable. If there are plenty of physical hosts, for 
example, 10k, inside one Neutron, it's very hard to overcome the broadcast storm issue under 
concurrent operation, that's the bottleneck for scalability of Neutron.


I think I understand that in general terms - but can you be more
specific about the broadcast storm?  Is there one particular message
exchange that involves broadcasting?  Is it only from the server to
agents, or are there 'broadcasts' in other directions as well?

[[joehuang]] for example, L2 population, Security group rule update, DVR route 
update. Both direction in different scenario.


Thanks.  In case it's helpful to see all the cases together, sync_routers (from 
the L3 agent) was also mentioned in other part of this thread.  Plus of course 
the liveness reporting from all agents.


(I presume you are talking about control plane messages here, i.e.
between Neutron components.  Is that right?  Obviously there can also
be broadcast storm problems in the data plane - but I don't think
that's what you are talking about here.)

[[joehuang]] Yes, controll plane here.


Thanks for confirming that.


We need layered architecture in Neutron to solve the broadcast
domain bottleneck of scalability. The test report from OpenStack
cascading shows that through layered architecture Neutron
cascading, Neutron can supports up to million level ports and 100k
level physical hosts. You can find the report here:
http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascad
ing-solution-to-support-1-million-v-ms-in-100-data-centers


Many thanks, I will take a look at this.


It was very interesting, thanks.  And by following through your

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-16 Thread joehuang
Hi, Neil,

The api_wokers / rpc-worker configuration for the cascading layer can be found 
in the test report, it's based on community Juno version, and some issues found 
and listed in the end of the report. 

Simulator is used for the cascaded OpenStackNo configuration in the test. For 
configuration of api_worker/rpc_works for one OpenStack Neutron to support 1152 
nodes, you can refer to the article http://www.openstack.cn/p2932.html or 
http://www.csdn.net/article/2014-12-19/2823077, but unfortunately, it was 
written in Chinese, and no detail number of workers.

Best Regards
Chaoyi Huang ( Joe Huang )


-Original Message-
From: Neil Jerram [mailto:neil.jer...@metaswitch.com] 
Sent: Thursday, April 16, 2015 5:15 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Thanks Joe, I really appreciate these numbers.

For an individual (cascaded) Neutron, then, your testing showed that it could 
happily handle 1000 compute hosts.  Apart from the cascading on the northbound 
side, was that otherwise unmodified from vanilla OpenStack?  Do you recall any 
particular config settings that were needed to achieve that?  (e.g. api_workers 
and rpc_workers)

Regards,
Neil


On 16/04/15 03:03, joehuang wrote:
 In case it's helpful to see all the cases together, sync_routers (from the 
 L3 agent) was also mentioned in other part of this thread.  Plus of course 
 the liveness reporting from all agents.

 In the test report [1], which shows Neutron can supports up to million level 
 ports and 100k level physical hosts, the scalability is done by one cascading 
 Neutron to manage 100 cascaded Neutrons through current Neutron restful API. 
 For normal Neutron, each compute node will host L2 agent/OVS, L3 agent/DVR. 
 In the cascading Neutron layer, the L2 agent is modified to interact with 
 regarding cascaded Neutron but not OVS, the L3 agent(DVR) is modified to 
 interact with regarding cascaded Neutron but not linux route. That's why we 
 call the cascaded Neutron is the backend of Neutron.

 Therefore, there are only 100 compute nodes (or say agent ) required in the 
 cascading layer, each compute node will manage one cascaded Neutron. Each 
 cascaded Neutron can manage up to 1000 nodes (there is already report and 
 deployment and lab test can support this). That's the scalability to 100k 
 nodes.

 Because the cloud is splited into two layer (100 nodes in the cascading 
 layer, 1000 nodes in each cascaded layer ), even current mechanism can meet 
 the demand for sync_routers and liveness reporting from all agents, or L2 
 population, DVR router update...etc.

 The test report [1] at least prove that the layered architecture idea is 
 feasible for Neutron scalability, even up to million level ports and 100k 
 level nodes. The extra benefit for the layered architecture is that each 
 cascaded Neutron can leverage different backend technology implementation, 
 for example, one is ML2+OVS, another is OVN or ODL or Calico...

 [1]test report for million ports scalability of Neutron 
 http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascadi
 ng-solution-to-support-1-million-v-ms-in-100-data-centers

 Best Regards
 Chaoyi Huang ( Joe Huang )

 -Original Message-
 From: Neil Jerram [mailto:neil.jer...@metaswitch.com]
 Sent: Wednesday, April 15, 2015 9:46 PM
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

 Hi again Joe, (+ list)

 On 11/04/15 02:00, joehuang wrote:
 Hi, Neil,

 See inline comments.

 Best Regards

 Chaoyi Huang

 
 From: Neil Jerram [neil.jer...@metaswitch.com]
 Sent: 09 April 2015 23:01
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

 Hi Joe,

 Many thanks for your reply!

 On 09/04/15 03:34, joehuang wrote:
 Hi, Neil,

From theoretic, Neutron is like a broadcast domain, for example, 
 enforcement of DVR and security group has to touch each regarding host 
 where there is VM of this project resides. Even using SDN controller, the 
 touch to regarding host is inevitable. If there are plenty of physical 
 hosts, for example, 10k, inside one Neutron, it's very hard to overcome the 
 broadcast storm issue under concurrent operation, that's the bottleneck 
 for scalability of Neutron.

 I think I understand that in general terms - but can you be more 
 specific about the broadcast storm?  Is there one particular message 
 exchange that involves broadcasting?  Is it only from the server to 
 agents, or are there 'broadcasts' in other directions as well?

 [[joehuang]] for example, L2 population, Security group rule update, DVR 
 route update. Both direction in different scenario.

 Thanks.  In case it's helpful to see all the cases together, sync_routers 
 (from the L3 agent) was also mentioned in other

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-15 Thread Neil Jerram

Hi again Joe, (+ list)

On 11/04/15 02:00, joehuang wrote:

Hi, Neil,

See inline comments.

Best Regards

Chaoyi Huang


From: Neil Jerram [neil.jer...@metaswitch.com]
Sent: 09 April 2015 23:01
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Hi Joe,

Many thanks for your reply!

On 09/04/15 03:34, joehuang wrote:

Hi, Neil,

  From theoretic, Neutron is like a broadcast domain, for example, enforcement of DVR and 
security group has to touch each regarding host where there is VM of this project resides. Even using SDN 
controller, the touch to regarding host is inevitable. If there are plenty of physical hosts, for 
example, 10k, inside one Neutron, it's very hard to overcome the broadcast storm issue under 
concurrent operation, that's the bottleneck for scalability of Neutron.


I think I understand that in general terms - but can you be more
specific about the broadcast storm?  Is there one particular message
exchange that involves broadcasting?  Is it only from the server to
agents, or are there 'broadcasts' in other directions as well?

[[joehuang]] for example, L2 population, Security group rule update, DVR route 
update. Both direction in different scenario.


Thanks.  In case it's helpful to see all the cases together, 
sync_routers (from the L3 agent) was also mentioned in other part of 
this thread.  Plus of course the liveness reporting from all agents.



(I presume you are talking about control plane messages here, i.e.
between Neutron components.  Is that right?  Obviously there can also be
broadcast storm problems in the data plane - but I don't think that's
what you are talking about here.)

[[joehuang]] Yes, controll plane here.


Thanks for confirming that.


We need layered architecture in Neutron to solve the broadcast domain bottleneck of 
scalability. The test report from OpenStack cascading shows that through layered architecture 
Neutron cascading, Neutron can supports up to million level ports and 100k level 
physical hosts. You can find the report here: 
http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers


Many thanks, I will take a look at this.


It was very interesting, thanks.  And by following through your links I 
also learned more about Nova cells, and about how some people question 
whether we need any kind of partitioning at all, and should instead 
solve scaling/performance problems in other ways...  It will be 
interesting to see how this plays out.


I'd still like to see more information, though, about how far people 
have scaled OpenStack - and in particular Neutron - as it exists today. 
 Surely having a consensus set of current limits is an important input 
into any discussion of future scaling work.


For example, Kevin mentioned benchmarking where the Neutron server 
processed a liveness update in 50ms and a sync_routers in 300ms. 
Suppose, the liveness update time was 50ms (since I don't know in detail 
what that  means) and agents report liveness every 30s.  Does that mean 
that a single Neutron server can only support 600 agents?


I'm also especially interested in the DHCP agent, because in Calico we 
have one of those on every compute host.  We've just run tests which 
appeared to be hitting trouble from just 50 compute hosts onwards, and 
apparently because of DHCP agent communications.  We need to continue 
looking into that and report findings properly, but if anyone already 
has any insights, they would be much appreciated.


Many thanks,
Neil

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-15 Thread Joshua Harlow

Neil Jerram wrote:

Hi again Joe, (+ list)

On 11/04/15 02:00, joehuang wrote:

Hi, Neil,

See inline comments.

Best Regards

Chaoyi Huang


From: Neil Jerram [neil.jer...@metaswitch.com]
Sent: 09 April 2015 23:01
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Hi Joe,

Many thanks for your reply!

On 09/04/15 03:34, joehuang wrote:

Hi, Neil,

From theoretic, Neutron is like a broadcast domain, for example,
enforcement of DVR and security group has to touch each regarding
host where there is VM of this project resides. Even using SDN
controller, the touch to regarding host is inevitable. If there are
plenty of physical hosts, for example, 10k, inside one Neutron, it's
very hard to overcome the broadcast storm issue under concurrent
operation, that's the bottleneck for scalability of Neutron.


I think I understand that in general terms - but can you be more
specific about the broadcast storm? Is there one particular message
exchange that involves broadcasting? Is it only from the server to
agents, or are there 'broadcasts' in other directions as well?

[[joehuang]] for example, L2 population, Security group rule update,
DVR route update. Both direction in different scenario.


Thanks. In case it's helpful to see all the cases together, sync_routers
(from the L3 agent) was also mentioned in other part of this thread.
Plus of course the liveness reporting from all agents.


(I presume you are talking about control plane messages here, i.e.
between Neutron components. Is that right? Obviously there can also be
broadcast storm problems in the data plane - but I don't think that's
what you are talking about here.)

[[joehuang]] Yes, controll plane here.


Thanks for confirming that.


We need layered architecture in Neutron to solve the broadcast
domain bottleneck of scalability. The test report from OpenStack
cascading shows that through layered architecture Neutron
cascading, Neutron can supports up to million level ports and 100k
level physical hosts. You can find the report here:
http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers



Many thanks, I will take a look at this.


It was very interesting, thanks. And by following through your links I
also learned more about Nova cells, and about how some people question
whether we need any kind of partitioning at all, and should instead
solve scaling/performance problems in other ways... It will be
interesting to see how this plays out.

I'd still like to see more information, though, about how far people
have scaled OpenStack - and in particular Neutron - as it exists today.
Surely having a consensus set of current limits is an important input
into any discussion of future scaling work.


+2 to this...

Shooting for the moon (although nice in theory) is not so useful when 
you can't even get up a hill ;)




For example, Kevin mentioned benchmarking where the Neutron server
processed a liveness update in 50ms and a sync_routers in 300ms.
Suppose, the liveness update time was 50ms (since I don't know in detail
what that  means) and agents report liveness every 30s. Does that mean
that a single Neutron server can only support 600 agents?

I'm also especially interested in the DHCP agent, because in Calico we
have one of those on every compute host. We've just run tests which
appeared to be hitting trouble from just 50 compute hosts onwards, and
apparently because of DHCP agent communications. We need to continue
looking into that and report findings properly, but if anyone already
has any insights, they would be much appreciated.

Many thanks,
Neil

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-15 Thread joehuang
In case it's helpful to see all the cases together, sync_routers (from the L3 
agent) was also mentioned in other part of this thread.  Plus of course the 
liveness reporting from all agents.

In the test report [1], which shows Neutron can supports up to million level 
ports and 100k level physical hosts, the scalability is done by one cascading 
Neutron to manage 100 cascaded Neutrons through current Neutron restful API. 
For normal Neutron, each compute node will host L2 agent/OVS, L3 agent/DVR. In 
the cascading Neutron layer, the L2 agent is modified to interact with 
regarding cascaded Neutron but not OVS, the L3 agent(DVR) is modified to 
interact with regarding cascaded Neutron but not linux route. That's why we 
call the cascaded Neutron is the backend of Neutron. 

Therefore, there are only 100 compute nodes (or say agent ) required in the 
cascading layer, each compute node will manage one cascaded Neutron. Each 
cascaded Neutron can manage up to 1000 nodes (there is already report and 
deployment and lab test can support this). That's the scalability to 100k nodes.

Because the cloud is splited into two layer (100 nodes in the cascading layer, 
1000 nodes in each cascaded layer ), even current mechanism can meet the demand 
for sync_routers and liveness reporting from all agents, or L2 population, DVR 
router update...etc. 

The test report [1] at least prove that the layered architecture idea is 
feasible for Neutron scalability, even up to million level ports and 100k level 
nodes. The extra benefit for the layered architecture is that each cascaded 
Neutron can leverage different backend technology implementation, for example, 
one is ML2+OVS, another is OVN or ODL or Calico...

[1]test report for million ports scalability of Neutron 
http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers

Best Regards
Chaoyi Huang ( Joe Huang )

-Original Message-
From: Neil Jerram [mailto:neil.jer...@metaswitch.com] 
Sent: Wednesday, April 15, 2015 9:46 PM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Hi again Joe, (+ list)

On 11/04/15 02:00, joehuang wrote:
 Hi, Neil,

 See inline comments.

 Best Regards

 Chaoyi Huang

 
 From: Neil Jerram [neil.jer...@metaswitch.com]
 Sent: 09 April 2015 23:01
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

 Hi Joe,

 Many thanks for your reply!

 On 09/04/15 03:34, joehuang wrote:
 Hi, Neil,

   From theoretic, Neutron is like a broadcast domain, for example, 
 enforcement of DVR and security group has to touch each regarding host where 
 there is VM of this project resides. Even using SDN controller, the touch 
 to regarding host is inevitable. If there are plenty of physical hosts, for 
 example, 10k, inside one Neutron, it's very hard to overcome the broadcast 
 storm issue under concurrent operation, that's the bottleneck for 
 scalability of Neutron.

 I think I understand that in general terms - but can you be more 
 specific about the broadcast storm?  Is there one particular message 
 exchange that involves broadcasting?  Is it only from the server to 
 agents, or are there 'broadcasts' in other directions as well?

 [[joehuang]] for example, L2 population, Security group rule update, DVR 
 route update. Both direction in different scenario.

Thanks.  In case it's helpful to see all the cases together, sync_routers (from 
the L3 agent) was also mentioned in other part of this thread.  Plus of course 
the liveness reporting from all agents.

 (I presume you are talking about control plane messages here, i.e.
 between Neutron components.  Is that right?  Obviously there can also 
 be broadcast storm problems in the data plane - but I don't think 
 that's what you are talking about here.)

 [[joehuang]] Yes, controll plane here.

Thanks for confirming that.

 We need layered architecture in Neutron to solve the broadcast 
 domain bottleneck of scalability. The test report from OpenStack 
 cascading shows that through layered architecture Neutron 
 cascading, Neutron can supports up to million level ports and 100k 
 level physical hosts. You can find the report here: 
 http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascad
 ing-solution-to-support-1-million-v-ms-in-100-data-centers

 Many thanks, I will take a look at this.

It was very interesting, thanks.  And by following through your links I also 
learned more about Nova cells, and about how some people question whether we 
need any kind of partitioning at all, and should instead solve 
scaling/performance problems in other ways...  It will be interesting to see 
how this plays out.

I'd still like to see more information, though, about how far people have 
scaled OpenStack - and in particular Neutron - as it exists today

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-15 Thread joehuang
Hi, Joshua,

This is a long discussion thread, may we come back to the scalability topic? 

As you confirmed, Tooz only addresses the issue of agent status management, not 
solve the concurrent dynamic load impact on large scale ( for example 100k 
managed nodes with the dynamic load like security group rule update, 
routers_updated, etc ). 

So even if Tooz is implemented in Neutron, that doesn't mean the scalability 
issue totally being addressed. 

So what's the goal and the whole picture to address the Neutron scalability? 
And Tooz will help the picture to be completed.
 
Best Regards
Chaoyi Huang ( Joe Huang )

-Original Message-
From: Joshua Harlow [mailto:harlo...@outlook.com] 
Sent: Tuesday, April 14, 2015 11:33 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] 答复: [neutron] Neutron scaling datapoints?

Daniel Comnea wrote:
 Joshua,

 those are old and have been fixed/ documented on Consul side.
 As for ZK, i have nothing against it, just wish you good luck running 
 it in a multi cross-DC setup :)

Totally fair, although I start to question a cross-DC setup of things, and why 
that's needed in this (and/or any) architecture, but to each there own ;)


 Dani

 On Mon, Apr 13, 2015 at 11:37 PM, Joshua Harlow harlo...@outlook.com
 mailto:harlo...@outlook.com wrote:

 Did the following get addressed?

 https://aphyr.com/posts/316-__call-me-maybe-etcd-and-consul
 https://aphyr.com/posts/316-call-me-maybe-etcd-and-consul

 Seems like quite a few things got raised in that post about etcd/consul.

 Maybe they are fixed, idk...

 https://aphyr.com/posts/291-__call-me-maybe-zookeeper
 https://aphyr.com/posts/291-call-me-maybe-zookeeper though worked
 as expected (and without issue)...

 I quote:

 '''
 Recommendations

 Use Zookeeper. It’s mature, well-designed, and battle-tested.
 Because the consequences of its connection model and linearizability
 properties are subtle, you should, wherever possible, take advantage
 of tested recipes and client libraries like Curator, which do their
 best to correctly handle the complex state transitions associated
 with session and connection loss.
 '''

 Daniel Comnea wrote:

 My $2 cents:

 I like the 3rd party backend however instead of ZK wouldn't
 Consul [1]
 fit better due to lighter/ out of box multi DC awareness?

 Dani

 [1] Consul - https://www.consul.io/


 On Mon, Apr 13, 2015 at 9:51 AM, Wangbibo wangb...@huawei.com
 mailto:wangb...@huawei.com
 mailto:wangb...@huawei.com mailto:wangb...@huawei.com wrote:

  Hi Kevin,

  __ __

  Totally agree with you that heartbeat from each agent is
 something
  that we cannot eliminate currently. Agent status depends on
 it, and
  further scheduler and HA depends on agent status.

  __ __

  I proposed a Liberty spec for introducing open
 framework/pluggable
  agent status drivers.[1][2]  It allows us to use some other
 3^rd
  party backend to monitor agent status, such as zookeeper,
 memcached.
  Meanwhile, it guarantees backward compatibility so that
 users could
  still use db-based status monitoring mechanism as their default
  choice.

  __ __

  Base on that, we may do further optimization on issues
 Attila and
  you mentioned. Thanks. 

  __ __

  [1] BP  -
 
 https://blueprints.launchpad.__net/neutron/+spec/agent-group-__and-status-drivers
 
 https://blueprints.launchpad.net/neutron/+spec/agent-group-and-status-drivers

  [2] Liberty Spec proposed -
 https://review.openstack.org/#__/c/168921/
 https://review.openstack.org/#/c/168921/

  __ __

  Best,

  Robin

  __ __

  __ __

  __ __

  __ __

  *发件人:*Kevin Benton [mailto:blak...@gmail.com
 mailto:blak...@gmail.com
 mailto:blak...@gmail.com mailto:blak...@gmail.com]
  *发送时间:*2015年4月11日12:35
  *收件人:*OpenStack Development Mailing List (not for usage
 questions)
  *主题:*Re: [openstack-dev] [neutron] Neutron scaling
 datapoints?

  __ __

  Which periodic updates did you have in mind to eliminate?
 One of the
  few remaining ones I can think of is sync_routers but it
 would be
  great if you can enumerate the ones you observed because
 eliminating
  overhead in agents is something I've been working on as
 well.

  __ __

  One of the most

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-13 Thread joehuang
Tooz provides a mechanism for grouping agents and agent status/liveness 
management, multiple coordinator services may be required in large scale 
deployment, especially for 100k nodes level. We can't make assumption that only 
one coordinator service is enough to manage all nodes, that means tooz may need 
to support multiple coordinate backend.

And Nova already supports several segregation concepts, for example, Cells, 
Availability Zone, Host Aggregates,  Where the coordinate backend will 
resides? How to group agents? It's weird to put coordinator in availability 
zone(AZ in short) 1, but all managed agents in AZ 2. If AZ 1 is power off, then 
all agents in AZ2 lost management. Do we need segregation concept for agents, 
or reuse Nova concept, or build mapping between them? Especially if multiple 
coordinate backend will work under one Neutron.

Best Regards
Chaoyi Huang ( Joe Huang )

-Original Message-
From: Joshua Harlow [mailto:harlo...@outlook.com] 
Sent: Monday, April 13, 2015 11:11 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

joehuang wrote:
 Hi, Kevin and Joshua,

 As my understanding, Tooz only addresses the issue of agent status 
 management, but how to solve the concurrent dynamic load impact on 
 large scale ( for example 100k managed nodes with the dynamic load 
 like security goup rule update, routers_updated, etc )

Yes, that is correct, let's not confuse status/liveness management with 
updates... since IMHO they are to very different things (the latter can be 
eventually consistent IMHO will the liveness 'question' probably should not 
be...).


 And one more question is, if we have 100k managed nodes, how to do the 
 partition? Or all nodes will be managed by one Tooz service, like 
 Zookeeper? Can Zookeeper manage 100k nodes status?

I can get u some data/numbers from some studies I've seen, but what u are 
talking about is highly specific as to what u are doing with zookeeper... There 
is no one solution for all the things IMHO; choose what's best from your 
tool-belt for each problem...


 Best Regards

 Chaoyi Huang ( Joe Huang )

 *From:*Kevin Benton [mailto:blak...@gmail.com]
 *Sent:* Monday, April 13, 2015 3:52 AM
 *To:* OpenStack Development Mailing List (not for usage questions)
 *Subject:* Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Timestamps are just one way (and likely the most primitive), using 
redis
 (or memcache) key/value and expiry are another (and letting memcache 
 or redis expire using its own internal algorithms), using zookeeper 
 ephemeral nodes[1] are another... The point being that its backend 
 specific and tooz supports varying backends.

 Very cool. Is the backend completely transparent so a deployer could 
 choose a service they are comfortable maintaining, or will that change 
 the properties WRT to resiliency of state on node restarts, partitions, etc?

 The Nova implementation of Tooz seemed pretty straight-forward, 
 although it looked like it had pluggable drivers for service management 
 already.
 Before I dig into it much further I'll file a spec on the Neutron side 
 to see if I can get some other cores onboard to do the review work if 
 I push a change to tooz.

 On Sun, Apr 12, 2015 at 9:38 AM, Joshua Harlow harlo...@outlook.com 
 mailto:harlo...@outlook.com wrote:

 Kevin Benton wrote:

 So IIUC tooz would be handling the liveness detection for the agents.
 That would be nice to get ride of that logic in Neutron and just 
 register callbacks for rescheduling the dead.

 Where does it store that state, does it persist timestamps to the DB 
 like Neutron does? If so, how would that scale better? If not, who 
 does a given node ask to know if an agent is online or offline when 
 making a scheduling decision?


 Timestamps are just one way (and likely the most primitive), using 
 redis (or memcache) key/value and expiry are another (and letting 
 memcache or redis expire using its own internal algorithms), using 
 zookeeper ephemeral nodes[1] are another... The point being that its 
 backend specific and tooz supports varying backends.


 However, before (what I assume is) the large code change to implement 
 tooz, I would like to quantify that the heartbeats are actually a 
 bottleneck. When I was doing some profiling of them on the master 
 branch a few months ago, processing a heartbeat took an order of 
 magnitude less time (50ms) than the 'sync routers' task of the l3 
 agent (~300ms). A few query optimizations might buy us a lot more 
 headroom before we have to fall back to large refactors.


 Sure, always good to avoid prematurely optimizing things...

 Although this is relevant for u I think anyway:

 https://review.openstack.org/#/c/138607/ (same thing/nearly same in nova)...

 https://review.openstack.org/#/c/172502/ (a WIP implementation of the 
 latter).

 [1]
 https://zookeeper.apache.org/doc/trunk

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-13 Thread joehuang

-Original Message-
From: Attila Fazekas [mailto:afaze...@redhat.com] 
Sent: Monday, April 13, 2015 3:19 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?


- Original Message -
 From: joehuang joehu...@huawei.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Sunday, April 12, 2015 1:20:48 PM
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
 
 
 
 Hi, Kevin,
 
 
 
 I assumed that all agents are connected to same IP address of 
 RabbitMQ, then the connection will exceed the port ranges limitation.
 
https://news.ycombinator.com/item?id=1571300

TCP connections are identified by the (src ip, src port, dest ip, dest port) 
tuple.

The server doesn't need multiple IPs to handle  65535 connections. All the 
server connections to a given IP are to the same port. For a given client, the 
unique key for an http connection is (client-ip, PORT, server-ip, 80). The only 
number that can vary is PORT, and that's a value on the client. So, the client 
is limited to 65535 connections to the server. But, a second client could also 
have another 65K connections to the same server-ip:port.


[[joehuang]] Sorry, long time not writing socket based app, I may make a 
mistake for HTTP server to spawn a thread to handle a new connection. I'll 
check again.

 
 For a RabbitMQ cluster, for sure the client can connect to any one of 
 member in the cluster, but in this case, the client has to be designed 
 in fail-safe
 manner: the client should be aware of the cluster member failure, and 
 reconnect to other survive member. No such mechnism has been 
 implemented yet.
 
 
 
 Other way is to use LVS or DNS based like load balancer, or something else.
 If you put one load balancer ahead of a cluster, then we have to take 
 care of the port number limitation, there are so many agents will 
 require connection concurrently, 100k level, and the requests can not be 
 rejected.
 
 
 
 Best Regards
 
 
 
 Chaoyi Huang ( joehuang )
 
 
 
 From: Kevin Benton [blak...@gmail.com]
 Sent: 12 April 2015 9:59
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
 
 The TCP/IP stack keeps track of connections as a combination of IP + 
 TCP port. The two byte port limit doesn't matter unless all of the 
 agents are connecting from the same IP address, which shouldn't be the 
 case unless compute nodes connect to the rabbitmq server via one IP 
 address running port address translation.
 
 Either way, the agents don't connect directly to the Neutron server, 
 they connect to the rabbit MQ cluster. Since as many Neutron server 
 processes can be launched as necessary, the bottlenecks will likely 
 show up at the messaging or DB layer.
 
 On Sat, Apr 11, 2015 at 6:46 PM, joehuang  joehu...@huawei.com  wrote:
 
 
 
 
 
 As Kevin talking about agents, I want to remind that in TCP/IP stack, 
 port ( not Neutron Port ) is a two bytes field, i.e. port ranges from 
 0 ~ 65535, supports maximum 64k port number.
 
 
 
  above 100k managed node  means more than 100k L2 agents/L3 
 agents... will be alive under Neutron.
 
 
 
 Want to know the detail design how to support 99.9% possibility for 
 scaling Neutron in this way, and PoC and test would be a good support for 
 this idea.
 
 
 
 I'm 99.9% sure, for scaling above 100k managed node, we do not really 
 need to split the openstack to multiple smaller openstack, or use 
 significant number of extra controller machine.
 
 
 
 Best Regards
 
 
 
 Chaoyi Huang ( joehuang )
 
 
 
 From: Kevin Benton [ blak...@gmail.com ]
 Sent: 11 April 2015 12:34
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
 
 Which periodic updates did you have in mind to eliminate? One of the 
 few remaining ones I can think of is sync_routers but it would be 
 great if you can enumerate the ones you observed because eliminating 
 overhead in agents is something I've been working on as well.
 
 One of the most common is the heartbeat from each agent. However, I 
 don't think we can't eliminate them because they are used to determine 
 if the agents are still alive for scheduling purposes. Did you have 
 something else in mind to determine if an agent is alive?
 
 On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas  afaze...@redhat.com 
 
 wrote:
 
 
 I'm 99.9% sure, for scaling above 100k managed node, we do not really 
 need to split the openstack to multiple smaller openstack, or use 
 significant number of extra controller machine.
 
 The problem is openstack using the right tools SQL/AMQP/(zk), but in a 
 wrong way.
 
 For example.:
 Periodic updates can be avoided almost in all cases
 
 The new data can be pushed to the agent just when it needed.
 The agent can know when the AMQP connection

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-13 Thread Attila Fazekas




- Original Message -
 From: joehuang joehu...@huawei.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Sunday, April 12, 2015 1:20:48 PM
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
 
 
 
 Hi, Kevin,
 
 
 
 I assumed that all agents are connected to same IP address of RabbitMQ, then
 the connection will exceed the port ranges limitation.
 
https://news.ycombinator.com/item?id=1571300

TCP connections are identified by the (src ip, src port, dest ip, dest port) 
tuple.

The server doesn't need multiple IPs to handle  65535 connections. All the 
server connections to a given IP are to the same port. For a given client, the 
unique key for an http connection is (client-ip, PORT, server-ip, 80). The only 
number that can vary is PORT, and that's a value on the client. So, the client 
is limited to 65535 connections to the server. But, a second client could also 
have another 65K connections to the same server-ip:port.

 
 For a RabbitMQ cluster, for sure the client can connect to any one of member
 in the cluster, but in this case, the client has to be designed in fail-safe
 manner: the client should be aware of the cluster member failure, and
 reconnect to other survive member. No such mechnism has been implemented
 yet.
 
 
 
 Other way is to use LVS or DNS based like load balancer, or something else.
 If you put one load balancer ahead of a cluster, then we have to take care
 of the port number limitation, there are so many agents will require
 connection concurrently, 100k level, and the requests can not be rejected.
 
 
 
 Best Regards
 
 
 
 Chaoyi Huang ( joehuang )
 
 
 
 From: Kevin Benton [blak...@gmail.com]
 Sent: 12 April 2015 9:59
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
 
 The TCP/IP stack keeps track of connections as a combination of IP + TCP
 port. The two byte port limit doesn't matter unless all of the agents are
 connecting from the same IP address, which shouldn't be the case unless
 compute nodes connect to the rabbitmq server via one IP address running port
 address translation.
 
 Either way, the agents don't connect directly to the Neutron server, they
 connect to the rabbit MQ cluster. Since as many Neutron server processes can
 be launched as necessary, the bottlenecks will likely show up at the
 messaging or DB layer.
 
 On Sat, Apr 11, 2015 at 6:46 PM, joehuang  joehu...@huawei.com  wrote:
 
 
 
 
 
 As Kevin talking about agents, I want to remind that in TCP/IP stack, port (
 not Neutron Port ) is a two bytes field, i.e. port ranges from 0 ~ 65535,
 supports maximum 64k port number.
 
 
 
  above 100k managed node  means more than 100k L2 agents/L3 agents... will
 be alive under Neutron.
 
 
 
 Want to know the detail design how to support 99.9% possibility for scaling
 Neutron in this way, and PoC and test would be a good support for this idea.
 
 
 
 I'm 99.9% sure, for scaling above 100k managed node,
 we do not really need to split the openstack to multiple smaller openstack,
 or use significant number of extra controller machine.
 
 
 
 Best Regards
 
 
 
 Chaoyi Huang ( joehuang )
 
 
 
 From: Kevin Benton [ blak...@gmail.com ]
 Sent: 11 April 2015 12:34
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
 
 Which periodic updates did you have in mind to eliminate? One of the few
 remaining ones I can think of is sync_routers but it would be great if you
 can enumerate the ones you observed because eliminating overhead in agents
 is something I've been working on as well.
 
 One of the most common is the heartbeat from each agent. However, I don't
 think we can't eliminate them because they are used to determine if the
 agents are still alive for scheduling purposes. Did you have something else
 in mind to determine if an agent is alive?
 
 On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas  afaze...@redhat.com 
 wrote:
 
 
 I'm 99.9% sure, for scaling above 100k managed node,
 we do not really need to split the openstack to multiple smaller openstack,
 or use significant number of extra controller machine.
 
 The problem is openstack using the right tools SQL/AMQP/(zk),
 but in a wrong way.
 
 For example.:
 Periodic updates can be avoided almost in all cases
 
 The new data can be pushed to the agent just when it needed.
 The agent can know when the AMQP connection become unreliable (queue or
 connection loose),
 and needs to do full sync.
 https://bugs.launchpad.net/neutron/+bug/1438159
 
 Also the agents when gets some notification, they start asking for details
 via the
 AMQP - SQL. Why they do not know it already or get it with the notification
 ?
 
 
 - Original Message -
  From: Neil Jerram  neil.jer...@metaswitch.com 
  To: OpenStack Development Mailing List (not for usage questions

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-13 Thread Attila Fazekas




- Original Message -
 From: Kevin Benton blak...@gmail.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Sunday, April 12, 2015 4:17:29 AM
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
 
 
 
 So IIUC tooz would be handling the liveness detection for the agents. That
 would be nice to get ride of that logic in Neutron and just register
 callbacks for rescheduling the dead.
 
 Where does it store that state, does it persist timestamps to the DB like
 Neutron does? If so, how would that scale better? If not, who does a given
 node ask to know if an agent is online or offline when making a scheduling
 decision?
 
You might find interesting the proposed solution in this bug:
https://bugs.launchpad.net/nova/+bug/1437199

 However, before (what I assume is) the large code change to implement tooz, I
 would like to quantify that the heartbeats are actually a bottleneck. When I
 was doing some profiling of them on the master branch a few months ago,
 processing a heartbeat took an order of magnitude less time (50ms) than the
 'sync routers' task of the l3 agent (~300ms). A few query optimizations
 might buy us a lot more headroom before we have to fall back to large
 refactors.
 Kevin Benton wrote:
 
 
 
 One of the most common is the heartbeat from each agent. However, I
 don't think we can't eliminate them because they are used to determine
 if the agents are still alive for scheduling purposes. Did you have
 something else in mind to determine if an agent is alive?
 
 Put each agent in a tooz[1] group; have each agent periodically heartbeat[2],
 have whoever needs to schedule read the active members of that group (or use
 [3] to get notified via a callback), profit...
 
 Pick from your favorite (supporting) driver at:
 
 http://docs.openstack.org/ developer/tooz/compatibility. html
 
 [1] http://docs.openstack.org/ developer/tooz/compatibility. html#grouping
 [2] https://github.com/openstack/ tooz/blob/0.13.1/tooz/ coordination.py#L315
 [3] http://docs.openstack.org/ developer/tooz/tutorial/group_
 membership.html#watching- group-changes
 
 
 __ __ __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: OpenStack-dev-request@lists. openstack.org?subject: unsubscribe
 http://lists.openstack.org/ cgi-bin/mailman/listinfo/ openstack-dev
 
 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-12 Thread Joshua Harlow

Kevin Benton wrote:

 Timestamps are just one way (and likely the most primitive), using
redis (or memcache) key/value and expiry are another (and letting
memcache or redis expire using its own internal algorithms), using
zookeeper ephemeral nodes[1] are another... The point being that its
backend specific and tooz supports varying backends.

Very cool. Is the backend completely transparent so a deployer could
choose a service they are comfortable maintaining, or will that change
the properties WRT to resiliency of state on node restarts, partitions, etc?


Of course... we tried to make it 'completely' transparent, but in 
reality certain backends (zookeeper which uses a paxos-like algorithm 
and redis with sentinel support...) are better (more resilient, more 
consistent, handle partitions/restarts better...) than others (memcached 
is after all just a distributed cache). This is just the nature of the 
game...




The Nova implementation of Tooz seemed pretty straight-forward, although
it looked like it had pluggable drivers for service management already.
Before I dig into it much further I'll file a spec on the Neutron side
to see if I can get some other cores onboard to do the review work if I
push a change to tooz.


Sounds good to me.




On Sun, Apr 12, 2015 at 9:38 AM, Joshua Harlow harlo...@outlook.com
mailto:harlo...@outlook.com wrote:

Kevin Benton wrote:

So IIUC tooz would be handling the liveness detection for the
agents.
That would be nice to get ride of that logic in Neutron and just
register callbacks for rescheduling the dead.

Where does it store that state, does it persist timestamps to the DB
like Neutron does? If so, how would that scale better? If not,
who does
a given node ask to know if an agent is online or offline when
making a
scheduling decision?


Timestamps are just one way (and likely the most primitive), using
redis (or memcache) key/value and expiry are another (and letting
memcache or redis expire using its own internal algorithms), using
zookeeper ephemeral nodes[1] are another... The point being that its
backend specific and tooz supports varying backends.


However, before (what I assume is) the large code change to
implement
tooz, I would like to quantify that the heartbeats are actually a
bottleneck. When I was doing some profiling of them on the
master branch
a few months ago, processing a heartbeat took an order of
magnitude less
time (50ms) than the 'sync routers' task of the l3 agent
(~300ms). A
few query optimizations might buy us a lot more headroom before
we have
to fall back to large refactors.


Sure, always good to avoid prematurely optimizing things...

Although this is relevant for u I think anyway:

https://review.openstack.org/#__/c/138607/
https://review.openstack.org/#/c/138607/ (same thing/nearly same
in nova)...

https://review.openstack.org/#__/c/172502/
https://review.openstack.org/#/c/172502/ (a WIP implementation of
the latter).

[1]

https://zookeeper.apache.org/__doc/trunk/__zookeeperProgrammers.html#__Ephemeral+Nodes

https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#Ephemeral+Nodes


Kevin Benton wrote:


 One of the most common is the heartbeat from each agent.
However, I
 don't think we can't eliminate them because they are used
to determine
 if the agents are still alive for scheduling purposes. Did
you have
 something else in mind to determine if an agent is alive?


Put each agent in a tooz[1] group; have each agent periodically
heartbeat[2], have whoever needs to schedule read the active
members of
that group (or use [3] to get notified via a callback), profit...

Pick from your favorite (supporting) driver at:

http://docs.openstack.org/developer/tooz/compatibility.html
http://docs.openstack.org/__developer/tooz/compatibility.__html
http://docs.openstack.org/__developer/tooz/compatibility.__html
http://docs.openstack.org/developer/tooz/compatibility.html

[1]

http://docs.openstack.org/developer/tooz/compatibility.html#grouping

http://docs.openstack.org/__developer/tooz/compatibility.__html#grouping

http://docs.openstack.org/__developer/tooz/compatibility.__html#grouping
http://docs.openstack.org/developer/tooz/compatibility.html#grouping
[2]

https://github.com/openstack/tooz/blob/0.13.1/tooz/coordination.py#L315

https://github.com/openstack/__tooz/blob/0.13.1/tooz/__coordination.py#L315

https://github.com/openstack/__tooz/blob/0.13.1/tooz/__coordination.py#L315


Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-12 Thread Joshua Harlow

joehuang wrote:

Hi, Kevin and Joshua,

As my understanding, Tooz only addresses the issue of agent status
management, but how to solve the concurrent dynamic load impact on large
scale ( for example 100k managed nodes with the dynamic load like
security goup rule update, routers_updated, etc )


Yes, that is correct, let's not confuse status/liveness management with 
updates... since IMHO they are to very different things (the latter can 
be eventually consistent IMHO will the liveness 'question' probably 
should not be...).




And one more question is, if we have 100k managed nodes, how to do the
partition? Or all nodes will be managed by one Tooz service, like
Zookeeper? Can Zookeeper manage 100k nodes status?


I can get u some data/numbers from some studies I've seen, but what u 
are talking about is highly specific as to what u are doing with 
zookeeper... There is no one solution for all the things IMHO; choose 
what's best from your tool-belt for each problem...




Best Regards

Chaoyi Huang ( Joe Huang )

*From:*Kevin Benton [mailto:blak...@gmail.com]
*Sent:* Monday, April 13, 2015 3:52 AM
*To:* OpenStack Development Mailing List (not for usage questions)
*Subject:* Re: [openstack-dev] [neutron] Neutron scaling datapoints?


Timestamps are just one way (and likely the most primitive), using redis

(or memcache) key/value and expiry are another (and letting memcache or
redis expire using its own internal algorithms), using zookeeper
ephemeral nodes[1] are another... The point being that its backend
specific and tooz supports varying backends.

Very cool. Is the backend completely transparent so a deployer could
choose a service they are comfortable maintaining, or will that change
the properties WRT to resiliency of state on node restarts, partitions, etc?

The Nova implementation of Tooz seemed pretty straight-forward, although
it looked like it had pluggable drivers for service management already.
Before I dig into it much further I'll file a spec on the Neutron side
to see if I can get some other cores onboard to do the review work if I
push a change to tooz.

On Sun, Apr 12, 2015 at 9:38 AM, Joshua Harlow harlo...@outlook.com
mailto:harlo...@outlook.com wrote:

Kevin Benton wrote:

So IIUC tooz would be handling the liveness detection for the agents.
That would be nice to get ride of that logic in Neutron and just
register callbacks for rescheduling the dead.

Where does it store that state, does it persist timestamps to the DB
like Neutron does? If so, how would that scale better? If not, who does
a given node ask to know if an agent is online or offline when making a
scheduling decision?


Timestamps are just one way (and likely the most primitive), using redis
(or memcache) key/value and expiry are another (and letting memcache or
redis expire using its own internal algorithms), using zookeeper
ephemeral nodes[1] are another... The point being that its backend
specific and tooz supports varying backends.


However, before (what I assume is) the large code change to implement
tooz, I would like to quantify that the heartbeats are actually a
bottleneck. When I was doing some profiling of them on the master branch
a few months ago, processing a heartbeat took an order of magnitude less
time (50ms) than the 'sync routers' task of the l3 agent (~300ms). A
few query optimizations might buy us a lot more headroom before we have
to fall back to large refactors.


Sure, always good to avoid prematurely optimizing things...

Although this is relevant for u I think anyway:

https://review.openstack.org/#/c/138607/ (same thing/nearly same in nova)...

https://review.openstack.org/#/c/172502/ (a WIP implementation of the
latter).

[1]
https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#Ephemeral+Nodes
https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#Ephemeral+Nodes


Kevin Benton wrote:


One of the most common is the heartbeat from each agent. However, I
don't think we can't eliminate them because they are used to determine
if the agents are still alive for scheduling purposes. Did you have
something else in mind to determine if an agent is alive?


Put each agent in a tooz[1] group; have each agent periodically
heartbeat[2], have whoever needs to schedule read the active members of
that group (or use [3] to get notified via a callback), profit...

Pick from your favorite (supporting) driver at:

http://docs.openstack.org/__developer/tooz/compatibility.__html
http://docs.openstack.org/developer/tooz/compatibility.html

[1]
http://docs.openstack.org/__developer/tooz/compatibility.__html#grouping
http://docs.openstack.org/developer/tooz/compatibility.html#grouping
[2]
https://github.com/openstack/__tooz/blob/0.13.1/tooz/__coordination.py#L315
https://github.com/openstack/tooz/blob/0.13.1/tooz/coordination.py#L315
[3]
http://docs.openstack.org/__developer/tooz/tutorial/group___membership.html#watching-__group-changes
http://docs.openstack.org/developer/tooz/tutorial

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-12 Thread joehuang
Hi, Kevin and Joshua,

As my understanding, Tooz only addresses the issue of agent status management, 
but how to solve the concurrent dynamic load impact on large scale ( for 
example 100k managed nodes with the dynamic load like security goup rule 
update, routers_updated, etc )

And one more question is, if we have 100k managed nodes, how to do the 
partition? Or all nodes will be managed by one Tooz service, like Zookeeper? 
Can Zookeeper manage 100k nodes status?

Best Regards
Chaoyi Huang ( Joe Huang )

From: Kevin Benton [mailto:blak...@gmail.com]
Sent: Monday, April 13, 2015 3:52 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Timestamps are just one way (and likely the most primitive), using redis (or 
memcache) key/value and expiry are another (and letting memcache or redis 
expire using its own internal algorithms), using zookeeper ephemeral nodes[1] 
are another... The point being that its backend specific and tooz supports 
varying backends.

Very cool. Is the backend completely transparent so a deployer could choose a 
service they are comfortable maintaining, or will that change the properties 
WRT to resiliency of state on node restarts, partitions, etc?

The Nova implementation of Tooz seemed pretty straight-forward, although it 
looked like it had pluggable drivers for service management already. Before I 
dig into it much further I'll file a spec on the Neutron side to see if I can 
get some other cores onboard to do the review work if I push a change to tooz.


On Sun, Apr 12, 2015 at 9:38 AM, Joshua Harlow 
harlo...@outlook.commailto:harlo...@outlook.com wrote:
Kevin Benton wrote:
So IIUC tooz would be handling the liveness detection for the agents.
That would be nice to get ride of that logic in Neutron and just
register callbacks for rescheduling the dead.

Where does it store that state, does it persist timestamps to the DB
like Neutron does? If so, how would that scale better? If not, who does
a given node ask to know if an agent is online or offline when making a
scheduling decision?

Timestamps are just one way (and likely the most primitive), using redis (or 
memcache) key/value and expiry are another (and letting memcache or redis 
expire using its own internal algorithms), using zookeeper ephemeral nodes[1] 
are another... The point being that its backend specific and tooz supports 
varying backends.

However, before (what I assume is) the large code change to implement
tooz, I would like to quantify that the heartbeats are actually a
bottleneck. When I was doing some profiling of them on the master branch
a few months ago, processing a heartbeat took an order of magnitude less
time (50ms) than the 'sync routers' task of the l3 agent (~300ms). A
few query optimizations might buy us a lot more headroom before we have
to fall back to large refactors.

Sure, always good to avoid prematurely optimizing things...

Although this is relevant for u I think anyway:

https://review.openstack.org/#/c/138607/ (same thing/nearly same in nova)...

https://review.openstack.org/#/c/172502/ (a WIP implementation of the latter).

[1] 
https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#Ephemeral+Nodes

Kevin Benton wrote:


One of the most common is the heartbeat from each agent. However, I
don't think we can't eliminate them because they are used to determine
if the agents are still alive for scheduling purposes. Did you have
something else in mind to determine if an agent is alive?


Put each agent in a tooz[1] group; have each agent periodically
heartbeat[2], have whoever needs to schedule read the active members of
that group (or use [3] to get notified via a callback), profit...

Pick from your favorite (supporting) driver at:

http://docs.openstack.org/__developer/tooz/compatibility.__html
http://docs.openstack.org/developer/tooz/compatibility.html

[1]
http://docs.openstack.org/__developer/tooz/compatibility.__html#grouping
http://docs.openstack.org/developer/tooz/compatibility.html#grouping
[2]
https://github.com/openstack/__tooz/blob/0.13.1/tooz/__coordination.py#L315
https://github.com/openstack/tooz/blob/0.13.1/tooz/coordination.py#L315
[3]
http://docs.openstack.org/__developer/tooz/tutorial/group___membership.html#watching-__group-changes
http://docs.openstack.org/developer/tooz/tutorial/group_membership.html#watching-group-changes


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.__openstack.org?subject:__unsubscribehttp://openstack.org?subject:__unsubscribe
http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/__cgi-bin/mailman/listinfo/__openstack-dev
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-12 Thread joehuang
Hi, Kevin,



I assumed that all agents are connected to same IP address of RabbitMQ, then 
the connection will exceed the port ranges limitation.



For a RabbitMQ cluster, for sure the client can connect to any one of member in 
the cluster, but in this case, the client has to be designed in fail-safe 
manner: the client should be aware of the cluster member failure, and reconnect 
to other survive member. No such mechnism has been implemented yet.



Other way is to use LVS or DNS based like load balancer, or something else. If 
you put one load balancer ahead of a cluster, then we have to take care of the 
port number limitation, there are so many agents  will require connection 
concurrently, 100k level, and the requests can not be rejected.



Best Regards



Chaoyi Huang ( joehuang )




From: Kevin Benton [blak...@gmail.com]
Sent: 12 April 2015 9:59
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

The TCP/IP stack keeps track of connections as a combination of IP + TCP port. 
The two byte port limit doesn't matter unless all of the agents are connecting 
from the same IP address, which shouldn't be the case unless compute nodes 
connect to the rabbitmq server via one IP address running port address 
translation.

Either way, the agents don't connect directly to the Neutron server, they 
connect to the rabbit MQ cluster. Since as many Neutron server processes can be 
launched as necessary, the bottlenecks will likely show up at the messaging or 
DB layer.

On Sat, Apr 11, 2015 at 6:46 PM, joehuang 
joehu...@huawei.commailto:joehu...@huawei.com wrote:

As Kevin talking about agents, I want to remind that in TCP/IP stack, port ( 
not Neutron Port ) is a two bytes field, i.e. port ranges from 0 ~ 65535, 
supports maximum 64k port number.



 above 100k managed node  means more than 100k L2 agents/L3 agents... will be 
alive under Neutron.



Want to know the detail design how to support 99.9% possibility for scaling 
Neutron in this way, and PoC and test would be a good support for this idea.



I'm 99.9% sure, for scaling above 100k managed node,
we do not really need to split the openstack to multiple smaller openstack,
or use significant number of extra controller machine.



Best Regards



Chaoyi Huang ( joehuang )




From: Kevin Benton [blak...@gmail.commailto:blak...@gmail.com]
Sent: 11 April 2015 12:34
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Which periodic updates did you have in mind to eliminate? One of the few 
remaining ones I can think of is sync_routers but it would be great if you can 
enumerate the ones you observed because eliminating overhead in agents is 
something I've been working on as well.

One of the most common is the heartbeat from each agent. However, I don't think 
we can't eliminate them because they are used to determine if the agents are 
still alive for scheduling purposes. Did you have something else in mind to 
determine if an agent is alive?

On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas 
afaze...@redhat.commailto:afaze...@redhat.com wrote:
I'm 99.9% sure, for scaling above 100k managed node,
we do not really need to split the openstack to multiple smaller openstack,
or use significant number of extra controller machine.

The problem is openstack using the right tools SQL/AMQP/(zk),
but in a wrong way.

For example.:
Periodic updates can be avoided almost in all cases

The new data can be pushed to the agent just when it needed.
The agent can know when the AMQP connection become unreliable (queue or 
connection loose),
and needs to do full sync.
https://bugs.launchpad.net/neutron/+bug/1438159

Also the agents when gets some notification, they start asking for details via 
the
AMQP - SQL. Why they do not know it already or get it with the notification ?


- Original Message -
 From: Neil Jerram 
 neil.jer...@metaswitch.commailto:neil.jer...@metaswitch.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
 Sent: Thursday, April 9, 2015 5:01:45 PM
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

 Hi Joe,

 Many thanks for your reply!

 On 09/04/15 03:34, joehuang wrote:
  Hi, Neil,
 
   From theoretic, Neutron is like a broadcast domain, for example,
   enforcement of DVR and security group has to touch each regarding host
   where there is VM of this project resides. Even using SDN controller, the
   touch to regarding host is inevitable. If there are plenty of physical
   hosts, for example, 10k, inside one Neutron, it's very hard to overcome
   the broadcast storm issue under concurrent operation, that's the
   bottleneck for scalability of Neutron.

 I think I understand that in general terms - but can you

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-12 Thread Joshua Harlow

Kevin Benton wrote:

So IIUC tooz would be handling the liveness detection for the agents.
That would be nice to get ride of that logic in Neutron and just
register callbacks for rescheduling the dead.

Where does it store that state, does it persist timestamps to the DB
like Neutron does? If so, how would that scale better? If not, who does
a given node ask to know if an agent is online or offline when making a
scheduling decision?


Timestamps are just one way (and likely the most primitive), using redis 
(or memcache) key/value and expiry are another (and letting memcache or 
redis expire using its own internal algorithms), using zookeeper 
ephemeral nodes[1] are another... The point being that its backend 
specific and tooz supports varying backends.




However, before (what I assume is) the large code change to implement
tooz, I would like to quantify that the heartbeats are actually a
bottleneck. When I was doing some profiling of them on the master branch
a few months ago, processing a heartbeat took an order of magnitude less
time (50ms) than the 'sync routers' task of the l3 agent (~300ms). A
few query optimizations might buy us a lot more headroom before we have
to fall back to large refactors.


Sure, always good to avoid prematurely optimizing things...

Although this is relevant for u I think anyway:

https://review.openstack.org/#/c/138607/ (same thing/nearly same in nova)...

https://review.openstack.org/#/c/172502/ (a WIP implementation of the 
latter).


[1] 
https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#Ephemeral+Nodes




Kevin Benton wrote:


One of the most common is the heartbeat from each agent. However, I
don't think we can't eliminate them because they are used to determine
if the agents are still alive for scheduling purposes. Did you have
something else in mind to determine if an agent is alive?


Put each agent in a tooz[1] group; have each agent periodically
heartbeat[2], have whoever needs to schedule read the active members of
that group (or use [3] to get notified via a callback), profit...

Pick from your favorite (supporting) driver at:

http://docs.openstack.org/__developer/tooz/compatibility.__html
http://docs.openstack.org/developer/tooz/compatibility.html

[1]
http://docs.openstack.org/__developer/tooz/compatibility.__html#grouping
http://docs.openstack.org/developer/tooz/compatibility.html#grouping
[2]
https://github.com/openstack/__tooz/blob/0.13.1/tooz/__coordination.py#L315
https://github.com/openstack/tooz/blob/0.13.1/tooz/coordination.py#L315
[3]
http://docs.openstack.org/__developer/tooz/tutorial/group___membership.html#watching-__group-changes
http://docs.openstack.org/developer/tooz/tutorial/group_membership.html#watching-group-changes


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.__openstack.org?subject:__unsubscribe
http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/__cgi-bin/mailman/listinfo/__openstack-dev
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-12 Thread Kevin Benton
Timestamps are just one way (and likely the most primitive), using redis
(or memcache) key/value and expiry are another (and letting memcache or
redis expire using its own internal algorithms), using zookeeper ephemeral
nodes[1] are another... The point being that its backend specific and tooz
supports varying backends.

Very cool. Is the backend completely transparent so a deployer could choose
a service they are comfortable maintaining, or will that change the
properties WRT to resiliency of state on node restarts, partitions, etc?

The Nova implementation of Tooz seemed pretty straight-forward, although it
looked like it had pluggable drivers for service management already. Before
I dig into it much further I'll file a spec on the Neutron side to see if I
can get some other cores onboard to do the review work if I push a change
to tooz.


On Sun, Apr 12, 2015 at 9:38 AM, Joshua Harlow harlo...@outlook.com wrote:

 Kevin Benton wrote:

 So IIUC tooz would be handling the liveness detection for the agents.
 That would be nice to get ride of that logic in Neutron and just
 register callbacks for rescheduling the dead.

 Where does it store that state, does it persist timestamps to the DB
 like Neutron does? If so, how would that scale better? If not, who does
 a given node ask to know if an agent is online or offline when making a
 scheduling decision?


 Timestamps are just one way (and likely the most primitive), using redis
 (or memcache) key/value and expiry are another (and letting memcache or
 redis expire using its own internal algorithms), using zookeeper ephemeral
 nodes[1] are another... The point being that its backend specific and tooz
 supports varying backends.


 However, before (what I assume is) the large code change to implement
 tooz, I would like to quantify that the heartbeats are actually a
 bottleneck. When I was doing some profiling of them on the master branch
 a few months ago, processing a heartbeat took an order of magnitude less
 time (50ms) than the 'sync routers' task of the l3 agent (~300ms). A
 few query optimizations might buy us a lot more headroom before we have
 to fall back to large refactors.


 Sure, always good to avoid prematurely optimizing things...

 Although this is relevant for u I think anyway:

 https://review.openstack.org/#/c/138607/ (same thing/nearly same in
 nova)...

 https://review.openstack.org/#/c/172502/ (a WIP implementation of the
 latter).

 [1] https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#
 Ephemeral+Nodes


 Kevin Benton wrote:


 One of the most common is the heartbeat from each agent. However, I
 don't think we can't eliminate them because they are used to determine
 if the agents are still alive for scheduling purposes. Did you have
 something else in mind to determine if an agent is alive?


 Put each agent in a tooz[1] group; have each agent periodically
 heartbeat[2], have whoever needs to schedule read the active members of
 that group (or use [3] to get notified via a callback), profit...

 Pick from your favorite (supporting) driver at:

 http://docs.openstack.org/__developer/tooz/compatibility.__html
 http://docs.openstack.org/developer/tooz/compatibility.html

 [1]
 http://docs.openstack.org/__developer/tooz/compatibility.__html#grouping
 http://docs.openstack.org/developer/tooz/compatibility.html#grouping
 [2]
 https://github.com/openstack/__tooz/blob/0.13.1/tooz/__
 coordination.py#L315
 https://github.com/openstack/tooz/blob/0.13.1/tooz/coordination.py#L315
 [3]
 http://docs.openstack.org/__developer/tooz/tutorial/group_
 __membership.html#watching-__group-changes
 http://docs.openstack.org/developer/tooz/tutorial/group_
 membership.html#watching-group-changes


 
 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe:
 OpenStack-dev-request@lists.__openstack.org?subject:__unsubscribe
 http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/__cgi-bin/mailman/listinfo/__openstack-dev
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

 
 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
 unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




-- 
Kevin Benton
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-12 Thread Kevin Benton
I assumed that all agents are connected to same IP address of RabbitMQ,
then the connection will exceed the port ranges limitation.

Only if the clients are all using the same IP address. If connections
weren't scoped by source IP, busy servers would be completely unreliable
because clients would keep having source port collisions.

For example, the following is a netstat output from a server with two
connections to a service running on port 4000 with both clients using
source port 5: http://paste.openstack.org/show/203211/

the client should be aware of the cluster member failure, and reconnect to
other survive member. No such mechnism has been implemented yet.

If I understand what you are suggesting, it already has been implemented
that way. The neutron agents and servers can be configured with multiple
rabbitmq servers and they will cycle through the list whenever there is a
failure.

The only downside to that approach is that every neutron agent and server
has to be configured with every rabbitmq server address. This gets tedious
to manage if you want to add cluster members dynamically so using a load
balancer can help relieve that.

Hi, Kevin,



I assumed that all agents are connected to same IP address of RabbitMQ,
then the connection will exceed the port ranges limitation.



For a RabbitMQ cluster, for sure the client can connect to any one of
member in the cluster, but in this case, the client has to be designed in
fail-safe manner: the client should be aware of the cluster member failure,
and reconnect to other survive member. No such mechnism has
been implemented yet.



Other way is to use LVS or DNS based like load balancer, or something else.
If you put one load balancer ahead of a cluster, then we have to take care
of the port number limitation, there are so many agents  will require
connection concurrently, 100k level, and the requests can not be rejected.



Best Regards



Chaoyi Huang ( joehuang )


 --
*From:* Kevin Benton [blak...@gmail.com]
*Sent:* 12 April 2015 9:59
*To:* OpenStack Development Mailing List (not for usage questions)
*Subject:* Re: [openstack-dev] [neutron] Neutron scaling datapoints?

  The TCP/IP stack keeps track of connections as a combination of IP + TCP
port. The two byte port limit doesn't matter unless all of the agents are
connecting from the same IP address, which shouldn't be the case unless
compute nodes connect to the rabbitmq server via one IP address running
port address translation.

 Either way, the agents don't connect directly to the Neutron server, they
connect to the rabbit MQ cluster. Since as many Neutron server processes
can be launched as necessary, the bottlenecks will likely show up at the
messaging or DB layer.

On Sat, Apr 11, 2015 at 6:46 PM, joehuang joehu...@huawei.com wrote:

  As Kevin talking about agents, I want to remind that in TCP/IP stack,
 port ( not Neutron Port ) is a two bytes field, i.e. port ranges from 0 ~
 65535, supports maximum 64k port number.



  above 100k managed node  means more than 100k L2 agents/L3 agents...
 will be alive under Neutron.



 Want to know the detail design how to support 99.9% possibility for
 scaling Neutron in this way, and PoC and test would be a good support for
 this idea.



 I'm 99.9% sure, for scaling above 100k managed node,
 we do not really need to split the openstack to multiple smaller openstack,
 or use significant number of extra controller machine.



 Best Regards



 Chaoyi Huang ( joehuang )


  --
 *From:* Kevin Benton [blak...@gmail.com]
 *Sent:* 11 April 2015 12:34
 *To:* OpenStack Development Mailing List (not for usage questions)
  *Subject:* Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Which periodic updates did you have in mind to eliminate? One of the
 few remaining ones I can think of is sync_routers but it would be great if
 you can enumerate the ones you observed because eliminating overhead in
 agents is something I've been working on as well.

  One of the most common is the heartbeat from each agent. However, I
 don't think we can't eliminate them because they are used to determine if
 the agents are still alive for scheduling purposes. Did you have something
 else in mind to determine if an agent is alive?

 On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas afaze...@redhat.com
 wrote:

 I'm 99.9% sure, for scaling above 100k managed node,
 we do not really need to split the openstack to multiple smaller
 openstack,
 or use significant number of extra controller machine.

 The problem is openstack using the right tools SQL/AMQP/(zk),
 but in a wrong way.

 For example.:
 Periodic updates can be avoided almost in all cases

 The new data can be pushed to the agent just when it needed.
 The agent can know when the AMQP connection become unreliable (queue or
 connection loose),
 and needs to do full sync.
 https://bugs.launchpad.net/neutron/+bug/1438159

 Also the agents when gets some

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-11 Thread joehuang
As Kevin talking about agents, I want to remind that in TCP/IP stack, port ( 
not Neutron Port ) is a two bytes field, i.e. port ranges from 0 ~ 65535, 
supports maximum 64k port number.



 above 100k managed node  means more than 100k L2 agents/L3 agents... will be 
alive under Neutron.



Want to know the detail design how to support 99.9% possibility for scaling 
Neutron in this way, and PoC and test would be a good support for this idea.



I'm 99.9% sure, for scaling above 100k managed node,
we do not really need to split the openstack to multiple smaller openstack,
or use significant number of extra controller machine.



Best Regards



Chaoyi Huang ( joehuang )




From: Kevin Benton [blak...@gmail.com]
Sent: 11 April 2015 12:34
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Which periodic updates did you have in mind to eliminate? One of the few 
remaining ones I can think of is sync_routers but it would be great if you can 
enumerate the ones you observed because eliminating overhead in agents is 
something I've been working on as well.

One of the most common is the heartbeat from each agent. However, I don't think 
we can't eliminate them because they are used to determine if the agents are 
still alive for scheduling purposes. Did you have something else in mind to 
determine if an agent is alive?

On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas 
afaze...@redhat.commailto:afaze...@redhat.com wrote:
I'm 99.9% sure, for scaling above 100k managed node,
we do not really need to split the openstack to multiple smaller openstack,
or use significant number of extra controller machine.

The problem is openstack using the right tools SQL/AMQP/(zk),
but in a wrong way.

For example.:
Periodic updates can be avoided almost in all cases

The new data can be pushed to the agent just when it needed.
The agent can know when the AMQP connection become unreliable (queue or 
connection loose),
and needs to do full sync.
https://bugs.launchpad.net/neutron/+bug/1438159

Also the agents when gets some notification, they start asking for details via 
the
AMQP - SQL. Why they do not know it already or get it with the notification ?


- Original Message -
 From: Neil Jerram 
 neil.jer...@metaswitch.commailto:neil.jer...@metaswitch.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
 Sent: Thursday, April 9, 2015 5:01:45 PM
 Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

 Hi Joe,

 Many thanks for your reply!

 On 09/04/15 03:34, joehuang wrote:
  Hi, Neil,
 
   From theoretic, Neutron is like a broadcast domain, for example,
   enforcement of DVR and security group has to touch each regarding host
   where there is VM of this project resides. Even using SDN controller, the
   touch to regarding host is inevitable. If there are plenty of physical
   hosts, for example, 10k, inside one Neutron, it's very hard to overcome
   the broadcast storm issue under concurrent operation, that's the
   bottleneck for scalability of Neutron.

 I think I understand that in general terms - but can you be more
 specific about the broadcast storm?  Is there one particular message
 exchange that involves broadcasting?  Is it only from the server to
 agents, or are there 'broadcasts' in other directions as well?

 (I presume you are talking about control plane messages here, i.e.
 between Neutron components.  Is that right?  Obviously there can also be
 broadcast storm problems in the data plane - but I don't think that's
 what you are talking about here.)

  We need layered architecture in Neutron to solve the broadcast domain
  bottleneck of scalability. The test report from OpenStack cascading shows
  that through layered architecture Neutron cascading, Neutron can
  supports up to million level ports and 100k level physical hosts. You can
  find the report here:
  http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers

 Many thanks, I will take a look at this.

  Neutron cascading also brings extra benefit: One cascading Neutron can
  have many cascaded Neutrons, and different cascaded Neutron can leverage
  different SDN controller, maybe one is ODL, the other one is OpenContrail.
 
  Cascading Neutron---
   / \
  --cascaded Neutron--   --cascaded Neutron-
  |  |
  -ODL--   OpenContrail
 
 
  And furthermore, if using Neutron cascading in multiple data centers, the
  DCI controller (Data center inter-connection controller) can also be used
  under cascading Neutron, to provide NaaS ( network as a service ) across
  data centers.
 
  ---Cascading Neutron

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-11 Thread Kevin Benton
The TCP/IP stack keeps track of connections as a combination of IP + TCP
port. The two byte port limit doesn't matter unless all of the agents are
connecting from the same IP address, which shouldn't be the case unless
compute nodes connect to the rabbitmq server via one IP address running
port address translation.

Either way, the agents don't connect directly to the Neutron server, they
connect to the rabbit MQ cluster. Since as many Neutron server processes
can be launched as necessary, the bottlenecks will likely show up at the
messaging or DB layer.

On Sat, Apr 11, 2015 at 6:46 PM, joehuang joehu...@huawei.com wrote:

  As Kevin talking about agents, I want to remind that in TCP/IP stack,
 port ( not Neutron Port ) is a two bytes field, i.e. port ranges from 0 ~
 65535, supports maximum 64k port number.



  above 100k managed node  means more than 100k L2 agents/L3 agents...
 will be alive under Neutron.



 Want to know the detail design how to support 99.9% possibility for
 scaling Neutron in this way, and PoC and test would be a good support for
 this idea.



 I'm 99.9% sure, for scaling above 100k managed node,
 we do not really need to split the openstack to multiple smaller openstack,
 or use significant number of extra controller machine.



 Best Regards



 Chaoyi Huang ( joehuang )


  --
 *From:* Kevin Benton [blak...@gmail.com]
 *Sent:* 11 April 2015 12:34
 *To:* OpenStack Development Mailing List (not for usage questions)
 *Subject:* Re: [openstack-dev] [neutron] Neutron scaling datapoints?

   Which periodic updates did you have in mind to eliminate? One of the
 few remaining ones I can think of is sync_routers but it would be great if
 you can enumerate the ones you observed because eliminating overhead in
 agents is something I've been working on as well.

  One of the most common is the heartbeat from each agent. However, I
 don't think we can't eliminate them because they are used to determine if
 the agents are still alive for scheduling purposes. Did you have something
 else in mind to determine if an agent is alive?

 On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas afaze...@redhat.com
 wrote:

 I'm 99.9% sure, for scaling above 100k managed node,
 we do not really need to split the openstack to multiple smaller
 openstack,
 or use significant number of extra controller machine.

 The problem is openstack using the right tools SQL/AMQP/(zk),
 but in a wrong way.

 For example.:
 Periodic updates can be avoided almost in all cases

 The new data can be pushed to the agent just when it needed.
 The agent can know when the AMQP connection become unreliable (queue or
 connection loose),
 and needs to do full sync.
 https://bugs.launchpad.net/neutron/+bug/1438159

 Also the agents when gets some notification, they start asking for
 details via the
 AMQP - SQL. Why they do not know it already or get it with the
 notification ?


 - Original Message -
  From: Neil Jerram neil.jer...@metaswitch.com
   To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
  Sent: Thursday, April 9, 2015 5:01:45 PM
  Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
 
  Hi Joe,
 
  Many thanks for your reply!
 
  On 09/04/15 03:34, joehuang wrote:
   Hi, Neil,
  
From theoretic, Neutron is like a broadcast domain, for example,
enforcement of DVR and security group has to touch each regarding
 host
where there is VM of this project resides. Even using SDN
 controller, the
touch to regarding host is inevitable. If there are plenty of
 physical
hosts, for example, 10k, inside one Neutron, it's very hard to
 overcome
the broadcast storm issue under concurrent operation, that's the
bottleneck for scalability of Neutron.
 
  I think I understand that in general terms - but can you be more
  specific about the broadcast storm?  Is there one particular message
  exchange that involves broadcasting?  Is it only from the server to
  agents, or are there 'broadcasts' in other directions as well?
 
  (I presume you are talking about control plane messages here, i.e.
  between Neutron components.  Is that right?  Obviously there can also be
  broadcast storm problems in the data plane - but I don't think that's
  what you are talking about here.)
 
   We need layered architecture in Neutron to solve the broadcast
 domain
   bottleneck of scalability. The test report from OpenStack cascading
 shows
   that through layered architecture Neutron cascading, Neutron can
   supports up to million level ports and 100k level physical hosts. You
 can
   find the report here:
  
 http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers
 
  Many thanks, I will take a look at this.
 
   Neutron cascading also brings extra benefit: One cascading Neutron
 can
   have many cascaded Neutrons, and different cascaded Neutron can
 leverage

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-11 Thread Kevin Benton
So IIUC tooz would be handling the liveness detection for the agents. That
would be nice to get ride of that logic in Neutron and just register
callbacks for rescheduling the dead.

Where does it store that state, does it persist timestamps to the DB like
Neutron does? If so, how would that scale better? If not, who does a given
node ask to know if an agent is online or offline when making a scheduling
decision?

However, before (what I assume is) the large code change to implement tooz,
I would like to quantify that the heartbeats are actually a bottleneck.
When I was doing some profiling of them on the master branch a few months
ago, processing a heartbeat took an order of magnitude less time (50ms)
than the 'sync routers' task of the l3 agent (~300ms). A few query
optimizations might buy us a lot more headroom before we have to fall back
to large refactors.
Kevin Benton wrote:


 One of the most common is the heartbeat from each agent. However, I
 don't think we can't eliminate them because they are used to determine
 if the agents are still alive for scheduling purposes. Did you have
 something else in mind to determine if an agent is alive?


Put each agent in a tooz[1] group; have each agent periodically
heartbeat[2], have whoever needs to schedule read the active members of
that group (or use [3] to get notified via a callback), profit...

Pick from your favorite (supporting) driver at:

http://docs.openstack.org/developer/tooz/compatibility.html

[1] http://docs.openstack.org/developer/tooz/compatibility.html#grouping
[2] https://github.com/openstack/tooz/blob/0.13.1/tooz/coordination.py#L315
[3] http://docs.openstack.org/developer/tooz/tutorial/group_
membership.html#watching-group-changes


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-10 Thread Kevin Benton
Which periodic updates did you have in mind to eliminate? One of the few
remaining ones I can think of is sync_routers but it would be great if you
can enumerate the ones you observed because eliminating overhead in agents
is something I've been working on as well.

One of the most common is the heartbeat from each agent. However, I don't
think we can't eliminate them because they are used to determine if the
agents are still alive for scheduling purposes. Did you have something else
in mind to determine if an agent is alive?

On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas afaze...@redhat.com wrote:

 I'm 99.9% sure, for scaling above 100k managed node,
 we do not really need to split the openstack to multiple smaller openstack,
 or use significant number of extra controller machine.

 The problem is openstack using the right tools SQL/AMQP/(zk),
 but in a wrong way.

 For example.:
 Periodic updates can be avoided almost in all cases

 The new data can be pushed to the agent just when it needed.
 The agent can know when the AMQP connection become unreliable (queue or
 connection loose),
 and needs to do full sync.
 https://bugs.launchpad.net/neutron/+bug/1438159

 Also the agents when gets some notification, they start asking for details
 via the
 AMQP - SQL. Why they do not know it already or get it with the
 notification ?


 - Original Message -
  From: Neil Jerram neil.jer...@metaswitch.com
  To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
  Sent: Thursday, April 9, 2015 5:01:45 PM
  Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
 
  Hi Joe,
 
  Many thanks for your reply!
 
  On 09/04/15 03:34, joehuang wrote:
   Hi, Neil,
  
From theoretic, Neutron is like a broadcast domain, for example,
enforcement of DVR and security group has to touch each regarding host
where there is VM of this project resides. Even using SDN controller,
 the
touch to regarding host is inevitable. If there are plenty of
 physical
hosts, for example, 10k, inside one Neutron, it's very hard to
 overcome
the broadcast storm issue under concurrent operation, that's the
bottleneck for scalability of Neutron.
 
  I think I understand that in general terms - but can you be more
  specific about the broadcast storm?  Is there one particular message
  exchange that involves broadcasting?  Is it only from the server to
  agents, or are there 'broadcasts' in other directions as well?
 
  (I presume you are talking about control plane messages here, i.e.
  between Neutron components.  Is that right?  Obviously there can also be
  broadcast storm problems in the data plane - but I don't think that's
  what you are talking about here.)
 
   We need layered architecture in Neutron to solve the broadcast domain
   bottleneck of scalability. The test report from OpenStack cascading
 shows
   that through layered architecture Neutron cascading, Neutron can
   supports up to million level ports and 100k level physical hosts. You
 can
   find the report here:
  
 http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers
 
  Many thanks, I will take a look at this.
 
   Neutron cascading also brings extra benefit: One cascading Neutron
 can
   have many cascaded Neutrons, and different cascaded Neutron can
 leverage
   different SDN controller, maybe one is ODL, the other one is
 OpenContrail.
  
   Cascading Neutron---
/ \
   --cascaded Neutron--   --cascaded Neutron-
   |  |
   -ODL--   OpenContrail
  
  
   And furthermore, if using Neutron cascading in multiple data centers,
 the
   DCI controller (Data center inter-connection controller) can also be
 used
   under cascading Neutron, to provide NaaS ( network as a service )
 across
   data centers.
  
   ---Cascading Neutron--
/|  \
   --cascaded Neutron--  -DCI controller-  --cascaded Neutron-
   | ||
   -ODL--   | OpenContrail
 |
   --(Data center 1)--   --(DCI networking)--  --(Data center 2)--
  
   Is it possible for us to discuss this in OpenStack Vancouver summit?
 
  Most certainly, yes.  I will be there from mid Monday afternoon through
  to end Friday.  But it will be my first summit, so I have no idea yet as
  to how I might run into you - please can you suggest!
 
   Best Regards
   Chaoyi Huang ( Joe Huang )
 
  Regards,
Neil
 
 
 __
  OpenStack Development Mailing List (not for usage questions)
  Unsubscribe:
 openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
  http://lists.openstack.org/cgi-bin/mailman/listinfo

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-10 Thread joehuang
Hi, Neil, 

See inline comments.

Best Regards

Chaoyi Huang


From: Neil Jerram [neil.jer...@metaswitch.com]
Sent: 09 April 2015 23:01
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Hi Joe,

Many thanks for your reply!

On 09/04/15 03:34, joehuang wrote:
 Hi, Neil,

  From theoretic, Neutron is like a broadcast domain, for example, 
 enforcement of DVR and security group has to touch each regarding host where 
 there is VM of this project resides. Even using SDN controller, the touch 
 to regarding host is inevitable. If there are plenty of physical hosts, for 
 example, 10k, inside one Neutron, it's very hard to overcome the broadcast 
 storm issue under concurrent operation, that's the bottleneck for 
 scalability of Neutron.

I think I understand that in general terms - but can you be more
specific about the broadcast storm?  Is there one particular message
exchange that involves broadcasting?  Is it only from the server to
agents, or are there 'broadcasts' in other directions as well?

[[joehuang]] for example, L2 population, Security group rule update, DVR route 
update. Both direction in different scenario.

(I presume you are talking about control plane messages here, i.e.
between Neutron components.  Is that right?  Obviously there can also be
broadcast storm problems in the data plane - but I don't think that's
what you are talking about here.)

[[joehuang]] Yes, controll plane here.


 We need layered architecture in Neutron to solve the broadcast domain 
 bottleneck of scalability. The test report from OpenStack cascading shows 
 that through layered architecture Neutron cascading, Neutron can supports 
 up to million level ports and 100k level physical hosts. You can find the 
 report here: 
 http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers

Many thanks, I will take a look at this.

 Neutron cascading also brings extra benefit: One cascading Neutron can have 
 many cascaded Neutrons, and different cascaded Neutron can leverage different 
 SDN controller, maybe one is ODL, the other one is OpenContrail.

 Cascading Neutron---
  / \
 --cascaded Neutron--   --cascaded Neutron-
 |  |
 -ODL--   OpenContrail


 And furthermore, if using Neutron cascading in multiple data centers, the DCI 
 controller (Data center inter-connection controller) can also be used under 
 cascading Neutron, to provide NaaS ( network as a service ) across data 
 centers.

 ---Cascading Neutron--
  /|  \
 --cascaded Neutron--  -DCI controller-  --cascaded Neutron-
 | ||
 -ODL--   | OpenContrail
   |
 --(Data center 1)--   --(DCI networking)--  --(Data center 2)--

 Is it possible for us to discuss this in OpenStack Vancouver summit?

Most certainly, yes.  I will be there from mid Monday afternoon through
to end Friday.  But it will be my first summit, so I have no idea yet as
to how I might run into you - please can you suggest!

I will also attend the summit whole week, sometimes in the OPNFV parts, 
sometimes in OpenStack parts. Let me see how to meet.

 Best Regards
 Chaoyi Huang ( Joe Huang )

Regards,
Neil

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-09 Thread Neil Jerram

Hi Joe,

Many thanks for your reply!

On 09/04/15 03:34, joehuang wrote:

Hi, Neil,

 From theoretic, Neutron is like a broadcast domain, for example, enforcement of DVR and security 
group has to touch each regarding host where there is VM of this project resides. Even using SDN controller, 
the touch to regarding host is inevitable. If there are plenty of physical hosts, for example, 
10k, inside one Neutron, it's very hard to overcome the broadcast storm issue under concurrent 
operation, that's the bottleneck for scalability of Neutron.


I think I understand that in general terms - but can you be more 
specific about the broadcast storm?  Is there one particular message 
exchange that involves broadcasting?  Is it only from the server to 
agents, or are there 'broadcasts' in other directions as well?


(I presume you are talking about control plane messages here, i.e. 
between Neutron components.  Is that right?  Obviously there can also be 
broadcast storm problems in the data plane - but I don't think that's 
what you are talking about here.)



We need layered architecture in Neutron to solve the broadcast domain bottleneck of 
scalability. The test report from OpenStack cascading shows that through layered architecture 
Neutron cascading, Neutron can supports up to million level ports and 100k level 
physical hosts. You can find the report here: 
http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers


Many thanks, I will take a look at this.


Neutron cascading also brings extra benefit: One cascading Neutron can have 
many cascaded Neutrons, and different cascaded Neutron can leverage different SDN 
controller, maybe one is ODL, the other one is OpenContrail.

Cascading Neutron---
 / \
--cascaded Neutron--   --cascaded Neutron-
|  |
-ODL--   OpenContrail


And furthermore, if using Neutron cascading in multiple data centers, the DCI 
controller (Data center inter-connection controller) can also be used under 
cascading Neutron, to provide NaaS ( network as a service ) across data centers.

---Cascading Neutron--
 /|  \
--cascaded Neutron--  -DCI controller-  --cascaded Neutron-
| ||
-ODL--   | OpenContrail
  |
--(Data center 1)--   --(DCI networking)--  --(Data center 2)--

Is it possible for us to discuss this in OpenStack Vancouver summit?


Most certainly, yes.  I will be there from mid Monday afternoon through 
to end Friday.  But it will be my first summit, so I have no idea yet as 
to how I might run into you - please can you suggest!



Best Regards
Chaoyi Huang ( Joe Huang )


Regards,
Neil

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-09 Thread Neil Jerram

Hi Mike,

Many thanks for your reply!

On 08/04/15 17:56, Mike Spreitzer wrote:

Are you looking at scaling the numbers of tenants, Neutron routers, and
tenant networks as you scale hosts and guests?  I think this is a
plausible way to grow.  The compartmentalizations that comes with
growing those things may make a difference in results.


Are you thinking of control plane or data plane limits?  In my email I 
was thinking of control plane points, such as


- how many compute host agents can communicate with the Neutron server

- how many Neutron server instances or threads are needed

- whether there are any limits associated with the Neutron DB (unlikely 
I guess).


Does the use of tenant networks and routers affect those points, in your 
experience?  That would be less obvious to me than simply how many 
compute hosts or Neutron servers there are.


On the data plane side - if that was more what you meant - I can 
certainly see the limits there and how they are alleviated by using 
tenant networks and routers, in the L2 model.  FWIW, my project Calico 
[1] tries to avoid those by not providing a L2 domain at all - which can 
make sense for workloads that only require or provide IP services - and 
instead routing data through the fabric.


To answer your question, then, no, I wasn't thinking of scaling tenant 
networks and routers, per your suggestion, because Calico doesn't do 
things that way (or alternatively because Calico already routes 
everywhere), and because I didn't think that would be relevant to the 
control plane scaling that I had in mind.  But I may be missing 
something, so please do say if so.


Many thanks,
Neil


[1] http://www.projectcalico.org/

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-08 Thread Mike Spreitzer
Are you looking at scaling the numbers of tenants, Neutron routers, and 
tenant networks as you scale hosts and guests?  I think this is a 
plausible way to grow.  The compartmentalizations that comes with growing 
those things may make a difference in results.

Thanks,
Mike



From:   Neil Jerram neil.jer...@metaswitch.com
To: openstack-dev@lists.openstack.org
Date:   04/08/2015 12:29 PM
Subject:[openstack-dev] [neutron] Neutron scaling datapoints?



My team is working on experiments looking at how far the Neutron server
will scale, with increasing numbers of compute hosts and VMs.  Does
anyone have any datapoints on this that they can share?  Or any clever
hints?

I'm already aware of the following ones:

https://javacruft.wordpress.com/2014/06/18/168k-instances/
 Icehouse
 118 compute hosts
 80 Neutron server processes (10 per core on each of 8 cores, on the
 controller node)
 27,000 VMs - but only after disabling all security/iptables

http://www.opencontrail.org/openstack-neutron-at-scale/
 1000 hosts
 5000 VMs
 3 Neutron servers (via a load balancer)
 But doesn't describe if any specific configuration is needed for this.
 (Other than using OpenContrail! :-))

Many thanks!
 Neil

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] Neutron scaling datapoints?

2015-04-08 Thread joehuang
Hi, Neil,

From theoretic, Neutron is like a broadcast domain, for example, enforcement 
of DVR and security group has to touch each regarding host where there is VM 
of this project resides. Even using SDN controller, the touch to regarding 
host is inevitable. If there are plenty of physical hosts, for example, 10k, 
inside one Neutron, it's very hard to overcome the broadcast storm issue 
under concurrent operation, that's the bottleneck for scalability of Neutron. 

We need layered architecture in Neutron to solve the broadcast domain 
bottleneck of scalability. The test report from OpenStack cascading shows that 
through layered architecture Neutron cascading, Neutron can supports up to 
million level ports and 100k level physical hosts. You can find the report 
here: 
http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers

Neutron cascading also brings extra benefit: One cascading Neutron can have 
many cascaded Neutrons, and different cascaded Neutron can leverage different 
SDN controller, maybe one is ODL, the other one is OpenContrail.

Cascading Neutron---
/ \
--cascaded Neutron--   --cascaded Neutron-
   |  |
-ODL--   OpenContrail


And furthermore, if using Neutron cascading in multiple data centers, the DCI 
controller (Data center inter-connection controller) can also be used under 
cascading Neutron, to provide NaaS ( network as a service ) across data centers.

---Cascading Neutron--
/|  \
--cascaded Neutron--  -DCI controller-  --cascaded Neutron-
   | ||
-ODL--   | OpenContrail
 |
--(Data center 1)--   --(DCI networking)--  --(Data center 2)--

Is it possible for us to discuss this in OpenStack Vancouver summit?

Best Regards
Chaoyi Huang ( Joe Huang )


-Original Message-
From: Neil Jerram [mailto:neil.jer...@metaswitch.com] 
Sent: Thursday, April 09, 2015 12:27 AM
To: openstack-dev@lists.openstack.org
Subject: [openstack-dev] [neutron] Neutron scaling datapoints?

My team is working on experiments looking at how far the Neutron server will 
scale, with increasing numbers of compute hosts and VMs.  Does anyone have any 
datapoints on this that they can share?  Or any clever hints?

I'm already aware of the following ones:

https://javacruft.wordpress.com/2014/06/18/168k-instances/
 Icehouse
 118 compute hosts
 80 Neutron server processes (10 per core on each of 8 cores, on the  
controller node)
 27,000 VMs - but only after disabling all security/iptables

http://www.opencontrail.org/openstack-neutron-at-scale/
 1000 hosts
 5000 VMs
 3 Neutron servers (via a load balancer)  But doesn't describe if any specific 
configuration is needed for this.
 (Other than using OpenContrail! :-))

Many thanks!
 Neil

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev