Re: [openstack-dev] [Nova] Cells conversation starter

2014-10-30 Thread Andrew Laski
I have written up some points on an etherpad to use during the summit 
session https://etherpad.openstack.org/p/kilo-nova-cells . Please read 
this over if possible before the session.  There is an alternate 
approach to this work proposed and I expect we'll spend some time 
discussing it.


If anyone would like to discuss it before then please reply here.


On 10/20/2014 02:00 PM, Andrew Laski wrote:
One of the big goals for the Kilo cycle by users and developers of the 
cells functionality within Nova is to get it to a point where it can 
be considered a first class citizen of Nova.  Ultimately I think this 
comes down to getting it tested by default in Nova jobs, and making it 
easy for developers to work with.  But there's a lot of work to get 
there.  In order to raise awareness of this effort, and get the 
conversation started on a few things, I've summarized a little bit 
about cells and this effort below.



Goals:

Testing of a single cell setup in the gate.
Feature parity.
Make cells the default implementation.  Developers write code once and 
it works for  cells.


Ultimately the goal is to improve maintainability of a large feature 
within the Nova code base.



Feature gaps:

Host aggregates
Security groups
Server groups


Shortcomings:

Flavor syncing
This needs to be addressed now.

Cells scheduling/rescheduling
Instances can not currently move between cells
These two won't affect the default one cell setup so they will be 
addressed later.



What does cells do:

Schedule an instance to a cell based on flavor slots available.
Proxy API requests to the proper cell.
Keep a copy of instance data at the global level for quick retrieval.
Sync data up from a child cell to keep the global level up to date.


Simplifying assumptions:

Cells will be treated as a two level tree structure.


Plan:

Fix flavor breakage in child cell which causes boot tests to fail. 
Currently the libvirt driver needs flavor.extra_specs which is not 
synced to the child cell.  Some options are to sync flavor and extra 
specs to child cell db, or pass full data with the request. 
https://review.openstack.org/#/c/126620/1 offers a means of passing 
full data with the request.


Determine proper switches to turn off Tempest tests for features that 
don't work with the goal of getting a voting job.  Once this is in 
place we can move towards feature parity and work on internal 
refactorings.


Work towards adding parity for host aggregates, security groups, and 
server groups.  They should be made to work in a single cell setup, 
but the solution should not preclude them from being used in multiple 
cells.  There needs to be some discussion as to whether a host 
aggregate or server group is a global concept or per cell concept.


Work towards merging compute/api.py and compute/cells_api.py so that 
developers only need to make changes/additions in once place.  The 
goal is for as much as possible to be hidden by the RPC layer, which 
will determine whether a call goes to a compute/conductor/cell.


For syncing data between cells, look at using objects to handle the 
logic of writing data to the cell/parent and then syncing the data to 
the other.


A potential migration scenario is to consider a non cells setup to be 
a child cell and converting to cells will mean setting up a parent 
cell and linking them.  There are periodic tasks in place to sync data 
up from a child already, but a manual kick off mechanism will need to 
be added.



Future plans:

Something that has been considered, but is out of scope for now, is 
that the parent/api cell doesn't need the same data model as the child 
cell.  Since the majority of what it does is act as a cache for API 
requests, it does not need all the data that a cell needs and what 
data it does need could be stored in a form that's optimized for reads.



Thoughts?

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Cells conversation starter

2014-10-30 Thread Vineet Menon
Hi Andrew,

Since you have mentioned one approach to solve the flavor meta data to
driver.spawn. I want to draw your attention to this code review thread as
well, https://review.openstack.org/#/c/108238/.

I didn't want to edit your etherpad notes, hence this email.


Regards,

Vineet Menon


On 30 October 2014 22:08, Andrew Laski andrew.la...@rackspace.com wrote:

 I have written up some points on an etherpad to use during the summit
 session https://etherpad.openstack.org/p/kilo-nova-cells . Please read
 this over if possible before the session.  There is an alternate approach
 to this work proposed and I expect we'll spend some time discussing it.

 If anyone would like to discuss it before then please reply here.



 On 10/20/2014 02:00 PM, Andrew Laski wrote:

 One of the big goals for the Kilo cycle by users and developers of the
 cells functionality within Nova is to get it to a point where it can be
 considered a first class citizen of Nova.  Ultimately I think this comes
 down to getting it tested by default in Nova jobs, and making it easy for
 developers to work with.  But there's a lot of work to get there.  In order
 to raise awareness of this effort, and get the conversation started on a
 few things, I've summarized a little bit about cells and this effort below.


 Goals:

 Testing of a single cell setup in the gate.
 Feature parity.
 Make cells the default implementation.  Developers write code once and it
 works for  cells.

 Ultimately the goal is to improve maintainability of a large feature
 within the Nova code base.


 Feature gaps:

 Host aggregates
 Security groups
 Server groups


 Shortcomings:

 Flavor syncing
 This needs to be addressed now.

 Cells scheduling/rescheduling
 Instances can not currently move between cells
 These two won't affect the default one cell setup so they will be
 addressed later.


 What does cells do:

 Schedule an instance to a cell based on flavor slots available.
 Proxy API requests to the proper cell.
 Keep a copy of instance data at the global level for quick retrieval.
 Sync data up from a child cell to keep the global level up to date.


 Simplifying assumptions:

 Cells will be treated as a two level tree structure.


 Plan:

 Fix flavor breakage in child cell which causes boot tests to fail.
 Currently the libvirt driver needs flavor.extra_specs which is not synced
 to the child cell.  Some options are to sync flavor and extra specs to
 child cell db, or pass full data with the request.
 https://review.openstack.org/#/c/126620/1 offers a means of passing full
 data with the request.

 Determine proper switches to turn off Tempest tests for features that
 don't work with the goal of getting a voting job.  Once this is in place we
 can move towards feature parity and work on internal refactorings.

 Work towards adding parity for host aggregates, security groups, and
 server groups.  They should be made to work in a single cell setup, but the
 solution should not preclude them from being used in multiple cells.  There
 needs to be some discussion as to whether a host aggregate or server group
 is a global concept or per cell concept.

 Work towards merging compute/api.py and compute/cells_api.py so that
 developers only need to make changes/additions in once place.  The goal is
 for as much as possible to be hidden by the RPC layer, which will determine
 whether a call goes to a compute/conductor/cell.

 For syncing data between cells, look at using objects to handle the logic
 of writing data to the cell/parent and then syncing the data to the other.

 A potential migration scenario is to consider a non cells setup to be a
 child cell and converting to cells will mean setting up a parent cell and
 linking them.  There are periodic tasks in place to sync data up from a
 child already, but a manual kick off mechanism will need to be added.


 Future plans:

 Something that has been considered, but is out of scope for now, is that
 the parent/api cell doesn't need the same data model as the child cell.
 Since the majority of what it does is act as a cache for API requests, it
 does not need all the data that a cell needs and what data it does need
 could be stored in a form that's optimized for reads.


 Thoughts?

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Cells conversation starter

2014-10-23 Thread Tim Bell
 -Original Message-
 From: Andrew Laski [mailto:andrew.la...@rackspace.com]
 Sent: 22 October 2014 21:12
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [Nova] Cells conversation starter
 
 
 On 10/22/2014 12:52 AM, Michael Still wrote:
  Thanks for this.
 
  It would be interesting to see how much of this work you think is
  achievable in Kilo. How long do you see this process taking? In line
  with that, is it just you currently working on this? Would calling for
  volunteers to help be meaningful?
 
 I think that getting a single cell setup tested in the gate is achievable.  I 
 think
 feature parity might be a stretch but could be achievable with enough hands to
 work on it.  Honestly I think that making cells the default implementation is
 going to take more than a cycle. But I think we can get some specifics worked
 out as to the direction and may be able to get to a point where the remaining
 work is mostly mechanical.
 

I think getting to feature parity would be a good Kilo objective. Moving to 
default is another step which would need migration
scripts from the non-cells setups which would need heavy testing. Aiming for L 
for that would seem reasonable given that we
are not drowning in volunteers.

 At the moment it is mainly me working on this with some support from a couple
 of people.  Volunteers would certainly be welcomed on this effort though.  If 
 it
 would be useful perhaps we could even have a cells subgroup to track progress
 and direction of this effort.
 

CERN and BARC (Bhaba Atomic Research Centre in Mumbai) would be interested in 
helping to close the gap. 

Tim

 
  Michael
 
  On Tue, Oct 21, 2014 at 5:00 AM, Andrew Laski
  andrew.la...@rackspace.com wrote:
  One of the big goals for the Kilo cycle by users and developers of
  the cells functionality within Nova is to get it to a point where it
  can be considered a first class citizen of Nova.  Ultimately I think
  this comes down to getting it tested by default in Nova jobs, and
  making it easy for developers to work with.  But there's a lot of
  work to get there.  In order to raise awareness of this effort, and
  get the conversation started on a few things, I've summarized a little bit
 about cells and this effort below.
 
 
  Goals:
 
  Testing of a single cell setup in the gate.
  Feature parity.
  Make cells the default implementation.  Developers write code once
  and it works for  cells.
 
  Ultimately the goal is to improve maintainability of a large feature
  within the Nova code base.
 
 
  Feature gaps:
 
  Host aggregates
  Security groups
  Server groups
 
 
  Shortcomings:
 
  Flavor syncing
   This needs to be addressed now.
 
  Cells scheduling/rescheduling
  Instances can not currently move between cells
   These two won't affect the default one cell setup so they will
  be addressed later.
 
 
  What does cells do:
 
  Schedule an instance to a cell based on flavor slots available.
  Proxy API requests to the proper cell.
  Keep a copy of instance data at the global level for quick retrieval.
  Sync data up from a child cell to keep the global level up to date.
 
 
  Simplifying assumptions:
 
  Cells will be treated as a two level tree structure.
 
 
  Plan:
 
  Fix flavor breakage in child cell which causes boot tests to fail.
  Currently the libvirt driver needs flavor.extra_specs which is not
  synced to the child cell.  Some options are to sync flavor and extra
  specs to child cell db, or pass full data with the request.
  https://review.openstack.org/#/c/126620/1
  offers a means of passing full data with the request.
 
  Determine proper switches to turn off Tempest tests for features that
  don't work with the goal of getting a voting job.  Once this is in
  place we can move towards feature parity and work on internal refactorings.
 
  Work towards adding parity for host aggregates, security groups, and
  server groups.  They should be made to work in a single cell setup,
  but the solution should not preclude them from being used in multiple
  cells.  There needs to be some discussion as to whether a host
  aggregate or server group is a global concept or per cell concept.
 
  Work towards merging compute/api.py and compute/cells_api.py so that
  developers only need to make changes/additions in once place.  The
  goal is for as much as possible to be hidden by the RPC layer, which
  will determine whether a call goes to a compute/conductor/cell.
 
  For syncing data between cells, look at using objects to handle the
  logic of writing data to the cell/parent and then syncing the data to the
 other.
 
  A potential migration scenario is to consider a non cells setup to be
  a child cell and converting to cells will mean setting up a parent
  cell and linking them.  There are periodic tasks in place to sync
  data up from a child already, but a manual kick off mechanism will need to 
  be
 added.
 
 
  Future plans:
 
  Something

Re: [openstack-dev] [Nova] Cells conversation starter

2014-10-23 Thread Andrew Laski


On 10/22/2014 08:11 PM, Sam Morrison wrote:

On 23 Oct 2014, at 5:55 am, Andrew Laski andrew.la...@rackspace.com wrote:


While I agree that N is a bit interesting, I have seen N=3 in production

[central API]--[state/region1]--[state/region DC1]
\-[state/region DC2]
   --[state/region2 DC]
   --[state/region3 DC]
   --[state/region4 DC]

I would be curious to hear any information about how this is working out.  Does 
everything that works for N=2 work when N=3?  Are there fixes that needed to be 
added to make this work?  Why do it this way rather than bring [state/region 
DC1] and [state/region DC2] up a level?

We (NeCTAR) have 3 tiers, our current setup has one parent, 6 children then 3 
of the children have 2 grandchildren each. All compute nodes are at the lowest 
level.

Everything works fine and we haven’t needed to do any modifications.

We run in a 3 tier system because it matches how our infrastructure is 
logically laid out, but I don’t see a problem in just having a 2 tier system 
and getting rid of the middle man.


There's no reason an N-tier system where N  2 shouldn't be feasible, 
but it's not going to be tested in this initial effort. So while we will 
try not to break it, it's hard to guarantee that. That's why my 
preference would be to remove that code and build up an N-tier system in 
conjunction with testing later.  But with a clear user of this 
functionality I don't think that's an option.




Sam


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Cells conversation starter

2014-10-22 Thread Vineet Menon
On 22 October 2014 06:24, Tom Fifield t...@openstack.org wrote:

 On 22/10/14 03:07, Andrew Laski wrote:
 
  On 10/21/2014 04:31 AM, Nikola Đipanov wrote:
  On 10/20/2014 08:00 PM, Andrew Laski wrote:
  One of the big goals for the Kilo cycle by users and developers of the
  cells functionality within Nova is to get it to a point where it can be
  considered a first class citizen of Nova.  Ultimately I think this
 comes
  down to getting it tested by default in Nova jobs, and making it easy
  for developers to work with.  But there's a lot of work to get there.
  In order to raise awareness of this effort, and get the conversation
  started on a few things, I've summarized a little bit about cells and
  this effort below.
 
 
  Goals:
 
  Testing of a single cell setup in the gate.
  Feature parity.
  Make cells the default implementation.  Developers write code once and
  it works for  cells.
 
  Ultimately the goal is to improve maintainability of a large feature
  within the Nova code base.
 
  Thanks for the write-up Andrew! Some thoughts/questions below. Looking
  forward to the discussion on some of these topics, and would be happy to
  review the code once we get to that point.
 
  Feature gaps:
 
  Host aggregates
  Security groups
  Server groups
 
 
  Shortcomings:
 
  Flavor syncing
   This needs to be addressed now.
 
  Cells scheduling/rescheduling
  Instances can not currently move between cells
   These two won't affect the default one cell setup so they will be
  addressed later.
 
 
  What does cells do:
 
  Schedule an instance to a cell based on flavor slots available.
  Proxy API requests to the proper cell.
  Keep a copy of instance data at the global level for quick retrieval.
  Sync data up from a child cell to keep the global level up to date.
 
 
  Simplifying assumptions:
 
  Cells will be treated as a two level tree structure.
 
  Are we thinking of making this official by removing code that actually
  allows cells to be an actual tree of depth N? I am not sure if doing so
  would be a win, although it does complicate the RPC/Messaging/State code
  a bit, but if it's not being used, even though a nice generalization,
  why keep it around?
 
  My preference would be to remove that code since I don't envision anyone
  writing tests to ensure that functionality works and/or doesn't
  regress.  But there's the challenge of not knowing if anyone is actually
  relying on that behavior.  So initially I'm not creating a specific work
  item to remove it.  But I think it needs to be made clear that it's not
  officially supported and may get removed unless a case is made for
  keeping it and work is put into testing it.

 While I agree that N is a bit interesting, I have seen N=3 in production

 [central API]--[state/region1]--[state/region DC1]
\-[state/region DC2]
   --[state/region2 DC]
   --[state/region3 DC]
   --[state/region4 DC]

 I'm curious.
What are the use cases for this deployment? Agreeably, root node runs n-api
along with horizon, key management etc. What components  are deployed in
tier 2 and tier 3?
And AFAIK, currently, openstack cell deployment isn't even a tree but DAG
since, one cell can have multiple parents. Has anyone come up any such
requirement?



 
  Plan:
 
  Fix flavor breakage in child cell which causes boot tests to fail.
  Currently the libvirt driver needs flavor.extra_specs which is not
  synced to the child cell.  Some options are to sync flavor and extra
  specs to child cell db, or pass full data with the request.
  https://review.openstack.org/#/c/126620/1 offers a means of passing
 full
  data with the request.
 
  Determine proper switches to turn off Tempest tests for features that
  don't work with the goal of getting a voting job.  Once this is in
 place
  we can move towards feature parity and work on internal refactorings.
 
  Work towards adding parity for host aggregates, security groups, and
  server groups.  They should be made to work in a single cell setup, but
  the solution should not preclude them from being used in multiple
  cells.  There needs to be some discussion as to whether a host
 aggregate
  or server group is a global concept or per cell concept.
 
  Have there been any previous discussions on this topic? If so I'd really
  like to read up on those to make sure I understand the pros and cons
  before the summit session.
 
  The only discussion I'm aware of is some comments on
  https://review.openstack.org/#/c/59101/ , though they mention a
  discussion at the Utah mid-cycle.
 
  The main con I'm aware of for defining these as global concepts is that
  there is no rescheduling capability in the cells scheduler.  So if a
  build is sent to a cell with a host aggregate that can't fit that
  instance the build will fail even though there may be space in that host
  aggregate from a global perspective.  That should be somewhat
  straightforward to 

Re: [openstack-dev] [Nova] Cells conversation starter

2014-10-22 Thread Dheeraj Gupta
Thanks Andrew for this (very) exhaustive list.

As you have pointed out, for all the missing features (I think flavors
can also be a part of that list) the community needs to decide where
the info lives primarily (API or compute cells) and how it is
propagated (Synced, sent with the request, asked on demand etc.)

With regards to flavors, I think the attention has shifted to getting
extra_specs to sync with child cells which isn't going to help much
because even instance_type isn't synced yet. And since instance_type
relies of auto generated IDs, syncing would be a major headache (One
cell is down when a new flavor is created or deleted). Storing
extra_specs along with instance_system_metadata is a good alternative
but if we assume API cells to be authoratative about flavors, then we
can simply pass flavor information with the boot request to the child
cell (This would also clean up non-cell nova code which currently
performs multiple DB lookups for flavors based on underlying virt
driver).

Any potential solutions for all these should also probably be
evaluated based on their compatibility with existing cell setups.

Maybe we should create a thread for all these missing features to
discuss their solutions individually or start the discussion in the
bug reports itself.

Regards,
Dheeraj

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Cells conversation starter

2014-10-22 Thread Andrew Laski


On 10/22/2014 12:24 AM, Tom Fifield wrote:

On 22/10/14 03:07, Andrew Laski wrote:

On 10/21/2014 04:31 AM, Nikola Đipanov wrote:

On 10/20/2014 08:00 PM, Andrew Laski wrote:

One of the big goals for the Kilo cycle by users and developers of the
cells functionality within Nova is to get it to a point where it can be
considered a first class citizen of Nova.  Ultimately I think this comes
down to getting it tested by default in Nova jobs, and making it easy
for developers to work with.  But there's a lot of work to get there.
In order to raise awareness of this effort, and get the conversation
started on a few things, I've summarized a little bit about cells and
this effort below.


Goals:

Testing of a single cell setup in the gate.
Feature parity.
Make cells the default implementation.  Developers write code once and
it works for  cells.

Ultimately the goal is to improve maintainability of a large feature
within the Nova code base.


Thanks for the write-up Andrew! Some thoughts/questions below. Looking
forward to the discussion on some of these topics, and would be happy to
review the code once we get to that point.


Feature gaps:

Host aggregates
Security groups
Server groups


Shortcomings:

Flavor syncing
  This needs to be addressed now.

Cells scheduling/rescheduling
Instances can not currently move between cells
  These two won't affect the default one cell setup so they will be
addressed later.


What does cells do:

Schedule an instance to a cell based on flavor slots available.
Proxy API requests to the proper cell.
Keep a copy of instance data at the global level for quick retrieval.
Sync data up from a child cell to keep the global level up to date.


Simplifying assumptions:

Cells will be treated as a two level tree structure.


Are we thinking of making this official by removing code that actually
allows cells to be an actual tree of depth N? I am not sure if doing so
would be a win, although it does complicate the RPC/Messaging/State code
a bit, but if it's not being used, even though a nice generalization,
why keep it around?

My preference would be to remove that code since I don't envision anyone
writing tests to ensure that functionality works and/or doesn't
regress.  But there's the challenge of not knowing if anyone is actually
relying on that behavior.  So initially I'm not creating a specific work
item to remove it.  But I think it needs to be made clear that it's not
officially supported and may get removed unless a case is made for
keeping it and work is put into testing it.

While I agree that N is a bit interesting, I have seen N=3 in production

[central API]--[state/region1]--[state/region DC1]
\-[state/region DC2]
   --[state/region2 DC]
   --[state/region3 DC]
   --[state/region4 DC]


I would be curious to hear any information about how this is working 
out.  Does everything that works for N=2 work when N=3?  Are there fixes 
that needed to be added to make this work?  Why do it this way rather 
than bring [state/region DC1] and [state/region DC2] up a level?







Plan:

Fix flavor breakage in child cell which causes boot tests to fail.
Currently the libvirt driver needs flavor.extra_specs which is not
synced to the child cell.  Some options are to sync flavor and extra
specs to child cell db, or pass full data with the request.
https://review.openstack.org/#/c/126620/1 offers a means of passing full
data with the request.

Determine proper switches to turn off Tempest tests for features that
don't work with the goal of getting a voting job.  Once this is in place
we can move towards feature parity and work on internal refactorings.

Work towards adding parity for host aggregates, security groups, and
server groups.  They should be made to work in a single cell setup, but
the solution should not preclude them from being used in multiple
cells.  There needs to be some discussion as to whether a host aggregate
or server group is a global concept or per cell concept.


Have there been any previous discussions on this topic? If so I'd really
like to read up on those to make sure I understand the pros and cons
before the summit session.

The only discussion I'm aware of is some comments on
https://review.openstack.org/#/c/59101/ , though they mention a
discussion at the Utah mid-cycle.

The main con I'm aware of for defining these as global concepts is that
there is no rescheduling capability in the cells scheduler.  So if a
build is sent to a cell with a host aggregate that can't fit that
instance the build will fail even though there may be space in that host
aggregate from a global perspective.  That should be somewhat
straightforward to address though.

I think it makes sense to define these as global concepts.  But these
are features that aren't used with cells yet so I haven't put a lot of
thought into potential arguments or cases for doing this one way or
another.



Work towards merging 

Re: [openstack-dev] [Nova] Cells conversation starter

2014-10-22 Thread Andrew Laski


On 10/22/2014 03:42 AM, Vineet Menon wrote:


On 22 October 2014 06:24, Tom Fifield t...@openstack.org 
mailto:t...@openstack.org wrote:


On 22/10/14 03:07, Andrew Laski wrote:

 On 10/21/2014 04:31 AM, Nikola Đipanov wrote:
 On 10/20/2014 08:00 PM, Andrew Laski wrote:
 One of the big goals for the Kilo cycle by users and
developers of the
 cells functionality within Nova is to get it to a point where
it can be
 considered a first class citizen of Nova.  Ultimately I think
this comes
 down to getting it tested by default in Nova jobs, and making
it easy
 for developers to work with.  But there's a lot of work to get
there.
 In order to raise awareness of this effort, and get the
conversation
 started on a few things, I've summarized a little bit about
cells and
 this effort below.


 Goals:

 Testing of a single cell setup in the gate.
 Feature parity.
 Make cells the default implementation. Developers write code
once and
 it works for  cells.

 Ultimately the goal is to improve maintainability of a large
feature
 within the Nova code base.

 Thanks for the write-up Andrew! Some thoughts/questions below.
Looking
 forward to the discussion on some of these topics, and would be
happy to
 review the code once we get to that point.

 Feature gaps:

 Host aggregates
 Security groups
 Server groups


 Shortcomings:

 Flavor syncing
  This needs to be addressed now.

 Cells scheduling/rescheduling
 Instances can not currently move between cells
  These two won't affect the default one cell setup so they
will be
 addressed later.


 What does cells do:

 Schedule an instance to a cell based on flavor slots available.
 Proxy API requests to the proper cell.
 Keep a copy of instance data at the global level for quick
retrieval.
 Sync data up from a child cell to keep the global level up to
date.


 Simplifying assumptions:

 Cells will be treated as a two level tree structure.

 Are we thinking of making this official by removing code that
actually
 allows cells to be an actual tree of depth N? I am not sure if
doing so
 would be a win, although it does complicate the
RPC/Messaging/State code
 a bit, but if it's not being used, even though a nice
generalization,
 why keep it around?

 My preference would be to remove that code since I don't
envision anyone
 writing tests to ensure that functionality works and/or doesn't
 regress.  But there's the challenge of not knowing if anyone is
actually
 relying on that behavior.  So initially I'm not creating a
specific work
 item to remove it.  But I think it needs to be made clear that
it's not
 officially supported and may get removed unless a case is made for
 keeping it and work is put into testing it.

While I agree that N is a bit interesting, I have seen N=3 in
production

[central API]--[state/region1]--[state/region DC1]
   \-[state/region DC2]
  --[state/region2 DC]
  --[state/region3 DC]
  --[state/region4 DC]

I'm curious.
What are the use cases for this deployment? Agreeably, root node runs 
n-api along with horizon, key management etc. What components  are 
deployed in tier 2 and tier 3?
And AFAIK, currently, openstack cell deployment isn't even a tree but 
DAG since, one cell can have multiple parents. Has anyone come up any 
such requirement?





While there's nothing to prevent a cell from having multiple parents I 
would be curious to know if this would actually work in practice, since 
I can imagine a number of cases that might cause problems. And is there 
a practical use for this?


Maybe we should start logging a warning when this is setup stating that 
this is an unsupported(i.e. untested) configuration to start to codify 
the design as that of a tree.  At least for the initial scope of work I 
think this makes sense, and if a case is made for a DAG setup that can 
be done independently.





 Plan:

 Fix flavor breakage in child cell which causes boot tests to fail.
 Currently the libvirt driver needs flavor.extra_specs which is not
 synced to the child cell.  Some options are to sync flavor and
extra
 specs to child cell db, or pass full data with the request.
 https://review.openstack.org/#/c/126620/1 offers a means of
passing full
 data with the request.

 Determine proper switches to turn off Tempest tests for
features that
 don't work with the goal of getting a voting job.  Once this
is in place
 we can move towards feature parity and work on internal
refactorings.

 Work towards adding parity for 

Re: [openstack-dev] [Nova] Cells conversation starter

2014-10-22 Thread Andrew Laski


On 10/22/2014 12:52 AM, Michael Still wrote:

Thanks for this.

It would be interesting to see how much of this work you think is
achievable in Kilo. How long do you see this process taking? In line
with that, is it just you currently working on this? Would calling for
volunteers to help be meaningful?


I think that getting a single cell setup tested in the gate is 
achievable.  I think feature parity might be a stretch but could be 
achievable with enough hands to work on it.  Honestly I think that 
making cells the default implementation is going to take more than a 
cycle. But I think we can get some specifics worked out as to the 
direction and may be able to get to a point where the remaining work is 
mostly mechanical.


At the moment it is mainly me working on this with some support from a 
couple of people.  Volunteers would certainly be welcomed on this effort 
though.  If it would be useful perhaps we could even have a cells 
subgroup to track progress and direction of this effort.




Michael

On Tue, Oct 21, 2014 at 5:00 AM, Andrew Laski
andrew.la...@rackspace.com wrote:

One of the big goals for the Kilo cycle by users and developers of the cells
functionality within Nova is to get it to a point where it can be considered
a first class citizen of Nova.  Ultimately I think this comes down to
getting it tested by default in Nova jobs, and making it easy for developers
to work with.  But there's a lot of work to get there.  In order to raise
awareness of this effort, and get the conversation started on a few things,
I've summarized a little bit about cells and this effort below.


Goals:

Testing of a single cell setup in the gate.
Feature parity.
Make cells the default implementation.  Developers write code once and it
works for  cells.

Ultimately the goal is to improve maintainability of a large feature within
the Nova code base.


Feature gaps:

Host aggregates
Security groups
Server groups


Shortcomings:

Flavor syncing
 This needs to be addressed now.

Cells scheduling/rescheduling
Instances can not currently move between cells
 These two won't affect the default one cell setup so they will be
addressed later.


What does cells do:

Schedule an instance to a cell based on flavor slots available.
Proxy API requests to the proper cell.
Keep a copy of instance data at the global level for quick retrieval.
Sync data up from a child cell to keep the global level up to date.


Simplifying assumptions:

Cells will be treated as a two level tree structure.


Plan:

Fix flavor breakage in child cell which causes boot tests to fail. Currently
the libvirt driver needs flavor.extra_specs which is not synced to the child
cell.  Some options are to sync flavor and extra specs to child cell db, or
pass full data with the request. https://review.openstack.org/#/c/126620/1
offers a means of passing full data with the request.

Determine proper switches to turn off Tempest tests for features that don't
work with the goal of getting a voting job.  Once this is in place we can
move towards feature parity and work on internal refactorings.

Work towards adding parity for host aggregates, security groups, and server
groups.  They should be made to work in a single cell setup, but the
solution should not preclude them from being used in multiple cells.  There
needs to be some discussion as to whether a host aggregate or server group
is a global concept or per cell concept.

Work towards merging compute/api.py and compute/cells_api.py so that
developers only need to make changes/additions in once place.  The goal is
for as much as possible to be hidden by the RPC layer, which will determine
whether a call goes to a compute/conductor/cell.

For syncing data between cells, look at using objects to handle the logic of
writing data to the cell/parent and then syncing the data to the other.

A potential migration scenario is to consider a non cells setup to be a
child cell and converting to cells will mean setting up a parent cell and
linking them.  There are periodic tasks in place to sync data up from a
child already, but a manual kick off mechanism will need to be added.


Future plans:

Something that has been considered, but is out of scope for now, is that the
parent/api cell doesn't need the same data model as the child cell.  Since
the majority of what it does is act as a cache for API requests, it does not
need all the data that a cell needs and what data it does need could be
stored in a form that's optimized for reads.


Thoughts?

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev






___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Cells conversation starter

2014-10-22 Thread Sam Morrison

 On 23 Oct 2014, at 5:55 am, Andrew Laski andrew.la...@rackspace.com wrote:
 
 While I agree that N is a bit interesting, I have seen N=3 in production
 
 [central API]--[state/region1]--[state/region DC1]
\-[state/region DC2]
   --[state/region2 DC]
   --[state/region3 DC]
   --[state/region4 DC]
 
 I would be curious to hear any information about how this is working out.  
 Does everything that works for N=2 work when N=3?  Are there fixes that 
 needed to be added to make this work?  Why do it this way rather than bring 
 [state/region DC1] and [state/region DC2] up a level?

We (NeCTAR) have 3 tiers, our current setup has one parent, 6 children then 3 
of the children have 2 grandchildren each. All compute nodes are at the lowest 
level.

Everything works fine and we haven’t needed to do any modifications. 

We run in a 3 tier system because it matches how our infrastructure is 
logically laid out, but I don’t see a problem in just having a 2 tier system 
and getting rid of the middle man.

Sam


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Cells conversation starter

2014-10-22 Thread loy wolfe
I suspect whether cell itself could ease large scale deployment across
multiple DC. Although openstack is modularized as separated projects,
but top level total solution across projects is needed when
considering issues such as scaling, especially coordination between
nova and network / metering.

Personally +1 to Steve, that in terms of future plans decision for
scaling, we should explore all similar proposals such as Cascading and
Federation. In fact networking and metering face the same challenge as
nova, or even more with l2pop and dvr. Those two proposals both bring
total solution across projects besides nova, with each has its own
emphasis by the view of intra vs. inter admin domain.


On Thu, Oct 23, 2014 at 8:11 AM, Sam Morrison sorri...@gmail.com wrote:

 On 23 Oct 2014, at 5:55 am, Andrew Laski andrew.la...@rackspace.com wrote:

 While I agree that N is a bit interesting, I have seen N=3 in production

 [central API]--[state/region1]--[state/region DC1]
\-[state/region DC2]
   --[state/region2 DC]
   --[state/region3 DC]
   --[state/region4 DC]

 I would be curious to hear any information about how this is working out.  
 Does everything that works for N=2 work when N=3?  Are there fixes that 
 needed to be added to make this work?  Why do it this way rather than bring 
 [state/region DC1] and [state/region DC2] up a level?

 We (NeCTAR) have 3 tiers, our current setup has one parent, 6 children then 3 
 of the children have 2 grandchildren each. All compute nodes are at the 
 lowest level.

 Everything works fine and we haven’t needed to do any modifications.

 We run in a 3 tier system because it matches how our infrastructure is 
 logically laid out, but I don’t see a problem in just having a 2 tier system 
 and getting rid of the middle man.

 Sam


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Cells conversation starter

2014-10-21 Thread Nikola Đipanov
On 10/20/2014 08:00 PM, Andrew Laski wrote:
 One of the big goals for the Kilo cycle by users and developers of the
 cells functionality within Nova is to get it to a point where it can be
 considered a first class citizen of Nova.  Ultimately I think this comes
 down to getting it tested by default in Nova jobs, and making it easy
 for developers to work with.  But there's a lot of work to get there. 
 In order to raise awareness of this effort, and get the conversation
 started on a few things, I've summarized a little bit about cells and
 this effort below.
 
 
 Goals:
 
 Testing of a single cell setup in the gate.
 Feature parity.
 Make cells the default implementation.  Developers write code once and
 it works for  cells.
 
 Ultimately the goal is to improve maintainability of a large feature
 within the Nova code base.


Thanks for the write-up Andrew! Some thoughts/questions below. Looking
forward to the discussion on some of these topics, and would be happy to
review the code once we get to that point.

 
 Feature gaps:
 
 Host aggregates
 Security groups
 Server groups
 
 
 Shortcomings:
 
 Flavor syncing
 This needs to be addressed now.
 
 Cells scheduling/rescheduling
 Instances can not currently move between cells
 These two won't affect the default one cell setup so they will be
 addressed later.
 
 
 What does cells do:
 
 Schedule an instance to a cell based on flavor slots available.
 Proxy API requests to the proper cell.
 Keep a copy of instance data at the global level for quick retrieval.
 Sync data up from a child cell to keep the global level up to date.
 
 
 Simplifying assumptions:
 
 Cells will be treated as a two level tree structure.
 

Are we thinking of making this official by removing code that actually
allows cells to be an actual tree of depth N? I am not sure if doing so
would be a win, although it does complicate the RPC/Messaging/State code
a bit, but if it's not being used, even though a nice generalization,
why keep it around?

 
 Plan:
 
 Fix flavor breakage in child cell which causes boot tests to fail.
 Currently the libvirt driver needs flavor.extra_specs which is not
 synced to the child cell.  Some options are to sync flavor and extra
 specs to child cell db, or pass full data with the request.
 https://review.openstack.org/#/c/126620/1 offers a means of passing full
 data with the request.
 
 Determine proper switches to turn off Tempest tests for features that
 don't work with the goal of getting a voting job.  Once this is in place
 we can move towards feature parity and work on internal refactorings.
 
 Work towards adding parity for host aggregates, security groups, and
 server groups.  They should be made to work in a single cell setup, but
 the solution should not preclude them from being used in multiple
 cells.  There needs to be some discussion as to whether a host aggregate
 or server group is a global concept or per cell concept.
 

Have there been any previous discussions on this topic? If so I'd really
like to read up on those to make sure I understand the pros and cons
before the summit session.

 Work towards merging compute/api.py and compute/cells_api.py so that
 developers only need to make changes/additions in once place.  The goal
 is for as much as possible to be hidden by the RPC layer, which will
 determine whether a call goes to a compute/conductor/cell.
 
 For syncing data between cells, look at using objects to handle the
 logic of writing data to the cell/parent and then syncing the data to
 the other.
 

Some of that work has been done already, although in a somewhat ad-hoc
fashion, were you thinking of extending objects to support this natively
(whatever that means), or do we continue to inline the code in the
existing object methods.

 A potential migration scenario is to consider a non cells setup to be a
 child cell and converting to cells will mean setting up a parent cell
 and linking them.  There are periodic tasks in place to sync data up
 from a child already, but a manual kick off mechanism will need to be
 added.
 
 
 Future plans:
 
 Something that has been considered, but is out of scope for now, is that
 the parent/api cell doesn't need the same data model as the child cell. 
 Since the majority of what it does is act as a cache for API requests,
 it does not need all the data that a cell needs and what data it does
 need could be stored in a form that's optimized for reads.
 
 
 Thoughts?
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Cells conversation starter

2014-10-21 Thread Belmiro Moreira
Hi,
to help the discussion,
a small compilation about the bugs and previous attempts to fix the
missing functionality in cells.


Aggregates
https://bugs.launchpad.net/nova/+bug/1161208
https://blueprints.launchpad.net/nova/+spec/cells-aggregate-support
https://review.openstack.org/#/c/25813/

Server Groups
https://bugs.launchpad.net/nova/+bug/1369518

Security Groups
https://bugs.launchpad.net/nova/+bug/1274325


Belmiro


On Tue, Oct 21, 2014 at 10:31 AM, Nikola Đipanov ndipa...@redhat.com
wrote:

 On 10/20/2014 08:00 PM, Andrew Laski wrote:
  One of the big goals for the Kilo cycle by users and developers of the
  cells functionality within Nova is to get it to a point where it can be
  considered a first class citizen of Nova.  Ultimately I think this comes
  down to getting it tested by default in Nova jobs, and making it easy
  for developers to work with.  But there's a lot of work to get there.
  In order to raise awareness of this effort, and get the conversation
  started on a few things, I've summarized a little bit about cells and
  this effort below.
 
 
  Goals:
 
  Testing of a single cell setup in the gate.
  Feature parity.
  Make cells the default implementation.  Developers write code once and
  it works for  cells.
 
  Ultimately the goal is to improve maintainability of a large feature
  within the Nova code base.
 

 Thanks for the write-up Andrew! Some thoughts/questions below. Looking
 forward to the discussion on some of these topics, and would be happy to
 review the code once we get to that point.

 
  Feature gaps:
 
  Host aggregates
  Security groups
  Server groups
 
 
  Shortcomings:
 
  Flavor syncing
  This needs to be addressed now.
 
  Cells scheduling/rescheduling
  Instances can not currently move between cells
  These two won't affect the default one cell setup so they will be
  addressed later.
 
 
  What does cells do:
 
  Schedule an instance to a cell based on flavor slots available.
  Proxy API requests to the proper cell.
  Keep a copy of instance data at the global level for quick retrieval.
  Sync data up from a child cell to keep the global level up to date.
 
 
  Simplifying assumptions:
 
  Cells will be treated as a two level tree structure.
 

 Are we thinking of making this official by removing code that actually
 allows cells to be an actual tree of depth N? I am not sure if doing so
 would be a win, although it does complicate the RPC/Messaging/State code
 a bit, but if it's not being used, even though a nice generalization,
 why keep it around?

 
  Plan:
 
  Fix flavor breakage in child cell which causes boot tests to fail.
  Currently the libvirt driver needs flavor.extra_specs which is not
  synced to the child cell.  Some options are to sync flavor and extra
  specs to child cell db, or pass full data with the request.
  https://review.openstack.org/#/c/126620/1 offers a means of passing full
  data with the request.
 
  Determine proper switches to turn off Tempest tests for features that
  don't work with the goal of getting a voting job.  Once this is in place
  we can move towards feature parity and work on internal refactorings.
 
  Work towards adding parity for host aggregates, security groups, and
  server groups.  They should be made to work in a single cell setup, but
  the solution should not preclude them from being used in multiple
  cells.  There needs to be some discussion as to whether a host aggregate
  or server group is a global concept or per cell concept.
 

 Have there been any previous discussions on this topic? If so I'd really
 like to read up on those to make sure I understand the pros and cons
 before the summit session.

  Work towards merging compute/api.py and compute/cells_api.py so that
  developers only need to make changes/additions in once place.  The goal
  is for as much as possible to be hidden by the RPC layer, which will
  determine whether a call goes to a compute/conductor/cell.
 
  For syncing data between cells, look at using objects to handle the
  logic of writing data to the cell/parent and then syncing the data to
  the other.
 

 Some of that work has been done already, although in a somewhat ad-hoc
 fashion, were you thinking of extending objects to support this natively
 (whatever that means), or do we continue to inline the code in the
 existing object methods.

  A potential migration scenario is to consider a non cells setup to be a
  child cell and converting to cells will mean setting up a parent cell
  and linking them.  There are periodic tasks in place to sync data up
  from a child already, but a manual kick off mechanism will need to be
  added.
 
 
  Future plans:
 
  Something that has been considered, but is out of scope for now, is that
  the parent/api cell doesn't need the same data model as the child cell.
  Since the majority of what it does is act as a cache for API requests,
  it does not need all the data that a cell needs and what data it does
  need could be stored 

Re: [openstack-dev] [Nova] Cells conversation starter

2014-10-21 Thread Andrew Laski


On 10/21/2014 04:31 AM, Nikola Đipanov wrote:

On 10/20/2014 08:00 PM, Andrew Laski wrote:

One of the big goals for the Kilo cycle by users and developers of the
cells functionality within Nova is to get it to a point where it can be
considered a first class citizen of Nova.  Ultimately I think this comes
down to getting it tested by default in Nova jobs, and making it easy
for developers to work with.  But there's a lot of work to get there.
In order to raise awareness of this effort, and get the conversation
started on a few things, I've summarized a little bit about cells and
this effort below.


Goals:

Testing of a single cell setup in the gate.
Feature parity.
Make cells the default implementation.  Developers write code once and
it works for  cells.

Ultimately the goal is to improve maintainability of a large feature
within the Nova code base.


Thanks for the write-up Andrew! Some thoughts/questions below. Looking
forward to the discussion on some of these topics, and would be happy to
review the code once we get to that point.


Feature gaps:

Host aggregates
Security groups
Server groups


Shortcomings:

Flavor syncing
 This needs to be addressed now.

Cells scheduling/rescheduling
Instances can not currently move between cells
 These two won't affect the default one cell setup so they will be
addressed later.


What does cells do:

Schedule an instance to a cell based on flavor slots available.
Proxy API requests to the proper cell.
Keep a copy of instance data at the global level for quick retrieval.
Sync data up from a child cell to keep the global level up to date.


Simplifying assumptions:

Cells will be treated as a two level tree structure.


Are we thinking of making this official by removing code that actually
allows cells to be an actual tree of depth N? I am not sure if doing so
would be a win, although it does complicate the RPC/Messaging/State code
a bit, but if it's not being used, even though a nice generalization,
why keep it around?


My preference would be to remove that code since I don't envision anyone 
writing tests to ensure that functionality works and/or doesn't 
regress.  But there's the challenge of not knowing if anyone is actually 
relying on that behavior.  So initially I'm not creating a specific work 
item to remove it.  But I think it needs to be made clear that it's not 
officially supported and may get removed unless a case is made for 
keeping it and work is put into testing it.





Plan:

Fix flavor breakage in child cell which causes boot tests to fail.
Currently the libvirt driver needs flavor.extra_specs which is not
synced to the child cell.  Some options are to sync flavor and extra
specs to child cell db, or pass full data with the request.
https://review.openstack.org/#/c/126620/1 offers a means of passing full
data with the request.

Determine proper switches to turn off Tempest tests for features that
don't work with the goal of getting a voting job.  Once this is in place
we can move towards feature parity and work on internal refactorings.

Work towards adding parity for host aggregates, security groups, and
server groups.  They should be made to work in a single cell setup, but
the solution should not preclude them from being used in multiple
cells.  There needs to be some discussion as to whether a host aggregate
or server group is a global concept or per cell concept.


Have there been any previous discussions on this topic? If so I'd really
like to read up on those to make sure I understand the pros and cons
before the summit session.


The only discussion I'm aware of is some comments on 
https://review.openstack.org/#/c/59101/ , though they mention a 
discussion at the Utah mid-cycle.


The main con I'm aware of for defining these as global concepts is that 
there is no rescheduling capability in the cells scheduler.  So if a 
build is sent to a cell with a host aggregate that can't fit that 
instance the build will fail even though there may be space in that host 
aggregate from a global perspective.  That should be somewhat 
straightforward to address though.


I think it makes sense to define these as global concepts.  But these 
are features that aren't used with cells yet so I haven't put a lot of 
thought into potential arguments or cases for doing this one way or another.




Work towards merging compute/api.py and compute/cells_api.py so that
developers only need to make changes/additions in once place.  The goal
is for as much as possible to be hidden by the RPC layer, which will
determine whether a call goes to a compute/conductor/cell.

For syncing data between cells, look at using objects to handle the
logic of writing data to the cell/parent and then syncing the data to
the other.


Some of that work has been done already, although in a somewhat ad-hoc
fashion, were you thinking of extending objects to support this natively
(whatever that means), or do we continue to inline the code in the
existing object methods.



Re: [openstack-dev] [Nova] Cells conversation starter

2014-10-21 Thread Steve Gordon
- Original Message -
 From: Andrew Laski andrew.la...@rackspace.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 
 One of the big goals for the Kilo cycle by users and developers of the
 cells functionality within Nova is to get it to a point where it can be
 considered a first class citizen of Nova.  Ultimately I think this comes
 down to getting it tested by default in Nova jobs, and making it easy
 for developers to work with.  But there's a lot of work to get there.
 In order to raise awareness of this effort, and get the conversation
 started on a few things, I've summarized a little bit about cells and
 this effort below.
 
 
 Goals:
 
 Testing of a single cell setup in the gate.
 Feature parity.
 Make cells the default implementation.  Developers write code once and
 it works for  cells.
 
 Ultimately the goal is to improve maintainability of a large feature
 within the Nova code base.
 
 
 Feature gaps:
 
 Host aggregates
 Security groups
 Server groups
 
 
 Shortcomings:
 
 Flavor syncing
  This needs to be addressed now.
 
 Cells scheduling/rescheduling
 Instances can not currently move between cells
  These two won't affect the default one cell setup so they will be
 addressed later.
 
 
 What does cells do:
 
 Schedule an instance to a cell based on flavor slots available.
 Proxy API requests to the proper cell.
 Keep a copy of instance data at the global level for quick retrieval.
 Sync data up from a child cell to keep the global level up to date.
 
 
 Simplifying assumptions:
 
 Cells will be treated as a two level tree structure.
 
 
 Plan:
 
 Fix flavor breakage in child cell which causes boot tests to fail.
 Currently the libvirt driver needs flavor.extra_specs which is not
 synced to the child cell.  Some options are to sync flavor and extra
 specs to child cell db, or pass full data with the request.
 https://review.openstack.org/#/c/126620/1 offers a means of passing full
 data with the request.
 
 Determine proper switches to turn off Tempest tests for features that
 don't work with the goal of getting a voting job.  Once this is in place
 we can move towards feature parity and work on internal refactorings.
 
 Work towards adding parity for host aggregates, security groups, and
 server groups.  They should be made to work in a single cell setup, but
 the solution should not preclude them from being used in multiple
 cells.  There needs to be some discussion as to whether a host aggregate
 or server group is a global concept or per cell concept.
 
 Work towards merging compute/api.py and compute/cells_api.py so that
 developers only need to make changes/additions in once place.  The goal
 is for as much as possible to be hidden by the RPC layer, which will
 determine whether a call goes to a compute/conductor/cell.
 
 For syncing data between cells, look at using objects to handle the
 logic of writing data to the cell/parent and then syncing the data to
 the other.
 
 A potential migration scenario is to consider a non cells setup to be a
 child cell and converting to cells will mean setting up a parent cell
 and linking them.  There are periodic tasks in place to sync data up
 from a child already, but a manual kick off mechanism will need to be added.
 
 
 Future plans:
 
 Something that has been considered, but is out of scope for now, is that
 the parent/api cell doesn't need the same data model as the child cell.
 Since the majority of what it does is act as a cache for API requests,
 it does not need all the data that a cell needs and what data it does
 need could be stored in a form that's optimized for reads.

In terms of future plans I'd like to also explore how continued iteration on 
the Cells concept might line up against the use cases presented in the recent 
threads on OpenStack cascading:

* http://lists.openstack.org/pipermail/openstack-dev/2014-September/047470.html
* http://lists.openstack.org/pipermail/openstack-dev/2014-October/047526.html

Thanks,

Steve

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Cells conversation starter

2014-10-21 Thread Tom Fifield
On 22/10/14 03:07, Andrew Laski wrote:
 
 On 10/21/2014 04:31 AM, Nikola Đipanov wrote:
 On 10/20/2014 08:00 PM, Andrew Laski wrote:
 One of the big goals for the Kilo cycle by users and developers of the
 cells functionality within Nova is to get it to a point where it can be
 considered a first class citizen of Nova.  Ultimately I think this comes
 down to getting it tested by default in Nova jobs, and making it easy
 for developers to work with.  But there's a lot of work to get there.
 In order to raise awareness of this effort, and get the conversation
 started on a few things, I've summarized a little bit about cells and
 this effort below.


 Goals:

 Testing of a single cell setup in the gate.
 Feature parity.
 Make cells the default implementation.  Developers write code once and
 it works for  cells.

 Ultimately the goal is to improve maintainability of a large feature
 within the Nova code base.

 Thanks for the write-up Andrew! Some thoughts/questions below. Looking
 forward to the discussion on some of these topics, and would be happy to
 review the code once we get to that point.

 Feature gaps:

 Host aggregates
 Security groups
 Server groups


 Shortcomings:

 Flavor syncing
  This needs to be addressed now.

 Cells scheduling/rescheduling
 Instances can not currently move between cells
  These two won't affect the default one cell setup so they will be
 addressed later.


 What does cells do:

 Schedule an instance to a cell based on flavor slots available.
 Proxy API requests to the proper cell.
 Keep a copy of instance data at the global level for quick retrieval.
 Sync data up from a child cell to keep the global level up to date.


 Simplifying assumptions:

 Cells will be treated as a two level tree structure.

 Are we thinking of making this official by removing code that actually
 allows cells to be an actual tree of depth N? I am not sure if doing so
 would be a win, although it does complicate the RPC/Messaging/State code
 a bit, but if it's not being used, even though a nice generalization,
 why keep it around?
 
 My preference would be to remove that code since I don't envision anyone
 writing tests to ensure that functionality works and/or doesn't
 regress.  But there's the challenge of not knowing if anyone is actually
 relying on that behavior.  So initially I'm not creating a specific work
 item to remove it.  But I think it needs to be made clear that it's not
 officially supported and may get removed unless a case is made for
 keeping it and work is put into testing it.

While I agree that N is a bit interesting, I have seen N=3 in production

[central API]--[state/region1]--[state/region DC1]
   \-[state/region DC2]
  --[state/region2 DC]
  --[state/region3 DC]
  --[state/region4 DC]




 Plan:

 Fix flavor breakage in child cell which causes boot tests to fail.
 Currently the libvirt driver needs flavor.extra_specs which is not
 synced to the child cell.  Some options are to sync flavor and extra
 specs to child cell db, or pass full data with the request.
 https://review.openstack.org/#/c/126620/1 offers a means of passing full
 data with the request.

 Determine proper switches to turn off Tempest tests for features that
 don't work with the goal of getting a voting job.  Once this is in place
 we can move towards feature parity and work on internal refactorings.

 Work towards adding parity for host aggregates, security groups, and
 server groups.  They should be made to work in a single cell setup, but
 the solution should not preclude them from being used in multiple
 cells.  There needs to be some discussion as to whether a host aggregate
 or server group is a global concept or per cell concept.

 Have there been any previous discussions on this topic? If so I'd really
 like to read up on those to make sure I understand the pros and cons
 before the summit session.
 
 The only discussion I'm aware of is some comments on
 https://review.openstack.org/#/c/59101/ , though they mention a
 discussion at the Utah mid-cycle.
 
 The main con I'm aware of for defining these as global concepts is that
 there is no rescheduling capability in the cells scheduler.  So if a
 build is sent to a cell with a host aggregate that can't fit that
 instance the build will fail even though there may be space in that host
 aggregate from a global perspective.  That should be somewhat
 straightforward to address though.
 
 I think it makes sense to define these as global concepts.  But these
 are features that aren't used with cells yet so I haven't put a lot of
 thought into potential arguments or cases for doing this one way or
 another.
 
 
 Work towards merging compute/api.py and compute/cells_api.py so that
 developers only need to make changes/additions in once place.  The goal
 is for as much as possible to be hidden by the RPC layer, which will
 determine whether a call goes to a compute/conductor/cell.

 

Re: [openstack-dev] [Nova] Cells conversation starter

2014-10-21 Thread Michael Still
Thanks for this.

It would be interesting to see how much of this work you think is
achievable in Kilo. How long do you see this process taking? In line
with that, is it just you currently working on this? Would calling for
volunteers to help be meaningful?

Michael

On Tue, Oct 21, 2014 at 5:00 AM, Andrew Laski
andrew.la...@rackspace.com wrote:
 One of the big goals for the Kilo cycle by users and developers of the cells
 functionality within Nova is to get it to a point where it can be considered
 a first class citizen of Nova.  Ultimately I think this comes down to
 getting it tested by default in Nova jobs, and making it easy for developers
 to work with.  But there's a lot of work to get there.  In order to raise
 awareness of this effort, and get the conversation started on a few things,
 I've summarized a little bit about cells and this effort below.


 Goals:

 Testing of a single cell setup in the gate.
 Feature parity.
 Make cells the default implementation.  Developers write code once and it
 works for  cells.

 Ultimately the goal is to improve maintainability of a large feature within
 the Nova code base.


 Feature gaps:

 Host aggregates
 Security groups
 Server groups


 Shortcomings:

 Flavor syncing
 This needs to be addressed now.

 Cells scheduling/rescheduling
 Instances can not currently move between cells
 These two won't affect the default one cell setup so they will be
 addressed later.


 What does cells do:

 Schedule an instance to a cell based on flavor slots available.
 Proxy API requests to the proper cell.
 Keep a copy of instance data at the global level for quick retrieval.
 Sync data up from a child cell to keep the global level up to date.


 Simplifying assumptions:

 Cells will be treated as a two level tree structure.


 Plan:

 Fix flavor breakage in child cell which causes boot tests to fail. Currently
 the libvirt driver needs flavor.extra_specs which is not synced to the child
 cell.  Some options are to sync flavor and extra specs to child cell db, or
 pass full data with the request. https://review.openstack.org/#/c/126620/1
 offers a means of passing full data with the request.

 Determine proper switches to turn off Tempest tests for features that don't
 work with the goal of getting a voting job.  Once this is in place we can
 move towards feature parity and work on internal refactorings.

 Work towards adding parity for host aggregates, security groups, and server
 groups.  They should be made to work in a single cell setup, but the
 solution should not preclude them from being used in multiple cells.  There
 needs to be some discussion as to whether a host aggregate or server group
 is a global concept or per cell concept.

 Work towards merging compute/api.py and compute/cells_api.py so that
 developers only need to make changes/additions in once place.  The goal is
 for as much as possible to be hidden by the RPC layer, which will determine
 whether a call goes to a compute/conductor/cell.

 For syncing data between cells, look at using objects to handle the logic of
 writing data to the cell/parent and then syncing the data to the other.

 A potential migration scenario is to consider a non cells setup to be a
 child cell and converting to cells will mean setting up a parent cell and
 linking them.  There are periodic tasks in place to sync data up from a
 child already, but a manual kick off mechanism will need to be added.


 Future plans:

 Something that has been considered, but is out of scope for now, is that the
 parent/api cell doesn't need the same data model as the child cell.  Since
 the majority of what it does is act as a cache for API requests, it does not
 need all the data that a cell needs and what data it does need could be
 stored in a form that's optimized for reads.


 Thoughts?

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Rackspace Australia

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Nova] Cells conversation starter

2014-10-20 Thread Andrew Laski
One of the big goals for the Kilo cycle by users and developers of the 
cells functionality within Nova is to get it to a point where it can be 
considered a first class citizen of Nova.  Ultimately I think this comes 
down to getting it tested by default in Nova jobs, and making it easy 
for developers to work with.  But there's a lot of work to get there.  
In order to raise awareness of this effort, and get the conversation 
started on a few things, I've summarized a little bit about cells and 
this effort below.



Goals:

Testing of a single cell setup in the gate.
Feature parity.
Make cells the default implementation.  Developers write code once and 
it works for  cells.


Ultimately the goal is to improve maintainability of a large feature 
within the Nova code base.



Feature gaps:

Host aggregates
Security groups
Server groups


Shortcomings:

Flavor syncing
This needs to be addressed now.

Cells scheduling/rescheduling
Instances can not currently move between cells
These two won't affect the default one cell setup so they will be 
addressed later.



What does cells do:

Schedule an instance to a cell based on flavor slots available.
Proxy API requests to the proper cell.
Keep a copy of instance data at the global level for quick retrieval.
Sync data up from a child cell to keep the global level up to date.


Simplifying assumptions:

Cells will be treated as a two level tree structure.


Plan:

Fix flavor breakage in child cell which causes boot tests to fail. 
Currently the libvirt driver needs flavor.extra_specs which is not 
synced to the child cell.  Some options are to sync flavor and extra 
specs to child cell db, or pass full data with the request. 
https://review.openstack.org/#/c/126620/1 offers a means of passing full 
data with the request.


Determine proper switches to turn off Tempest tests for features that 
don't work with the goal of getting a voting job.  Once this is in place 
we can move towards feature parity and work on internal refactorings.


Work towards adding parity for host aggregates, security groups, and 
server groups.  They should be made to work in a single cell setup, but 
the solution should not preclude them from being used in multiple 
cells.  There needs to be some discussion as to whether a host aggregate 
or server group is a global concept or per cell concept.


Work towards merging compute/api.py and compute/cells_api.py so that 
developers only need to make changes/additions in once place.  The goal 
is for as much as possible to be hidden by the RPC layer, which will 
determine whether a call goes to a compute/conductor/cell.


For syncing data between cells, look at using objects to handle the 
logic of writing data to the cell/parent and then syncing the data to 
the other.


A potential migration scenario is to consider a non cells setup to be a 
child cell and converting to cells will mean setting up a parent cell 
and linking them.  There are periodic tasks in place to sync data up 
from a child already, but a manual kick off mechanism will need to be added.



Future plans:

Something that has been considered, but is out of scope for now, is that 
the parent/api cell doesn't need the same data model as the child cell.  
Since the majority of what it does is act as a cache for API requests, 
it does not need all the data that a cell needs and what data it does 
need could be stored in a form that's optimized for reads.



Thoughts?

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Cells conversation starter

2014-10-20 Thread Mathieu Gagné

On 2014-10-20 2:00 PM, Andrew Laski wrote:

One of the big goals for the Kilo cycle by users and developers of the
cells functionality within Nova is to get it to a point where it can be
considered a first class citizen of Nova.


[...]


Shortcomings:

Flavor syncing
 This needs to be addressed now.


What does cells do:

Schedule an instance to a cell based on flavor slots available.


=)


Thoughts?



I'm pleased to see concrete efforts at making Nova cells a first class 
citizen. I'm looking forward to it. Thanks!


--
Mathieu

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Cells conversation starter

2014-10-20 Thread Belmiro Moreira
Hi Andrew,

great that you have started the “cells” discussion.

Looking forward to see cells as default setup in Kilo.



The feature gap is really painful for current cells users.

We are looking into these features for some time and the main concern is
really where

these concepts should live.



cheers,

Belmiro

On Mon, Oct 20, 2014 at 8:14 PM, Mathieu Gagné mga...@iweb.com wrote:

 On 2014-10-20 2:00 PM, Andrew Laski wrote:

 One of the big goals for the Kilo cycle by users and developers of the
 cells functionality within Nova is to get it to a point where it can be
 considered a first class citizen of Nova.


 [...]

  Shortcomings:

 Flavor syncing
  This needs to be addressed now.


 What does cells do:

 Schedule an instance to a cell based on flavor slots available.


 =)

  Thoughts?


 I'm pleased to see concrete efforts at making Nova cells a first class
 citizen. I'm looking forward to it. Thanks!

 --
 Mathieu


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev