On 10/21/2014 04:31 AM, Nikola Đipanov wrote:
On 10/20/2014 08:00 PM, Andrew Laski wrote:
One of the big goals for the Kilo cycle by users and developers of the
cells functionality within Nova is to get it to a point where it can be
considered a first class citizen of Nova.  Ultimately I think this comes
down to getting it tested by default in Nova jobs, and making it easy
for developers to work with.  But there's a lot of work to get there.
In order to raise awareness of this effort, and get the conversation
started on a few things, I've summarized a little bit about cells and
this effort below.


Testing of a single cell setup in the gate.
Feature parity.
Make cells the default implementation.  Developers write code once and
it works for  cells.

Ultimately the goal is to improve maintainability of a large feature
within the Nova code base.

Thanks for the write-up Andrew! Some thoughts/questions below. Looking
forward to the discussion on some of these topics, and would be happy to
review the code once we get to that point.

Feature gaps:

Host aggregates
Security groups
Server groups


Flavor syncing
     This needs to be addressed now.

Cells scheduling/rescheduling
Instances can not currently move between cells
     These two won't affect the default one cell setup so they will be
addressed later.

What does cells do:

Schedule an instance to a cell based on flavor slots available.
Proxy API requests to the proper cell.
Keep a copy of instance data at the global level for quick retrieval.
Sync data up from a child cell to keep the global level up to date.

Simplifying assumptions:

Cells will be treated as a two level tree structure.

Are we thinking of making this official by removing code that actually
allows cells to be an actual tree of depth N? I am not sure if doing so
would be a win, although it does complicate the RPC/Messaging/State code
a bit, but if it's not being used, even though a nice generalization,
why keep it around?

My preference would be to remove that code since I don't envision anyone writing tests to ensure that functionality works and/or doesn't regress. But there's the challenge of not knowing if anyone is actually relying on that behavior. So initially I'm not creating a specific work item to remove it. But I think it needs to be made clear that it's not officially supported and may get removed unless a case is made for keeping it and work is put into testing it.


Fix flavor breakage in child cell which causes boot tests to fail.
Currently the libvirt driver needs flavor.extra_specs which is not
synced to the child cell.  Some options are to sync flavor and extra
specs to child cell db, or pass full data with the request.
https://review.openstack.org/#/c/126620/1 offers a means of passing full
data with the request.

Determine proper switches to turn off Tempest tests for features that
don't work with the goal of getting a voting job.  Once this is in place
we can move towards feature parity and work on internal refactorings.

Work towards adding parity for host aggregates, security groups, and
server groups.  They should be made to work in a single cell setup, but
the solution should not preclude them from being used in multiple
cells.  There needs to be some discussion as to whether a host aggregate
or server group is a global concept or per cell concept.

Have there been any previous discussions on this topic? If so I'd really
like to read up on those to make sure I understand the pros and cons
before the summit session.

The only discussion I'm aware of is some comments on https://review.openstack.org/#/c/59101/ , though they mention a discussion at the Utah mid-cycle.

The main con I'm aware of for defining these as global concepts is that there is no rescheduling capability in the cells scheduler. So if a build is sent to a cell with a host aggregate that can't fit that instance the build will fail even though there may be space in that host aggregate from a global perspective. That should be somewhat straightforward to address though.

I think it makes sense to define these as global concepts. But these are features that aren't used with cells yet so I haven't put a lot of thought into potential arguments or cases for doing this one way or another.

Work towards merging compute/api.py and compute/cells_api.py so that
developers only need to make changes/additions in once place.  The goal
is for as much as possible to be hidden by the RPC layer, which will
determine whether a call goes to a compute/conductor/cell.

For syncing data between cells, look at using objects to handle the
logic of writing data to the cell/parent and then syncing the data to
the other.

Some of that work has been done already, although in a somewhat ad-hoc
fashion, were you thinking of extending objects to support this natively
(whatever that means), or do we continue to inline the code in the
existing object methods.

I would prefer to have some native support for this. In general data is considered authoritative at the global level or the cell level. For example, instance data is synced down from the global level to a cell(except for a few fields which are synced up) but a migration would be synced up. I could imagine decorators that would specify how data should be synced and handle that as transparently as possible.

A potential migration scenario is to consider a non cells setup to be a
child cell and converting to cells will mean setting up a parent cell
and linking them.  There are periodic tasks in place to sync data up
from a child already, but a manual kick off mechanism will need to be

Future plans:

Something that has been considered, but is out of scope for now, is that
the parent/api cell doesn't need the same data model as the child cell.
Since the majority of what it does is act as a cache for API requests,
it does not need all the data that a cell needs and what data it does
need could be stored in a form that's optimized for reads.


OpenStack-dev mailing list

OpenStack-dev mailing list

OpenStack-dev mailing list

Reply via email to