Re: [openstack-dev] [Nova] Cells conversation starter
I have written up some points on an etherpad to use during the summit session https://etherpad.openstack.org/p/kilo-nova-cells . Please read this over if possible before the session. There is an alternate approach to this work proposed and I expect we'll spend some time discussing it. If anyone would like to discuss it before then please reply here. On 10/20/2014 02:00 PM, Andrew Laski wrote: One of the big goals for the Kilo cycle by users and developers of the cells functionality within Nova is to get it to a point where it can be considered a first class citizen of Nova. Ultimately I think this comes down to getting it tested by default in Nova jobs, and making it easy for developers to work with. But there's a lot of work to get there. In order to raise awareness of this effort, and get the conversation started on a few things, I've summarized a little bit about cells and this effort below. Goals: Testing of a single cell setup in the gate. Feature parity. Make cells the default implementation. Developers write code once and it works for cells. Ultimately the goal is to improve maintainability of a large feature within the Nova code base. Feature gaps: Host aggregates Security groups Server groups Shortcomings: Flavor syncing This needs to be addressed now. Cells scheduling/rescheduling Instances can not currently move between cells These two won't affect the default one cell setup so they will be addressed later. What does cells do: Schedule an instance to a cell based on flavor slots available. Proxy API requests to the proper cell. Keep a copy of instance data at the global level for quick retrieval. Sync data up from a child cell to keep the global level up to date. Simplifying assumptions: Cells will be treated as a two level tree structure. Plan: Fix flavor breakage in child cell which causes boot tests to fail. Currently the libvirt driver needs flavor.extra_specs which is not synced to the child cell. Some options are to sync flavor and extra specs to child cell db, or pass full data with the request. https://review.openstack.org/#/c/126620/1 offers a means of passing full data with the request. Determine proper switches to turn off Tempest tests for features that don't work with the goal of getting a voting job. Once this is in place we can move towards feature parity and work on internal refactorings. Work towards adding parity for host aggregates, security groups, and server groups. They should be made to work in a single cell setup, but the solution should not preclude them from being used in multiple cells. There needs to be some discussion as to whether a host aggregate or server group is a global concept or per cell concept. Work towards merging compute/api.py and compute/cells_api.py so that developers only need to make changes/additions in once place. The goal is for as much as possible to be hidden by the RPC layer, which will determine whether a call goes to a compute/conductor/cell. For syncing data between cells, look at using objects to handle the logic of writing data to the cell/parent and then syncing the data to the other. A potential migration scenario is to consider a non cells setup to be a child cell and converting to cells will mean setting up a parent cell and linking them. There are periodic tasks in place to sync data up from a child already, but a manual kick off mechanism will need to be added. Future plans: Something that has been considered, but is out of scope for now, is that the parent/api cell doesn't need the same data model as the child cell. Since the majority of what it does is act as a cache for API requests, it does not need all the data that a cell needs and what data it does need could be stored in a form that's optimized for reads. Thoughts? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Cells conversation starter
Hi Andrew, Since you have mentioned one approach to solve the flavor meta data to driver.spawn. I want to draw your attention to this code review thread as well, https://review.openstack.org/#/c/108238/. I didn't want to edit your etherpad notes, hence this email. Regards, Vineet Menon On 30 October 2014 22:08, Andrew Laski andrew.la...@rackspace.com wrote: I have written up some points on an etherpad to use during the summit session https://etherpad.openstack.org/p/kilo-nova-cells . Please read this over if possible before the session. There is an alternate approach to this work proposed and I expect we'll spend some time discussing it. If anyone would like to discuss it before then please reply here. On 10/20/2014 02:00 PM, Andrew Laski wrote: One of the big goals for the Kilo cycle by users and developers of the cells functionality within Nova is to get it to a point where it can be considered a first class citizen of Nova. Ultimately I think this comes down to getting it tested by default in Nova jobs, and making it easy for developers to work with. But there's a lot of work to get there. In order to raise awareness of this effort, and get the conversation started on a few things, I've summarized a little bit about cells and this effort below. Goals: Testing of a single cell setup in the gate. Feature parity. Make cells the default implementation. Developers write code once and it works for cells. Ultimately the goal is to improve maintainability of a large feature within the Nova code base. Feature gaps: Host aggregates Security groups Server groups Shortcomings: Flavor syncing This needs to be addressed now. Cells scheduling/rescheduling Instances can not currently move between cells These two won't affect the default one cell setup so they will be addressed later. What does cells do: Schedule an instance to a cell based on flavor slots available. Proxy API requests to the proper cell. Keep a copy of instance data at the global level for quick retrieval. Sync data up from a child cell to keep the global level up to date. Simplifying assumptions: Cells will be treated as a two level tree structure. Plan: Fix flavor breakage in child cell which causes boot tests to fail. Currently the libvirt driver needs flavor.extra_specs which is not synced to the child cell. Some options are to sync flavor and extra specs to child cell db, or pass full data with the request. https://review.openstack.org/#/c/126620/1 offers a means of passing full data with the request. Determine proper switches to turn off Tempest tests for features that don't work with the goal of getting a voting job. Once this is in place we can move towards feature parity and work on internal refactorings. Work towards adding parity for host aggregates, security groups, and server groups. They should be made to work in a single cell setup, but the solution should not preclude them from being used in multiple cells. There needs to be some discussion as to whether a host aggregate or server group is a global concept or per cell concept. Work towards merging compute/api.py and compute/cells_api.py so that developers only need to make changes/additions in once place. The goal is for as much as possible to be hidden by the RPC layer, which will determine whether a call goes to a compute/conductor/cell. For syncing data between cells, look at using objects to handle the logic of writing data to the cell/parent and then syncing the data to the other. A potential migration scenario is to consider a non cells setup to be a child cell and converting to cells will mean setting up a parent cell and linking them. There are periodic tasks in place to sync data up from a child already, but a manual kick off mechanism will need to be added. Future plans: Something that has been considered, but is out of scope for now, is that the parent/api cell doesn't need the same data model as the child cell. Since the majority of what it does is act as a cache for API requests, it does not need all the data that a cell needs and what data it does need could be stored in a form that's optimized for reads. Thoughts? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Cells conversation starter
-Original Message- From: Andrew Laski [mailto:andrew.la...@rackspace.com] Sent: 22 October 2014 21:12 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Nova] Cells conversation starter On 10/22/2014 12:52 AM, Michael Still wrote: Thanks for this. It would be interesting to see how much of this work you think is achievable in Kilo. How long do you see this process taking? In line with that, is it just you currently working on this? Would calling for volunteers to help be meaningful? I think that getting a single cell setup tested in the gate is achievable. I think feature parity might be a stretch but could be achievable with enough hands to work on it. Honestly I think that making cells the default implementation is going to take more than a cycle. But I think we can get some specifics worked out as to the direction and may be able to get to a point where the remaining work is mostly mechanical. I think getting to feature parity would be a good Kilo objective. Moving to default is another step which would need migration scripts from the non-cells setups which would need heavy testing. Aiming for L for that would seem reasonable given that we are not drowning in volunteers. At the moment it is mainly me working on this with some support from a couple of people. Volunteers would certainly be welcomed on this effort though. If it would be useful perhaps we could even have a cells subgroup to track progress and direction of this effort. CERN and BARC (Bhaba Atomic Research Centre in Mumbai) would be interested in helping to close the gap. Tim Michael On Tue, Oct 21, 2014 at 5:00 AM, Andrew Laski andrew.la...@rackspace.com wrote: One of the big goals for the Kilo cycle by users and developers of the cells functionality within Nova is to get it to a point where it can be considered a first class citizen of Nova. Ultimately I think this comes down to getting it tested by default in Nova jobs, and making it easy for developers to work with. But there's a lot of work to get there. In order to raise awareness of this effort, and get the conversation started on a few things, I've summarized a little bit about cells and this effort below. Goals: Testing of a single cell setup in the gate. Feature parity. Make cells the default implementation. Developers write code once and it works for cells. Ultimately the goal is to improve maintainability of a large feature within the Nova code base. Feature gaps: Host aggregates Security groups Server groups Shortcomings: Flavor syncing This needs to be addressed now. Cells scheduling/rescheduling Instances can not currently move between cells These two won't affect the default one cell setup so they will be addressed later. What does cells do: Schedule an instance to a cell based on flavor slots available. Proxy API requests to the proper cell. Keep a copy of instance data at the global level for quick retrieval. Sync data up from a child cell to keep the global level up to date. Simplifying assumptions: Cells will be treated as a two level tree structure. Plan: Fix flavor breakage in child cell which causes boot tests to fail. Currently the libvirt driver needs flavor.extra_specs which is not synced to the child cell. Some options are to sync flavor and extra specs to child cell db, or pass full data with the request. https://review.openstack.org/#/c/126620/1 offers a means of passing full data with the request. Determine proper switches to turn off Tempest tests for features that don't work with the goal of getting a voting job. Once this is in place we can move towards feature parity and work on internal refactorings. Work towards adding parity for host aggregates, security groups, and server groups. They should be made to work in a single cell setup, but the solution should not preclude them from being used in multiple cells. There needs to be some discussion as to whether a host aggregate or server group is a global concept or per cell concept. Work towards merging compute/api.py and compute/cells_api.py so that developers only need to make changes/additions in once place. The goal is for as much as possible to be hidden by the RPC layer, which will determine whether a call goes to a compute/conductor/cell. For syncing data between cells, look at using objects to handle the logic of writing data to the cell/parent and then syncing the data to the other. A potential migration scenario is to consider a non cells setup to be a child cell and converting to cells will mean setting up a parent cell and linking them. There are periodic tasks in place to sync data up from a child already, but a manual kick off mechanism will need to be added. Future plans: Something
Re: [openstack-dev] [Nova] Cells conversation starter
On 10/22/2014 08:11 PM, Sam Morrison wrote: On 23 Oct 2014, at 5:55 am, Andrew Laski andrew.la...@rackspace.com wrote: While I agree that N is a bit interesting, I have seen N=3 in production [central API]--[state/region1]--[state/region DC1] \-[state/region DC2] --[state/region2 DC] --[state/region3 DC] --[state/region4 DC] I would be curious to hear any information about how this is working out. Does everything that works for N=2 work when N=3? Are there fixes that needed to be added to make this work? Why do it this way rather than bring [state/region DC1] and [state/region DC2] up a level? We (NeCTAR) have 3 tiers, our current setup has one parent, 6 children then 3 of the children have 2 grandchildren each. All compute nodes are at the lowest level. Everything works fine and we haven’t needed to do any modifications. We run in a 3 tier system because it matches how our infrastructure is logically laid out, but I don’t see a problem in just having a 2 tier system and getting rid of the middle man. There's no reason an N-tier system where N 2 shouldn't be feasible, but it's not going to be tested in this initial effort. So while we will try not to break it, it's hard to guarantee that. That's why my preference would be to remove that code and build up an N-tier system in conjunction with testing later. But with a clear user of this functionality I don't think that's an option. Sam ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Cells conversation starter
On 22 October 2014 06:24, Tom Fifield t...@openstack.org wrote: On 22/10/14 03:07, Andrew Laski wrote: On 10/21/2014 04:31 AM, Nikola Đipanov wrote: On 10/20/2014 08:00 PM, Andrew Laski wrote: One of the big goals for the Kilo cycle by users and developers of the cells functionality within Nova is to get it to a point where it can be considered a first class citizen of Nova. Ultimately I think this comes down to getting it tested by default in Nova jobs, and making it easy for developers to work with. But there's a lot of work to get there. In order to raise awareness of this effort, and get the conversation started on a few things, I've summarized a little bit about cells and this effort below. Goals: Testing of a single cell setup in the gate. Feature parity. Make cells the default implementation. Developers write code once and it works for cells. Ultimately the goal is to improve maintainability of a large feature within the Nova code base. Thanks for the write-up Andrew! Some thoughts/questions below. Looking forward to the discussion on some of these topics, and would be happy to review the code once we get to that point. Feature gaps: Host aggregates Security groups Server groups Shortcomings: Flavor syncing This needs to be addressed now. Cells scheduling/rescheduling Instances can not currently move between cells These two won't affect the default one cell setup so they will be addressed later. What does cells do: Schedule an instance to a cell based on flavor slots available. Proxy API requests to the proper cell. Keep a copy of instance data at the global level for quick retrieval. Sync data up from a child cell to keep the global level up to date. Simplifying assumptions: Cells will be treated as a two level tree structure. Are we thinking of making this official by removing code that actually allows cells to be an actual tree of depth N? I am not sure if doing so would be a win, although it does complicate the RPC/Messaging/State code a bit, but if it's not being used, even though a nice generalization, why keep it around? My preference would be to remove that code since I don't envision anyone writing tests to ensure that functionality works and/or doesn't regress. But there's the challenge of not knowing if anyone is actually relying on that behavior. So initially I'm not creating a specific work item to remove it. But I think it needs to be made clear that it's not officially supported and may get removed unless a case is made for keeping it and work is put into testing it. While I agree that N is a bit interesting, I have seen N=3 in production [central API]--[state/region1]--[state/region DC1] \-[state/region DC2] --[state/region2 DC] --[state/region3 DC] --[state/region4 DC] I'm curious. What are the use cases for this deployment? Agreeably, root node runs n-api along with horizon, key management etc. What components are deployed in tier 2 and tier 3? And AFAIK, currently, openstack cell deployment isn't even a tree but DAG since, one cell can have multiple parents. Has anyone come up any such requirement? Plan: Fix flavor breakage in child cell which causes boot tests to fail. Currently the libvirt driver needs flavor.extra_specs which is not synced to the child cell. Some options are to sync flavor and extra specs to child cell db, or pass full data with the request. https://review.openstack.org/#/c/126620/1 offers a means of passing full data with the request. Determine proper switches to turn off Tempest tests for features that don't work with the goal of getting a voting job. Once this is in place we can move towards feature parity and work on internal refactorings. Work towards adding parity for host aggregates, security groups, and server groups. They should be made to work in a single cell setup, but the solution should not preclude them from being used in multiple cells. There needs to be some discussion as to whether a host aggregate or server group is a global concept or per cell concept. Have there been any previous discussions on this topic? If so I'd really like to read up on those to make sure I understand the pros and cons before the summit session. The only discussion I'm aware of is some comments on https://review.openstack.org/#/c/59101/ , though they mention a discussion at the Utah mid-cycle. The main con I'm aware of for defining these as global concepts is that there is no rescheduling capability in the cells scheduler. So if a build is sent to a cell with a host aggregate that can't fit that instance the build will fail even though there may be space in that host aggregate from a global perspective. That should be somewhat straightforward to
Re: [openstack-dev] [Nova] Cells conversation starter
Thanks Andrew for this (very) exhaustive list. As you have pointed out, for all the missing features (I think flavors can also be a part of that list) the community needs to decide where the info lives primarily (API or compute cells) and how it is propagated (Synced, sent with the request, asked on demand etc.) With regards to flavors, I think the attention has shifted to getting extra_specs to sync with child cells which isn't going to help much because even instance_type isn't synced yet. And since instance_type relies of auto generated IDs, syncing would be a major headache (One cell is down when a new flavor is created or deleted). Storing extra_specs along with instance_system_metadata is a good alternative but if we assume API cells to be authoratative about flavors, then we can simply pass flavor information with the boot request to the child cell (This would also clean up non-cell nova code which currently performs multiple DB lookups for flavors based on underlying virt driver). Any potential solutions for all these should also probably be evaluated based on their compatibility with existing cell setups. Maybe we should create a thread for all these missing features to discuss their solutions individually or start the discussion in the bug reports itself. Regards, Dheeraj ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Cells conversation starter
On 10/22/2014 12:24 AM, Tom Fifield wrote: On 22/10/14 03:07, Andrew Laski wrote: On 10/21/2014 04:31 AM, Nikola Đipanov wrote: On 10/20/2014 08:00 PM, Andrew Laski wrote: One of the big goals for the Kilo cycle by users and developers of the cells functionality within Nova is to get it to a point where it can be considered a first class citizen of Nova. Ultimately I think this comes down to getting it tested by default in Nova jobs, and making it easy for developers to work with. But there's a lot of work to get there. In order to raise awareness of this effort, and get the conversation started on a few things, I've summarized a little bit about cells and this effort below. Goals: Testing of a single cell setup in the gate. Feature parity. Make cells the default implementation. Developers write code once and it works for cells. Ultimately the goal is to improve maintainability of a large feature within the Nova code base. Thanks for the write-up Andrew! Some thoughts/questions below. Looking forward to the discussion on some of these topics, and would be happy to review the code once we get to that point. Feature gaps: Host aggregates Security groups Server groups Shortcomings: Flavor syncing This needs to be addressed now. Cells scheduling/rescheduling Instances can not currently move between cells These two won't affect the default one cell setup so they will be addressed later. What does cells do: Schedule an instance to a cell based on flavor slots available. Proxy API requests to the proper cell. Keep a copy of instance data at the global level for quick retrieval. Sync data up from a child cell to keep the global level up to date. Simplifying assumptions: Cells will be treated as a two level tree structure. Are we thinking of making this official by removing code that actually allows cells to be an actual tree of depth N? I am not sure if doing so would be a win, although it does complicate the RPC/Messaging/State code a bit, but if it's not being used, even though a nice generalization, why keep it around? My preference would be to remove that code since I don't envision anyone writing tests to ensure that functionality works and/or doesn't regress. But there's the challenge of not knowing if anyone is actually relying on that behavior. So initially I'm not creating a specific work item to remove it. But I think it needs to be made clear that it's not officially supported and may get removed unless a case is made for keeping it and work is put into testing it. While I agree that N is a bit interesting, I have seen N=3 in production [central API]--[state/region1]--[state/region DC1] \-[state/region DC2] --[state/region2 DC] --[state/region3 DC] --[state/region4 DC] I would be curious to hear any information about how this is working out. Does everything that works for N=2 work when N=3? Are there fixes that needed to be added to make this work? Why do it this way rather than bring [state/region DC1] and [state/region DC2] up a level? Plan: Fix flavor breakage in child cell which causes boot tests to fail. Currently the libvirt driver needs flavor.extra_specs which is not synced to the child cell. Some options are to sync flavor and extra specs to child cell db, or pass full data with the request. https://review.openstack.org/#/c/126620/1 offers a means of passing full data with the request. Determine proper switches to turn off Tempest tests for features that don't work with the goal of getting a voting job. Once this is in place we can move towards feature parity and work on internal refactorings. Work towards adding parity for host aggregates, security groups, and server groups. They should be made to work in a single cell setup, but the solution should not preclude them from being used in multiple cells. There needs to be some discussion as to whether a host aggregate or server group is a global concept or per cell concept. Have there been any previous discussions on this topic? If so I'd really like to read up on those to make sure I understand the pros and cons before the summit session. The only discussion I'm aware of is some comments on https://review.openstack.org/#/c/59101/ , though they mention a discussion at the Utah mid-cycle. The main con I'm aware of for defining these as global concepts is that there is no rescheduling capability in the cells scheduler. So if a build is sent to a cell with a host aggregate that can't fit that instance the build will fail even though there may be space in that host aggregate from a global perspective. That should be somewhat straightforward to address though. I think it makes sense to define these as global concepts. But these are features that aren't used with cells yet so I haven't put a lot of thought into potential arguments or cases for doing this one way or another. Work towards merging
Re: [openstack-dev] [Nova] Cells conversation starter
On 10/22/2014 03:42 AM, Vineet Menon wrote: On 22 October 2014 06:24, Tom Fifield t...@openstack.org mailto:t...@openstack.org wrote: On 22/10/14 03:07, Andrew Laski wrote: On 10/21/2014 04:31 AM, Nikola Đipanov wrote: On 10/20/2014 08:00 PM, Andrew Laski wrote: One of the big goals for the Kilo cycle by users and developers of the cells functionality within Nova is to get it to a point where it can be considered a first class citizen of Nova. Ultimately I think this comes down to getting it tested by default in Nova jobs, and making it easy for developers to work with. But there's a lot of work to get there. In order to raise awareness of this effort, and get the conversation started on a few things, I've summarized a little bit about cells and this effort below. Goals: Testing of a single cell setup in the gate. Feature parity. Make cells the default implementation. Developers write code once and it works for cells. Ultimately the goal is to improve maintainability of a large feature within the Nova code base. Thanks for the write-up Andrew! Some thoughts/questions below. Looking forward to the discussion on some of these topics, and would be happy to review the code once we get to that point. Feature gaps: Host aggregates Security groups Server groups Shortcomings: Flavor syncing This needs to be addressed now. Cells scheduling/rescheduling Instances can not currently move between cells These two won't affect the default one cell setup so they will be addressed later. What does cells do: Schedule an instance to a cell based on flavor slots available. Proxy API requests to the proper cell. Keep a copy of instance data at the global level for quick retrieval. Sync data up from a child cell to keep the global level up to date. Simplifying assumptions: Cells will be treated as a two level tree structure. Are we thinking of making this official by removing code that actually allows cells to be an actual tree of depth N? I am not sure if doing so would be a win, although it does complicate the RPC/Messaging/State code a bit, but if it's not being used, even though a nice generalization, why keep it around? My preference would be to remove that code since I don't envision anyone writing tests to ensure that functionality works and/or doesn't regress. But there's the challenge of not knowing if anyone is actually relying on that behavior. So initially I'm not creating a specific work item to remove it. But I think it needs to be made clear that it's not officially supported and may get removed unless a case is made for keeping it and work is put into testing it. While I agree that N is a bit interesting, I have seen N=3 in production [central API]--[state/region1]--[state/region DC1] \-[state/region DC2] --[state/region2 DC] --[state/region3 DC] --[state/region4 DC] I'm curious. What are the use cases for this deployment? Agreeably, root node runs n-api along with horizon, key management etc. What components are deployed in tier 2 and tier 3? And AFAIK, currently, openstack cell deployment isn't even a tree but DAG since, one cell can have multiple parents. Has anyone come up any such requirement? While there's nothing to prevent a cell from having multiple parents I would be curious to know if this would actually work in practice, since I can imagine a number of cases that might cause problems. And is there a practical use for this? Maybe we should start logging a warning when this is setup stating that this is an unsupported(i.e. untested) configuration to start to codify the design as that of a tree. At least for the initial scope of work I think this makes sense, and if a case is made for a DAG setup that can be done independently. Plan: Fix flavor breakage in child cell which causes boot tests to fail. Currently the libvirt driver needs flavor.extra_specs which is not synced to the child cell. Some options are to sync flavor and extra specs to child cell db, or pass full data with the request. https://review.openstack.org/#/c/126620/1 offers a means of passing full data with the request. Determine proper switches to turn off Tempest tests for features that don't work with the goal of getting a voting job. Once this is in place we can move towards feature parity and work on internal refactorings. Work towards adding parity for
Re: [openstack-dev] [Nova] Cells conversation starter
On 10/22/2014 12:52 AM, Michael Still wrote: Thanks for this. It would be interesting to see how much of this work you think is achievable in Kilo. How long do you see this process taking? In line with that, is it just you currently working on this? Would calling for volunteers to help be meaningful? I think that getting a single cell setup tested in the gate is achievable. I think feature parity might be a stretch but could be achievable with enough hands to work on it. Honestly I think that making cells the default implementation is going to take more than a cycle. But I think we can get some specifics worked out as to the direction and may be able to get to a point where the remaining work is mostly mechanical. At the moment it is mainly me working on this with some support from a couple of people. Volunteers would certainly be welcomed on this effort though. If it would be useful perhaps we could even have a cells subgroup to track progress and direction of this effort. Michael On Tue, Oct 21, 2014 at 5:00 AM, Andrew Laski andrew.la...@rackspace.com wrote: One of the big goals for the Kilo cycle by users and developers of the cells functionality within Nova is to get it to a point where it can be considered a first class citizen of Nova. Ultimately I think this comes down to getting it tested by default in Nova jobs, and making it easy for developers to work with. But there's a lot of work to get there. In order to raise awareness of this effort, and get the conversation started on a few things, I've summarized a little bit about cells and this effort below. Goals: Testing of a single cell setup in the gate. Feature parity. Make cells the default implementation. Developers write code once and it works for cells. Ultimately the goal is to improve maintainability of a large feature within the Nova code base. Feature gaps: Host aggregates Security groups Server groups Shortcomings: Flavor syncing This needs to be addressed now. Cells scheduling/rescheduling Instances can not currently move between cells These two won't affect the default one cell setup so they will be addressed later. What does cells do: Schedule an instance to a cell based on flavor slots available. Proxy API requests to the proper cell. Keep a copy of instance data at the global level for quick retrieval. Sync data up from a child cell to keep the global level up to date. Simplifying assumptions: Cells will be treated as a two level tree structure. Plan: Fix flavor breakage in child cell which causes boot tests to fail. Currently the libvirt driver needs flavor.extra_specs which is not synced to the child cell. Some options are to sync flavor and extra specs to child cell db, or pass full data with the request. https://review.openstack.org/#/c/126620/1 offers a means of passing full data with the request. Determine proper switches to turn off Tempest tests for features that don't work with the goal of getting a voting job. Once this is in place we can move towards feature parity and work on internal refactorings. Work towards adding parity for host aggregates, security groups, and server groups. They should be made to work in a single cell setup, but the solution should not preclude them from being used in multiple cells. There needs to be some discussion as to whether a host aggregate or server group is a global concept or per cell concept. Work towards merging compute/api.py and compute/cells_api.py so that developers only need to make changes/additions in once place. The goal is for as much as possible to be hidden by the RPC layer, which will determine whether a call goes to a compute/conductor/cell. For syncing data between cells, look at using objects to handle the logic of writing data to the cell/parent and then syncing the data to the other. A potential migration scenario is to consider a non cells setup to be a child cell and converting to cells will mean setting up a parent cell and linking them. There are periodic tasks in place to sync data up from a child already, but a manual kick off mechanism will need to be added. Future plans: Something that has been considered, but is out of scope for now, is that the parent/api cell doesn't need the same data model as the child cell. Since the majority of what it does is act as a cache for API requests, it does not need all the data that a cell needs and what data it does need could be stored in a form that's optimized for reads. Thoughts? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Cells conversation starter
On 23 Oct 2014, at 5:55 am, Andrew Laski andrew.la...@rackspace.com wrote: While I agree that N is a bit interesting, I have seen N=3 in production [central API]--[state/region1]--[state/region DC1] \-[state/region DC2] --[state/region2 DC] --[state/region3 DC] --[state/region4 DC] I would be curious to hear any information about how this is working out. Does everything that works for N=2 work when N=3? Are there fixes that needed to be added to make this work? Why do it this way rather than bring [state/region DC1] and [state/region DC2] up a level? We (NeCTAR) have 3 tiers, our current setup has one parent, 6 children then 3 of the children have 2 grandchildren each. All compute nodes are at the lowest level. Everything works fine and we haven’t needed to do any modifications. We run in a 3 tier system because it matches how our infrastructure is logically laid out, but I don’t see a problem in just having a 2 tier system and getting rid of the middle man. Sam ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Cells conversation starter
I suspect whether cell itself could ease large scale deployment across multiple DC. Although openstack is modularized as separated projects, but top level total solution across projects is needed when considering issues such as scaling, especially coordination between nova and network / metering. Personally +1 to Steve, that in terms of future plans decision for scaling, we should explore all similar proposals such as Cascading and Federation. In fact networking and metering face the same challenge as nova, or even more with l2pop and dvr. Those two proposals both bring total solution across projects besides nova, with each has its own emphasis by the view of intra vs. inter admin domain. On Thu, Oct 23, 2014 at 8:11 AM, Sam Morrison sorri...@gmail.com wrote: On 23 Oct 2014, at 5:55 am, Andrew Laski andrew.la...@rackspace.com wrote: While I agree that N is a bit interesting, I have seen N=3 in production [central API]--[state/region1]--[state/region DC1] \-[state/region DC2] --[state/region2 DC] --[state/region3 DC] --[state/region4 DC] I would be curious to hear any information about how this is working out. Does everything that works for N=2 work when N=3? Are there fixes that needed to be added to make this work? Why do it this way rather than bring [state/region DC1] and [state/region DC2] up a level? We (NeCTAR) have 3 tiers, our current setup has one parent, 6 children then 3 of the children have 2 grandchildren each. All compute nodes are at the lowest level. Everything works fine and we haven’t needed to do any modifications. We run in a 3 tier system because it matches how our infrastructure is logically laid out, but I don’t see a problem in just having a 2 tier system and getting rid of the middle man. Sam ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Cells conversation starter
On 10/20/2014 08:00 PM, Andrew Laski wrote: One of the big goals for the Kilo cycle by users and developers of the cells functionality within Nova is to get it to a point where it can be considered a first class citizen of Nova. Ultimately I think this comes down to getting it tested by default in Nova jobs, and making it easy for developers to work with. But there's a lot of work to get there. In order to raise awareness of this effort, and get the conversation started on a few things, I've summarized a little bit about cells and this effort below. Goals: Testing of a single cell setup in the gate. Feature parity. Make cells the default implementation. Developers write code once and it works for cells. Ultimately the goal is to improve maintainability of a large feature within the Nova code base. Thanks for the write-up Andrew! Some thoughts/questions below. Looking forward to the discussion on some of these topics, and would be happy to review the code once we get to that point. Feature gaps: Host aggregates Security groups Server groups Shortcomings: Flavor syncing This needs to be addressed now. Cells scheduling/rescheduling Instances can not currently move between cells These two won't affect the default one cell setup so they will be addressed later. What does cells do: Schedule an instance to a cell based on flavor slots available. Proxy API requests to the proper cell. Keep a copy of instance data at the global level for quick retrieval. Sync data up from a child cell to keep the global level up to date. Simplifying assumptions: Cells will be treated as a two level tree structure. Are we thinking of making this official by removing code that actually allows cells to be an actual tree of depth N? I am not sure if doing so would be a win, although it does complicate the RPC/Messaging/State code a bit, but if it's not being used, even though a nice generalization, why keep it around? Plan: Fix flavor breakage in child cell which causes boot tests to fail. Currently the libvirt driver needs flavor.extra_specs which is not synced to the child cell. Some options are to sync flavor and extra specs to child cell db, or pass full data with the request. https://review.openstack.org/#/c/126620/1 offers a means of passing full data with the request. Determine proper switches to turn off Tempest tests for features that don't work with the goal of getting a voting job. Once this is in place we can move towards feature parity and work on internal refactorings. Work towards adding parity for host aggregates, security groups, and server groups. They should be made to work in a single cell setup, but the solution should not preclude them from being used in multiple cells. There needs to be some discussion as to whether a host aggregate or server group is a global concept or per cell concept. Have there been any previous discussions on this topic? If so I'd really like to read up on those to make sure I understand the pros and cons before the summit session. Work towards merging compute/api.py and compute/cells_api.py so that developers only need to make changes/additions in once place. The goal is for as much as possible to be hidden by the RPC layer, which will determine whether a call goes to a compute/conductor/cell. For syncing data between cells, look at using objects to handle the logic of writing data to the cell/parent and then syncing the data to the other. Some of that work has been done already, although in a somewhat ad-hoc fashion, were you thinking of extending objects to support this natively (whatever that means), or do we continue to inline the code in the existing object methods. A potential migration scenario is to consider a non cells setup to be a child cell and converting to cells will mean setting up a parent cell and linking them. There are periodic tasks in place to sync data up from a child already, but a manual kick off mechanism will need to be added. Future plans: Something that has been considered, but is out of scope for now, is that the parent/api cell doesn't need the same data model as the child cell. Since the majority of what it does is act as a cache for API requests, it does not need all the data that a cell needs and what data it does need could be stored in a form that's optimized for reads. Thoughts? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Cells conversation starter
Hi, to help the discussion, a small compilation about the bugs and previous attempts to fix the missing functionality in cells. Aggregates https://bugs.launchpad.net/nova/+bug/1161208 https://blueprints.launchpad.net/nova/+spec/cells-aggregate-support https://review.openstack.org/#/c/25813/ Server Groups https://bugs.launchpad.net/nova/+bug/1369518 Security Groups https://bugs.launchpad.net/nova/+bug/1274325 Belmiro On Tue, Oct 21, 2014 at 10:31 AM, Nikola Đipanov ndipa...@redhat.com wrote: On 10/20/2014 08:00 PM, Andrew Laski wrote: One of the big goals for the Kilo cycle by users and developers of the cells functionality within Nova is to get it to a point where it can be considered a first class citizen of Nova. Ultimately I think this comes down to getting it tested by default in Nova jobs, and making it easy for developers to work with. But there's a lot of work to get there. In order to raise awareness of this effort, and get the conversation started on a few things, I've summarized a little bit about cells and this effort below. Goals: Testing of a single cell setup in the gate. Feature parity. Make cells the default implementation. Developers write code once and it works for cells. Ultimately the goal is to improve maintainability of a large feature within the Nova code base. Thanks for the write-up Andrew! Some thoughts/questions below. Looking forward to the discussion on some of these topics, and would be happy to review the code once we get to that point. Feature gaps: Host aggregates Security groups Server groups Shortcomings: Flavor syncing This needs to be addressed now. Cells scheduling/rescheduling Instances can not currently move between cells These two won't affect the default one cell setup so they will be addressed later. What does cells do: Schedule an instance to a cell based on flavor slots available. Proxy API requests to the proper cell. Keep a copy of instance data at the global level for quick retrieval. Sync data up from a child cell to keep the global level up to date. Simplifying assumptions: Cells will be treated as a two level tree structure. Are we thinking of making this official by removing code that actually allows cells to be an actual tree of depth N? I am not sure if doing so would be a win, although it does complicate the RPC/Messaging/State code a bit, but if it's not being used, even though a nice generalization, why keep it around? Plan: Fix flavor breakage in child cell which causes boot tests to fail. Currently the libvirt driver needs flavor.extra_specs which is not synced to the child cell. Some options are to sync flavor and extra specs to child cell db, or pass full data with the request. https://review.openstack.org/#/c/126620/1 offers a means of passing full data with the request. Determine proper switches to turn off Tempest tests for features that don't work with the goal of getting a voting job. Once this is in place we can move towards feature parity and work on internal refactorings. Work towards adding parity for host aggregates, security groups, and server groups. They should be made to work in a single cell setup, but the solution should not preclude them from being used in multiple cells. There needs to be some discussion as to whether a host aggregate or server group is a global concept or per cell concept. Have there been any previous discussions on this topic? If so I'd really like to read up on those to make sure I understand the pros and cons before the summit session. Work towards merging compute/api.py and compute/cells_api.py so that developers only need to make changes/additions in once place. The goal is for as much as possible to be hidden by the RPC layer, which will determine whether a call goes to a compute/conductor/cell. For syncing data between cells, look at using objects to handle the logic of writing data to the cell/parent and then syncing the data to the other. Some of that work has been done already, although in a somewhat ad-hoc fashion, were you thinking of extending objects to support this natively (whatever that means), or do we continue to inline the code in the existing object methods. A potential migration scenario is to consider a non cells setup to be a child cell and converting to cells will mean setting up a parent cell and linking them. There are periodic tasks in place to sync data up from a child already, but a manual kick off mechanism will need to be added. Future plans: Something that has been considered, but is out of scope for now, is that the parent/api cell doesn't need the same data model as the child cell. Since the majority of what it does is act as a cache for API requests, it does not need all the data that a cell needs and what data it does need could be stored
Re: [openstack-dev] [Nova] Cells conversation starter
On 10/21/2014 04:31 AM, Nikola Đipanov wrote: On 10/20/2014 08:00 PM, Andrew Laski wrote: One of the big goals for the Kilo cycle by users and developers of the cells functionality within Nova is to get it to a point where it can be considered a first class citizen of Nova. Ultimately I think this comes down to getting it tested by default in Nova jobs, and making it easy for developers to work with. But there's a lot of work to get there. In order to raise awareness of this effort, and get the conversation started on a few things, I've summarized a little bit about cells and this effort below. Goals: Testing of a single cell setup in the gate. Feature parity. Make cells the default implementation. Developers write code once and it works for cells. Ultimately the goal is to improve maintainability of a large feature within the Nova code base. Thanks for the write-up Andrew! Some thoughts/questions below. Looking forward to the discussion on some of these topics, and would be happy to review the code once we get to that point. Feature gaps: Host aggregates Security groups Server groups Shortcomings: Flavor syncing This needs to be addressed now. Cells scheduling/rescheduling Instances can not currently move between cells These two won't affect the default one cell setup so they will be addressed later. What does cells do: Schedule an instance to a cell based on flavor slots available. Proxy API requests to the proper cell. Keep a copy of instance data at the global level for quick retrieval. Sync data up from a child cell to keep the global level up to date. Simplifying assumptions: Cells will be treated as a two level tree structure. Are we thinking of making this official by removing code that actually allows cells to be an actual tree of depth N? I am not sure if doing so would be a win, although it does complicate the RPC/Messaging/State code a bit, but if it's not being used, even though a nice generalization, why keep it around? My preference would be to remove that code since I don't envision anyone writing tests to ensure that functionality works and/or doesn't regress. But there's the challenge of not knowing if anyone is actually relying on that behavior. So initially I'm not creating a specific work item to remove it. But I think it needs to be made clear that it's not officially supported and may get removed unless a case is made for keeping it and work is put into testing it. Plan: Fix flavor breakage in child cell which causes boot tests to fail. Currently the libvirt driver needs flavor.extra_specs which is not synced to the child cell. Some options are to sync flavor and extra specs to child cell db, or pass full data with the request. https://review.openstack.org/#/c/126620/1 offers a means of passing full data with the request. Determine proper switches to turn off Tempest tests for features that don't work with the goal of getting a voting job. Once this is in place we can move towards feature parity and work on internal refactorings. Work towards adding parity for host aggregates, security groups, and server groups. They should be made to work in a single cell setup, but the solution should not preclude them from being used in multiple cells. There needs to be some discussion as to whether a host aggregate or server group is a global concept or per cell concept. Have there been any previous discussions on this topic? If so I'd really like to read up on those to make sure I understand the pros and cons before the summit session. The only discussion I'm aware of is some comments on https://review.openstack.org/#/c/59101/ , though they mention a discussion at the Utah mid-cycle. The main con I'm aware of for defining these as global concepts is that there is no rescheduling capability in the cells scheduler. So if a build is sent to a cell with a host aggregate that can't fit that instance the build will fail even though there may be space in that host aggregate from a global perspective. That should be somewhat straightforward to address though. I think it makes sense to define these as global concepts. But these are features that aren't used with cells yet so I haven't put a lot of thought into potential arguments or cases for doing this one way or another. Work towards merging compute/api.py and compute/cells_api.py so that developers only need to make changes/additions in once place. The goal is for as much as possible to be hidden by the RPC layer, which will determine whether a call goes to a compute/conductor/cell. For syncing data between cells, look at using objects to handle the logic of writing data to the cell/parent and then syncing the data to the other. Some of that work has been done already, although in a somewhat ad-hoc fashion, were you thinking of extending objects to support this natively (whatever that means), or do we continue to inline the code in the existing object methods.
Re: [openstack-dev] [Nova] Cells conversation starter
- Original Message - From: Andrew Laski andrew.la...@rackspace.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org One of the big goals for the Kilo cycle by users and developers of the cells functionality within Nova is to get it to a point where it can be considered a first class citizen of Nova. Ultimately I think this comes down to getting it tested by default in Nova jobs, and making it easy for developers to work with. But there's a lot of work to get there. In order to raise awareness of this effort, and get the conversation started on a few things, I've summarized a little bit about cells and this effort below. Goals: Testing of a single cell setup in the gate. Feature parity. Make cells the default implementation. Developers write code once and it works for cells. Ultimately the goal is to improve maintainability of a large feature within the Nova code base. Feature gaps: Host aggregates Security groups Server groups Shortcomings: Flavor syncing This needs to be addressed now. Cells scheduling/rescheduling Instances can not currently move between cells These two won't affect the default one cell setup so they will be addressed later. What does cells do: Schedule an instance to a cell based on flavor slots available. Proxy API requests to the proper cell. Keep a copy of instance data at the global level for quick retrieval. Sync data up from a child cell to keep the global level up to date. Simplifying assumptions: Cells will be treated as a two level tree structure. Plan: Fix flavor breakage in child cell which causes boot tests to fail. Currently the libvirt driver needs flavor.extra_specs which is not synced to the child cell. Some options are to sync flavor and extra specs to child cell db, or pass full data with the request. https://review.openstack.org/#/c/126620/1 offers a means of passing full data with the request. Determine proper switches to turn off Tempest tests for features that don't work with the goal of getting a voting job. Once this is in place we can move towards feature parity and work on internal refactorings. Work towards adding parity for host aggregates, security groups, and server groups. They should be made to work in a single cell setup, but the solution should not preclude them from being used in multiple cells. There needs to be some discussion as to whether a host aggregate or server group is a global concept or per cell concept. Work towards merging compute/api.py and compute/cells_api.py so that developers only need to make changes/additions in once place. The goal is for as much as possible to be hidden by the RPC layer, which will determine whether a call goes to a compute/conductor/cell. For syncing data between cells, look at using objects to handle the logic of writing data to the cell/parent and then syncing the data to the other. A potential migration scenario is to consider a non cells setup to be a child cell and converting to cells will mean setting up a parent cell and linking them. There are periodic tasks in place to sync data up from a child already, but a manual kick off mechanism will need to be added. Future plans: Something that has been considered, but is out of scope for now, is that the parent/api cell doesn't need the same data model as the child cell. Since the majority of what it does is act as a cache for API requests, it does not need all the data that a cell needs and what data it does need could be stored in a form that's optimized for reads. In terms of future plans I'd like to also explore how continued iteration on the Cells concept might line up against the use cases presented in the recent threads on OpenStack cascading: * http://lists.openstack.org/pipermail/openstack-dev/2014-September/047470.html * http://lists.openstack.org/pipermail/openstack-dev/2014-October/047526.html Thanks, Steve ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Cells conversation starter
On 22/10/14 03:07, Andrew Laski wrote: On 10/21/2014 04:31 AM, Nikola Đipanov wrote: On 10/20/2014 08:00 PM, Andrew Laski wrote: One of the big goals for the Kilo cycle by users and developers of the cells functionality within Nova is to get it to a point where it can be considered a first class citizen of Nova. Ultimately I think this comes down to getting it tested by default in Nova jobs, and making it easy for developers to work with. But there's a lot of work to get there. In order to raise awareness of this effort, and get the conversation started on a few things, I've summarized a little bit about cells and this effort below. Goals: Testing of a single cell setup in the gate. Feature parity. Make cells the default implementation. Developers write code once and it works for cells. Ultimately the goal is to improve maintainability of a large feature within the Nova code base. Thanks for the write-up Andrew! Some thoughts/questions below. Looking forward to the discussion on some of these topics, and would be happy to review the code once we get to that point. Feature gaps: Host aggregates Security groups Server groups Shortcomings: Flavor syncing This needs to be addressed now. Cells scheduling/rescheduling Instances can not currently move between cells These two won't affect the default one cell setup so they will be addressed later. What does cells do: Schedule an instance to a cell based on flavor slots available. Proxy API requests to the proper cell. Keep a copy of instance data at the global level for quick retrieval. Sync data up from a child cell to keep the global level up to date. Simplifying assumptions: Cells will be treated as a two level tree structure. Are we thinking of making this official by removing code that actually allows cells to be an actual tree of depth N? I am not sure if doing so would be a win, although it does complicate the RPC/Messaging/State code a bit, but if it's not being used, even though a nice generalization, why keep it around? My preference would be to remove that code since I don't envision anyone writing tests to ensure that functionality works and/or doesn't regress. But there's the challenge of not knowing if anyone is actually relying on that behavior. So initially I'm not creating a specific work item to remove it. But I think it needs to be made clear that it's not officially supported and may get removed unless a case is made for keeping it and work is put into testing it. While I agree that N is a bit interesting, I have seen N=3 in production [central API]--[state/region1]--[state/region DC1] \-[state/region DC2] --[state/region2 DC] --[state/region3 DC] --[state/region4 DC] Plan: Fix flavor breakage in child cell which causes boot tests to fail. Currently the libvirt driver needs flavor.extra_specs which is not synced to the child cell. Some options are to sync flavor and extra specs to child cell db, or pass full data with the request. https://review.openstack.org/#/c/126620/1 offers a means of passing full data with the request. Determine proper switches to turn off Tempest tests for features that don't work with the goal of getting a voting job. Once this is in place we can move towards feature parity and work on internal refactorings. Work towards adding parity for host aggregates, security groups, and server groups. They should be made to work in a single cell setup, but the solution should not preclude them from being used in multiple cells. There needs to be some discussion as to whether a host aggregate or server group is a global concept or per cell concept. Have there been any previous discussions on this topic? If so I'd really like to read up on those to make sure I understand the pros and cons before the summit session. The only discussion I'm aware of is some comments on https://review.openstack.org/#/c/59101/ , though they mention a discussion at the Utah mid-cycle. The main con I'm aware of for defining these as global concepts is that there is no rescheduling capability in the cells scheduler. So if a build is sent to a cell with a host aggregate that can't fit that instance the build will fail even though there may be space in that host aggregate from a global perspective. That should be somewhat straightforward to address though. I think it makes sense to define these as global concepts. But these are features that aren't used with cells yet so I haven't put a lot of thought into potential arguments or cases for doing this one way or another. Work towards merging compute/api.py and compute/cells_api.py so that developers only need to make changes/additions in once place. The goal is for as much as possible to be hidden by the RPC layer, which will determine whether a call goes to a compute/conductor/cell.
Re: [openstack-dev] [Nova] Cells conversation starter
Thanks for this. It would be interesting to see how much of this work you think is achievable in Kilo. How long do you see this process taking? In line with that, is it just you currently working on this? Would calling for volunteers to help be meaningful? Michael On Tue, Oct 21, 2014 at 5:00 AM, Andrew Laski andrew.la...@rackspace.com wrote: One of the big goals for the Kilo cycle by users and developers of the cells functionality within Nova is to get it to a point where it can be considered a first class citizen of Nova. Ultimately I think this comes down to getting it tested by default in Nova jobs, and making it easy for developers to work with. But there's a lot of work to get there. In order to raise awareness of this effort, and get the conversation started on a few things, I've summarized a little bit about cells and this effort below. Goals: Testing of a single cell setup in the gate. Feature parity. Make cells the default implementation. Developers write code once and it works for cells. Ultimately the goal is to improve maintainability of a large feature within the Nova code base. Feature gaps: Host aggregates Security groups Server groups Shortcomings: Flavor syncing This needs to be addressed now. Cells scheduling/rescheduling Instances can not currently move between cells These two won't affect the default one cell setup so they will be addressed later. What does cells do: Schedule an instance to a cell based on flavor slots available. Proxy API requests to the proper cell. Keep a copy of instance data at the global level for quick retrieval. Sync data up from a child cell to keep the global level up to date. Simplifying assumptions: Cells will be treated as a two level tree structure. Plan: Fix flavor breakage in child cell which causes boot tests to fail. Currently the libvirt driver needs flavor.extra_specs which is not synced to the child cell. Some options are to sync flavor and extra specs to child cell db, or pass full data with the request. https://review.openstack.org/#/c/126620/1 offers a means of passing full data with the request. Determine proper switches to turn off Tempest tests for features that don't work with the goal of getting a voting job. Once this is in place we can move towards feature parity and work on internal refactorings. Work towards adding parity for host aggregates, security groups, and server groups. They should be made to work in a single cell setup, but the solution should not preclude them from being used in multiple cells. There needs to be some discussion as to whether a host aggregate or server group is a global concept or per cell concept. Work towards merging compute/api.py and compute/cells_api.py so that developers only need to make changes/additions in once place. The goal is for as much as possible to be hidden by the RPC layer, which will determine whether a call goes to a compute/conductor/cell. For syncing data between cells, look at using objects to handle the logic of writing data to the cell/parent and then syncing the data to the other. A potential migration scenario is to consider a non cells setup to be a child cell and converting to cells will mean setting up a parent cell and linking them. There are periodic tasks in place to sync data up from a child already, but a manual kick off mechanism will need to be added. Future plans: Something that has been considered, but is out of scope for now, is that the parent/api cell doesn't need the same data model as the child cell. Since the majority of what it does is act as a cache for API requests, it does not need all the data that a cell needs and what data it does need could be stored in a form that's optimized for reads. Thoughts? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Nova] Cells conversation starter
One of the big goals for the Kilo cycle by users and developers of the cells functionality within Nova is to get it to a point where it can be considered a first class citizen of Nova. Ultimately I think this comes down to getting it tested by default in Nova jobs, and making it easy for developers to work with. But there's a lot of work to get there. In order to raise awareness of this effort, and get the conversation started on a few things, I've summarized a little bit about cells and this effort below. Goals: Testing of a single cell setup in the gate. Feature parity. Make cells the default implementation. Developers write code once and it works for cells. Ultimately the goal is to improve maintainability of a large feature within the Nova code base. Feature gaps: Host aggregates Security groups Server groups Shortcomings: Flavor syncing This needs to be addressed now. Cells scheduling/rescheduling Instances can not currently move between cells These two won't affect the default one cell setup so they will be addressed later. What does cells do: Schedule an instance to a cell based on flavor slots available. Proxy API requests to the proper cell. Keep a copy of instance data at the global level for quick retrieval. Sync data up from a child cell to keep the global level up to date. Simplifying assumptions: Cells will be treated as a two level tree structure. Plan: Fix flavor breakage in child cell which causes boot tests to fail. Currently the libvirt driver needs flavor.extra_specs which is not synced to the child cell. Some options are to sync flavor and extra specs to child cell db, or pass full data with the request. https://review.openstack.org/#/c/126620/1 offers a means of passing full data with the request. Determine proper switches to turn off Tempest tests for features that don't work with the goal of getting a voting job. Once this is in place we can move towards feature parity and work on internal refactorings. Work towards adding parity for host aggregates, security groups, and server groups. They should be made to work in a single cell setup, but the solution should not preclude them from being used in multiple cells. There needs to be some discussion as to whether a host aggregate or server group is a global concept or per cell concept. Work towards merging compute/api.py and compute/cells_api.py so that developers only need to make changes/additions in once place. The goal is for as much as possible to be hidden by the RPC layer, which will determine whether a call goes to a compute/conductor/cell. For syncing data between cells, look at using objects to handle the logic of writing data to the cell/parent and then syncing the data to the other. A potential migration scenario is to consider a non cells setup to be a child cell and converting to cells will mean setting up a parent cell and linking them. There are periodic tasks in place to sync data up from a child already, but a manual kick off mechanism will need to be added. Future plans: Something that has been considered, but is out of scope for now, is that the parent/api cell doesn't need the same data model as the child cell. Since the majority of what it does is act as a cache for API requests, it does not need all the data that a cell needs and what data it does need could be stored in a form that's optimized for reads. Thoughts? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Cells conversation starter
On 2014-10-20 2:00 PM, Andrew Laski wrote: One of the big goals for the Kilo cycle by users and developers of the cells functionality within Nova is to get it to a point where it can be considered a first class citizen of Nova. [...] Shortcomings: Flavor syncing This needs to be addressed now. What does cells do: Schedule an instance to a cell based on flavor slots available. =) Thoughts? I'm pleased to see concrete efforts at making Nova cells a first class citizen. I'm looking forward to it. Thanks! -- Mathieu ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Cells conversation starter
Hi Andrew, great that you have started the “cells” discussion. Looking forward to see cells as default setup in Kilo. The feature gap is really painful for current cells users. We are looking into these features for some time and the main concern is really where these concepts should live. cheers, Belmiro On Mon, Oct 20, 2014 at 8:14 PM, Mathieu Gagné mga...@iweb.com wrote: On 2014-10-20 2:00 PM, Andrew Laski wrote: One of the big goals for the Kilo cycle by users and developers of the cells functionality within Nova is to get it to a point where it can be considered a first class citizen of Nova. [...] Shortcomings: Flavor syncing This needs to be addressed now. What does cells do: Schedule an instance to a cell based on flavor slots available. =) Thoughts? I'm pleased to see concrete efforts at making Nova cells a first class citizen. I'm looking forward to it. Thanks! -- Mathieu ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev