On Thu, 25 Oct 2018 10:55:15 +1100, Sam Morrison wrote:

On 24 Oct 2018, at 4:01 pm, melanie witt <melwi...@gmail.com> wrote:

On Wed, 24 Oct 2018 10:54:31 +1100, Sam Morrison wrote:
Hi nova devs,
Have been having a good look into cellsv2 and how we migrate to them (we’re 
still on cellsv1 and about to upgrade to queens and still run cells v1 for now).
One of the problems I have is that now all our nova cell database servers need 
to respond to API requests.
With cellsv1 our architecture was to have a big powerful DB cluster (3 physical 
servers) at the API level to handle the API cell and then a smallish non HA DB 
server (usually just a VM) for each of the compute cells.
This architecture won’t work with cells V2 and we’ll now need to have a lot of 
highly available and responsive DB servers for all the cells.
It will also mean that our nova-apis which reside in Melbourne, Australia will 
now need to talk to database servers in Auckland, New Zealand.
The biggest issue we have is when a cell is down. We sometimes have cells go 
down for an hour or so planned or unplanned and with cellsv1 this does not 
affect other cells.
Looks like some good work going on here 
But what about quota? If a cell goes down then it would seem that a user all of 
a sudden would regain some quota from the instances that are in the down cell?
Just wondering if anyone has thought about this?

Yes, we've discussed it quite a bit. The current plan is to offer a policy-driven 
behavior as part of the "down" cell handling which will control whether nova 

a) Reject a server create request if the user owns instances in "down" cells

b) Go ahead and count quota usage "as-is" if the user owns instances in "down" 
cells and allow quota limit to be potentially exceeded

We would like to know if you think this plan will work for you.

Further down the road, if we're able to come to an agreement on a consumer 
type/owner or partitioning concept in placement (to be certain we are counting 
usage our instance of nova owns, as placement is a shared service), we could 
count quota usage from placement instead of querying cells.

OK great, always good to know other people are thinking for you :-) , I don’t 
really like a or b but the idea about using placement sounds like a good one to 

Your honesty is appreciated. :) We do want to get to where we can use placement for quota usage. There is a significant amount of higher priority placement-related work in flight right now (getting nested resource providers working end-to-end, for one) for it to receive adequate attention at this moment. We've been discussing it on the spec [1] the past few days, if you're interested.

I guess our architecture is pretty unique in a way but I wonder if other people 
are also a little scared about the whole all DB servers need to up to serve API 

You are not alone. At CERN, they are experiencing the same challenges. They too have an architecture where they had deployed less powerful database servers in cells and also have cell sites that are located geographically far away. They have been driving the "handling of a down cell" work.

I’ve been thinking of some hybrid cellsv1/v2 thing where we’d still have the 
top level api cell DB but the API would only ever read from it. Nova-api would 
only write to the compute cell DBs.
Then keep the nova-cells processes just doing instance_update_at_top to keep 
the nova-cell-api db up to date.

We’d still have syncing issues but we have that with placement now and that is 
more frequent than nova-cells-v1 is for us.

I have had similar thoughts, but keep ending up at the syncing/racing issues, like you said. I think it's something we'll need to discuss and explore more, to see if we can come up with a reasonable way to address the increased demand on cell databases as it's been a considerable pain point for deployments like yours and CERN's.


[1] https://review.openstack.org/509042

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Reply via email to