Re: [openstack-dev] [nova] nova cellsv2 and DBs / down cells / quotas

2018-10-28 Thread Sam Morrison


> On 26 Oct 2018, at 1:42 am, Dan Smith  wrote:
> 
>> I guess our architecture is pretty unique in a way but I wonder if
>> other people are also a little scared about the whole all DB servers
>> need to up to serve API requests?
> 
> When we started down this path, we acknowledged that this would create a
> different access pattern which would require ops to treat the cell
> databases differently. The input we were getting at the time was that
> the benefits outweighed the costs here, and that we'd work on caching to
> deal with performance issues if/when that became necessary.
> 
>> I’ve been thinking of some hybrid cellsv1/v2 thing where we’d still
>> have the top level api cell DB but the API would only ever read from
>> it. Nova-api would only write to the compute cell DBs.
>> Then keep the nova-cells processes just doing instance_update_at_top to keep 
>> the nova-cell-api db up to date.
> 
> I'm definitely not in favor of doing more replication in python to
> address this. What was there in cellsv1 was lossy, even for the subset
> of things it actually supported (which didn't cover all nova features at
> the time and hasn't kept pace with features added since, obviously).
> 
> About a year ago, I proposed that we add another "read only mirror"
> field to the cell mapping, which nova would use if and only if the
> primary cell database wasn't reachable, and only for read
> operations. The ops, if they wanted to use this, would configure plain
> old one-way mysql replication of the cell databases to a
> highly-available server (probably wherever the api_db is) and nova could
> use that as a read-only cache for things like listing instances and
> calculating quotas. The reaction was (very surprisingly to me) negative
> to this option. It seems very low-effort, high-gain, and proper re-use
> of existing technologies to me, without us having to replicate a
> replication engine (hah) in python. So, I'm curious: does that sound
> more palatable to you?

Yeah I think that could work for us, so far I can’t think of anything better. 

Thanks,
Sam


> 
> --Dan


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova cellsv2 and DBs / down cells / quotas

2018-10-25 Thread Trinh Nguyen
Hi Matt,

The Searchlight team decided to revive the required feature in Stein [1]
and We're working with Kevin to review the patch this weekend. If Nova team
needs some help, just let me know.

[1] https://review.openstack.org/#/c/453352/

Bests,

On Fri, Oct 26, 2018 at 12:58 AM Matt Riedemann  wrote:

> On 10/24/2018 6:55 PM, Sam Morrison wrote:
> > I’ve been thinking of some hybrid cellsv1/v2 thing where we’d still have
> the top level api cell DB but the API would only ever read from it.
> Nova-api would only write to the compute cell DBs.
> > Then keep the nova-cells processes just doing instance_update_at_top to
> keep the nova-cell-api db up to date.
>
> There was also the "read from searchlight" idea [1] but that died in
> Boston.
>
> [1]
>
> https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/list-instances-using-searchlight.html
>
> --
>
> Thanks,
>
> Matt
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>


-- 
*Trinh Nguyen*
*www.edlab.xyz *
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova cellsv2 and DBs / down cells / quotas

2018-10-25 Thread Matt Riedemann

On 10/24/2018 6:55 PM, Sam Morrison wrote:

I’ve been thinking of some hybrid cellsv1/v2 thing where we’d still have the 
top level api cell DB but the API would only ever read from it. Nova-api would 
only write to the compute cell DBs.
Then keep the nova-cells processes just doing instance_update_at_top to keep 
the nova-cell-api db up to date.


There was also the "read from searchlight" idea [1] but that died in Boston.

[1] 
https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/list-instances-using-searchlight.html


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova cellsv2 and DBs / down cells / quotas

2018-10-25 Thread Dan Smith
> I guess our architecture is pretty unique in a way but I wonder if
> other people are also a little scared about the whole all DB servers
> need to up to serve API requests?

When we started down this path, we acknowledged that this would create a
different access pattern which would require ops to treat the cell
databases differently. The input we were getting at the time was that
the benefits outweighed the costs here, and that we'd work on caching to
deal with performance issues if/when that became necessary.

> I’ve been thinking of some hybrid cellsv1/v2 thing where we’d still
> have the top level api cell DB but the API would only ever read from
> it. Nova-api would only write to the compute cell DBs.
> Then keep the nova-cells processes just doing instance_update_at_top to keep 
> the nova-cell-api db up to date.

I'm definitely not in favor of doing more replication in python to
address this. What was there in cellsv1 was lossy, even for the subset
of things it actually supported (which didn't cover all nova features at
the time and hasn't kept pace with features added since, obviously).

About a year ago, I proposed that we add another "read only mirror"
field to the cell mapping, which nova would use if and only if the
primary cell database wasn't reachable, and only for read
operations. The ops, if they wanted to use this, would configure plain
old one-way mysql replication of the cell databases to a
highly-available server (probably wherever the api_db is) and nova could
use that as a read-only cache for things like listing instances and
calculating quotas. The reaction was (very surprisingly to me) negative
to this option. It seems very low-effort, high-gain, and proper re-use
of existing technologies to me, without us having to replicate a
replication engine (hah) in python. So, I'm curious: does that sound
more palatable to you?

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova cellsv2 and DBs / down cells / quotas

2018-10-24 Thread melanie witt

On Thu, 25 Oct 2018 10:55:15 +1100, Sam Morrison wrote:




On 24 Oct 2018, at 4:01 pm, melanie witt  wrote:

On Wed, 24 Oct 2018 10:54:31 +1100, Sam Morrison wrote:

Hi nova devs,
Have been having a good look into cellsv2 and how we migrate to them (we’re 
still on cellsv1 and about to upgrade to queens and still run cells v1 for now).
One of the problems I have is that now all our nova cell database servers need 
to respond to API requests.
With cellsv1 our architecture was to have a big powerful DB cluster (3 physical 
servers) at the API level to handle the API cell and then a smallish non HA DB 
server (usually just a VM) for each of the compute cells.
This architecture won’t work with cells V2 and we’ll now need to have a lot of 
highly available and responsive DB servers for all the cells.
It will also mean that our nova-apis which reside in Melbourne, Australia will 
now need to talk to database servers in Auckland, New Zealand.
The biggest issue we have is when a cell is down. We sometimes have cells go 
down for an hour or so planned or unplanned and with cellsv1 this does not 
affect other cells.
Looks like some good work going on here 
https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/handling-down-cell
But what about quota? If a cell goes down then it would seem that a user all of 
a sudden would regain some quota from the instances that are in the down cell?
Just wondering if anyone has thought about this?


Yes, we've discussed it quite a bit. The current plan is to offer a policy-driven 
behavior as part of the "down" cell handling which will control whether nova 
will:

a) Reject a server create request if the user owns instances in "down" cells

b) Go ahead and count quota usage "as-is" if the user owns instances in "down" 
cells and allow quota limit to be potentially exceeded

We would like to know if you think this plan will work for you.

Further down the road, if we're able to come to an agreement on a consumer 
type/owner or partitioning concept in placement (to be certain we are counting 
usage our instance of nova owns, as placement is a shared service), we could 
count quota usage from placement instead of querying cells.


OK great, always good to know other people are thinking for you :-) , I don’t 
really like a or b but the idea about using placement sounds like a good one to 
me.


Your honesty is appreciated. :) We do want to get to where we can use 
placement for quota usage. There is a significant amount of higher 
priority placement-related work in flight right now (getting nested 
resource providers working end-to-end, for one) for it to receive 
adequate attention at this moment. We've been discussing it on the spec 
[1] the past few days, if you're interested.



I guess our architecture is pretty unique in a way but I wonder if other people 
are also a little scared about the whole all DB servers need to up to serve API 
requests?


You are not alone. At CERN, they are experiencing the same challenges. 
They too have an architecture where they had deployed less powerful 
database servers in cells and also have cell sites that are located 
geographically far away. They have been driving the "handling of a down 
cell" work.



I’ve been thinking of some hybrid cellsv1/v2 thing where we’d still have the 
top level api cell DB but the API would only ever read from it. Nova-api would 
only write to the compute cell DBs.
Then keep the nova-cells processes just doing instance_update_at_top to keep 
the nova-cell-api db up to date.

We’d still have syncing issues but we have that with placement now and that is 
more frequent than nova-cells-v1 is for us.


I have had similar thoughts, but keep ending up at the syncing/racing 
issues, like you said. I think it's something we'll need to discuss and 
explore more, to see if we can come up with a reasonable way to address 
the increased demand on cell databases as it's been a considerable pain 
point for deployments like yours and CERN's.


Cheers,
-melanie

[1] https://review.openstack.org/509042


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova cellsv2 and DBs / down cells / quotas

2018-10-24 Thread Sam Morrison


> On 24 Oct 2018, at 4:01 pm, melanie witt  wrote:
> 
> On Wed, 24 Oct 2018 10:54:31 +1100, Sam Morrison wrote:
>> Hi nova devs,
>> Have been having a good look into cellsv2 and how we migrate to them (we’re 
>> still on cellsv1 and about to upgrade to queens and still run cells v1 for 
>> now).
>> One of the problems I have is that now all our nova cell database servers 
>> need to respond to API requests.
>> With cellsv1 our architecture was to have a big powerful DB cluster (3 
>> physical servers) at the API level to handle the API cell and then a 
>> smallish non HA DB server (usually just a VM) for each of the compute cells.
>> This architecture won’t work with cells V2 and we’ll now need to have a lot 
>> of highly available and responsive DB servers for all the cells.
>> It will also mean that our nova-apis which reside in Melbourne, Australia 
>> will now need to talk to database servers in Auckland, New Zealand.
>> The biggest issue we have is when a cell is down. We sometimes have cells go 
>> down for an hour or so planned or unplanned and with cellsv1 this does not 
>> affect other cells.
>> Looks like some good work going on here 
>> https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/handling-down-cell
>> But what about quota? If a cell goes down then it would seem that a user all 
>> of a sudden would regain some quota from the instances that are in the down 
>> cell?
>> Just wondering if anyone has thought about this?
> 
> Yes, we've discussed it quite a bit. The current plan is to offer a 
> policy-driven behavior as part of the "down" cell handling which will control 
> whether nova will:
> 
> a) Reject a server create request if the user owns instances in "down" cells
> 
> b) Go ahead and count quota usage "as-is" if the user owns instances in 
> "down" cells and allow quota limit to be potentially exceeded
> 
> We would like to know if you think this plan will work for you.
> 
> Further down the road, if we're able to come to an agreement on a consumer 
> type/owner or partitioning concept in placement (to be certain we are 
> counting usage our instance of nova owns, as placement is a shared service), 
> we could count quota usage from placement instead of querying cells.

OK great, always good to know other people are thinking for you :-) , I don’t 
really like a or b but the idea about using placement sounds like a good one to 
me.

I guess our architecture is pretty unique in a way but I wonder if other people 
are also a little scared about the whole all DB servers need to up to serve API 
requests?

I’ve been thinking of some hybrid cellsv1/v2 thing where we’d still have the 
top level api cell DB but the API would only ever read from it. Nova-api would 
only write to the compute cell DBs.
Then keep the nova-cells processes just doing instance_update_at_top to keep 
the nova-cell-api db up to date.

We’d still have syncing issues but we have that with placement now and that is 
more frequent than nova-cells-v1 is for us.

Cheers,
Sam



> 
> Cheers,
> -melanie
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova cellsv2 and DBs / down cells / quotas

2018-10-23 Thread melanie witt

On Wed, 24 Oct 2018 10:54:31 +1100, Sam Morrison wrote:

Hi nova devs,

Have been having a good look into cellsv2 and how we migrate to them 
(we’re still on cellsv1 and about to upgrade to queens and still run 
cells v1 for now).


One of the problems I have is that now all our nova cell database 
servers need to respond to API requests.
With cellsv1 our architecture was to have a big powerful DB cluster (3 
physical servers) at the API level to handle the API cell and then a 
smallish non HA DB server (usually just a VM) for each of the compute 
cells.


This architecture won’t work with cells V2 and we’ll now need to have a 
lot of highly available and responsive DB servers for all the cells.


It will also mean that our nova-apis which reside in Melbourne, 
Australia will now need to talk to database servers in Auckland, New 
Zealand.


The biggest issue we have is when a cell is down. We sometimes have 
cells go down for an hour or so planned or unplanned and with cellsv1 
this does not affect other cells.
Looks like some good work going on here 
https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/handling-down-cell


But what about quota? If a cell goes down then it would seem that a user 
all of a sudden would regain some quota from the instances that are in 
the down cell?

Just wondering if anyone has thought about this?


Yes, we've discussed it quite a bit. The current plan is to offer a 
policy-driven behavior as part of the "down" cell handling which will 
control whether nova will:


a) Reject a server create request if the user owns instances in "down" cells

b) Go ahead and count quota usage "as-is" if the user owns instances in 
"down" cells and allow quota limit to be potentially exceeded


We would like to know if you think this plan will work for you.

Further down the road, if we're able to come to an agreement on a 
consumer type/owner or partitioning concept in placement (to be certain 
we are counting usage our instance of nova owns, as placement is a 
shared service), we could count quota usage from placement instead of 
querying cells.


Cheers,
-melanie


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev