Re: [openstack-dev] [nova] [placement] Ocata upgrade procedure and problems when it's optional in Newton

2017-01-10 Thread Sylvain Bauza


Le 10/01/2017 14:49, Sylvain Bauza a écrit :
> Aloha folks,
> 
> Recently, I was discussing with TripleO folks. Disclaimer, I don't think
> it's only a TripleO related discussion but rather a larger one for all
> our deployers.
> 
> So, the question I was asked was about how to upgrade from Newton to
> Ocata for the Placement API when the deployer is not using yet the
> Placement API for Newton (because it was optional in Newton).
> 
> The quick answer was to say "easy, just upgrade the service and run the
> placement API *before* the scheduler upgrade". That's because we're
> working on a change for the scheduler calling the Placement API instead
> of getting all the compute nodes [1]
> 
> That said, I thought about something else : wait, the Newton compute
> nodes work with the Placement API, cool. Cool, but what if the Placement
> API is optional in Newton ? Then, the Newton computes are stopping to
> call the Placement API because of a nice decorator [2] (okay with me)
> 
> Then, imagine the problem for the upgrade : given we don't have
> deployers running the Placement API in Newton, they would need to
> *first* deploy the (Newton or Ocata) Placement service, then SIGHUP all
> the Newton compute nodes to have them reporting the resources (and
> creating the inventories), then wait for some minutes that all the
> inventories are reported, and then upgrade all the services (but the
> compute nodes of course) to Ocata, including the scheduler service.
> 
> The above looks a different upgrade policy, right?
>  - Either we say you need to run the Newton placement service *before*
> upgrading - and in that case, the Placement service is not optional for
> Newton, right?
>  - Or, we say you need to run the Ocata placement service and then
> restart the compute nodes *before* upgrading the services - and that's a
> very different situation than the current upgrade way.
> 
> For example, I know it's not a Nova stuff, but most of our deployers
> have what they say "controllers" vs. "compute" services, ie. all the
> Nova services but computes running on a single (or more) machine(s). In
> that case, the "controller" upgrade is monotonic and all the services
> are upgraded and restarted at the same stage. If so, that looks
> difficult for those deployers to just be asked to have a very different
> procedure.
> 
> Anyway, I think we need to carefully consider that, and probably find
> some solutions. For example, we could imagine (disclaimer #2, that's
> probably silly solutions, but that's the ones I'm thinking now) :
>  - a DB migration for creating the inventories and allocations before
> upgrading (ie. not asking the computes to register themselves to the
> placement API). That would be terrible because it's a data upgrade, I
> know...
>  - having the scheduler having a backwards compatible behaviour in [1],
> ie. trying to call the Placement API for getting the list of RPs or
> failback to calling all the ComputeNodes if that's not possible. But
> that would mean that the Placement API is still optional for Ocata :/
>  - merging the scheduler calling the Placement API [1] in a point
> release after we deliver Ocata (and still make the Placement API
> mandatory for Ocata) so that we would be sure that all computes are
> reporting their status to the Placement once we restart the scheduler in
> the point release.
> 

FWIW, a possible other solution has been discussed upstream in the
#openstack-nova channel and proposed by Dan Smith : we could remove the
try-once behaviour made in the decorator, backport it to Newton and do a
point release which would allow the compute nodes to try to reconcile
with the Placement API in a self-heal manner.

That would mean that deployers would have to upgrade to the latest
Newton point release before upgrading to Ocata, which is I think the
best supported model.

I'll propose a patch for that in my series as a bottom change for [1].

-Sylvain



> 
> Thoughts ?
> -Sylvain
> 
> 
> [1] https://review.openstack.org/#/c/417961/
> 
> [2]
> https://github.com/openstack/nova/blob/180e6340a595ec047c59365465f36fed7a669ec3/nova/scheduler/client/report.py#L40-L67
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] [placement] Ocata upgrade procedure and problems when it's optional in Newton

2017-01-10 Thread Sylvain Bauza
Aloha folks,

Recently, I was discussing with TripleO folks. Disclaimer, I don't think
it's only a TripleO related discussion but rather a larger one for all
our deployers.

So, the question I was asked was about how to upgrade from Newton to
Ocata for the Placement API when the deployer is not using yet the
Placement API for Newton (because it was optional in Newton).

The quick answer was to say "easy, just upgrade the service and run the
placement API *before* the scheduler upgrade". That's because we're
working on a change for the scheduler calling the Placement API instead
of getting all the compute nodes [1]

That said, I thought about something else : wait, the Newton compute
nodes work with the Placement API, cool. Cool, but what if the Placement
API is optional in Newton ? Then, the Newton computes are stopping to
call the Placement API because of a nice decorator [2] (okay with me)

Then, imagine the problem for the upgrade : given we don't have
deployers running the Placement API in Newton, they would need to
*first* deploy the (Newton or Ocata) Placement service, then SIGHUP all
the Newton compute nodes to have them reporting the resources (and
creating the inventories), then wait for some minutes that all the
inventories are reported, and then upgrade all the services (but the
compute nodes of course) to Ocata, including the scheduler service.

The above looks a different upgrade policy, right?
 - Either we say you need to run the Newton placement service *before*
upgrading - and in that case, the Placement service is not optional for
Newton, right?
 - Or, we say you need to run the Ocata placement service and then
restart the compute nodes *before* upgrading the services - and that's a
very different situation than the current upgrade way.

For example, I know it's not a Nova stuff, but most of our deployers
have what they say "controllers" vs. "compute" services, ie. all the
Nova services but computes running on a single (or more) machine(s). In
that case, the "controller" upgrade is monotonic and all the services
are upgraded and restarted at the same stage. If so, that looks
difficult for those deployers to just be asked to have a very different
procedure.

Anyway, I think we need to carefully consider that, and probably find
some solutions. For example, we could imagine (disclaimer #2, that's
probably silly solutions, but that's the ones I'm thinking now) :
 - a DB migration for creating the inventories and allocations before
upgrading (ie. not asking the computes to register themselves to the
placement API). That would be terrible because it's a data upgrade, I
know...
 - having the scheduler having a backwards compatible behaviour in [1],
ie. trying to call the Placement API for getting the list of RPs or
failback to calling all the ComputeNodes if that's not possible. But
that would mean that the Placement API is still optional for Ocata :/
 - merging the scheduler calling the Placement API [1] in a point
release after we deliver Ocata (and still make the Placement API
mandatory for Ocata) so that we would be sure that all computes are
reporting their status to the Placement once we restart the scheduler in
the point release.


Thoughts ?
-Sylvain


[1] https://review.openstack.org/#/c/417961/

[2]
https://github.com/openstack/nova/blob/180e6340a595ec047c59365465f36fed7a669ec3/nova/scheduler/client/report.py#L40-L67

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev