Re: [openstack-dev] [nova] Order of n-api (placement) and n-sch upgrades for Ocata

2017-01-20 Thread Matt Riedemann

On 1/20/2017 9:00 AM, Matt Riedemann wrote:

On 1/20/2017 4:53 AM, Eoghan Glynn wrote:


Do we also need to be concerned about the placement API "warm-up" time?

i.e. if a placement-less newton deployment is upgraded to placement-ful
ocata, then would there surely be a short period during which placement
is able to respond to the incoming queries from the scheduler, but only
with incomplete information since all the computes haven't yet triggered
their first reporting cycle?

In that case, it wouldn't necessarily lead to a NoValidHost failure on a
instance boot request, but rather a potentially faulty placement
decision,
being based on incomplete information. I mean "faulty" there in the sense
of not strictly following the configured scheduling strategy.

Is that a concern, or an acceptable short degradation of service?

Cheers,
Eoghan



That's discussed a bit in this older thread [1]. If placement is up and
running but there are no computes checking in yet, then it's going to be
a NoValidHost from the filter scheduler because it's not going to
fallback to the compute_nodes table.

The nova-status command was written as an upgrade readiness check for
this situation [2]. If there are compute nodes in the database (from
Newton) but no resource providers in the placement service, then that
upgrade check is going to fail.

If it's a fresh install and there are 0 resource providers in the
placement service and 0 compute nodes, then the upgrade check passes but
provides a reminder about needing to make sure you get the computes
configured and registered for placement as you bring them online.

[1]
http://lists.openstack.org/pipermail/openstack-dev/2016-December/109060.html

[2]
https://github.com/openstack/nova/blob/ae753d96281709397dcfe5dd4ff7d6db57f3683e/nova/cmd/status.py#L301




Also, before anyone mentions this:

"Well what if I've upgraded my controller services to Ocata, including 
placement and nova-scheduler, but my computes are all still Newton?!"


The answer is Newton computes can be configured to register with the 
placement service already, and we backported a fix from Dan Smith [1] to 
make sure the Newton computes keep trying to connect to placement.


So this means you can still configure your Newton computes to talk to 
the Ocata placement service, even before rolling out placement, and once 
it's up the computes will check in and you shouldn't have NoValidHost 
issues, or at least the window should be small.


[1] https://review.openstack.org/#/c/419217/

--

Thanks,

Matt Riedemann

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Order of n-api (placement) and n-sch upgrades for Ocata

2017-01-20 Thread Matt Riedemann

On 1/20/2017 4:53 AM, Eoghan Glynn wrote:


Do we also need to be concerned about the placement API "warm-up" time?

i.e. if a placement-less newton deployment is upgraded to placement-ful
ocata, then would there surely be a short period during which placement
is able to respond to the incoming queries from the scheduler, but only
with incomplete information since all the computes haven't yet triggered
their first reporting cycle?

In that case, it wouldn't necessarily lead to a NoValidHost failure on a
instance boot request, but rather a potentially faulty placement decision,
being based on incomplete information. I mean "faulty" there in the sense
of not strictly following the configured scheduling strategy.

Is that a concern, or an acceptable short degradation of service?

Cheers,
Eoghan



That's discussed a bit in this older thread [1]. If placement is up and 
running but there are no computes checking in yet, then it's going to be 
a NoValidHost from the filter scheduler because it's not going to 
fallback to the compute_nodes table.


The nova-status command was written as an upgrade readiness check for 
this situation [2]. If there are compute nodes in the database (from 
Newton) but no resource providers in the placement service, then that 
upgrade check is going to fail.


If it's a fresh install and there are 0 resource providers in the 
placement service and 0 compute nodes, then the upgrade check passes but 
provides a reminder about needing to make sure you get the computes 
configured and registered for placement as you bring them online.


[1] 
http://lists.openstack.org/pipermail/openstack-dev/2016-December/109060.html
[2] 
https://github.com/openstack/nova/blob/ae753d96281709397dcfe5dd4ff7d6db57f3683e/nova/cmd/status.py#L301


--

Thanks,

Matt Riedemann

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Order of n-api (placement) and n-sch upgrades for Ocata

2017-01-20 Thread Eoghan Glynn


> >> What are these issues? My original message was to highlight one
> >> particular deployment type which is completely independent of
> >> how things get packaged in the traditional sense of the word
> >> (rpms/deb/tar.gz).  Perhaps it's getting lost in terminology,
> >> but packaging the software in one way and how it's run can be two
> >> separate issues.  So what I'd like to know is how is that
> >> impacted by whatever ordering is necessary, and if there's anyway
> >> way not to explicitly have special cases that need to be handled
> >> by the end user when applying updates.  It seems like we all want
> >> similar things. I would like not to have to do anything different
> >> from the install for upgrade. Why can't apply configs, restart
> >> all services?  Or can I?  I seem to be getting mixed messages...
> >> 
> >> 
> > 
> > Sorry for being unclear on the issue. As Jay pointed out, if
> > nova-scheduler is upgraded before the placement service, the
> > nova-scheduler service will continue to start and take requests.
> > The problem is if the filter scheduler code is requesting a
> > microversion in the placement API which isn't available yet, in
> > particular this 1.4 microversion, then scheduling requests will
> > fail which to the end user means NoValidHost (the same as if we
> > don't have any compute nodes yet, or available).
> > 
> > So as Jay also pointed out, if placement and n-sch are upgraded
> > and restarted at the same time, the window for hitting this is
> > minimal. If deployment tooling is written to make sure to restart
> > the placement service *before* nova-scheduler, then there should be
> > no window for issues.
> > 
> 
> 
> Thanks all for providing insights. I'm trying to see a consensus here,
> and while I understand the concerns from Alex about the upgrade, I
> think it's okay for a deployer having a "controller" node (disclaimer:
> Nova doesn't have this concept, rather a list of components that are
> not compute nodes) to have a very quick downtime (I mean getting
> NoValidHosts if an user asks for an instance while the "controller" is
> upgraded).

Do we also need to be concerned about the placement API "warm-up" time?

i.e. if a placement-less newton deployment is upgraded to placement-ful
ocata, then would there surely be a short period during which placement
is able to respond to the incoming queries from the scheduler, but only
with incomplete information since all the computes haven't yet triggered
their first reporting cycle?

In that case, it wouldn't necessarily lead to a NoValidHost failure on a
instance boot request, but rather a potentially faulty placement decision,
being based on incomplete information. I mean "faulty" there in the sense
of not strictly following the configured scheduling strategy.

Is that a concern, or an acceptable short degradation of service?

Cheers,
Eoghan

> To be honest, Nova has never supported (yet) having rolling upgrades
> for services that are not computes. If you look at the upgrade devref,
> we ask for a maintenance window [1]. During that maintenance window,
> we say it's safer to upgrade "nova-conductor first and nova-api last"
> for coherence reasons but since that's during the maintenance window,
> we're not supposed to have user requests coming in.
> 
> So, to circle back with the original problem, I think having the
> nova-scheduler upgraded *before* placement is not a problem. If
> deployers don't want to implement an upgrade scenario where placement
> is upgraded before scheduler, that's fine. No need of extra work for
> deployers. That's just that *if* you implement that path, the
> scheduler could still get requests.
> 
> -Sylvain
> 
> [1]
> http://docs.openstack.org/developer/nova/upgrade.html#rolling-upgrade-process
> 
> 
> > --
> > 
> > Thanks,
> > 
> > Matt
> > 
> > __
> >
> > 
> OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Order of n-api (placement) and n-sch upgrades for Ocata

2017-01-20 Thread Alex Xu
2017-01-19 23:43 GMT+08:00 Sylvain Bauza :

>
>
> Le 19/01/2017 16:27, Matt Riedemann a écrit :
> > Sylvain and I were talking about how he's going to work placement
> > microversion requests into his filter scheduler patch [1]. He needs to
> > make requests to the placement API with microversion 1.4 [2] or later
> > for resource provider filtering on specific resource classes like VCPU
> > and MEMORY_MB.
> >
> > The question was what happens if microversion 1.4 isn't available in the
> > placement API, i.e. the nova-scheduler is running Ocata code now but the
> > placement service is running Newton still.
> >
> > Our rolling upgrades doc [3] says:
> >
> > "It is safest to start nova-conductor first and nova-api last."
> >
> > But since placement is bundled with n-api that would cause issues since
> > n-sch now depends on the n-api code.
> >
> > If you package the placement service separately from the nova-api
> > service then this is probably not an issue. You can still roll out n-api
> > last and restart it last (for control services), and just make sure that
> > placement is upgraded before nova-scheduler (we need to be clear about
> > that in [3]).
> >
> > But do we have any other issues if they are not packaged separately? Is
> > it possible to install the new code, but still only restart the
> > placement service before nova-api? I believe it is, but want to ask this
> > out loud.
> >
> > I think we're probably OK here but I wanted to ask this out loud and
> > make sure everyone is aware and can think about this as we're a week
> > from feature freeze. We also need to look into devstack/grenade because
> > I'm fairly certain that we upgrade n-sch *before* placement in a grenade
> > run which will make any issues here very obvious in [1].
> >
> > [1] https://review.openstack.org/#/c/417961/
> > [2]
> > http://docs.openstack.org/developer/nova/placement.html#
> filter-resource-providers-having-requested-resource-capacity
> >
> > [3]
> > http://docs.openstack.org/developer/nova/upgrade.html#
> rolling-upgrade-process
> >
> >
>
> I thought out loud in the nova channel at the following possibility :
> since we always ask to upgrade n-cpus *AFTER* upgrading our other
> services, we could imagine to allow the nova-scheduler gently accept to
> have a placement service be Newton *UNLESS* you have Ocata computes.
>
> On other technical words, the scheduler getting a response from the
> placement service is an hard requirement for Ocata. That said, if the
> response code is a 400 with a message saying that the schema is
> incorrect, it would be checking the max version of all the computes and
> then :
>  - either the max version is Newton and then call back the
> ComputeNodeList.get_all() for getting the list of nodes
>  - or, the max version is Ocata (at least one node is upgraded), and
> then we would throw a NoValidHosts
>

Emm...when you request a Microversion which didn't support by the service,
you will get 406 response. Then you will know the placement is old. Then
you needn't check the version of computes?


>
> That way, the upgrade path would be :
>  1/ upgrade your conductor
>  2/ upgrade all your other services but n-cpus (we could upgrade and
> restart n-sch before n-api, that would still work, or the contrary would
> be fine too)
>  3/ rolling upgrade your n-cpus
>
> I think we would keep then the existing upgrade path and we would still
> have the placement service be mandatory for Ocata.
>
> Thoughts ?
> -Sylvain
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Order of n-api (placement) and n-sch upgrades for Ocata

2017-01-20 Thread Sylvain Bauza


Le 19/01/2017 21:39, Matt Riedemann a écrit :
> On Thu, Jan 19, 2017 at 2:29 PM, Alex Schultz 
> wrote:
>> 
>> What are these issues? My original message was to highlight one 
>> particular deployment type which is completely independent of
>> how things get packaged in the traditional sense of the word 
>> (rpms/deb/tar.gz).  Perhaps it's getting lost in terminology,
>> but packaging the software in one way and how it's run can be two
>> separate issues.  So what I'd like to know is how is that
>> impacted by whatever ordering is necessary, and if there's anyway
>> way not to explicitly have special cases that need to be handled
>> by the end user when applying updates.  It seems like we all want
>> similar things. I would like not to have to do anything different
>> from the install for upgrade. Why can't apply configs, restart
>> all services?  Or can I?  I seem to be getting mixed messages...
>> 
>> 
> 
> Sorry for being unclear on the issue. As Jay pointed out, if 
> nova-scheduler is upgraded before the placement service, the 
> nova-scheduler service will continue to start and take requests.
> The problem is if the filter scheduler code is requesting a
> microversion in the placement API which isn't available yet, in
> particular this 1.4 microversion, then scheduling requests will
> fail which to the end user means NoValidHost (the same as if we
> don't have any compute nodes yet, or available).
> 
> So as Jay also pointed out, if placement and n-sch are upgraded
> and restarted at the same time, the window for hitting this is
> minimal. If deployment tooling is written to make sure to restart
> the placement service *before* nova-scheduler, then there should be
> no window for issues.
> 


Thanks all for providing insights. I'm trying to see a consensus here,
and while I understand the concerns from Alex about the upgrade, I
think it's okay for a deployer having a "controller" node (disclaimer:
Nova doesn't have this concept, rather a list of components that are
not compute nodes) to have a very quick downtime (I mean getting
NoValidHosts if an user asks for an instance while the "controller" is
upgraded).
To be honest, Nova has never supported (yet) having rolling upgrades
for services that are not computes. If you look at the upgrade devref,
we ask for a maintenance window [1]. During that maintenance window,
we say it's safer to upgrade "nova-conductor first and nova-api last"
for coherence reasons but since that's during the maintenance window,
we're not supposed to have user requests coming in.

So, to circle back with the original problem, I think having the
nova-scheduler upgraded *before* placement is not a problem. If
deployers don't want to implement an upgrade scenario where placement
is upgraded before scheduler, that's fine. No need of extra work for
deployers. That's just that *if* you implement that path, the
scheduler could still get requests.

-Sylvain

[1]
http://docs.openstack.org/developer/nova/upgrade.html#rolling-upgrade-process


> --
> 
> Thanks,
> 
> Matt
> 
> __
>
> 
OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe 
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Order of n-api (placement) and n-sch upgrades for Ocata

2017-01-19 Thread Matt Riedemann
On Thu, Jan 19, 2017 at 2:29 PM, Alex Schultz  wrote:
>
> What are these issues? My original message was to highlight one
> particular deployment type which is completely independent of how
> things get packaged in the traditional sense of the word
> (rpms/deb/tar.gz).  Perhaps it's getting lost in terminology, but
> packaging the software in one way and how it's run can be two separate
> issues.  So what I'd like to know is how is that impacted by whatever
> ordering is necessary, and if there's anyway way not to explicitly
> have special cases that need to be handled by the end user when
> applying updates.  It seems like we all want similar things. I would
> like not to have to do anything different from the install for
> upgrade. Why can't apply configs, restart all services?  Or can I?  I
> seem to be getting mixed messages...
>
>

Sorry for being unclear on the issue. As Jay pointed out, if
nova-scheduler is upgraded before the placement service, the
nova-scheduler service will continue to start and take requests. The
problem is if the filter scheduler code is requesting a microversion
in the placement API which isn't available yet, in particular this 1.4
microversion, then scheduling requests will fail which to the end user
means NoValidHost (the same as if we don't have any compute nodes yet,
or available).

So as Jay also pointed out, if placement and n-sch are upgraded and
restarted at the same time, the window for hitting this is minimal. If
deployment tooling is written to make sure to restart the placement
service *before* nova-scheduler, then there should be no window for
issues.

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Order of n-api (placement) and n-sch upgrades for Ocata

2017-01-19 Thread Alex Schultz
On Thu, Jan 19, 2017 at 11:45 AM, Jay Pipes  wrote:
> On 01/19/2017 01:18 PM, Alex Schultz wrote:
>>
>> On Thu, Jan 19, 2017 at 10:34 AM, Jay Pipes  wrote:
>>>
>>> On 01/19/2017 11:25 AM, Alex Schultz wrote:


 On Thu, Jan 19, 2017 at 8:27 AM, Matt Riedemann
  wrote:
>
>
> Sylvain and I were talking about how he's going to work placement
> microversion requests into his filter scheduler patch [1]. He needs to
> make
> requests to the placement API with microversion 1.4 [2] or later for
> resource provider filtering on specific resource classes like VCPU and
> MEMORY_MB.
>
> The question was what happens if microversion 1.4 isn't available in
> the
> placement API, i.e. the nova-scheduler is running Ocata code now but
> the
> placement service is running Newton still.
>
> Our rolling upgrades doc [3] says:
>
> "It is safest to start nova-conductor first and nova-api last."
>
> But since placement is bundled with n-api that would cause issues since
> n-sch now depends on the n-api code.
>
> If you package the placement service separately from the nova-api
> service
> then this is probably not an issue. You can still roll out n-api last
> and
> restart it last (for control services), and just make sure that
> placement
> is
> upgraded before nova-scheduler (we need to be clear about that in [3]).
>
> But do we have any other issues if they are not packaged separately? Is
> it
> possible to install the new code, but still only restart the placement
> service before nova-api? I believe it is, but want to ask this out
> loud.
>

 Forgive me as I haven't looked really in depth, but if the api and
 placement api are both collocated in the same apache instance this is
 not necessarily the simplest thing to achieve.  While, yes it could be
 achieved it will require more manual intervention of custom upgrade
 scripts. To me this is not a good idea. My personal preference (now
 having dealt with multiple N->O nova related acrobatics) is that these
 types of requirements not be made.  We've already run into these
 assumptions for new installs as well specifically in this newer code.
 Why can't we turn all the services on and they properly enter a wait
 state until such conditions are satisfied?
>>>
>>>
>>>
>>> Simply put, because it adds a bunch of conditional, temporary code to the
>>> Nova codebase as a replacement for well-documented upgrade steps.
>>>
>>> Can we do it? Yes. Is it kind of a pain in the ass? Yeah, mostly because
>>> of
>>> the testing requirements.
>>>
>>
>> 
>> You mean understanding how people actually consume your software and
>> handling those cases?  To me this is the fundamental problem if you
>> want software adoption, understand your user.
>
>
> The fact that we have these conversations should indicate that we are
> concerned about users. Nova developers, more than any other OpenStack
> project, has gone out of its way to put smooth upgrade processes as the
> project's highest priority.
>

I understand that may seem like that's the case but based on my
interactions this cycle, the smooth upgrade process hasn't always been
apparent lately.

> However, deployment/packaging concerns aren't necessarily cloud *user*
> concerns. And I don't mean to sound like I'm brushing off the concerns of
> deployers, but deployers don't necessarily *use* the software we produce
> either. They install/package it/deploy it. It's application developer teams
> that *use* the software.
>

I disagree. When you develop something you have different types of
users.  In the case of OpenStack, you are correct that 'cloud users'
are one of your users. 'Deployers' and 'Operators' are additional
categories of 'users'. It seems like many times the priorities are
shifted to the 'cloud users' but for things like Nova some of the
functionality is also around how can an operator/deployer expose a
resource to the end user and what does it mean to do that.  IMHO these
considerations for each user category for Nova need to be weighted
differently than like Horizon where it's probably more on the cloud
user category.  'Cloud users' don't use nova-manage. That's a piece of
software written specifically for deployers/operators.  So yes, Nova
writes software for both sets of users and when you get feedback from
one of those sets of users it needs to be taken into consideration.
What I'm attempting to expose is this thought process because
sometimes it gets lost as people want to expose new awesome features
to the 'cloud user'.  But if no one can deploy the update, how can the
'cloud user' use it?

> What we're really talking about here is catering to a request that simply
> doesn't have much real-world impact -- to cloud users *or* to deployers,
> even those using 

Re: [openstack-dev] [nova] Order of n-api (placement) and n-sch upgrades for Ocata

2017-01-19 Thread Jay Pipes

On 01/19/2017 12:59 PM, Eoghan Glynn wrote:

I think Alex is suggesting something different than falling back to the
legacy behaviour. The ocata scheduler would still roll forward to basing
its node selection decisions on data provided by the placement API, but
would be tolerant of the 3 different transient cases that are problematic:

 1. placement API momentarily not running yet

 2. placement API already running, but still on the newton micro-version

 3. placement API already running ocata code, but not yet warmed up

IIUC Alex is suggesting that the nova services themselves are tolerant
of those transient conditions during the upgrade, rather than requiring
multiple upgrade toolings to independently force the new ordering
constraint.

On my superficial understanding, case #3 would require the a freshly
deployed ocata placement (i.e. when upgraded from a placement-less
newton deployment) to detect that it's being run for the first time
(i.e. no providers reported yet) and return say 503s to the scheduler
queries until enough time has passed for all computes to have reported
in their inventories & allocations.


As mentioned to Alex, I'm totally cool with the scheduler returning 
failures to the end user for some amount of time while the placement API 
service is upgraded (if the deployment tooling upgraded the schedulers 
before the placement API).


What nobody wants to see is the scheduler *die* due to placement API 
version issues or placement API connectivity. The scheduler should 
remain operational/up, but be logging errors continually in this case.


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Order of n-api (placement) and n-sch upgrades for Ocata

2017-01-19 Thread Jay Pipes

On 01/19/2017 01:18 PM, Alex Schultz wrote:

On Thu, Jan 19, 2017 at 10:34 AM, Jay Pipes  wrote:

On 01/19/2017 11:25 AM, Alex Schultz wrote:


On Thu, Jan 19, 2017 at 8:27 AM, Matt Riedemann
 wrote:


Sylvain and I were talking about how he's going to work placement
microversion requests into his filter scheduler patch [1]. He needs to
make
requests to the placement API with microversion 1.4 [2] or later for
resource provider filtering on specific resource classes like VCPU and
MEMORY_MB.

The question was what happens if microversion 1.4 isn't available in the
placement API, i.e. the nova-scheduler is running Ocata code now but the
placement service is running Newton still.

Our rolling upgrades doc [3] says:

"It is safest to start nova-conductor first and nova-api last."

But since placement is bundled with n-api that would cause issues since
n-sch now depends on the n-api code.

If you package the placement service separately from the nova-api service
then this is probably not an issue. You can still roll out n-api last and
restart it last (for control services), and just make sure that placement
is
upgraded before nova-scheduler (we need to be clear about that in [3]).

But do we have any other issues if they are not packaged separately? Is
it
possible to install the new code, but still only restart the placement
service before nova-api? I believe it is, but want to ask this out loud.



Forgive me as I haven't looked really in depth, but if the api and
placement api are both collocated in the same apache instance this is
not necessarily the simplest thing to achieve.  While, yes it could be
achieved it will require more manual intervention of custom upgrade
scripts. To me this is not a good idea. My personal preference (now
having dealt with multiple N->O nova related acrobatics) is that these
types of requirements not be made.  We've already run into these
assumptions for new installs as well specifically in this newer code.
Why can't we turn all the services on and they properly enter a wait
state until such conditions are satisfied?



Simply put, because it adds a bunch of conditional, temporary code to the
Nova codebase as a replacement for well-documented upgrade steps.

Can we do it? Yes. Is it kind of a pain in the ass? Yeah, mostly because of
the testing requirements.




You mean understanding how people actually consume your software and
handling those cases?  To me this is the fundamental problem if you
want software adoption, understand your user.


The fact that we have these conversations should indicate that we are 
concerned about users. Nova developers, more than any other OpenStack 
project, has gone out of its way to put smooth upgrade processes as the 
project's highest priority.


However, deployment/packaging concerns aren't necessarily cloud *user* 
concerns. And I don't mean to sound like I'm brushing off the concerns 
of deployers, but deployers don't necessarily *use* the software we 
produce either. They install/package it/deploy it. It's application 
developer teams that *use* the software.


What we're really talking about here is catering to a request that 
simply doesn't have much real-world impact -- to cloud users *or* to 
deployers, even those using continuous delivery mechanisms.


If there is a few seconds of log lines outputting error messages and 
some 400 requests returned from the scheduler while a placement API 
service is upgraded and restarted (again, ONLY if the placement API 
service is upgraded after the scheduler) I'm cool with that. It's really 
not a huge deal to me.


What *would* be a big deal is if any of the following occur:

a) The scheduler dies a horrible death and goes offline
b) Any of the compute nodes failed and went offline
c) Anything regarding the tenant data plane was disrupted

Those are the real concerns for us, and if we have introduced code that 
results in any of the above, we absolutely will prioritize bug fixes ASAP.


But, as far as I know, we have *not* introduce code that would result in 
any of the above.


> Know what you're doing

and the impact on them.


Yeah, sorry, but we absolutely *are* concerned about users. What we're 
not as concerned about is a few seconds of temporary disruption to the 
control plane.


>  I was just raising awareness around how some

people are deploying this stuff because it feels that sometimes folks
just don't know or don't care.


We *do* care, thus this email and the ongoing conversations on IRC.

>  So IMHO adding service startup/restart

ordering requirements is not ideal for the person who has to run your
software because it makes the entire process hard and more complex.


Unless I'm mistaken, this is not *required ordering*. It's recommended 
ordering of service upgrade/restarts in order to minimize/eliminate 
downtime of the control plane, but the scheduler service shouldn't die 
due to these issues. The scheduler should just keep 

Re: [openstack-dev] [nova] Order of n-api (placement) and n-sch upgrades for Ocata

2017-01-19 Thread Alex Schultz
On Thu, Jan 19, 2017 at 10:34 AM, Jay Pipes  wrote:
> On 01/19/2017 11:25 AM, Alex Schultz wrote:
>>
>> On Thu, Jan 19, 2017 at 8:27 AM, Matt Riedemann
>>  wrote:
>>>
>>> Sylvain and I were talking about how he's going to work placement
>>> microversion requests into his filter scheduler patch [1]. He needs to
>>> make
>>> requests to the placement API with microversion 1.4 [2] or later for
>>> resource provider filtering on specific resource classes like VCPU and
>>> MEMORY_MB.
>>>
>>> The question was what happens if microversion 1.4 isn't available in the
>>> placement API, i.e. the nova-scheduler is running Ocata code now but the
>>> placement service is running Newton still.
>>>
>>> Our rolling upgrades doc [3] says:
>>>
>>> "It is safest to start nova-conductor first and nova-api last."
>>>
>>> But since placement is bundled with n-api that would cause issues since
>>> n-sch now depends on the n-api code.
>>>
>>> If you package the placement service separately from the nova-api service
>>> then this is probably not an issue. You can still roll out n-api last and
>>> restart it last (for control services), and just make sure that placement
>>> is
>>> upgraded before nova-scheduler (we need to be clear about that in [3]).
>>>
>>> But do we have any other issues if they are not packaged separately? Is
>>> it
>>> possible to install the new code, but still only restart the placement
>>> service before nova-api? I believe it is, but want to ask this out loud.
>>>
>>
>> Forgive me as I haven't looked really in depth, but if the api and
>> placement api are both collocated in the same apache instance this is
>> not necessarily the simplest thing to achieve.  While, yes it could be
>> achieved it will require more manual intervention of custom upgrade
>> scripts. To me this is not a good idea. My personal preference (now
>> having dealt with multiple N->O nova related acrobatics) is that these
>> types of requirements not be made.  We've already run into these
>> assumptions for new installs as well specifically in this newer code.
>> Why can't we turn all the services on and they properly enter a wait
>> state until such conditions are satisfied?
>
>
> Simply put, because it adds a bunch of conditional, temporary code to the
> Nova codebase as a replacement for well-documented upgrade steps.
>
> Can we do it? Yes. Is it kind of a pain in the ass? Yeah, mostly because of
> the testing requirements.
>


You mean understanding how people actually consume your software and
handling those cases?  To me this is the fundamental problem if you
want software adoption, understand your user. Know what you're doing
and the impact on them.  I was just raising awareness around how some
people are deploying this stuff because it feels that sometimes folks
just don't know or don't care.  So IMHO adding service startup/restart
ordering requirements is not ideal for the person who has to run your
software because it makes the entire process hard and more complex.
Why use this when I can just buy a product that does this for me and
handles these types of cases?  We're not all containers yet which
might alleviate some of this but as there was a push for the placement
service specifically to be in a shared vhost, this recommended
deployment method introduces these kind of complexities. It's not
something that just affects me.  Squeaky wheel gets the hose, I mean
grease.


> But meh, I can whip up an amendment to Sylvain's patch that would add the
> self-healing/fallback to legacy behaviour if this is what the operator
> community insists on.
>
> I think Matt generally has been in the "push forward" camp because we're
> tired of delaying improvements to Nova because of some terror that we may
> cause some deployer somewhere to restart their controller services in a
> particular order in order to minimize any downtime of the control plane.
>
> For the distributed compute nodes, I totally understand the need to tolerate
> long rolling upgrade windows. For controller nodes/services, what we're
> talking about here is adding code into Nova scheduler to deal with what in
> 99% of cases will be something that isn't even noticed because the upgrade
> tooling will be restarting all these nodes at almost the same time and the
> momentary failures that might be logged on the scheduler (400s returned from
> the placement API due to using an unknown parameter in a GET request) will
> only exist for a second or two as the upgrade completes.

So in our case they will get (re)started at the same time. If that's
not a problem, great.  I've seen services in the past where it's been
a problem when a service actually won't start because the dependent
service is not up yet. That's what I wanted to make sure is not the
case here.  So if we have documented assurance that restarting both at
the same time won't cause any problems or the interaction is that the
api service won't be 'up' until the 

Re: [openstack-dev] [nova] Order of n-api (placement) and n-sch upgrades for Ocata

2017-01-19 Thread Eoghan Glynn

> >> Sylvain and I were talking about how he's going to work placement
> >> microversion requests into his filter scheduler patch [1]. He needs to
> >> make
> >> requests to the placement API with microversion 1.4 [2] or later for
> >> resource provider filtering on specific resource classes like VCPU and
> >> MEMORY_MB.
> >>
> >> The question was what happens if microversion 1.4 isn't available in the
> >> placement API, i.e. the nova-scheduler is running Ocata code now but the
> >> placement service is running Newton still.
> >>
> >> Our rolling upgrades doc [3] says:
> >>
> >> "It is safest to start nova-conductor first and nova-api last."
> >>
> >> But since placement is bundled with n-api that would cause issues since
> >> n-sch now depends on the n-api code.
> >>
> >> If you package the placement service separately from the nova-api service
> >> then this is probably not an issue. You can still roll out n-api last and
> >> restart it last (for control services), and just make sure that placement
> >> is
> >> upgraded before nova-scheduler (we need to be clear about that in [3]).
> >>
> >> But do we have any other issues if they are not packaged separately? Is it
> >> possible to install the new code, but still only restart the placement
> >> service before nova-api? I believe it is, but want to ask this out loud.
> >>
> >
> > Forgive me as I haven't looked really in depth, but if the api and
> > placement api are both collocated in the same apache instance this is
> > not necessarily the simplest thing to achieve.  While, yes it could be
> > achieved it will require more manual intervention of custom upgrade
> > scripts. To me this is not a good idea. My personal preference (now
> > having dealt with multiple N->O nova related acrobatics) is that these
> > types of requirements not be made.  We've already run into these
> > assumptions for new installs as well specifically in this newer code.
> > Why can't we turn all the services on and they properly enter a wait
> > state until such conditions are satisfied?
> 
> Simply put, because it adds a bunch of conditional, temporary code to
> the Nova codebase as a replacement for well-documented upgrade steps.
> 
> Can we do it? Yes. Is it kind of a pain in the ass? Yeah, mostly because
> of the testing requirements.
> 
> But meh, I can whip up an amendment to Sylvain's patch that would add
> the self-healing/fallback to legacy behaviour if this is what the
> operator community insists on.

I think Alex is suggesting something different than falling back to the
legacy behaviour. The ocata scheduler would still roll forward to basing
its node selection decisions on data provided by the placement API, but
would be tolerant of the 3 different transient cases that are problematic:

 1. placement API momentarily not running yet

 2. placement API already running, but still on the newton micro-version

 3. placement API already running ocata code, but not yet warmed up

IIUC Alex is suggesting that the nova services themselves are tolerant
of those transient conditions during the upgrade, rather than requiring
multiple upgrade toolings to independently force the new ordering
constraint.

On my superficial understanding, case #3 would require the a freshly
deployed ocata placement (i.e. when upgraded from a placement-less
newton deployment) to detect that it's being run for the first time
(i.e. no providers reported yet) and return say 503s to the scheduler
queries until enough time has passed for all computes to have reported
in their inventories & allocations.

Cheers,
Eoghan 

 
> I think Matt generally has been in the "push forward" camp because we're
> tired of delaying improvements to Nova because of some terror that we
> may cause some deployer somewhere to restart their controller services
> in a particular order in order to minimize any downtime of the control
> plane.
> 
> For the distributed compute nodes, I totally understand the need to
> tolerate long rolling upgrade windows. For controller nodes/services,
> what we're talking about here is adding code into Nova scheduler to deal
> with what in 99% of cases will be something that isn't even noticed
> because the upgrade tooling will be restarting all these nodes at almost
> the same time and the momentary failures that might be logged on the
> scheduler (400s returned from the placement API due to using an unknown
> parameter in a GET request) will only exist for a second or two as the
> upgrade completes.
> 
> So, yeah, a lot of work and testing for very little real-world benefit,
> which is why a number of us just want to more forward...
> 
> Best,
> -jay
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 


Re: [openstack-dev] [nova] Order of n-api (placement) and n-sch upgrades for Ocata

2017-01-19 Thread Jay Pipes

On 01/19/2017 11:25 AM, Alex Schultz wrote:

On Thu, Jan 19, 2017 at 8:27 AM, Matt Riedemann
 wrote:

Sylvain and I were talking about how he's going to work placement
microversion requests into his filter scheduler patch [1]. He needs to make
requests to the placement API with microversion 1.4 [2] or later for
resource provider filtering on specific resource classes like VCPU and
MEMORY_MB.

The question was what happens if microversion 1.4 isn't available in the
placement API, i.e. the nova-scheduler is running Ocata code now but the
placement service is running Newton still.

Our rolling upgrades doc [3] says:

"It is safest to start nova-conductor first and nova-api last."

But since placement is bundled with n-api that would cause issues since
n-sch now depends on the n-api code.

If you package the placement service separately from the nova-api service
then this is probably not an issue. You can still roll out n-api last and
restart it last (for control services), and just make sure that placement is
upgraded before nova-scheduler (we need to be clear about that in [3]).

But do we have any other issues if they are not packaged separately? Is it
possible to install the new code, but still only restart the placement
service before nova-api? I believe it is, but want to ask this out loud.



Forgive me as I haven't looked really in depth, but if the api and
placement api are both collocated in the same apache instance this is
not necessarily the simplest thing to achieve.  While, yes it could be
achieved it will require more manual intervention of custom upgrade
scripts. To me this is not a good idea. My personal preference (now
having dealt with multiple N->O nova related acrobatics) is that these
types of requirements not be made.  We've already run into these
assumptions for new installs as well specifically in this newer code.
Why can't we turn all the services on and they properly enter a wait
state until such conditions are satisfied?


Simply put, because it adds a bunch of conditional, temporary code to 
the Nova codebase as a replacement for well-documented upgrade steps.


Can we do it? Yes. Is it kind of a pain in the ass? Yeah, mostly because 
of the testing requirements.


But meh, I can whip up an amendment to Sylvain's patch that would add 
the self-healing/fallback to legacy behaviour if this is what the 
operator community insists on.


I think Matt generally has been in the "push forward" camp because we're 
tired of delaying improvements to Nova because of some terror that we 
may cause some deployer somewhere to restart their controller services 
in a particular order in order to minimize any downtime of the control 
plane.


For the distributed compute nodes, I totally understand the need to 
tolerate long rolling upgrade windows. For controller nodes/services, 
what we're talking about here is adding code into Nova scheduler to deal 
with what in 99% of cases will be something that isn't even noticed 
because the upgrade tooling will be restarting all these nodes at almost 
the same time and the momentary failures that might be logged on the 
scheduler (400s returned from the placement API due to using an unknown 
parameter in a GET request) will only exist for a second or two as the 
upgrade completes.


So, yeah, a lot of work and testing for very little real-world benefit, 
which is why a number of us just want to more forward...


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Order of n-api (placement) and n-sch upgrades for Ocata

2017-01-19 Thread Sylvain Bauza


Le 19/01/2017 17:00, Matt Riedemann a écrit :
> On 1/19/2017 9:43 AM, Sylvain Bauza wrote:
>>
>>
>> Le 19/01/2017 16:27, Matt Riedemann a écrit :
>>> Sylvain and I were talking about how he's going to work placement
>>> microversion requests into his filter scheduler patch [1]. He needs to
>>> make requests to the placement API with microversion 1.4 [2] or later
>>> for resource provider filtering on specific resource classes like VCPU
>>> and MEMORY_MB.
>>>
>>> The question was what happens if microversion 1.4 isn't available in the
>>> placement API, i.e. the nova-scheduler is running Ocata code now but the
>>> placement service is running Newton still.
>>>
>>> Our rolling upgrades doc [3] says:
>>>
>>> "It is safest to start nova-conductor first and nova-api last."
>>>
>>> But since placement is bundled with n-api that would cause issues since
>>> n-sch now depends on the n-api code.
>>>
>>> If you package the placement service separately from the nova-api
>>> service then this is probably not an issue. You can still roll out n-api
>>> last and restart it last (for control services), and just make sure that
>>> placement is upgraded before nova-scheduler (we need to be clear about
>>> that in [3]).
>>>
>>> But do we have any other issues if they are not packaged separately? Is
>>> it possible to install the new code, but still only restart the
>>> placement service before nova-api? I believe it is, but want to ask this
>>> out loud.
>>>
>>> I think we're probably OK here but I wanted to ask this out loud and
>>> make sure everyone is aware and can think about this as we're a week
>>> from feature freeze. We also need to look into devstack/grenade because
>>> I'm fairly certain that we upgrade n-sch *before* placement in a grenade
>>> run which will make any issues here very obvious in [1].
>>>
>>> [1] https://review.openstack.org/#/c/417961/
>>> [2]
>>> http://docs.openstack.org/developer/nova/placement.html#filter-resource-providers-having-requested-resource-capacity
>>>
>>>
>>> [3]
>>> http://docs.openstack.org/developer/nova/upgrade.html#rolling-upgrade-process
>>>
>>>
>>>
>>
>> I thought out loud in the nova channel at the following possibility :
>> since we always ask to upgrade n-cpus *AFTER* upgrading our other
>> services, we could imagine to allow the nova-scheduler gently accept to
>> have a placement service be Newton *UNLESS* you have Ocata computes.
>>
>> On other technical words, the scheduler getting a response from the
>> placement service is an hard requirement for Ocata. That said, if the
>> response code is a 400 with a message saying that the schema is
>> incorrect, it would be checking the max version of all the computes and
>> then :
>>  - either the max version is Newton and then call back the
>> ComputeNodeList.get_all() for getting the list of nodes
>>  - or, the max version is Ocata (at least one node is upgraded), and
>> then we would throw a NoValidHosts
>>
>> That way, the upgrade path would be :
>>  1/ upgrade your conductor
>>  2/ upgrade all your other services but n-cpus (we could upgrade and
>> restart n-sch before n-api, that would still work, or the contrary would
>> be fine too)
>>  3/ rolling upgrade your n-cpus
>>
>> I think we would keep then the existing upgrade path and we would still
>> have the placement service be mandatory for Ocata.
>>
>> Thoughts ?
>> -Sylvain
>>
>> __
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> 
> I don't like basing the n-sch decision on the service version of the
> computes, because the computes will keep trying to connect to the
> placement service until it's available, but not fail. That doesn't
> really mean that placement is new enough for the scheduler to use the
> 1.4 microversion.
> 
> So IMO we either charge forward as planned and make it clear in the docs
> that for Ocata, the placement service must be upgraded *before*
> nova-scheduler, or we punt and provide a fallback to just pulling all
> compute nodes from the database if we can't make the 1.4 request to
> placement. Given my original post here, I'd prefer to charge forward
> unless it becomes clear that is not going to work, or is at least going
> to be very painful.
> 

Given the very short term for cycle-trailing projects [1] deadline which
is R+2 [2], that would mean a charge forward for asking to modify their
deployments would have to be done by the next 3 weeks (even less given
that we haven't yet agreed and haven't yet provided the documentation).
That would like a very short time for them and a fire drill then.

I'd prefer to see a possibility to rather accept the placement service
to be Newton. If you don't agree with verifying the compute node
versions, why not maybe just accepting to fallback calling the database

Re: [openstack-dev] [nova] Order of n-api (placement) and n-sch upgrades for Ocata

2017-01-19 Thread Alex Schultz
On Thu, Jan 19, 2017 at 8:27 AM, Matt Riedemann
 wrote:
> Sylvain and I were talking about how he's going to work placement
> microversion requests into his filter scheduler patch [1]. He needs to make
> requests to the placement API with microversion 1.4 [2] or later for
> resource provider filtering on specific resource classes like VCPU and
> MEMORY_MB.
>
> The question was what happens if microversion 1.4 isn't available in the
> placement API, i.e. the nova-scheduler is running Ocata code now but the
> placement service is running Newton still.
>
> Our rolling upgrades doc [3] says:
>
> "It is safest to start nova-conductor first and nova-api last."
>
> But since placement is bundled with n-api that would cause issues since
> n-sch now depends on the n-api code.
>
> If you package the placement service separately from the nova-api service
> then this is probably not an issue. You can still roll out n-api last and
> restart it last (for control services), and just make sure that placement is
> upgraded before nova-scheduler (we need to be clear about that in [3]).
>
> But do we have any other issues if they are not packaged separately? Is it
> possible to install the new code, but still only restart the placement
> service before nova-api? I believe it is, but want to ask this out loud.
>

Forgive me as I haven't looked really in depth, but if the api and
placement api are both collocated in the same apache instance this is
not necessarily the simplest thing to achieve.  While, yes it could be
achieved it will require more manual intervention of custom upgrade
scripts. To me this is not a good idea. My personal preference (now
having dealt with multiple N->O nova related acrobatics) is that these
types of requirements not be made.  We've already run into these
assumptions for new installs as well specifically in this newer code.
Why can't we turn all the services on and they properly enter a wait
state until such conditions are satisfied?

Thanks,
-Alex

> I think we're probably OK here but I wanted to ask this out loud and make
> sure everyone is aware and can think about this as we're a week from feature
> freeze. We also need to look into devstack/grenade because I'm fairly
> certain that we upgrade n-sch *before* placement in a grenade run which will
> make any issues here very obvious in [1].
>
> [1] https://review.openstack.org/#/c/417961/
> [2]
> http://docs.openstack.org/developer/nova/placement.html#filter-resource-providers-having-requested-resource-capacity
> [3]
> http://docs.openstack.org/developer/nova/upgrade.html#rolling-upgrade-process
>
> --
>
> Thanks,
>
> Matt Riedemann
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Order of n-api (placement) and n-sch upgrades for Ocata

2017-01-19 Thread Matt Riedemann

On 1/19/2017 9:43 AM, Sylvain Bauza wrote:



Le 19/01/2017 16:27, Matt Riedemann a écrit :

Sylvain and I were talking about how he's going to work placement
microversion requests into his filter scheduler patch [1]. He needs to
make requests to the placement API with microversion 1.4 [2] or later
for resource provider filtering on specific resource classes like VCPU
and MEMORY_MB.

The question was what happens if microversion 1.4 isn't available in the
placement API, i.e. the nova-scheduler is running Ocata code now but the
placement service is running Newton still.

Our rolling upgrades doc [3] says:

"It is safest to start nova-conductor first and nova-api last."

But since placement is bundled with n-api that would cause issues since
n-sch now depends on the n-api code.

If you package the placement service separately from the nova-api
service then this is probably not an issue. You can still roll out n-api
last and restart it last (for control services), and just make sure that
placement is upgraded before nova-scheduler (we need to be clear about
that in [3]).

But do we have any other issues if they are not packaged separately? Is
it possible to install the new code, but still only restart the
placement service before nova-api? I believe it is, but want to ask this
out loud.

I think we're probably OK here but I wanted to ask this out loud and
make sure everyone is aware and can think about this as we're a week
from feature freeze. We also need to look into devstack/grenade because
I'm fairly certain that we upgrade n-sch *before* placement in a grenade
run which will make any issues here very obvious in [1].

[1] https://review.openstack.org/#/c/417961/
[2]
http://docs.openstack.org/developer/nova/placement.html#filter-resource-providers-having-requested-resource-capacity

[3]
http://docs.openstack.org/developer/nova/upgrade.html#rolling-upgrade-process




I thought out loud in the nova channel at the following possibility :
since we always ask to upgrade n-cpus *AFTER* upgrading our other
services, we could imagine to allow the nova-scheduler gently accept to
have a placement service be Newton *UNLESS* you have Ocata computes.

On other technical words, the scheduler getting a response from the
placement service is an hard requirement for Ocata. That said, if the
response code is a 400 with a message saying that the schema is
incorrect, it would be checking the max version of all the computes and
then :
 - either the max version is Newton and then call back the
ComputeNodeList.get_all() for getting the list of nodes
 - or, the max version is Ocata (at least one node is upgraded), and
then we would throw a NoValidHosts

That way, the upgrade path would be :
 1/ upgrade your conductor
 2/ upgrade all your other services but n-cpus (we could upgrade and
restart n-sch before n-api, that would still work, or the contrary would
be fine too)
 3/ rolling upgrade your n-cpus

I think we would keep then the existing upgrade path and we would still
have the placement service be mandatory for Ocata.

Thoughts ?
-Sylvain

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



I don't like basing the n-sch decision on the service version of the 
computes, because the computes will keep trying to connect to the 
placement service until it's available, but not fail. That doesn't 
really mean that placement is new enough for the scheduler to use the 
1.4 microversion.


So IMO we either charge forward as planned and make it clear in the docs 
that for Ocata, the placement service must be upgraded *before* 
nova-scheduler, or we punt and provide a fallback to just pulling all 
compute nodes from the database if we can't make the 1.4 request to 
placement. Given my original post here, I'd prefer to charge forward 
unless it becomes clear that is not going to work, or is at least going 
to be very painful.


--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Order of n-api (placement) and n-sch upgrades for Ocata

2017-01-19 Thread Sylvain Bauza


Le 19/01/2017 16:27, Matt Riedemann a écrit :
> Sylvain and I were talking about how he's going to work placement
> microversion requests into his filter scheduler patch [1]. He needs to
> make requests to the placement API with microversion 1.4 [2] or later
> for resource provider filtering on specific resource classes like VCPU
> and MEMORY_MB.
> 
> The question was what happens if microversion 1.4 isn't available in the
> placement API, i.e. the nova-scheduler is running Ocata code now but the
> placement service is running Newton still.
> 
> Our rolling upgrades doc [3] says:
> 
> "It is safest to start nova-conductor first and nova-api last."
> 
> But since placement is bundled with n-api that would cause issues since
> n-sch now depends on the n-api code.
> 
> If you package the placement service separately from the nova-api
> service then this is probably not an issue. You can still roll out n-api
> last and restart it last (for control services), and just make sure that
> placement is upgraded before nova-scheduler (we need to be clear about
> that in [3]).
> 
> But do we have any other issues if they are not packaged separately? Is
> it possible to install the new code, but still only restart the
> placement service before nova-api? I believe it is, but want to ask this
> out loud.
> 
> I think we're probably OK here but I wanted to ask this out loud and
> make sure everyone is aware and can think about this as we're a week
> from feature freeze. We also need to look into devstack/grenade because
> I'm fairly certain that we upgrade n-sch *before* placement in a grenade
> run which will make any issues here very obvious in [1].
> 
> [1] https://review.openstack.org/#/c/417961/
> [2]
> http://docs.openstack.org/developer/nova/placement.html#filter-resource-providers-having-requested-resource-capacity
> 
> [3]
> http://docs.openstack.org/developer/nova/upgrade.html#rolling-upgrade-process
> 
> 

I thought out loud in the nova channel at the following possibility :
since we always ask to upgrade n-cpus *AFTER* upgrading our other
services, we could imagine to allow the nova-scheduler gently accept to
have a placement service be Newton *UNLESS* you have Ocata computes.

On other technical words, the scheduler getting a response from the
placement service is an hard requirement for Ocata. That said, if the
response code is a 400 with a message saying that the schema is
incorrect, it would be checking the max version of all the computes and
then :
 - either the max version is Newton and then call back the
ComputeNodeList.get_all() for getting the list of nodes
 - or, the max version is Ocata (at least one node is upgraded), and
then we would throw a NoValidHosts

That way, the upgrade path would be :
 1/ upgrade your conductor
 2/ upgrade all your other services but n-cpus (we could upgrade and
restart n-sch before n-api, that would still work, or the contrary would
be fine too)
 3/ rolling upgrade your n-cpus

I think we would keep then the existing upgrade path and we would still
have the placement service be mandatory for Ocata.

Thoughts ?
-Sylvain

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Order of n-api (placement) and n-sch upgrades for Ocata

2017-01-19 Thread Matt Riedemann
Sylvain and I were talking about how he's going to work placement 
microversion requests into his filter scheduler patch [1]. He needs to 
make requests to the placement API with microversion 1.4 [2] or later 
for resource provider filtering on specific resource classes like VCPU 
and MEMORY_MB.


The question was what happens if microversion 1.4 isn't available in the 
placement API, i.e. the nova-scheduler is running Ocata code now but the 
placement service is running Newton still.


Our rolling upgrades doc [3] says:

"It is safest to start nova-conductor first and nova-api last."

But since placement is bundled with n-api that would cause issues since 
n-sch now depends on the n-api code.


If you package the placement service separately from the nova-api 
service then this is probably not an issue. You can still roll out n-api 
last and restart it last (for control services), and just make sure that 
placement is upgraded before nova-scheduler (we need to be clear about 
that in [3]).


But do we have any other issues if they are not packaged separately? Is 
it possible to install the new code, but still only restart the 
placement service before nova-api? I believe it is, but want to ask this 
out loud.


I think we're probably OK here but I wanted to ask this out loud and 
make sure everyone is aware and can think about this as we're a week 
from feature freeze. We also need to look into devstack/grenade because 
I'm fairly certain that we upgrade n-sch *before* placement in a grenade 
run which will make any issues here very obvious in [1].


[1] https://review.openstack.org/#/c/417961/
[2] 
http://docs.openstack.org/developer/nova/placement.html#filter-resource-providers-having-requested-resource-capacity
[3] 
http://docs.openstack.org/developer/nova/upgrade.html#rolling-upgrade-process


--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev