Re: [openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades

2014-10-21 Thread Tim Bell
 -Original Message-
 From: Christopher Aedo [mailto:d...@aedo.net]
 Sent: 21 October 2014 04:45
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [All] Maintenance mode in OpenStack during
 patching/upgrades
 
...
 
 Also, I would like to see maintenance mode for Nova be limited just to
 stopping any further VMs being sent there, and the node reporting that it's in
 maintenance mode.  I think proactive workload migration should be handled
 independently, as I can imaging scenarios where maintenance mode might be
 desired without coupling migration to it.
 

A typical scenario we have is a non-fatal hardware repair. If a node is 
reporting ECC memory errors, you want to schedule a repair which
Will be disruptive for any VMs running on that host. The users get annoyed when 
you give them their new VM and then immediately tell them the
hardware is going to be repaired.

Setting into maintenance for me should mean no new work. I assume that stopping 
the service has a negative impact on other functions like Telemetry.

Tim

 I would love to keep discussing this further - a small session in Paris would 
 be
 great.  But it seems like there's never enough time at the summits, so I don't
 have high hopes for making much progress on this specific topic there.  Just 
 the
 same, if anything gets pulled together, I'll be keeping an eye out for it.
 
 -Christopher
 
 On Fri, Oct 17, 2014 at 9:21 PM, Joe Cropper cropper@gmail.com wrote:
  I’m glad to see this topic getting some focus once again.  :-)
 
  From several of the administrators I talk with, when they think of putting a
 host into maintenance mode, the common requests I hear are:
 
  1. Don’t schedule more VMs to the host 2. Provide an optional way to
  automatically migrate all (usually active) VMs off the host so that
  users’ workloads remain “unaffected” by the maintenance operation
 
  #1 can easily be achieved, as has been mentioned several times, by simply
 disabling the compute service.  However, #2 involves a little more work,
 although certainly possible using all the operations provided by nova today 
 (e.g.,
 live migration, etc.).  I believe these types of discussions have come up 
 several
 times over the past several OpenStack releases—certainly since Grizzly (i.e.,
 when I started watching this space).
 
  It seems that the general direction is to have the type of workflow needed 
  for
 #2 outside of nova (which is certainly a valid stance).  To that end, it 
 would be
 fairly straightforward to build some code that logically sits on top of nova, 
 that
 when entering maintenance:
 
  1. Prevents VMs from being scheduled to the host; 2. Maintains state
  about the maintenance operation (e.g., not in maintenance, migrations
  in progress, in maintenance, or error); 3. Provides mechanisms to, upon
 entering maintenance, dictates which VMs (active, all, none) to migrate and
 provides some throttling capabilities to prevent hundreds of parallel 
 migrations
 on densely packed hosts (all done via a REST API).
 
  If anyone has additional questions, comments, or would like to discuss some
 options, please let me know.  If interested, upon request, I could even share 
 a
 video of how such cases might work.  :-)  My colleagues and I have given these
 use cases a lot of thought and consideration and I’d love to talk more about
 them (perhaps a small session in Paris would be possible).
 
  - Joe
 
  On Oct 17, 2014, at 4:18 AM, John Garbutt j...@johngarbutt.com wrote:
 
  On 17 October 2014 02:28, Matt Riedemann mrie...@linux.vnet.ibm.com
 wrote:
 
 
  On 10/16/2014 7:26 PM, Christopher Aedo wrote:
 
  On Tue, Sep 9, 2014 at 2:19 PM, Mike Scherbakov
  mscherba...@mirantis.com wrote:
 
  On Tue, Sep 9, 2014 at 6:02 PM, Clint Byrum cl...@fewbar.com
 wrote:
 
  The idea is not simply deny or hang requests from clients, but
  provide them we are in maintenance mode, retry in X seconds
 
  You probably would want 'nova host-servers-migrate host'
 
  yeah for migrations - but as far as I understand, it doesn't help
  with disabling this host in scheduler - there is can be a chance
  that some workloads will be scheduled to the host.
 
 
  Regarding putting a compute host in maintenance mode using nova
  host-update --maintenance enable, it looks like the blueprint and
  associated commits were abandoned a year and a half ago:
  https://blueprints.launchpad.net/nova/+spec/host-maintenance
 
  It seems that nova service-disable host nova-compute
  effectively prevents the scheduler from trying to send new work
  there.  Is this the best approach to use right now if you want to
  pull a compute host out of an environment before migrating VMs off?
 
  I agree with Tim and Mike that having something respond down for
  maintenance rather than ignore or hang would be really valuable.
  But it also looks like that hasn't gotten much traction in the past
  - anyone feel like they'd be in support

Re: [openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades

2014-10-20 Thread Christopher Aedo
I'm glad to see there's more than one interested person here too :)

Regarding the Xen-specific host maintenance mode, if it gets dropped I
would not complain since it's useful only to those running Xen at the
moment.  The issues around when it works and doesn't work are my
bigger concern - as similar limitations exist in the migrate code
today.  They're not xen-specific, but do seem to consider few
deployment scenarios (and don't seem to work if you're using
ceph-backed storage for instance).

As Joe pointed out, there's definitely a need for maintenance mode.
Having a reliable method to pull a compute node out of a cluster would
be incredibly valuable.  This will certainly be a required component
of any full-environment upgrade path.

The scenario Joe outlined is the only working approach I'm aware of
right now, but I'm not a fan of disabling the compute service.  For
one thing, hopefully it will raise an alarm with your monitoring
system.  It also has the potential of interfering with other
operations that are ongoing (and with nova compute disabled, will you
still/always be able to reliably migrate a VM off the host?)

Also, I would like to see maintenance mode for Nova be limited just
to stopping any further VMs being sent there, and the node reporting
that it's in maintenance mode.  I think proactive workload migration
should be handled independently, as I can imaging scenarios where
maintenance mode might be desired without coupling migration to it.

I would love to keep discussing this further - a small session in
Paris would be great.  But it seems like there's never enough time at
the summits, so I don't have high hopes for making much progress on
this specific topic there.  Just the same, if anything gets pulled
together, I'll be keeping an eye out for it.

-Christopher

On Fri, Oct 17, 2014 at 9:21 PM, Joe Cropper cropper@gmail.com wrote:
 I’m glad to see this topic getting some focus once again.  :-)

 From several of the administrators I talk with, when they think of putting a 
 host into maintenance mode, the common requests I hear are:

 1. Don’t schedule more VMs to the host
 2. Provide an optional way to automatically migrate all (usually active) VMs 
 off the host so that users’ workloads remain “unaffected” by the maintenance 
 operation

 #1 can easily be achieved, as has been mentioned several times, by simply 
 disabling the compute service.  However, #2 involves a little more work, 
 although certainly possible using all the operations provided by nova today 
 (e.g., live migration, etc.).  I believe these types of discussions have come 
 up several times over the past several OpenStack releases—certainly since 
 Grizzly (i.e., when I started watching this space).

 It seems that the general direction is to have the type of workflow needed 
 for #2 outside of nova (which is certainly a valid stance).  To that end, it 
 would be fairly straightforward to build some code that logically sits on top 
 of nova, that when entering maintenance:

 1. Prevents VMs from being scheduled to the host;
 2. Maintains state about the maintenance operation (e.g., not in maintenance, 
 migrations in progress, in maintenance, or error);
 3. Provides mechanisms to, upon entering maintenance, dictates which VMs 
 (active, all, none) to migrate and provides some throttling capabilities to 
 prevent hundreds of parallel migrations on densely packed hosts (all done via 
 a REST API).

 If anyone has additional questions, comments, or would like to discuss some 
 options, please let me know.  If interested, upon request, I could even share 
 a video of how such cases might work.  :-)  My colleagues and I have given 
 these use cases a lot of thought and consideration and I’d love to talk more 
 about them (perhaps a small session in Paris would be possible).

 - Joe

 On Oct 17, 2014, at 4:18 AM, John Garbutt j...@johngarbutt.com wrote:

 On 17 October 2014 02:28, Matt Riedemann mrie...@linux.vnet.ibm.com wrote:


 On 10/16/2014 7:26 PM, Christopher Aedo wrote:

 On Tue, Sep 9, 2014 at 2:19 PM, Mike Scherbakov
 mscherba...@mirantis.com wrote:

 On Tue, Sep 9, 2014 at 6:02 PM, Clint Byrum cl...@fewbar.com wrote:

 The idea is not simply deny or hang requests from clients, but provide
 them
 we are in maintenance mode, retry in X seconds

 You probably would want 'nova host-servers-migrate host'

 yeah for migrations - but as far as I understand, it doesn't help with
 disabling this host in scheduler - there is can be a chance that some
 workloads will be scheduled to the host.


 Regarding putting a compute host in maintenance mode using nova
 host-update --maintenance enable, it looks like the blueprint and
 associated commits were abandoned a year and a half ago:
 https://blueprints.launchpad.net/nova/+spec/host-maintenance

 It seems that nova service-disable host nova-compute effectively
 prevents the scheduler from trying to send new work there.  Is this
 the best approach to use right now if you 

Re: [openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades

2014-10-17 Thread John Garbutt
On 17 October 2014 02:28, Matt Riedemann mrie...@linux.vnet.ibm.com wrote:


 On 10/16/2014 7:26 PM, Christopher Aedo wrote:

 On Tue, Sep 9, 2014 at 2:19 PM, Mike Scherbakov
 mscherba...@mirantis.com wrote:

 On Tue, Sep 9, 2014 at 6:02 PM, Clint Byrum cl...@fewbar.com wrote:

 The idea is not simply deny or hang requests from clients, but provide
 them
 we are in maintenance mode, retry in X seconds

 You probably would want 'nova host-servers-migrate host'

 yeah for migrations - but as far as I understand, it doesn't help with
 disabling this host in scheduler - there is can be a chance that some
 workloads will be scheduled to the host.


 Regarding putting a compute host in maintenance mode using nova
 host-update --maintenance enable, it looks like the blueprint and
 associated commits were abandoned a year and a half ago:
 https://blueprints.launchpad.net/nova/+spec/host-maintenance

 It seems that nova service-disable host nova-compute effectively
 prevents the scheduler from trying to send new work there.  Is this
 the best approach to use right now if you want to pull a compute host
 out of an environment before migrating VMs off?

 I agree with Tim and Mike that having something respond down for
 maintenance rather than ignore or hang would be really valuable.  But
 it also looks like that hasn't gotten much traction in the past -
 anyone feel like they'd be in support of reviving the notion of
 maintenance mode?

 -Christopher

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 host-maintenance-mode is definitely a thing in nova compute via the os-hosts
 API extension and the --maintenance parameter, the compute manager code is
 here [1].  The thing is the only in-tree virt driver that implements it is
 xenapi, and I believe when you put the host in maintenance mode it's
 supposed to automatically evacuate the instances to some other host, but you
 can't target the other host or tell the driver, from the API, which
 instances you want to evacuate, e.g. all, none, running only, etc.

 [1]
 http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/manager.py?id=2014.2#n3990

We should certainly make that more generic. It doesn't update the VM
state, so its really only admin focused in its current form.

The XenAPI logic only works when using XenServer pools with shared NFS
storage, if my memory serves me correctly. Honestly, its a bit of code
I have planned on removing, along with the rest of the pool support.

In terms of requiring DB downtime in Nova, the current efforts are
focusing on avoiding downtime all together, via expand/contract style
migrations, with a little help from objects to avoid data migrations.

That doesn't mean maintenance mode if not useful for other things,
like an emergency patching of the hypervisor.

John

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades

2014-10-17 Thread Joe Cropper
I’m glad to see this topic getting some focus once again.  :-)

From several of the administrators I talk with, when they think of putting a 
host into maintenance mode, the common requests I hear are:

1. Don’t schedule more VMs to the host
2. Provide an optional way to automatically migrate all (usually active) VMs 
off the host so that users’ workloads remain “unaffected” by the maintenance 
operation

#1 can easily be achieved, as has been mentioned several times, by simply 
disabling the compute service.  However, #2 involves a little more work, 
although certainly possible using all the operations provided by nova today 
(e.g., live migration, etc.).  I believe these types of discussions have come 
up several times over the past several OpenStack releases—certainly since 
Grizzly (i.e., when I started watching this space).

It seems that the general direction is to have the type of workflow needed for 
#2 outside of nova (which is certainly a valid stance).  To that end, it would 
be fairly straightforward to build some code that logically sits on top of 
nova, that when entering maintenance:

1. Prevents VMs from being scheduled to the host;
2. Maintains state about the maintenance operation (e.g., not in maintenance, 
migrations in progress, in maintenance, or error);
3. Provides mechanisms to, upon entering maintenance, dictates which VMs 
(active, all, none) to migrate and provides some throttling capabilities to 
prevent hundreds of parallel migrations on densely packed hosts (all done via a 
REST API).

If anyone has additional questions, comments, or would like to discuss some 
options, please let me know.  If interested, upon request, I could even share a 
video of how such cases might work.  :-)  My colleagues and I have given these 
use cases a lot of thought and consideration and I’d love to talk more about 
them (perhaps a small session in Paris would be possible).

- Joe

On Oct 17, 2014, at 4:18 AM, John Garbutt j...@johngarbutt.com wrote:

 On 17 October 2014 02:28, Matt Riedemann mrie...@linux.vnet.ibm.com wrote:
 
 
 On 10/16/2014 7:26 PM, Christopher Aedo wrote:
 
 On Tue, Sep 9, 2014 at 2:19 PM, Mike Scherbakov
 mscherba...@mirantis.com wrote:
 
 On Tue, Sep 9, 2014 at 6:02 PM, Clint Byrum cl...@fewbar.com wrote:
 
 The idea is not simply deny or hang requests from clients, but provide
 them
 we are in maintenance mode, retry in X seconds
 
 You probably would want 'nova host-servers-migrate host'
 
 yeah for migrations - but as far as I understand, it doesn't help with
 disabling this host in scheduler - there is can be a chance that some
 workloads will be scheduled to the host.
 
 
 Regarding putting a compute host in maintenance mode using nova
 host-update --maintenance enable, it looks like the blueprint and
 associated commits were abandoned a year and a half ago:
 https://blueprints.launchpad.net/nova/+spec/host-maintenance
 
 It seems that nova service-disable host nova-compute effectively
 prevents the scheduler from trying to send new work there.  Is this
 the best approach to use right now if you want to pull a compute host
 out of an environment before migrating VMs off?
 
 I agree with Tim and Mike that having something respond down for
 maintenance rather than ignore or hang would be really valuable.  But
 it also looks like that hasn't gotten much traction in the past -
 anyone feel like they'd be in support of reviving the notion of
 maintenance mode?
 
 -Christopher
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 host-maintenance-mode is definitely a thing in nova compute via the os-hosts
 API extension and the --maintenance parameter, the compute manager code is
 here [1].  The thing is the only in-tree virt driver that implements it is
 xenapi, and I believe when you put the host in maintenance mode it's
 supposed to automatically evacuate the instances to some other host, but you
 can't target the other host or tell the driver, from the API, which
 instances you want to evacuate, e.g. all, none, running only, etc.
 
 [1]
 http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/manager.py?id=2014.2#n3990
 
 We should certainly make that more generic. It doesn't update the VM
 state, so its really only admin focused in its current form.
 
 The XenAPI logic only works when using XenServer pools with shared NFS
 storage, if my memory serves me correctly. Honestly, its a bit of code
 I have planned on removing, along with the rest of the pool support.
 
 In terms of requiring DB downtime in Nova, the current efforts are
 focusing on avoiding downtime all together, via expand/contract style
 migrations, with a little help from objects to avoid data migrations.
 
 That doesn't mean maintenance mode if not useful for other things,
 like an emergency patching of the hypervisor.
 
 John
 
 

Re: [openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades

2014-10-16 Thread Christopher Aedo
On Tue, Sep 9, 2014 at 2:19 PM, Mike Scherbakov
mscherba...@mirantis.com wrote:
 On Tue, Sep 9, 2014 at 6:02 PM, Clint Byrum cl...@fewbar.com wrote:
 The idea is not simply deny or hang requests from clients, but provide them
 we are in maintenance mode, retry in X seconds

 You probably would want 'nova host-servers-migrate host'
 yeah for migrations - but as far as I understand, it doesn't help with
 disabling this host in scheduler - there is can be a chance that some
 workloads will be scheduled to the host.

Regarding putting a compute host in maintenance mode using nova
host-update --maintenance enable, it looks like the blueprint and
associated commits were abandoned a year and a half ago:
https://blueprints.launchpad.net/nova/+spec/host-maintenance

It seems that nova service-disable host nova-compute effectively
prevents the scheduler from trying to send new work there.  Is this
the best approach to use right now if you want to pull a compute host
out of an environment before migrating VMs off?

I agree with Tim and Mike that having something respond down for
maintenance rather than ignore or hang would be really valuable.  But
it also looks like that hasn't gotten much traction in the past -
anyone feel like they'd be in support of reviving the notion of
maintenance mode?

-Christopher

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades

2014-09-10 Thread Tim Bell
It would be great if each OpenStack component could provide a maintenance mode 
like this… there was some work being considered on Cells 
https://blueprints.launchpad.net/nova/+spec/disable-child-cell-support which 
would have allowed parts of Nova to indicate they were in maintenance.

Something generic would be very useful. Some operators have asked for 
‘read-only’ modes also where query is OK but update is not permitted.

Tim

From: Mike Scherbakov [mailto:mscherba...@mirantis.com]
Sent: 09 September 2014 23:20
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [All] Maintenance mode in OpenStack during 
patching/upgrades

Sergii, Clint,
to rephrase what you are saying - there are might be situations when our 
OpenStack API will not be responding, as simply services would be down for 
upgrade.
Do we want to support it somehow? For example, if we know that Nova is going to 
be down, can we respond with HTTP 503 with appropriate Retry-After time in 
header?

The idea is not simply deny or hang requests from clients, but provide them we 
are in maintenance mode, retry in X seconds

 Turbo Hipster was added to the gate
great idea, I think we should use it in Fuel too

 You probably would want 'nova host-servers-migrate host'
yeah for migrations - but as far as I understand, it doesn't help with 
disabling this host in scheduler - there is can be a chance that some workloads 
will be scheduled to the host.


On Tue, Sep 9, 2014 at 6:02 PM, Clint Byrum 
cl...@fewbar.commailto:cl...@fewbar.com wrote:
Excerpts from Mike Scherbakov's message of 2014-09-09 00:35:09 -0700:
 Hi all,
 please see below original email below from Dmitry. I've modified the
 subject to bring larger audience to the issue.

 I'd like to split the issue into two parts:

1. Maintenance mode for OpenStack controllers in HA mode (HA-ed
Keystone, Glance, etc.)
2. Maintenance mode for OpenStack computes/storage nodes (no HA)

 For first category, we might not need to have maintenance mode at all. For
 example, if we apply patching/upgrade one by one node to 3-node HA cluster,
 2 nodes will serve requests normally. Is that possible for our HA solutions
 in Fuel, TripleO, other frameworks?

You may have a broken cloud if you are pushing out an update that
requires a new schema. Some services are better than others about
handling old schemas, and can be upgraded before doing schema upgrades.
But most of the time you have to do at least a brief downtime:

 * turn off DB accessing services
 * update code
 * run db migration
 * turn on DB accessing services

It is for this very reason, I believe, that Turbo Hipster was added to
the gate, so that deployers running against the upstream master branches
can have a chance at performing these upgrades in a reasonable amount of
time.


 For second category, can not we simply do nova-manage service disable...,
 so scheduler will simply stop scheduling new workloads on particular host
 which we want to do maintenance on?


You probably would want 'nova host-servers-migrate host' at that
point, assuming you have migration set up.

http://docs.openstack.org/user-guide/content/novaclient_commands.html

 On Thu, Aug 28, 2014 at 6:44 PM, Dmitry Pyzhov 
 dpyz...@mirantis.commailto:dpyz...@mirantis.com wrote:

  All,
 
  I'm not sure if it deserves to be mentioned in our documentation, this
  seems to be a common practice. If an administrator wants to patch his
  environment, he should be prepared for a temporary downtime of OpenStack
  services. And he should plan to perform patching in advance: choose a time
  with minimal load and warn users about possible interruptions of service
  availability.
 
  Our current implementation of patching does not protect from downtime
  during the patching procedure. HA deployments seems to be more or less
  stable. But it looks like it is possible to schedule an action on a compute
  node and get an error because of service restart. Deployments with one
  controller... well, you won’t be able to use your cluster until the
  patching is finished. There is no way to get rid of downtime here.
 
  As I understand, we can get rid of possible issues with computes in HA.
  But it will require migration of instances and stopping of nova-compute
  service before patching. And it will make the overall patching procedure
  much longer. Do we want to investigate this process?
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Mike Scherbakov
#mihgen
___
OpenStack-dev mailing list
OpenStack-dev

[openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades

2014-09-09 Thread Mike Scherbakov
Hi all,
please see below original email below from Dmitry. I've modified the
subject to bring larger audience to the issue.

I'd like to split the issue into two parts:

   1. Maintenance mode for OpenStack controllers in HA mode (HA-ed
   Keystone, Glance, etc.)
   2. Maintenance mode for OpenStack computes/storage nodes (no HA)

For first category, we might not need to have maintenance mode at all. For
example, if we apply patching/upgrade one by one node to 3-node HA cluster,
2 nodes will serve requests normally. Is that possible for our HA solutions
in Fuel, TripleO, other frameworks?

For second category, can not we simply do nova-manage service disable...,
so scheduler will simply stop scheduling new workloads on particular host
which we want to do maintenance on?


On Thu, Aug 28, 2014 at 6:44 PM, Dmitry Pyzhov dpyz...@mirantis.com wrote:

 All,

 I'm not sure if it deserves to be mentioned in our documentation, this
 seems to be a common practice. If an administrator wants to patch his
 environment, he should be prepared for a temporary downtime of OpenStack
 services. And he should plan to perform patching in advance: choose a time
 with minimal load and warn users about possible interruptions of service
 availability.

 Our current implementation of patching does not protect from downtime
 during the patching procedure. HA deployments seems to be more or less
 stable. But it looks like it is possible to schedule an action on a compute
 node and get an error because of service restart. Deployments with one
 controller... well, you won’t be able to use your cluster until the
 patching is finished. There is no way to get rid of downtime here.

 As I understand, we can get rid of possible issues with computes in HA.
 But it will require migration of instances and stopping of nova-compute
 service before patching. And it will make the overall patching procedure
 much longer. Do we want to investigate this process?

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




-- 
Mike Scherbakov
#mihgen
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades

2014-09-09 Thread Sergii Golovatiuk
Hi Fuelers,

1. Sometimes fuel has non reversible changes. Here are a couple of samples
A new version needs to change/adjust Pacemaker primitives. Such changes
affect all controllers in cluster.
A old API can be deprecated or new API can be introduced. Until we all
components configured to use new API, it's almost impossible to keep half
of cluster with old API and half cluster with new API.

2. For computes, even if we stop services VM instances should work. I think
it's possible to upgrade without downtime of VM instances. Though I am not
sure if it's possible for CEPH nodes.




--
Best regards,
Sergii Golovatiuk,
Skype #golserge
IRC #holser

On Tue, Sep 9, 2014 at 9:35 AM, Mike Scherbakov mscherba...@mirantis.com
wrote:

 Hi all,
 please see below original email below from Dmitry. I've modified the
 subject to bring larger audience to the issue.

 I'd like to split the issue into two parts:

1. Maintenance mode for OpenStack controllers in HA mode (HA-ed
Keystone, Glance, etc.)
2. Maintenance mode for OpenStack computes/storage nodes (no HA)

 For first category, we might not need to have maintenance mode at all. For
 example, if we apply patching/upgrade one by one node to 3-node HA cluster,
 2 nodes will serve requests normally. Is that possible for our HA solutions
 in Fuel, TripleO, other frameworks?

 For second category, can not we simply do nova-manage service
 disable..., so scheduler will simply stop scheduling new workloads on
 particular host which we want to do maintenance on?


 On Thu, Aug 28, 2014 at 6:44 PM, Dmitry Pyzhov dpyz...@mirantis.com
 wrote:

 All,

 I'm not sure if it deserves to be mentioned in our documentation, this
 seems to be a common practice. If an administrator wants to patch his
 environment, he should be prepared for a temporary downtime of OpenStack
 services. And he should plan to perform patching in advance: choose a time
 with minimal load and warn users about possible interruptions of service
 availability.

 Our current implementation of patching does not protect from downtime
 during the patching procedure. HA deployments seems to be more or less
 stable. But it looks like it is possible to schedule an action on a compute
 node and get an error because of service restart. Deployments with one
 controller... well, you won’t be able to use your cluster until the
 patching is finished. There is no way to get rid of downtime here.

 As I understand, we can get rid of possible issues with computes in HA.
 But it will require migration of instances and stopping of nova-compute
 service before patching. And it will make the overall patching procedure
 much longer. Do we want to investigate this process?

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




 --
 Mike Scherbakov
 #mihgen


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades

2014-09-09 Thread Mike Scherbakov
Sergii, Clint,
to rephrase what you are saying - there are might be situations when our
OpenStack API will not be responding, as simply services would be down for
upgrade.
Do we want to support it somehow? For example, if we know that Nova is
going to be down, can we respond with HTTP 503 with appropriate Retry-After
time in header?

The idea is not simply deny or hang requests from clients, but provide them
we are in maintenance mode, retry in X seconds

 Turbo Hipster was added to the gate
great idea, I think we should use it in Fuel too

 You probably would want 'nova host-servers-migrate host'
yeah for migrations - but as far as I understand, it doesn't help with
disabling this host in scheduler - there is can be a chance that some
workloads will be scheduled to the host.


On Tue, Sep 9, 2014 at 6:02 PM, Clint Byrum cl...@fewbar.com wrote:

 Excerpts from Mike Scherbakov's message of 2014-09-09 00:35:09 -0700:
  Hi all,
  please see below original email below from Dmitry. I've modified the
  subject to bring larger audience to the issue.
 
  I'd like to split the issue into two parts:
 
 1. Maintenance mode for OpenStack controllers in HA mode (HA-ed
 Keystone, Glance, etc.)
 2. Maintenance mode for OpenStack computes/storage nodes (no HA)
 
  For first category, we might not need to have maintenance mode at all.
 For
  example, if we apply patching/upgrade one by one node to 3-node HA
 cluster,
  2 nodes will serve requests normally. Is that possible for our HA
 solutions
  in Fuel, TripleO, other frameworks?

 You may have a broken cloud if you are pushing out an update that
 requires a new schema. Some services are better than others about
 handling old schemas, and can be upgraded before doing schema upgrades.
 But most of the time you have to do at least a brief downtime:

  * turn off DB accessing services
  * update code
  * run db migration
  * turn on DB accessing services

 It is for this very reason, I believe, that Turbo Hipster was added to
 the gate, so that deployers running against the upstream master branches
 can have a chance at performing these upgrades in a reasonable amount of
 time.

 
  For second category, can not we simply do nova-manage service
 disable...,
  so scheduler will simply stop scheduling new workloads on particular host
  which we want to do maintenance on?
 

 You probably would want 'nova host-servers-migrate host' at that
 point, assuming you have migration set up.

 http://docs.openstack.org/user-guide/content/novaclient_commands.html

  On Thu, Aug 28, 2014 at 6:44 PM, Dmitry Pyzhov dpyz...@mirantis.com
 wrote:
 
   All,
  
   I'm not sure if it deserves to be mentioned in our documentation, this
   seems to be a common practice. If an administrator wants to patch his
   environment, he should be prepared for a temporary downtime of
 OpenStack
   services. And he should plan to perform patching in advance: choose a
 time
   with minimal load and warn users about possible interruptions of
 service
   availability.
  
   Our current implementation of patching does not protect from downtime
   during the patching procedure. HA deployments seems to be more or less
   stable. But it looks like it is possible to schedule an action on a
 compute
   node and get an error because of service restart. Deployments with one
   controller... well, you won’t be able to use your cluster until the
   patching is finished. There is no way to get rid of downtime here.
  
   As I understand, we can get rid of possible issues with computes in HA.
   But it will require migration of instances and stopping of nova-compute
   service before patching. And it will make the overall patching
 procedure
   much longer. Do we want to investigate this process?
  
   ___
   OpenStack-dev mailing list
   OpenStack-dev@lists.openstack.org
   http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
  
  
 

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




-- 
Mike Scherbakov
#mihgen
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev