Re: [openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades
-Original Message- From: Christopher Aedo [mailto:d...@aedo.net] Sent: 21 October 2014 04:45 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades ... Also, I would like to see maintenance mode for Nova be limited just to stopping any further VMs being sent there, and the node reporting that it's in maintenance mode. I think proactive workload migration should be handled independently, as I can imaging scenarios where maintenance mode might be desired without coupling migration to it. A typical scenario we have is a non-fatal hardware repair. If a node is reporting ECC memory errors, you want to schedule a repair which Will be disruptive for any VMs running on that host. The users get annoyed when you give them their new VM and then immediately tell them the hardware is going to be repaired. Setting into maintenance for me should mean no new work. I assume that stopping the service has a negative impact on other functions like Telemetry. Tim I would love to keep discussing this further - a small session in Paris would be great. But it seems like there's never enough time at the summits, so I don't have high hopes for making much progress on this specific topic there. Just the same, if anything gets pulled together, I'll be keeping an eye out for it. -Christopher On Fri, Oct 17, 2014 at 9:21 PM, Joe Cropper cropper@gmail.com wrote: I’m glad to see this topic getting some focus once again. :-) From several of the administrators I talk with, when they think of putting a host into maintenance mode, the common requests I hear are: 1. Don’t schedule more VMs to the host 2. Provide an optional way to automatically migrate all (usually active) VMs off the host so that users’ workloads remain “unaffected” by the maintenance operation #1 can easily be achieved, as has been mentioned several times, by simply disabling the compute service. However, #2 involves a little more work, although certainly possible using all the operations provided by nova today (e.g., live migration, etc.). I believe these types of discussions have come up several times over the past several OpenStack releases—certainly since Grizzly (i.e., when I started watching this space). It seems that the general direction is to have the type of workflow needed for #2 outside of nova (which is certainly a valid stance). To that end, it would be fairly straightforward to build some code that logically sits on top of nova, that when entering maintenance: 1. Prevents VMs from being scheduled to the host; 2. Maintains state about the maintenance operation (e.g., not in maintenance, migrations in progress, in maintenance, or error); 3. Provides mechanisms to, upon entering maintenance, dictates which VMs (active, all, none) to migrate and provides some throttling capabilities to prevent hundreds of parallel migrations on densely packed hosts (all done via a REST API). If anyone has additional questions, comments, or would like to discuss some options, please let me know. If interested, upon request, I could even share a video of how such cases might work. :-) My colleagues and I have given these use cases a lot of thought and consideration and I’d love to talk more about them (perhaps a small session in Paris would be possible). - Joe On Oct 17, 2014, at 4:18 AM, John Garbutt j...@johngarbutt.com wrote: On 17 October 2014 02:28, Matt Riedemann mrie...@linux.vnet.ibm.com wrote: On 10/16/2014 7:26 PM, Christopher Aedo wrote: On Tue, Sep 9, 2014 at 2:19 PM, Mike Scherbakov mscherba...@mirantis.com wrote: On Tue, Sep 9, 2014 at 6:02 PM, Clint Byrum cl...@fewbar.com wrote: The idea is not simply deny or hang requests from clients, but provide them we are in maintenance mode, retry in X seconds You probably would want 'nova host-servers-migrate host' yeah for migrations - but as far as I understand, it doesn't help with disabling this host in scheduler - there is can be a chance that some workloads will be scheduled to the host. Regarding putting a compute host in maintenance mode using nova host-update --maintenance enable, it looks like the blueprint and associated commits were abandoned a year and a half ago: https://blueprints.launchpad.net/nova/+spec/host-maintenance It seems that nova service-disable host nova-compute effectively prevents the scheduler from trying to send new work there. Is this the best approach to use right now if you want to pull a compute host out of an environment before migrating VMs off? I agree with Tim and Mike that having something respond down for maintenance rather than ignore or hang would be really valuable. But it also looks like that hasn't gotten much traction in the past - anyone feel like they'd be in support
Re: [openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades
I'm glad to see there's more than one interested person here too :) Regarding the Xen-specific host maintenance mode, if it gets dropped I would not complain since it's useful only to those running Xen at the moment. The issues around when it works and doesn't work are my bigger concern - as similar limitations exist in the migrate code today. They're not xen-specific, but do seem to consider few deployment scenarios (and don't seem to work if you're using ceph-backed storage for instance). As Joe pointed out, there's definitely a need for maintenance mode. Having a reliable method to pull a compute node out of a cluster would be incredibly valuable. This will certainly be a required component of any full-environment upgrade path. The scenario Joe outlined is the only working approach I'm aware of right now, but I'm not a fan of disabling the compute service. For one thing, hopefully it will raise an alarm with your monitoring system. It also has the potential of interfering with other operations that are ongoing (and with nova compute disabled, will you still/always be able to reliably migrate a VM off the host?) Also, I would like to see maintenance mode for Nova be limited just to stopping any further VMs being sent there, and the node reporting that it's in maintenance mode. I think proactive workload migration should be handled independently, as I can imaging scenarios where maintenance mode might be desired without coupling migration to it. I would love to keep discussing this further - a small session in Paris would be great. But it seems like there's never enough time at the summits, so I don't have high hopes for making much progress on this specific topic there. Just the same, if anything gets pulled together, I'll be keeping an eye out for it. -Christopher On Fri, Oct 17, 2014 at 9:21 PM, Joe Cropper cropper@gmail.com wrote: I’m glad to see this topic getting some focus once again. :-) From several of the administrators I talk with, when they think of putting a host into maintenance mode, the common requests I hear are: 1. Don’t schedule more VMs to the host 2. Provide an optional way to automatically migrate all (usually active) VMs off the host so that users’ workloads remain “unaffected” by the maintenance operation #1 can easily be achieved, as has been mentioned several times, by simply disabling the compute service. However, #2 involves a little more work, although certainly possible using all the operations provided by nova today (e.g., live migration, etc.). I believe these types of discussions have come up several times over the past several OpenStack releases—certainly since Grizzly (i.e., when I started watching this space). It seems that the general direction is to have the type of workflow needed for #2 outside of nova (which is certainly a valid stance). To that end, it would be fairly straightforward to build some code that logically sits on top of nova, that when entering maintenance: 1. Prevents VMs from being scheduled to the host; 2. Maintains state about the maintenance operation (e.g., not in maintenance, migrations in progress, in maintenance, or error); 3. Provides mechanisms to, upon entering maintenance, dictates which VMs (active, all, none) to migrate and provides some throttling capabilities to prevent hundreds of parallel migrations on densely packed hosts (all done via a REST API). If anyone has additional questions, comments, or would like to discuss some options, please let me know. If interested, upon request, I could even share a video of how such cases might work. :-) My colleagues and I have given these use cases a lot of thought and consideration and I’d love to talk more about them (perhaps a small session in Paris would be possible). - Joe On Oct 17, 2014, at 4:18 AM, John Garbutt j...@johngarbutt.com wrote: On 17 October 2014 02:28, Matt Riedemann mrie...@linux.vnet.ibm.com wrote: On 10/16/2014 7:26 PM, Christopher Aedo wrote: On Tue, Sep 9, 2014 at 2:19 PM, Mike Scherbakov mscherba...@mirantis.com wrote: On Tue, Sep 9, 2014 at 6:02 PM, Clint Byrum cl...@fewbar.com wrote: The idea is not simply deny or hang requests from clients, but provide them we are in maintenance mode, retry in X seconds You probably would want 'nova host-servers-migrate host' yeah for migrations - but as far as I understand, it doesn't help with disabling this host in scheduler - there is can be a chance that some workloads will be scheduled to the host. Regarding putting a compute host in maintenance mode using nova host-update --maintenance enable, it looks like the blueprint and associated commits were abandoned a year and a half ago: https://blueprints.launchpad.net/nova/+spec/host-maintenance It seems that nova service-disable host nova-compute effectively prevents the scheduler from trying to send new work there. Is this the best approach to use right now if you
Re: [openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades
On 17 October 2014 02:28, Matt Riedemann mrie...@linux.vnet.ibm.com wrote: On 10/16/2014 7:26 PM, Christopher Aedo wrote: On Tue, Sep 9, 2014 at 2:19 PM, Mike Scherbakov mscherba...@mirantis.com wrote: On Tue, Sep 9, 2014 at 6:02 PM, Clint Byrum cl...@fewbar.com wrote: The idea is not simply deny or hang requests from clients, but provide them we are in maintenance mode, retry in X seconds You probably would want 'nova host-servers-migrate host' yeah for migrations - but as far as I understand, it doesn't help with disabling this host in scheduler - there is can be a chance that some workloads will be scheduled to the host. Regarding putting a compute host in maintenance mode using nova host-update --maintenance enable, it looks like the blueprint and associated commits were abandoned a year and a half ago: https://blueprints.launchpad.net/nova/+spec/host-maintenance It seems that nova service-disable host nova-compute effectively prevents the scheduler from trying to send new work there. Is this the best approach to use right now if you want to pull a compute host out of an environment before migrating VMs off? I agree with Tim and Mike that having something respond down for maintenance rather than ignore or hang would be really valuable. But it also looks like that hasn't gotten much traction in the past - anyone feel like they'd be in support of reviving the notion of maintenance mode? -Christopher ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev host-maintenance-mode is definitely a thing in nova compute via the os-hosts API extension and the --maintenance parameter, the compute manager code is here [1]. The thing is the only in-tree virt driver that implements it is xenapi, and I believe when you put the host in maintenance mode it's supposed to automatically evacuate the instances to some other host, but you can't target the other host or tell the driver, from the API, which instances you want to evacuate, e.g. all, none, running only, etc. [1] http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/manager.py?id=2014.2#n3990 We should certainly make that more generic. It doesn't update the VM state, so its really only admin focused in its current form. The XenAPI logic only works when using XenServer pools with shared NFS storage, if my memory serves me correctly. Honestly, its a bit of code I have planned on removing, along with the rest of the pool support. In terms of requiring DB downtime in Nova, the current efforts are focusing on avoiding downtime all together, via expand/contract style migrations, with a little help from objects to avoid data migrations. That doesn't mean maintenance mode if not useful for other things, like an emergency patching of the hypervisor. John ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades
I’m glad to see this topic getting some focus once again. :-) From several of the administrators I talk with, when they think of putting a host into maintenance mode, the common requests I hear are: 1. Don’t schedule more VMs to the host 2. Provide an optional way to automatically migrate all (usually active) VMs off the host so that users’ workloads remain “unaffected” by the maintenance operation #1 can easily be achieved, as has been mentioned several times, by simply disabling the compute service. However, #2 involves a little more work, although certainly possible using all the operations provided by nova today (e.g., live migration, etc.). I believe these types of discussions have come up several times over the past several OpenStack releases—certainly since Grizzly (i.e., when I started watching this space). It seems that the general direction is to have the type of workflow needed for #2 outside of nova (which is certainly a valid stance). To that end, it would be fairly straightforward to build some code that logically sits on top of nova, that when entering maintenance: 1. Prevents VMs from being scheduled to the host; 2. Maintains state about the maintenance operation (e.g., not in maintenance, migrations in progress, in maintenance, or error); 3. Provides mechanisms to, upon entering maintenance, dictates which VMs (active, all, none) to migrate and provides some throttling capabilities to prevent hundreds of parallel migrations on densely packed hosts (all done via a REST API). If anyone has additional questions, comments, or would like to discuss some options, please let me know. If interested, upon request, I could even share a video of how such cases might work. :-) My colleagues and I have given these use cases a lot of thought and consideration and I’d love to talk more about them (perhaps a small session in Paris would be possible). - Joe On Oct 17, 2014, at 4:18 AM, John Garbutt j...@johngarbutt.com wrote: On 17 October 2014 02:28, Matt Riedemann mrie...@linux.vnet.ibm.com wrote: On 10/16/2014 7:26 PM, Christopher Aedo wrote: On Tue, Sep 9, 2014 at 2:19 PM, Mike Scherbakov mscherba...@mirantis.com wrote: On Tue, Sep 9, 2014 at 6:02 PM, Clint Byrum cl...@fewbar.com wrote: The idea is not simply deny or hang requests from clients, but provide them we are in maintenance mode, retry in X seconds You probably would want 'nova host-servers-migrate host' yeah for migrations - but as far as I understand, it doesn't help with disabling this host in scheduler - there is can be a chance that some workloads will be scheduled to the host. Regarding putting a compute host in maintenance mode using nova host-update --maintenance enable, it looks like the blueprint and associated commits were abandoned a year and a half ago: https://blueprints.launchpad.net/nova/+spec/host-maintenance It seems that nova service-disable host nova-compute effectively prevents the scheduler from trying to send new work there. Is this the best approach to use right now if you want to pull a compute host out of an environment before migrating VMs off? I agree with Tim and Mike that having something respond down for maintenance rather than ignore or hang would be really valuable. But it also looks like that hasn't gotten much traction in the past - anyone feel like they'd be in support of reviving the notion of maintenance mode? -Christopher ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev host-maintenance-mode is definitely a thing in nova compute via the os-hosts API extension and the --maintenance parameter, the compute manager code is here [1]. The thing is the only in-tree virt driver that implements it is xenapi, and I believe when you put the host in maintenance mode it's supposed to automatically evacuate the instances to some other host, but you can't target the other host or tell the driver, from the API, which instances you want to evacuate, e.g. all, none, running only, etc. [1] http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/manager.py?id=2014.2#n3990 We should certainly make that more generic. It doesn't update the VM state, so its really only admin focused in its current form. The XenAPI logic only works when using XenServer pools with shared NFS storage, if my memory serves me correctly. Honestly, its a bit of code I have planned on removing, along with the rest of the pool support. In terms of requiring DB downtime in Nova, the current efforts are focusing on avoiding downtime all together, via expand/contract style migrations, with a little help from objects to avoid data migrations. That doesn't mean maintenance mode if not useful for other things, like an emergency patching of the hypervisor. John
Re: [openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades
On Tue, Sep 9, 2014 at 2:19 PM, Mike Scherbakov mscherba...@mirantis.com wrote: On Tue, Sep 9, 2014 at 6:02 PM, Clint Byrum cl...@fewbar.com wrote: The idea is not simply deny or hang requests from clients, but provide them we are in maintenance mode, retry in X seconds You probably would want 'nova host-servers-migrate host' yeah for migrations - but as far as I understand, it doesn't help with disabling this host in scheduler - there is can be a chance that some workloads will be scheduled to the host. Regarding putting a compute host in maintenance mode using nova host-update --maintenance enable, it looks like the blueprint and associated commits were abandoned a year and a half ago: https://blueprints.launchpad.net/nova/+spec/host-maintenance It seems that nova service-disable host nova-compute effectively prevents the scheduler from trying to send new work there. Is this the best approach to use right now if you want to pull a compute host out of an environment before migrating VMs off? I agree with Tim and Mike that having something respond down for maintenance rather than ignore or hang would be really valuable. But it also looks like that hasn't gotten much traction in the past - anyone feel like they'd be in support of reviving the notion of maintenance mode? -Christopher ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades
It would be great if each OpenStack component could provide a maintenance mode like this… there was some work being considered on Cells https://blueprints.launchpad.net/nova/+spec/disable-child-cell-support which would have allowed parts of Nova to indicate they were in maintenance. Something generic would be very useful. Some operators have asked for ‘read-only’ modes also where query is OK but update is not permitted. Tim From: Mike Scherbakov [mailto:mscherba...@mirantis.com] Sent: 09 September 2014 23:20 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades Sergii, Clint, to rephrase what you are saying - there are might be situations when our OpenStack API will not be responding, as simply services would be down for upgrade. Do we want to support it somehow? For example, if we know that Nova is going to be down, can we respond with HTTP 503 with appropriate Retry-After time in header? The idea is not simply deny or hang requests from clients, but provide them we are in maintenance mode, retry in X seconds Turbo Hipster was added to the gate great idea, I think we should use it in Fuel too You probably would want 'nova host-servers-migrate host' yeah for migrations - but as far as I understand, it doesn't help with disabling this host in scheduler - there is can be a chance that some workloads will be scheduled to the host. On Tue, Sep 9, 2014 at 6:02 PM, Clint Byrum cl...@fewbar.commailto:cl...@fewbar.com wrote: Excerpts from Mike Scherbakov's message of 2014-09-09 00:35:09 -0700: Hi all, please see below original email below from Dmitry. I've modified the subject to bring larger audience to the issue. I'd like to split the issue into two parts: 1. Maintenance mode for OpenStack controllers in HA mode (HA-ed Keystone, Glance, etc.) 2. Maintenance mode for OpenStack computes/storage nodes (no HA) For first category, we might not need to have maintenance mode at all. For example, if we apply patching/upgrade one by one node to 3-node HA cluster, 2 nodes will serve requests normally. Is that possible for our HA solutions in Fuel, TripleO, other frameworks? You may have a broken cloud if you are pushing out an update that requires a new schema. Some services are better than others about handling old schemas, and can be upgraded before doing schema upgrades. But most of the time you have to do at least a brief downtime: * turn off DB accessing services * update code * run db migration * turn on DB accessing services It is for this very reason, I believe, that Turbo Hipster was added to the gate, so that deployers running against the upstream master branches can have a chance at performing these upgrades in a reasonable amount of time. For second category, can not we simply do nova-manage service disable..., so scheduler will simply stop scheduling new workloads on particular host which we want to do maintenance on? You probably would want 'nova host-servers-migrate host' at that point, assuming you have migration set up. http://docs.openstack.org/user-guide/content/novaclient_commands.html On Thu, Aug 28, 2014 at 6:44 PM, Dmitry Pyzhov dpyz...@mirantis.commailto:dpyz...@mirantis.com wrote: All, I'm not sure if it deserves to be mentioned in our documentation, this seems to be a common practice. If an administrator wants to patch his environment, he should be prepared for a temporary downtime of OpenStack services. And he should plan to perform patching in advance: choose a time with minimal load and warn users about possible interruptions of service availability. Our current implementation of patching does not protect from downtime during the patching procedure. HA deployments seems to be more or less stable. But it looks like it is possible to schedule an action on a compute node and get an error because of service restart. Deployments with one controller... well, you won’t be able to use your cluster until the patching is finished. There is no way to get rid of downtime here. As I understand, we can get rid of possible issues with computes in HA. But it will require migration of instances and stopping of nova-compute service before patching. And it will make the overall patching procedure much longer. Do we want to investigate this process? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Mike Scherbakov #mihgen ___ OpenStack-dev mailing list OpenStack-dev
[openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades
Hi all, please see below original email below from Dmitry. I've modified the subject to bring larger audience to the issue. I'd like to split the issue into two parts: 1. Maintenance mode for OpenStack controllers in HA mode (HA-ed Keystone, Glance, etc.) 2. Maintenance mode for OpenStack computes/storage nodes (no HA) For first category, we might not need to have maintenance mode at all. For example, if we apply patching/upgrade one by one node to 3-node HA cluster, 2 nodes will serve requests normally. Is that possible for our HA solutions in Fuel, TripleO, other frameworks? For second category, can not we simply do nova-manage service disable..., so scheduler will simply stop scheduling new workloads on particular host which we want to do maintenance on? On Thu, Aug 28, 2014 at 6:44 PM, Dmitry Pyzhov dpyz...@mirantis.com wrote: All, I'm not sure if it deserves to be mentioned in our documentation, this seems to be a common practice. If an administrator wants to patch his environment, he should be prepared for a temporary downtime of OpenStack services. And he should plan to perform patching in advance: choose a time with minimal load and warn users about possible interruptions of service availability. Our current implementation of patching does not protect from downtime during the patching procedure. HA deployments seems to be more or less stable. But it looks like it is possible to schedule an action on a compute node and get an error because of service restart. Deployments with one controller... well, you won’t be able to use your cluster until the patching is finished. There is no way to get rid of downtime here. As I understand, we can get rid of possible issues with computes in HA. But it will require migration of instances and stopping of nova-compute service before patching. And it will make the overall patching procedure much longer. Do we want to investigate this process? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Mike Scherbakov #mihgen ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades
Hi Fuelers, 1. Sometimes fuel has non reversible changes. Here are a couple of samples A new version needs to change/adjust Pacemaker primitives. Such changes affect all controllers in cluster. A old API can be deprecated or new API can be introduced. Until we all components configured to use new API, it's almost impossible to keep half of cluster with old API and half cluster with new API. 2. For computes, even if we stop services VM instances should work. I think it's possible to upgrade without downtime of VM instances. Though I am not sure if it's possible for CEPH nodes. -- Best regards, Sergii Golovatiuk, Skype #golserge IRC #holser On Tue, Sep 9, 2014 at 9:35 AM, Mike Scherbakov mscherba...@mirantis.com wrote: Hi all, please see below original email below from Dmitry. I've modified the subject to bring larger audience to the issue. I'd like to split the issue into two parts: 1. Maintenance mode for OpenStack controllers in HA mode (HA-ed Keystone, Glance, etc.) 2. Maintenance mode for OpenStack computes/storage nodes (no HA) For first category, we might not need to have maintenance mode at all. For example, if we apply patching/upgrade one by one node to 3-node HA cluster, 2 nodes will serve requests normally. Is that possible for our HA solutions in Fuel, TripleO, other frameworks? For second category, can not we simply do nova-manage service disable..., so scheduler will simply stop scheduling new workloads on particular host which we want to do maintenance on? On Thu, Aug 28, 2014 at 6:44 PM, Dmitry Pyzhov dpyz...@mirantis.com wrote: All, I'm not sure if it deserves to be mentioned in our documentation, this seems to be a common practice. If an administrator wants to patch his environment, he should be prepared for a temporary downtime of OpenStack services. And he should plan to perform patching in advance: choose a time with minimal load and warn users about possible interruptions of service availability. Our current implementation of patching does not protect from downtime during the patching procedure. HA deployments seems to be more or less stable. But it looks like it is possible to schedule an action on a compute node and get an error because of service restart. Deployments with one controller... well, you won’t be able to use your cluster until the patching is finished. There is no way to get rid of downtime here. As I understand, we can get rid of possible issues with computes in HA. But it will require migration of instances and stopping of nova-compute service before patching. And it will make the overall patching procedure much longer. Do we want to investigate this process? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Mike Scherbakov #mihgen ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades
Sergii, Clint, to rephrase what you are saying - there are might be situations when our OpenStack API will not be responding, as simply services would be down for upgrade. Do we want to support it somehow? For example, if we know that Nova is going to be down, can we respond with HTTP 503 with appropriate Retry-After time in header? The idea is not simply deny or hang requests from clients, but provide them we are in maintenance mode, retry in X seconds Turbo Hipster was added to the gate great idea, I think we should use it in Fuel too You probably would want 'nova host-servers-migrate host' yeah for migrations - but as far as I understand, it doesn't help with disabling this host in scheduler - there is can be a chance that some workloads will be scheduled to the host. On Tue, Sep 9, 2014 at 6:02 PM, Clint Byrum cl...@fewbar.com wrote: Excerpts from Mike Scherbakov's message of 2014-09-09 00:35:09 -0700: Hi all, please see below original email below from Dmitry. I've modified the subject to bring larger audience to the issue. I'd like to split the issue into two parts: 1. Maintenance mode for OpenStack controllers in HA mode (HA-ed Keystone, Glance, etc.) 2. Maintenance mode for OpenStack computes/storage nodes (no HA) For first category, we might not need to have maintenance mode at all. For example, if we apply patching/upgrade one by one node to 3-node HA cluster, 2 nodes will serve requests normally. Is that possible for our HA solutions in Fuel, TripleO, other frameworks? You may have a broken cloud if you are pushing out an update that requires a new schema. Some services are better than others about handling old schemas, and can be upgraded before doing schema upgrades. But most of the time you have to do at least a brief downtime: * turn off DB accessing services * update code * run db migration * turn on DB accessing services It is for this very reason, I believe, that Turbo Hipster was added to the gate, so that deployers running against the upstream master branches can have a chance at performing these upgrades in a reasonable amount of time. For second category, can not we simply do nova-manage service disable..., so scheduler will simply stop scheduling new workloads on particular host which we want to do maintenance on? You probably would want 'nova host-servers-migrate host' at that point, assuming you have migration set up. http://docs.openstack.org/user-guide/content/novaclient_commands.html On Thu, Aug 28, 2014 at 6:44 PM, Dmitry Pyzhov dpyz...@mirantis.com wrote: All, I'm not sure if it deserves to be mentioned in our documentation, this seems to be a common practice. If an administrator wants to patch his environment, he should be prepared for a temporary downtime of OpenStack services. And he should plan to perform patching in advance: choose a time with minimal load and warn users about possible interruptions of service availability. Our current implementation of patching does not protect from downtime during the patching procedure. HA deployments seems to be more or less stable. But it looks like it is possible to schedule an action on a compute node and get an error because of service restart. Deployments with one controller... well, you won’t be able to use your cluster until the patching is finished. There is no way to get rid of downtime here. As I understand, we can get rid of possible issues with computes in HA. But it will require migration of instances and stopping of nova-compute service before patching. And it will make the overall patching procedure much longer. Do we want to investigate this process? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Mike Scherbakov #mihgen ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev