On 07/20/2016 07:32 PM, Andrew Beekhof wrote: > On Thu, Jul 21, 2016 at 2:47 AM, Adam Spiers <[email protected]> wrote: >> Ken Gaillot <[email protected]> wrote: >>> Hello all, >>> >>> I've been meaning to address the implementation of "reload" in Pacemaker >>> for a while now, and I think the next release will be a good time, as it >>> seems to be coming up more frequently. >> >> [snipped] >> >> I don't want to comment directly on any of the excellent points which >> have been raised in this thread, but it seems like a good time to make >> a plea for easier reload / restart of individual instances of cloned >> services, one node at a time. Currently, if nodes are all managed by >> a configuration management system (such as Chef in our case), > > Puppet creates the same kinds of issues. > Both seem designed for a magical world full of unrelated servers that > require no co-ordination to update. > Particularly when the timing of an update to some central store (cib, > database, whatever) needs to be carefully ordered. > > When you say "restart" though, is that a traditional stop/start cycle > in Pacemaker that also results in all the dependancies being stopped > too? > I'm guessing you really want the "atomic reload" kind where nothing > else is affected because we already have the other style covered by > crm_resource --restart.
crm_resource --restart isn't sufficient for his use case because it affects all clone instances cluster-wide, whereas he needs to reload or restart (depending on the service) the local instance only. > > I propose that we introduce a --force-restart option for crm_resource which: > > 1. disables any recurring monitor operations None of the other --force-* options disable monitors, so for consistency, I think we should leave this to the user (or add it for other --force-*). > 2. calls a native restart action directly on the resource if it > exists, otherwise calls the native stop+start actions What do you mean by native restart action? Systemd restart? > 3. re-enables the recurring monitor operations regardless of whether > the reload succeeds, fails, or times out, etc > > No maintenance mode required, and whatever state the resource ends up > in is re-detected by the cluster in step 3. If you're lucky :-) The cluster may still mess with the resource even without monitors, e.g. a dependency fails or a preferred node comes online. Maintenance mode/unmanaging would still be safer (though no --force-* option is completely safe, besides check). >> when the >> system wants to perform a configuration run on that node (e.g. when >> updating a service's configuration file from a template), it is >> necessary to place the entire node in maintenance mode before >> reloading or restarting that service on that node. It works OK, but >> can result in ugly effects such as the node getting stuck in >> maintenance mode if the chef-client run failed, without any easy way >> to track down the original cause. >> >> I went through several design iterations before settling on this >> approach, and they are detailed in a lengthy comment here, which may >> help you better understand the challenges we encountered: >> >> >> https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/providers/service.rb#L61 >> >> Similar challenges are posed during upgrade of Pacemaker-managed >> OpenStack infrastructure. >> >> Cheers, >> Adam _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
