Re: [ClusterLabs] Doing reload right
On Sat, Jul 23, 2016 at 7:10 AM, Ken Gaillotwrote: > On 07/21/2016 07:46 PM, Andrew Beekhof wrote: What do you mean by native restart action? Systemd restart? >> >> Whatever the agent supports. > > Are you suggesting that pacemaker starting checking whether the agent > metadata advertises a "restart" action? Or just assume that certain > resource classes support restart (e.g. systemd) and others don't (e.g. ocf)? No, I'm suggesting the crm_resource cli start checking... not the same thing > > 3. re-enables the recurring monitor operations regardless of whether > the reload succeeds, fails, or times out, etc > > No maintenance mode required, and whatever state the resource ends up > in is re-detected by the cluster in step 3. If you're lucky :-) The cluster may still mess with the resource even without monitors, e.g. a dependency fails or a preferred node comes online. >> >> Can you explain how neither of those results in a restart of the service? > > Unless the resource is unmanaged, the cluster could do something like > move it to a different node, disrupting the local force-restart. But the next time it starts there, it will come up with the new configuration. Achieving the desired affect. This is no different to using maintenance-mode and the cluster moving or stopping it immediately after is it disabled again. Either way, the resource is no-longer running with the old configuration at the end of the call. > > Ideally, we'd be able to disable monitors and unmanage the resource for > the duration of the force-restart, but only on the local node. > Maintenance mode/unmanaging would still be safer (though no --force-* option is completely safe, besides check). >>> >>> I'm happy with whatever you gurus come up with ;-) I'm just hoping >>> that it can be made possible to pinpoint an individual resource on an >>> individual node, rather than having to toggle maintenance flag(s) >>> across a whole set of clones, or a whole node. >> >> Yep. > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Doing reload right
Ken Gaillotwrote: > On 07/20/2016 07:32 PM, Andrew Beekhof wrote: > > On Thu, Jul 21, 2016 at 2:47 AM, Adam Spiers wrote: > >> Ken Gaillot wrote: > >>> Hello all, > >>> > >>> I've been meaning to address the implementation of "reload" in Pacemaker > >>> for a while now, and I think the next release will be a good time, as it > >>> seems to be coming up more frequently. > >> > >> [snipped] > >> > >> I don't want to comment directly on any of the excellent points which > >> have been raised in this thread, but it seems like a good time to make > >> a plea for easier reload / restart of individual instances of cloned > >> services, one node at a time. Currently, if nodes are all managed by > >> a configuration management system (such as Chef in our case), > > > > Puppet creates the same kinds of issues. > > Both seem designed for a magical world full of unrelated servers that > > require no co-ordination to update. > > Particularly when the timing of an update to some central store (cib, > > database, whatever) needs to be carefully ordered. > > > > When you say "restart" though, is that a traditional stop/start cycle > > in Pacemaker that also results in all the dependancies being stopped > > too? No, just the service reload or restart without causing any cascading effects in Pacemaker. > > I'm guessing you really want the "atomic reload" kind where nothing > > else is affected because we already have the other style covered by > > crm_resource --restart. > > crm_resource --restart isn't sufficient for his use case because it > affects all clone instances cluster-wide, whereas he needs to reload or > restart (depending on the service) the local instance only. Exactly. > > I propose that we introduce a --force-restart option for crm_resource which: > > > > 1. disables any recurring monitor operations > > None of the other --force-* options disable monitors, so for > consistency, I think we should leave this to the user (or add it for > other --force-*). > > > 2. calls a native restart action directly on the resource if it > > exists, otherwise calls the native stop+start actions > > What do you mean by native restart action? Systemd restart? > > > 3. re-enables the recurring monitor operations regardless of whether > > the reload succeeds, fails, or times out, etc > > > > No maintenance mode required, and whatever state the resource ends up > > in is re-detected by the cluster in step 3. > > If you're lucky :-) > > The cluster may still mess with the resource even without monitors, e.g. > a dependency fails or a preferred node comes online. Maintenance > mode/unmanaging would still be safer (though no --force-* option is > completely safe, besides check). I'm happy with whatever you gurus come up with ;-) I'm just hoping that it can be made possible to pinpoint an individual resource on an individual node, rather than having to toggle maintenance flag(s) across a whole set of clones, or a whole node. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Doing reload right
On 07/20/2016 11:47 AM, Adam Spiers wrote: > Ken Gaillotwrote: >> Hello all, >> >> I've been meaning to address the implementation of "reload" in Pacemaker >> for a while now, and I think the next release will be a good time, as it >> seems to be coming up more frequently. > > [snipped] > > I don't want to comment directly on any of the excellent points which > have been raised in this thread, but it seems like a good time to make > a plea for easier reload / restart of individual instances of cloned > services, one node at a time. Currently, if nodes are all managed by > a configuration management system (such as Chef in our case), when the > system wants to perform a configuration run on that node (e.g. when > updating a service's configuration file from a template), it is > necessary to place the entire node in maintenance mode before > reloading or restarting that service on that node. It works OK, but > can result in ugly effects such as the node getting stuck in > maintenance mode if the chef-client run failed, without any easy way > to track down the original cause. > > I went through several design iterations before settling on this > approach, and they are detailed in a lengthy comment here, which may > help you better understand the challenges we encountered: > > > https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/providers/service.rb#L61 Wow, that is a lot of hard-earned wisdom. :-) I don't think the problem is restarting individual clone instances. You can already restart an individual clone instance, by unmanaging the resource and disabling any monitors on it, then using crm_resource --force-* on the desired node. The problem (for your use case) is that is-managed is cluster-wide for the given resource. I suspect coming up with a per-node interface/implementation for is-managed would be difficult. If we implement --force-reload, there won't be a problem with reloads, since unmanaging shouldn't be necessary. FYI, maintenance mode is supported for Pacemaker Remote nodes as of 1.1.13. > Similar challenges are posed during upgrade of Pacemaker-managed > OpenStack infrastructure. > > Cheers, > Adam > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Doing reload right
On 07/04/2016 07:13 AM, Ferenc Wágner wrote: > Ken Gaillotwrites: > >> Does anyone know of an RA that uses reload correctly? > > My resource agents advertise a no-op reload action for handling their > "private" meta attributes. Meta in the sense that they are used by the > resource agent when performing certain operations, not by the managed > resource itself. Which means they are trivially changeable online, > without any resource operation whatsoever. > >> Does anyone object to the (backward-incompatible) solution proposed >> here? > > I'm all for cleanups, but please keep an online migration path around. Not sure what you mean by online ... the behavior would change when Pacemaker was upgraded, so the node would already be out of the cluster at that point. You would unmanage resources if desired, stop pacemaker on the node, upgrade pacemaker, upgrade the RA, then start/manage again. If you mean that you would like to use the same RA before and after the upgrade, that would be doable. We could bump the crm feature set, which gets passed to the RA as an environment variable. You could modify the RA to handle both reload and reload-params, and if it's asked to reload, check the feature set to decide which type of reload to do. You could upgrade the RA anytime before the pacemaker upgrade. In pseudo-code, the recommended way of supporting reload would become: reload_params() { ... } reload_service() { ... } if action is "reload-params" then reload_params() else if action is "reload" if crm_feature_set < X.Y.Z then reload_params() else reload_service() Handling both "unique" and "reloadable" would be more complicated, but that's inherent in the mismash of meaning unique has right now. I see three approaches: 1. Use "unique" in its GUI sense and "reloadable" to indicate reloadable parameters. This would be cleanest, but would not be useful with pre-"reloadable" pacemaker. 2. Use both unique=0 and reloadable=1 to indicate reloadable parameters. This sacrifices proper GUI hinting to keep compatibility with pre- and post-"reloadable" pacemaker (the same sacrifice that has to be made now to use reload correctly). 3. Dynamically modify the metadata according to the crm feature set, using approach 1 with post-"reloadable" pacemaker and approach 2 with pre-"reloadable" pacemaker. This is the most flexible but makes the code more complicated. In pseudocode, it might look something like: if crm_feature_set < X.Y.Z then UNIQUE_TRUE="" UNIQUE_FALSE="" RELOADABLE_TRUE="unique=0" RELOADABLE_FALSE="unique=1" else UNIQUE_TRUE="unique=1" UNIQUE_FALSE="unique=0" RELOADABLE_TRUE="reloadable=1" RELOADABLE_FALSE="reloadable=0" meta_data() { ... ... } ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Doing reload right
Ken Gaillotwrites: > Does anyone know of an RA that uses reload correctly? My resource agents advertise a no-op reload action for handling their "private" meta attributes. Meta in the sense that they are used by the resource agent when performing certain operations, not by the managed resource itself. Which means they are trivially changeable online, without any resource operation whatsoever. > Does anyone object to the (backward-incompatible) solution proposed > here? I'm all for cleanups, but please keep an online migration path around. -- Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Doing reload right
Hello all, I've been meaning to address the implementation of "reload" in Pacemaker for a while now, and I think the next release will be a good time, as it seems to be coming up more frequently. In the current implementation, Pacemaker considers a resource parameter "reloadable" if the resource agent supports the "reload" action, and the agent's metadata marks the parameter with "unique=0". If (only) such parameters get changed in the resource's pacemaker configuration, pacemaker will call the agent's reload action rather than the stop-then-start it usually does for parameter changes. This is completely broken for two reasons: 1. It relies on "unique=0" to determine reloadability. "unique" was originally intended (and is widely used by existing resource agents) as a hint to UIs to indicate which parameters uniquely determine a resource instance. That is, two resource instances should never have the same value of a "unique" parameter. For this purpose, it makes perfect sense that (for example) the path to a binary command would have unique=0 -- multiple resource instances could (and likely would) use the same binary. However, such a parameter could never be reloadable. 2. Every known resource agent that implements a reload action does so incorrectly. Pacemaker uses reload for changes in the resource's *pacemaker* configuration, while all known RAs use reload for a service's native reload capability of its own configuration file. As an example, the ocf:heartbeat:named RA calls "rndc reload" for its reload action, which will have zero effect on any pacemaker-configured parameters -- and on top of that, the RA uses "unique=0" in its correct UI sense, and none of those parameters are actually reloadable. My proposed solution is: * Add a new "reloadable" attribute for resource agent metadata, to indicate reloadable parameters. Pacemaker would use this instead of "unique". * Add a new "reload-options" RA action for the ability to reload Pacemaker-configured options. Pacemaker would call this instead if "reload". * Formalize that "reload" means reload the service's own configuration, legitimizing the most common existing RA implementations. (Pacemaker itself will not use this, but tools such as crm_resource might.) * Review all ocf:pacemaker and ocf:heartbeat agents to make sure they use unique, reloadable, reload, and reload-options properly. The downside is that this breaks backward compatibility. Any RA that actually implements unique and reload so that reload works will lose reload capability until it is updated to the new style. While we usually go to great lengths to preserve backward compatibility, I think it is OK to break it in this case, because most RAs that implement reload do so wrongly: some implement it as a service reload, a few advertise reload but don't actually implement it, and others map reload to start, which might theoretically work in some cases (I'm not familiar enough with iSCSILogicalUnit and iSCSITarget to be sure), but typically won't, as the previous service options are not reverted (for example, I think Route would incorrectly leave the old route in the old table). So, I think breaking backward compatibility is actually a good thing here, since the most reload can do with existing RAs is trigger bad behavior. The opposing view would be that we shouldn't punish any RA writer who implemented this correctly. However, there's no solution that preserves backward compatibility with both UI usage of unique and reload usage of unique. Plus, the worst that would happen is that the RA would stop being reloadable -- not as bad as the current possibilities from mis-implemented reload. My questions are: Does anyone know of an RA that uses reload correctly? Dummy doesn't count ;-) Does anyone object to the (backward-incompatible) solution proposed here? -- Ken Gaillot___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org