Re: [ClusterLabs] Doing reload right

2016-07-25 Thread Andrew Beekhof
On Sat, Jul 23, 2016 at 7:10 AM, Ken Gaillot  wrote:
> On 07/21/2016 07:46 PM, Andrew Beekhof wrote:
 What do you mean by native restart action? Systemd restart?
>>
>> Whatever the agent supports.
>
> Are you suggesting that pacemaker starting checking whether the agent
> metadata advertises a "restart" action? Or just assume that certain
> resource classes support restart (e.g. systemd) and others don't (e.g. ocf)?

No, I'm suggesting the crm_resource cli start checking... not the same thing

>

> 3. re-enables the recurring monitor operations regardless of whether
> the reload succeeds, fails, or times out, etc
>
> No maintenance mode required, and whatever state the resource ends up
> in is re-detected by the cluster in step 3.

 If you're lucky :-)

 The cluster may still mess with the resource even without monitors, e.g.
 a dependency fails or a preferred node comes online.
>>
>> Can you explain how neither of those results in a restart of the service?
>
> Unless the resource is unmanaged, the cluster could do something like
> move it to a different node, disrupting the local force-restart.

But the next time it starts there, it will come up with the new configuration.
Achieving the desired affect.

This is no different to using maintenance-mode and the cluster moving
or stopping it immediately after is it disabled again.
Either way, the resource is no-longer running with the old
configuration at the end of the call.

>
> Ideally, we'd be able to disable monitors and unmanage the resource for
> the duration of the force-restart, but only on the local node.
>
 Maintenance
 mode/unmanaging would still be safer (though no --force-* option is
 completely safe, besides check).
>>>
>>> I'm happy with whatever you gurus come up with ;-)  I'm just hoping
>>> that it can be made possible to pinpoint an individual resource on an
>>> individual node, rather than having to toggle maintenance flag(s)
>>> across a whole set of clones, or a whole node.
>>
>> Yep.
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Doing reload right

2016-07-21 Thread Adam Spiers
Ken Gaillot  wrote:
> On 07/20/2016 07:32 PM, Andrew Beekhof wrote:
> > On Thu, Jul 21, 2016 at 2:47 AM, Adam Spiers  wrote:
> >> Ken Gaillot  wrote:
> >>> Hello all,
> >>>
> >>> I've been meaning to address the implementation of "reload" in Pacemaker
> >>> for a while now, and I think the next release will be a good time, as it
> >>> seems to be coming up more frequently.
> >>
> >> [snipped]
> >>
> >> I don't want to comment directly on any of the excellent points which
> >> have been raised in this thread, but it seems like a good time to make
> >> a plea for easier reload / restart of individual instances of cloned
> >> services, one node at a time.  Currently, if nodes are all managed by
> >> a configuration management system (such as Chef in our case),
> > 
> > Puppet creates the same kinds of issues.
> > Both seem designed for a magical world full of unrelated servers that
> > require no co-ordination to update.
> > Particularly when the timing of an update to some central store (cib,
> > database, whatever) needs to be carefully ordered.
> > 
> > When you say "restart" though, is that a traditional stop/start cycle
> > in Pacemaker that also results in all the dependancies being stopped
> > too?

No, just the service reload or restart without causing any cascading
effects in Pacemaker.

> > I'm guessing you really want the "atomic reload" kind where nothing
> > else is affected because we already have the other style covered by
> > crm_resource --restart.
> 
> crm_resource --restart isn't sufficient for his use case because it
> affects all clone instances cluster-wide, whereas he needs to reload or
> restart (depending on the service) the local instance only.

Exactly.

> > I propose that we introduce a --force-restart option for crm_resource which:
> > 
> > 1. disables any recurring monitor operations
> 
> None of the other --force-* options disable monitors, so for
> consistency, I think we should leave this to the user (or add it for
> other --force-*).
>
> > 2. calls a native restart action directly on the resource if it
> > exists, otherwise calls the native stop+start actions
> 
> What do you mean by native restart action? Systemd restart?
> 
> > 3. re-enables the recurring monitor operations regardless of whether
> > the reload succeeds, fails, or times out, etc
> > 
> > No maintenance mode required, and whatever state the resource ends up
> > in is re-detected by the cluster in step 3.
> 
> If you're lucky :-)
> 
> The cluster may still mess with the resource even without monitors, e.g.
> a dependency fails or a preferred node comes online. Maintenance
> mode/unmanaging would still be safer (though no --force-* option is
> completely safe, besides check).

I'm happy with whatever you gurus come up with ;-)  I'm just hoping
that it can be made possible to pinpoint an individual resource on an
individual node, rather than having to toggle maintenance flag(s)
across a whole set of clones, or a whole node.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Doing reload right

2016-07-20 Thread Ken Gaillot
On 07/20/2016 11:47 AM, Adam Spiers wrote:
> Ken Gaillot  wrote:
>> Hello all,
>>
>> I've been meaning to address the implementation of "reload" in Pacemaker
>> for a while now, and I think the next release will be a good time, as it
>> seems to be coming up more frequently.
> 
> [snipped]
> 
> I don't want to comment directly on any of the excellent points which
> have been raised in this thread, but it seems like a good time to make
> a plea for easier reload / restart of individual instances of cloned
> services, one node at a time.  Currently, if nodes are all managed by
> a configuration management system (such as Chef in our case), when the
> system wants to perform a configuration run on that node (e.g. when
> updating a service's configuration file from a template), it is
> necessary to place the entire node in maintenance mode before
> reloading or restarting that service on that node.  It works OK, but
> can result in ugly effects such as the node getting stuck in
> maintenance mode if the chef-client run failed, without any easy way
> to track down the original cause.
> 
> I went through several design iterations before settling on this
> approach, and they are detailed in a lengthy comment here, which may
> help you better understand the challenges we encountered:
> 
>   
> https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/providers/service.rb#L61

Wow, that is a lot of hard-earned wisdom. :-)

I don't think the problem is restarting individual clone instances. You
can already restart an individual clone instance, by unmanaging the
resource and disabling any monitors on it, then using crm_resource
--force-* on the desired node.

The problem (for your use case) is that is-managed is cluster-wide for
the given resource. I suspect coming up with a per-node
interface/implementation for is-managed would be difficult.

If we implement --force-reload, there won't be a problem with reloads,
since unmanaging shouldn't be necessary.

FYI, maintenance mode is supported for Pacemaker Remote nodes as of 1.1.13.

> Similar challenges are posed during upgrade of Pacemaker-managed
> OpenStack infrastructure.
> 
> Cheers,
> Adam
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Doing reload right

2016-07-08 Thread Ken Gaillot
On 07/04/2016 07:13 AM, Ferenc Wágner wrote:
> Ken Gaillot  writes:
> 
>> Does anyone know of an RA that uses reload correctly?
> 
> My resource agents advertise a no-op reload action for handling their
> "private" meta attributes.  Meta in the sense that they are used by the
> resource agent when performing certain operations, not by the managed
> resource itself.  Which means they are trivially changeable online,
> without any resource operation whatsoever.
> 
>> Does anyone object to the (backward-incompatible) solution proposed
>> here?
> 
> I'm all for cleanups, but please keep an online migration path around.

Not sure what you mean by online ... the behavior would change when
Pacemaker was upgraded, so the node would already be out of the cluster
at that point. You would unmanage resources if desired, stop pacemaker
on the node, upgrade pacemaker, upgrade the RA, then start/manage again.

If you mean that you would like to use the same RA before and after the
upgrade, that would be doable. We could bump the crm feature set, which
gets passed to the RA as an environment variable. You could modify the
RA to handle both reload and reload-params, and if it's asked to reload,
check the feature set to decide which type of reload to do. You could
upgrade the RA anytime before the pacemaker upgrade.

In pseudo-code, the recommended way of supporting reload would become:

  reload_params() { ... }
  reload_service() { ... }

  if action is "reload-params" then
 reload_params()
  else if action is "reload"
 if crm_feature_set < X.Y.Z then
reload_params()
 else
reload_service()


Handling both "unique" and "reloadable" would be more complicated, but
that's inherent in the mismash of meaning unique has right now. I see
three approaches:

1. Use "unique" in its GUI sense and "reloadable" to indicate reloadable
parameters. This would be cleanest, but would not be useful with
pre-"reloadable" pacemaker.

2. Use both unique=0 and reloadable=1 to indicate reloadable parameters.
This sacrifices proper GUI hinting to keep compatibility with pre- and
post-"reloadable" pacemaker (the same sacrifice that has to be made now
to use reload correctly).

3. Dynamically modify the metadata according to the crm feature set,
using approach 1 with post-"reloadable" pacemaker and approach 2 with
pre-"reloadable" pacemaker. This is the most flexible but makes the code
more complicated. In pseudocode, it might look something like:

   if crm_feature_set < X.Y.Z then
  UNIQUE_TRUE=""
  UNIQUE_FALSE=""
  RELOADABLE_TRUE="unique=0"
  RELOADABLE_FALSE="unique=1"
   else
  UNIQUE_TRUE="unique=1"
  UNIQUE_FALSE="unique=0"
  RELOADABLE_TRUE="reloadable=1"
  RELOADABLE_FALSE="reloadable=0"

   meta_data() {
  ...
  
  ...
  
   }

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Doing reload right

2016-07-04 Thread Ferenc Wágner
Ken Gaillot  writes:

> Does anyone know of an RA that uses reload correctly?

My resource agents advertise a no-op reload action for handling their
"private" meta attributes.  Meta in the sense that they are used by the
resource agent when performing certain operations, not by the managed
resource itself.  Which means they are trivially changeable online,
without any resource operation whatsoever.

> Does anyone object to the (backward-incompatible) solution proposed
> here?

I'm all for cleanups, but please keep an online migration path around.
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Doing reload right

2016-06-30 Thread Ken Gaillot
Hello all,

I've been meaning to address the implementation of "reload" in Pacemaker
for a while now, and I think the next release will be a good time, as it
seems to be coming up more frequently.

In the current implementation, Pacemaker considers a resource parameter
"reloadable" if the resource agent supports the "reload" action, and the
agent's metadata marks the parameter with "unique=0". If (only) such
parameters get changed in the resource's pacemaker configuration,
pacemaker will call the agent's reload action rather than the
stop-then-start it usually does for parameter changes.

This is completely broken for two reasons:

1. It relies on "unique=0" to determine reloadability. "unique" was
originally intended (and is widely used by existing resource agents) as
a hint to UIs to indicate which parameters uniquely determine a resource
instance. That is, two resource instances should never have the same
value of a "unique" parameter. For this purpose, it makes perfect sense
that (for example) the path to a binary command would have unique=0 --
multiple resource instances could (and likely would) use the same
binary. However, such a parameter could never be reloadable.

2. Every known resource agent that implements a reload action does so
incorrectly. Pacemaker uses reload for changes in the resource's
*pacemaker* configuration, while all known RAs use reload for a
service's native reload capability of its own configuration file. As an
example, the ocf:heartbeat:named RA calls "rndc reload" for its reload
action, which will have zero effect on any pacemaker-configured
parameters -- and on top of that, the RA uses "unique=0" in its correct
UI sense, and none of those parameters are actually reloadable.

My proposed solution is:

* Add a new "reloadable" attribute for resource agent metadata, to
indicate reloadable parameters. Pacemaker would use this instead of
"unique".

* Add a new "reload-options" RA action for the ability to reload
Pacemaker-configured options. Pacemaker would call this instead if "reload".

* Formalize that "reload" means reload the service's own configuration,
legitimizing the most common existing RA implementations. (Pacemaker
itself will not use this, but tools such as crm_resource might.)

* Review all ocf:pacemaker and ocf:heartbeat agents to make sure they
use unique, reloadable, reload, and reload-options properly.

The downside is that this breaks backward compatibility. Any RA that
actually implements unique and reload so that reload works will lose
reload capability until it is updated to the new style.

While we usually go to great lengths to preserve backward compatibility,
I think it is OK to break it in this case, because most RAs that
implement reload do so wrongly: some implement it as a service reload, a
few advertise reload but don't actually implement it, and others map
reload to start, which might theoretically work in some cases (I'm not
familiar enough with iSCSILogicalUnit and iSCSITarget to be sure), but
typically won't, as the previous service options are not reverted (for
example, I think Route would incorrectly leave the old route in the old
table).

So, I think breaking backward compatibility is actually a good thing
here, since the most reload can do with existing RAs is trigger bad
behavior.

The opposing view would be that we shouldn't punish any RA writer who
implemented this correctly. However, there's no solution that preserves
backward compatibility with both UI usage of unique and reload usage of
unique. Plus, the worst that would happen is that the RA would stop
being reloadable -- not as bad as the current possibilities from
mis-implemented reload.

My questions are:

Does anyone know of an RA that uses reload correctly? Dummy doesn't
count ;-)

Does anyone object to the (backward-incompatible) solution proposed here?
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org