Re: [ClusterLabs] About the pacemaker

2019-01-10 Thread Jan Pokorný
On 10/01/19 14:53 +0100, Jan Pokorný wrote:
> On 08/01/19 10:14 -0600, Ken Gaillot wrote:
>> On Tue, 2019-01-08 at 15:27 +0800, T. Ladd Omar wrote:
>>> I have a question, if the Pacemaker has an event-notify interface
>>> which is realized by push&pull. Recently I want to do something
>>> extra using other process when the resources being started or
>>> deleted. So I need a way to monitor the resources events.
>>> ClusterMon and alerts both use external-scripts for extra actions,
>>> but in my situation, the specific process might have not being
>>> started. I hope pacemaker itself could store the old events and
>>> flush them for updating until the specific process starts and
>>> subscribe to Pacemaker, then pull all the old events. Also the
>>> Pacemaker could push to it when new events come.
>> 
>> I would use alerts with alert_file.sh (with custom modifications if
>> desired) to record them to a file, then have your process look at that.
>> (Tip: if you only care about events since the last boot, put the file
>> in /run so you don't have to worry about cleaning it up.)
> 
> Based on what's been described, it sounds like asking for extended
> functionality that might be served with an external "store-and-forward"
> daemon.  Such a daemon would also alleviate the processing complexity
> in case of plentiful alert subscribers when they are used for subsequent
> forwarding and/or extraction to assist decisions, since it would
> conceptually detach such postprocessing from the main executive flow
> fully (e.g. no sharing of the same security boundaries, cgroup, etc.;
> access control would be the sole responsibility of this daemon),

perhaps also rate-limiting possibly using priorities, even

> and allow for durability of the events with desired parameters.  
> 
> Such a daemon could then gradually overtake the responsibility to
> keep event stream subscribers updated, itself making use of a more
> suitable hooking directly into pacemaker.
> 
> That's how the future could evolve.  Contributions welcome.

-- 
Nazdar,
Jan (Poki)


pgp5tFyUOvvBR.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] About the pacemaker

2019-01-10 Thread Jan Pokorný
On 08/01/19 10:14 -0600, Ken Gaillot wrote:
> On Tue, 2019-01-08 at 15:27 +0800, T. Ladd Omar wrote:
>> I have a question, if the Pacemaker has an event-notify interface
>> which is realized by push&pull. Recently I want to do something
>> extra using other process when the resources being started or
>> deleted. So I need a way to monitor the resources events.
>> ClusterMon and alerts both use external-scripts for extra actions,
>> but in my situation, the specific process might have not being
>> started. I hope pacemaker itself could store the old events and
>> flush them for updating until the specific process starts and
>> subscribe to Pacemaker, then pull all the old events. Also the
>> Pacemaker could push to it when new events come.
> 
> I would use alerts with alert_file.sh (with custom modifications if
> desired) to record them to a file, then have your process look at that.
> (Tip: if you only care about events since the last boot, put the file
> in /run so you don't have to worry about cleaning it up.)

Based on what's been described, it sounds like asking for extended
functionality that might be served with an external "store-and-forward"
daemon.  Such a daemon would also alleviate the processing complexity
in case of plentiful alert subscribers when they are used for subsequent
forwarding and/or extraction to assist decisions, since it would
conceptually detach such postprocessing from the main executive flow
fully (e.g. no sharing of the same security boundaries, cgroup, etc.;
access control would be the sole responsibility of this daemon),
and allow for durability of the events with desired parameters.  

Such a daemon could then gradually overtake the responsibility to
keep event stream subscribers updated, itself making use of a more
suitable hooking directly into pacemaker.

That's how the future could evolve.  Contributions welcome.

>> Above is all what I thought, maybe it is not accurate. Anyway, I
>> need some advice.  By the way, there is no deletion notify in
>> ClusterMon and alerts, right ?
> 
> Correct, configuration changes are not alerted. The only way I know of
> to get configuration changes is to use the C API for update/replace
> callbacks. It would also be possible to poll the configuration at
> intervals and use crm_diff to compare them, but that's probably not any
> easier.

The hypothetical daemon could keep up with whatever events that
get exposed internally.

-- 
Jan (Poki)


pgpAU_esR0VdY.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] About the pacemaker

2019-01-08 Thread Ken Gaillot
On Tue, 2019-01-08 at 15:27 +0800, T. Ladd Omar wrote:
> Hey guys. I have a question, if the Pacemaker has an event-notify
> interface which is realized by push&pull. Recently I want to do
> something extra using other process when the resources being started
> or deleted. So I need a way to monitor the resources events.
> ClusterMon and alerts both use external-scripts for extra actions,
> but in my situation, the specific process might have not being
> started. I hope pacemaker itself could store the old events and flush
> them for updating until the specific process starts and subscribe to
> Pacemaker, then pull all the old events. Also the Pacemaker could
> push to it when new events come.

I would use alerts with alert_file.sh (with custom modifications if
desired) to record them to a file, then have your process look at that.
(Tip: if you only care about events since the last boot, put the file
in /run so you don't have to worry about cleaning it up.)

> Above is all what I thought, maybe it is not accurate. Anyway, I need
> some advice.
> By the way, there is no deletion notify in ClusterMon and alerts,
> right ?

Correct, configuration changes are not alerted. The only way I know of
to get configuration changes is to use the C API for update/replace
callbacks. It would also be possible to poll the configuration at
intervals and use crm_diff to compare them, but that's probably not any
easier.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] About the Pacemaker

2018-10-23 Thread Ken Gaillot
On Tue, 2018-10-23 at 21:20 +0800, T. Ladd Omar wrote:
> For the question one, I don't thinkstart-failure-is-fatal is good
> way for me. It barely has no interval for retrying and easily leads
> to flooding log output in a short time.
> 
> T. Ladd Omar  于2018年10月23日周二 下午9:06写道:
> > Hi all, I send this message to get some answers for my questions
> > about Pacemaker.
> > 1. In order to cleanup start-failed resources automatically, I add
> > failure-timeout attribute for resources, however, the common way to
> > trigger the recovery is by cluster-recheck whose interval is 15min
> > by default. I wonder how lower value could I set for the cluster-
> > recheck-interval. I had to let the failed resources recover
> > somewhat quickly while little impact taken by the more frequent
> > cluster-recheck.
> > Or, is there another way to automatically cleanup start-failed
> > resources ?

failure-timeout with a lower cluster-recheck-interval is fine. I don't
think there's ever been solid testing on what a lower bound for the
interval is. I've seen users set it as low as 1 minute, but that seems
low to me. My gut feeling is 5 minutes is a good trade-off. The simpler
your cluster is (# nodes / # resources / features used), the lower the
number could be.

> > 2. Is Pacemaker suitable for the Master-Slave model HA ? I had some
> > productive problems when I use Pacemaker. If only one resource
> > stopped on one node, should I failover all this node for the whole
> > cluster? If not, the transactions from the ports on this node may
> > fail for this failure. If yes, it seems to be big action for just
> > one resource failure.

Definitely, master/slave operation is one of the most commonly used
Pacemaker features. You have the flexibility of failing over any
combination of resources you want. Look into clone resources,
master/slave clones, colocation constraints, and the on-fail property
of operations.
-- 
Ken Gaillot 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] About the Pacemaker

2018-10-23 Thread T. Ladd Omar
For the question one, I don't think   start-failure-is-fatal is good way
for me. It barely has no interval for retrying and easily leads to flooding
log output in a short time.

T. Ladd Omar  于2018年10月23日周二 下午9:06写道:

> Hi all, I send this message to get some answers for my questions about
> Pacemaker.
> 1. In order to cleanup start-failed resources automatically, I add
> failure-timeout attribute for resources, however, the common way to trigger
> the recovery is by cluster-recheck whose interval is 15min by default. I
> wonder how lower value could I set for the cluster-recheck-interval. I had
> to let the failed resources recover somewhat quickly while little impact
> taken by the more frequent cluster-recheck.
> Or, is there another way to automatically cleanup start-failed resources ?
> 2. Is Pacemaker suitable for the Master-Slave model HA ? I had some
> productive problems when I use Pacemaker. If only one resource stopped on
> one node, should I failover all this node for the whole cluster? If not,
> the transactions from the ports on this node may fail for this failure. If
> yes, it seems to be big action for just one resource failure.
>
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org