Re: [ClusterLabs] About the pacemaker
On 10/01/19 14:53 +0100, Jan Pokorný wrote: > On 08/01/19 10:14 -0600, Ken Gaillot wrote: >> On Tue, 2019-01-08 at 15:27 +0800, T. Ladd Omar wrote: >>> I have a question, if the Pacemaker has an event-notify interface >>> which is realized by push&pull. Recently I want to do something >>> extra using other process when the resources being started or >>> deleted. So I need a way to monitor the resources events. >>> ClusterMon and alerts both use external-scripts for extra actions, >>> but in my situation, the specific process might have not being >>> started. I hope pacemaker itself could store the old events and >>> flush them for updating until the specific process starts and >>> subscribe to Pacemaker, then pull all the old events. Also the >>> Pacemaker could push to it when new events come. >> >> I would use alerts with alert_file.sh (with custom modifications if >> desired) to record them to a file, then have your process look at that. >> (Tip: if you only care about events since the last boot, put the file >> in /run so you don't have to worry about cleaning it up.) > > Based on what's been described, it sounds like asking for extended > functionality that might be served with an external "store-and-forward" > daemon. Such a daemon would also alleviate the processing complexity > in case of plentiful alert subscribers when they are used for subsequent > forwarding and/or extraction to assist decisions, since it would > conceptually detach such postprocessing from the main executive flow > fully (e.g. no sharing of the same security boundaries, cgroup, etc.; > access control would be the sole responsibility of this daemon), perhaps also rate-limiting possibly using priorities, even > and allow for durability of the events with desired parameters. > > Such a daemon could then gradually overtake the responsibility to > keep event stream subscribers updated, itself making use of a more > suitable hooking directly into pacemaker. > > That's how the future could evolve. Contributions welcome. -- Nazdar, Jan (Poki) pgp5tFyUOvvBR.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] About the pacemaker
On 08/01/19 10:14 -0600, Ken Gaillot wrote: > On Tue, 2019-01-08 at 15:27 +0800, T. Ladd Omar wrote: >> I have a question, if the Pacemaker has an event-notify interface >> which is realized by push&pull. Recently I want to do something >> extra using other process when the resources being started or >> deleted. So I need a way to monitor the resources events. >> ClusterMon and alerts both use external-scripts for extra actions, >> but in my situation, the specific process might have not being >> started. I hope pacemaker itself could store the old events and >> flush them for updating until the specific process starts and >> subscribe to Pacemaker, then pull all the old events. Also the >> Pacemaker could push to it when new events come. > > I would use alerts with alert_file.sh (with custom modifications if > desired) to record them to a file, then have your process look at that. > (Tip: if you only care about events since the last boot, put the file > in /run so you don't have to worry about cleaning it up.) Based on what's been described, it sounds like asking for extended functionality that might be served with an external "store-and-forward" daemon. Such a daemon would also alleviate the processing complexity in case of plentiful alert subscribers when they are used for subsequent forwarding and/or extraction to assist decisions, since it would conceptually detach such postprocessing from the main executive flow fully (e.g. no sharing of the same security boundaries, cgroup, etc.; access control would be the sole responsibility of this daemon), and allow for durability of the events with desired parameters. Such a daemon could then gradually overtake the responsibility to keep event stream subscribers updated, itself making use of a more suitable hooking directly into pacemaker. That's how the future could evolve. Contributions welcome. >> Above is all what I thought, maybe it is not accurate. Anyway, I >> need some advice. By the way, there is no deletion notify in >> ClusterMon and alerts, right ? > > Correct, configuration changes are not alerted. The only way I know of > to get configuration changes is to use the C API for update/replace > callbacks. It would also be possible to poll the configuration at > intervals and use crm_diff to compare them, but that's probably not any > easier. The hypothetical daemon could keep up with whatever events that get exposed internally. -- Jan (Poki) pgpAU_esR0VdY.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] About the pacemaker
On Tue, 2019-01-08 at 15:27 +0800, T. Ladd Omar wrote: > Hey guys. I have a question, if the Pacemaker has an event-notify > interface which is realized by push&pull. Recently I want to do > something extra using other process when the resources being started > or deleted. So I need a way to monitor the resources events. > ClusterMon and alerts both use external-scripts for extra actions, > but in my situation, the specific process might have not being > started. I hope pacemaker itself could store the old events and flush > them for updating until the specific process starts and subscribe to > Pacemaker, then pull all the old events. Also the Pacemaker could > push to it when new events come. I would use alerts with alert_file.sh (with custom modifications if desired) to record them to a file, then have your process look at that. (Tip: if you only care about events since the last boot, put the file in /run so you don't have to worry about cleaning it up.) > Above is all what I thought, maybe it is not accurate. Anyway, I need > some advice. > By the way, there is no deletion notify in ClusterMon and alerts, > right ? Correct, configuration changes are not alerted. The only way I know of to get configuration changes is to use the C API for update/replace callbacks. It would also be possible to poll the configuration at intervals and use crm_diff to compare them, but that's probably not any easier. -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] About the Pacemaker
On Tue, 2018-10-23 at 21:20 +0800, T. Ladd Omar wrote: > For the question one, I don't thinkstart-failure-is-fatal is good > way for me. It barely has no interval for retrying and easily leads > to flooding log output in a short time. > > T. Ladd Omar 于2018年10月23日周二 下午9:06写道: > > Hi all, I send this message to get some answers for my questions > > about Pacemaker. > > 1. In order to cleanup start-failed resources automatically, I add > > failure-timeout attribute for resources, however, the common way to > > trigger the recovery is by cluster-recheck whose interval is 15min > > by default. I wonder how lower value could I set for the cluster- > > recheck-interval. I had to let the failed resources recover > > somewhat quickly while little impact taken by the more frequent > > cluster-recheck. > > Or, is there another way to automatically cleanup start-failed > > resources ? failure-timeout with a lower cluster-recheck-interval is fine. I don't think there's ever been solid testing on what a lower bound for the interval is. I've seen users set it as low as 1 minute, but that seems low to me. My gut feeling is 5 minutes is a good trade-off. The simpler your cluster is (# nodes / # resources / features used), the lower the number could be. > > 2. Is Pacemaker suitable for the Master-Slave model HA ? I had some > > productive problems when I use Pacemaker. If only one resource > > stopped on one node, should I failover all this node for the whole > > cluster? If not, the transactions from the ports on this node may > > fail for this failure. If yes, it seems to be big action for just > > one resource failure. Definitely, master/slave operation is one of the most commonly used Pacemaker features. You have the flexibility of failing over any combination of resources you want. Look into clone resources, master/slave clones, colocation constraints, and the on-fail property of operations. -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] About the Pacemaker
For the question one, I don't think start-failure-is-fatal is good way for me. It barely has no interval for retrying and easily leads to flooding log output in a short time. T. Ladd Omar 于2018年10月23日周二 下午9:06写道: > Hi all, I send this message to get some answers for my questions about > Pacemaker. > 1. In order to cleanup start-failed resources automatically, I add > failure-timeout attribute for resources, however, the common way to trigger > the recovery is by cluster-recheck whose interval is 15min by default. I > wonder how lower value could I set for the cluster-recheck-interval. I had > to let the failed resources recover somewhat quickly while little impact > taken by the more frequent cluster-recheck. > Or, is there another way to automatically cleanup start-failed resources ? > 2. Is Pacemaker suitable for the Master-Slave model HA ? I had some > productive problems when I use Pacemaker. If only one resource stopped on > one node, should I failover all this node for the whole cluster? If not, > the transactions from the ports on this node may fail for this failure. If > yes, it seems to be big action for just one resource failure. > ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org