[ClusterLabs] Pacemaker 1.1.15 - Release Candidate 1
ClusterLabs is happy to announce the first release candidate for Pacemaker version 1.1.15. Source code is available at: https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.15-rc1 The most significant enhancements in this release are: * A new "alerts" section of the CIB allows you to configure scripts that will be called after significant cluster events. (For details, see the recent "Coming in 1.1.15: Event-driven alerts" thread on this mailing list.) * A new pcmk_action_limit option for fence devices allows multiple fence actions to be executed concurrently. It defaults to 1 to preserve existing behavior (i.e. serial execution of fence actions). * Pacemaker Remote support has been improved. Most noticeably, if pacemaker_remote is stopped without disabling the remote resource first, any resources will be moved off the node (previously, the node would get fenced). This allows easier software updates on remote nodes, since updates often involve restarting the daemon. * You may notice some files have moved from the pacemaker package to pacemaker-cli, including most ocf:pacemaker resource agents, the logrotate configuration, the XML schemas and the SNMP MIB. This allows Pacemaker Remote nodes to work better when the full pacemaker package is not installed. * Have you ever wondered why a resource is not starting when you think it should? crm_mon will now show why a resource is stopped, for example, because it is unmanaged, or disabled in the configuration. * Three significant regressions have been fixed. Compressed CIBs larger than 1MB are again supported (a regression since 1.1.14), fenced unseen nodes properly are not marked as unclean (also a regression since 1.1.14), and failures of multiple-level monitor checks should again cause the resource to fail (a regression since 1.1.10). As usual, the release includes many bugfixes and minor enhancements. For a more detailed list of changes, see the change log: https://github.com/ClusterLabs/pacemaker/blob/1.1/ChangeLog Everyone is encouraged to download, compile and test the new release. We do many regression tests and simulations, but we can't cover all possible use cases, so your feedback is important and appreciated. Many thanks to all contributors of source code to this release, including Andrew Beekhof, Bin Liu, David Shane Holden, Ferenc Wágner, Gao Yan, Hideo Yamauchi, Jan Pokorný, Ken Gaillot, Klaus Wenninger, Kristoffer Grönlund, Lars Ellenberg, Michal Koutný, Nakahira Kazutomo, Ruben Kerkhof, and Yusuke Iida. Apologies if I have overlooked anyone. -- Ken Gaillot___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Monitoring action of Pacemaker resources fail because of high load on the nodes
On 04/22/2016 08:57 AM, Klaus Wenninger wrote: > On 04/22/2016 03:29 PM, John Gogu wrote: >> Hello community, >> I am facing following situation with a Pacemaker 2 nodes DB cluster >> (3 resources configured into the cluster - 1 MySQL DB resource, 1 >> Apache resource, 1 IP resource ) >> -at every 61 seconds an MySQL monitoring action is started and have a >> 1200 sec timeout. > You can increase the timeout for monitoring. >> >> In some situation due to high load on the machines, monitoring action >> run into a timeout, and the cluster is performing a fail over even if >> the DB is up and running. Do you have a hint how can be prioritized >> automatically monitoring actions? >> > Consider that monitoring - at least as part of the action - should check > if what your service is actually providing > is working according to some functional and nonfunctional constraints as > to simulate the experience of the > consumer of your services. So you probably don't want that to happen > prioritized. > So if you relaxed the timing requirements of your monitoring to > something that would be acceptable in terms > of the definition of the service you are providing and you are still > running into troubles the service quality you > are providing wouldn't be that spiffing either... Also, you can provide multiple levels of monitoring: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_multiple_monitor_operations For example, you could provide a very simple check that just makes sure MySQL is responding on its port, and run that frequently with a low timeout. And your existing thorough monitor could be run less frequently with a high timeout. FYI there was a bug related to multiple monitors apparently introduced in 1.1.10, such that a higher-level monitor failure might not trigger a resource failure. It was recently fixed in the upstream master branch (which will be in the soon-to-be-released 1.1.15-rc1). >> Thank you and best regards, >> John ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Coming in 1.1.15: Event-driven alerts
On 04/22/2016 02:43 AM, Klaus Wenninger wrote: > On 04/22/2016 08:16 AM, Ulrich Windl wrote: > Ken Gaillotschrieb am 21.04.2016 um 19:50 in > Nachricht >> <571912f3.2060...@redhat.com>: >> >> [...] >>> The alerts section can have any number of alerts, which look like: >>> >>>>> path="/srv/pacemaker/pcmk_alert_sample.sh"> >>> >>> >> value="/var/log/cluster-alerts.log" /> >>> >>> >> Are there any parameters supplied for the script? For the XML: I think >> "path" for the script to execute is somewhat generic: Why not call it "exec" >> or something like that? Likewise for "value": Isn't "logfile" a better name? > exec has a certain appeal... > but recipient can actually be anything like email-address, logfile, ... so > keeping it general like value makes sense in my mind >> >>> As always, id is simply a unique label for the entry. The path is an >>> arbitrary file path to an alert script. Existing external scripts used >>> with ClusterMon resources will work as alert scripts, because the >>> interface is compatible. >>> >>> We intend to provide sample scripts in the extra/alerts source >>> directory. The existing pcmk_notify_sample.sh script has been moved >>> there (as pcmk_alert_sample.sh), and so has pcmk_snmp_helper.sh. >>> >>> Each alert may have any number of recipients configured. These values >> What I did not understand is how an "alert" is related to some cluster >> "event": By ID, or by some explict configuration? > There are "node", "fencing" and "resource" (CRM_alert_kind tells you > if you want to know inside a script) alerts and alerts was chosen > as it is in sync with other frameworks like nagios, ... but you can choose > it a synonym for event ... meaning it is not necessarily anything bad > or good just something you might be interested in. > > You get set a bunch of environment variables when your executable is > called you can use to get more info and add intelligence if you like: > > CRM_alert_node, CRM_alert_nodeid, CRM_alert_rsc, CRM_alert_task, > CRM_alert_interval, CRM_alert_desc, CRM_alert_status, > CRM_alert_target_rc, CRM_alert_rc, CRM_alert_kind, > CRM_alert_version, CRM_alert_node_sequence > CRM_alert_timestamp > > Referencing is done via node-names, resource-ids as throughout > the pacemaker-config in the cib. > > >> >>> will simply be passed to the script as arguments. The first recipient >>> will also be passed as the CRM_alert_recipient environment variable, for >>> compatibility with existing scripts that only support one recipient. >>> (All CRM_alert_* variables will also be passed as CRM_notify_* for >>> compatibility with existing ClusterMon scripts.) >>> >>> An alert may also have instance attributes and meta-attributes, for example: >>> >>>>> path="/srv/pacemaker/pcmk_alert_sample.sh"> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >> value="/var/log/cluster-alerts.log" /> >>> >>> >>> >>> The meta-attributes are optional properties used by the cluster. >>> Currently, they include "timeout" (which defaults to 30s) and >>> "tstamp_format" (which defaults to "%H:%M:%S.%06N", and is a >>> microsecond-resolution timestamp provided to the alert script as the >>> CRM_alert_timestamp environment variable). >>> >>> The instance attributes are arbitrary values that will be passed as >>> environment variables to the alert script. This provides you a >>> convenient way to configure your scripts in the cluster, so you can >>> easily reuse them. >> At the moment this sounds quite abstract, yet. > meta-attributes and instance-attributes as used as with > resources, where meta-attributes reflect config-parameters > you pass rather to pacemaker like in this case for the timeout > observation when the script is executed, and the format > string that tells pacemaker in which style you would like > CRM_alert_timestamp to be filled. > By the way this timestamp is created immediately before all alerts > are fired off in parallel so to be usable for analysis of what happened > in which order in the cluster - much better than using date inside > a script running as separate process possibly having been delayed. > > instance-attributes you can use to tell your script whatever > you like but it is visible and synchronized throughout the > cluster residing in the cib. It is abstract, because instance attributes are interpreted by the alert script you provide; pacemaker merely passes them along. It's comparable to instance attributes for a resource -- pacemaker just passes them to the resource agent. A concrete example might be a script that emails somebody. It might take the email address as the recipient, and the subject line as an instance attribute. Maybe it could also take a time limit as an instance attribute, and not send emails more often than that, to avoid filling up someone's inbox when things go haywire.
[ClusterLabs] operation parallelism
Hi, Are recurring monitor operations constrained by the batch-limit cluster option? I ask because I'd like to limit the number of parallel start and stop operations (because they are resource hungry and potentially take long) without starving other operations, especially monitors. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Monitoring action of Pacemaker resources fail because of high load on the nodes
On 04/22/2016 03:29 PM, John Gogu wrote: > Hello community, > I am facing following situation with a Pacemaker 2 nodes DB cluster > (3 resources configured into the cluster - 1 MySQL DB resource, 1 > Apache resource, 1 IP resource ) > -at every 61 seconds an MySQL monitoring action is started and have a > 1200 sec timeout. You can increase the timeout for monitoring. > > In some situation due to high load on the machines, monitoring action > run into a timeout, and the cluster is performing a fail over even if > the DB is up and running. Do you have a hint how can be prioritized > automatically monitoring actions? > Consider that monitoring - at least as part of the action - should check if what your service is actually providing is working according to some functional and nonfunctional constraints as to simulate the experience of the consumer of your services. So you probably don't want that to happen prioritized. So if you relaxed the timing requirements of your monitoring to something that would be acceptable in terms of the definition of the service you are providing and you are still running into troubles the service quality you are providing wouldn't be that spiffing either... > Thank you and best regards, > John > > > > > > > > > > > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] pacemaker apache and umask on CentOS 7
> Gesendet: Mittwoch, 20. April 2016 um 19:35 Uhr > Von: "Ken Gaillot"> An: users@clusterlabs.org > Betreff: Re: [ClusterLabs] pacemaker apache and umask on CentOS 7 > > On 04/20/2016 12:20 PM, Klaus Wenninger wrote: > > On 04/20/2016 05:35 PM, fatcha...@gmx.de wrote: > >> > >>> Gesendet: Mittwoch, 20. April 2016 um 16:31 Uhr > >>> Von: "Klaus Wenninger" > >>> An: users@clusterlabs.org > >>> Betreff: Re: [ClusterLabs] pacemaker apache and umask on CentOS 7 > >>> > >>> On 04/20/2016 04:11 PM, fatcha...@gmx.de wrote: > Hi, > > I´m running a 2-node apache webcluster on a fully patched CentOS 7 > (pacemaker-1.1.13-10.el7_2.2.x86_64 pcs-0.9.143-15.el7.x86_64). > Some files which are generated by the apache are created with a umask > 137 but I need this files created with a umask of 117. > To change this I first tried to add a umask 117 to /etc/sysconfig/httpd > & rebooted the system. This had no effekt. > So I found out (after some research) that this is not working under > CentOS 7 and that this had to be changed via systemd. > So I created a directory "/etc/systemd/system/httpd.service.d" and put > there a "umask.conf"-File with this content: > [Service] > UMask=0117 > > Again I rebooted the system but no effekt. > Is the pacemaker really starting the apache over the systemd ? And how > can I solve the problem ? > >>> Didn't check with CentOS7 but on RHEL7 there is a > >>> /usr/lib/ocf/resource.d/heartbeat/apache. > >>> So it depends on how you defined the resource starting apache if systemd > >>> is used or if it being done by the ocf-ra. > >> MY configuration is: > >> Resource: apache (class=ocf provider=heartbeat type=apache) > >> Attributes: configfile=/etc/httpd/conf/httpd.conf > >> statusurl=http://127.0.0.1:8089/server-status > >> Operations: start interval=0s timeout=40s (apache-start-timeout-40s) > >> stop interval=0s timeout=60s (apache-stop-timeout-60s) > >> monitor interval=1min (apache-monitor-interval-1min) > >> > >> So I quess it is ocf. But what will be the right way to do it ? I lack a > >> bit of understandig about this /usr/lib/ocf/resource.d/heartbeat/apache > >> file. > >> > > There are the ocf-Resource-Agents (if there is none you can always > > create one for your service) which usually > > give you a little bit more control of the service from the cib. (You can > > set a couple of variables like in this example > > the pointer to the config-file) > > And of course you can always create resources referring the native > > services of your distro (systemd-units in > > this case). > >> > >> > >> > Any suggestions are welcome > > If you add envfiles="/etc/sysconfig/httpd" to your apache resource, it > should work. worked like a charm. Thanks to everybody for your support. Kind regards fatcharly > > Kind regards > > fatcharly > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] ClusterLabsComing in 1.1.15: Event-driven alerts
On 04/22/2016 10:55 AM, Ferenc Wágner wrote: > Ken Gaillotwrites: > >> Each alert may have any number of recipients configured. These values >> will simply be passed to the script as arguments. The first recipient >> will also be passed as the CRM_alert_recipient environment variable, >> for compatibility with existing scripts that only support one >> recipient. >> [...] >> In the current implementation, meta-attributes and instance attributes >> may also be specified within the block, in which case they >> override any values specified in the block when sent to that >> recipient. > Sorry, I don't get this. The first paragraph above tells me that for a > given cluster event each is run once, with all recipients passed > as command line arguments to the alert executable. But a single > invocation can only have a single set of environmental variables, so how > can you override instance attributes for individual recipients? The paragraph above is indeed confusing or at least it can be understood in a way that doesn't reflect how it is implemented. If the script would just be called once the parameter itself could already be a list. Anyway - as it is implemented at the moment - the script is called for each of the recipients in parallel. This has a couple of advantages as it simplifies the script implementation (if you have problems with concurrency use just one recipient and make it a list), the timeout can be observed on a per recipient basis and if delivering to one recipient fails it doesn't affect the others. And each of these calls inherits a global set of environment variables while each of them can be overwritten on a per recipient basis. > >> Whether this stays in the final 1.1.15 release or not depends on >> whether people find this to be useful, or confusing. > Now guess..:) Like this it finally even might lead to detection and avoidance of confusion ;-) ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] ClusterLabsComing in 1.1.15: Event-driven alerts
Ken Gaillotwrites: > Each alert may have any number of recipients configured. These values > will simply be passed to the script as arguments. The first recipient > will also be passed as the CRM_alert_recipient environment variable, > for compatibility with existing scripts that only support one > recipient. > [...] > In the current implementation, meta-attributes and instance attributes > may also be specified within the block, in which case they > override any values specified in the block when sent to that > recipient. Sorry, I don't get this. The first paragraph above tells me that for a given cluster event each is run once, with all recipients passed as command line arguments to the alert executable. But a single invocation can only have a single set of environmental variables, so how can you override instance attributes for individual recipients? > Whether this stays in the final 1.1.15 release or not depends on > whether people find this to be useful, or confusing. Now guess..:) -- Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Coming in 1.1.15: Event-driven alerts
On 04/22/2016 08:16 AM, Ulrich Windl wrote: Ken Gaillotschrieb am 21.04.2016 um 19:50 in Nachricht > <571912f3.2060...@redhat.com>: > > [...] >> The alerts section can have any number of alerts, which look like: >> >>> path="/srv/pacemaker/pcmk_alert_sample.sh"> >> >> > value="/var/log/cluster-alerts.log" /> >> >> > Are there any parameters supplied for the script? For the XML: I think "path" > for the script to execute is somewhat generic: Why not call it "exec" or > something like that? Likewise for "value": Isn't "logfile" a better name? exec has a certain appeal... but recipient can actually be anything like email-address, logfile, ... so keeping it general like value makes sense in my mind > >> As always, id is simply a unique label for the entry. The path is an >> arbitrary file path to an alert script. Existing external scripts used >> with ClusterMon resources will work as alert scripts, because the >> interface is compatible. >> >> We intend to provide sample scripts in the extra/alerts source >> directory. The existing pcmk_notify_sample.sh script has been moved >> there (as pcmk_alert_sample.sh), and so has pcmk_snmp_helper.sh. >> >> Each alert may have any number of recipients configured. These values > What I did not understand is how an "alert" is related to some cluster > "event": By ID, or by some explict configuration? There are "node", "fencing" and "resource" (CRM_alert_kind tells you if you want to know inside a script) alerts and alerts was chosen as it is in sync with other frameworks like nagios, ... but you can choose it a synonym for event ... meaning it is not necessarily anything bad or good just something you might be interested in. You get set a bunch of environment variables when your executable is called you can use to get more info and add intelligence if you like: CRM_alert_node, CRM_alert_nodeid, CRM_alert_rsc, CRM_alert_task, CRM_alert_interval, CRM_alert_desc, CRM_alert_status, CRM_alert_target_rc, CRM_alert_rc, CRM_alert_kind, CRM_alert_version, CRM_alert_node_sequence CRM_alert_timestamp Referencing is done via node-names, resource-ids as throughout the pacemaker-config in the cib. > >> will simply be passed to the script as arguments. The first recipient >> will also be passed as the CRM_alert_recipient environment variable, for >> compatibility with existing scripts that only support one recipient. >> (All CRM_alert_* variables will also be passed as CRM_notify_* for >> compatibility with existing ClusterMon scripts.) >> >> An alert may also have instance attributes and meta-attributes, for example: >> >>> path="/srv/pacemaker/pcmk_alert_sample.sh"> >> >> >> >> >> >> >> >> >> >> >> > value="/var/log/cluster-alerts.log" /> >> >> >> >> The meta-attributes are optional properties used by the cluster. >> Currently, they include "timeout" (which defaults to 30s) and >> "tstamp_format" (which defaults to "%H:%M:%S.%06N", and is a >> microsecond-resolution timestamp provided to the alert script as the >> CRM_alert_timestamp environment variable). >> >> The instance attributes are arbitrary values that will be passed as >> environment variables to the alert script. This provides you a >> convenient way to configure your scripts in the cluster, so you can >> easily reuse them. > At the moment this sounds quite abstract, yet. meta-attributes and instance-attributes as used as with resources, where meta-attributes reflect config-parameters you pass rather to pacemaker like in this case for the timeout observation when the script is executed, and the format string that tells pacemaker in which style you would like CRM_alert_timestamp to be filled. By the way this timestamp is created immediately before all alerts are fired off in parallel so to be usable for analysis of what happened in which order in the cluster - much better than using date inside a script running as separate process possibly having been delayed. instance-attributes you can use to tell your script whatever you like but it is visible and synchronized throughout the cluster residing in the cib. >> In the current implementation, meta-attributes and instance attributes >> may also be specified within the block, in which case they >> override any values specified in the block when sent to that >> recipient. Whether this stays in the final 1.1.15 release or not depends >> on whether people find this to be useful, or confusing. > Could you give one complete example (configuration and script), even if it's > just as a sample for discussion? > > ANd will the DTD version number be incremented this time? ;-) pcmk_alert_sample.sh is not a bad example for the use of the environment variables set per default - although at the moment it is still using the deprecated CRM_notify_... naming (instead of CRM_alert_...) which is still in for
[ClusterLabs] Antw: Coming in 1.1.15: Event-driven alerts
>>> Ken Gaillotschrieb am 21.04.2016 um 19:50 in >>> Nachricht <571912f3.2060...@redhat.com>: [...] > The alerts section can have any number of alerts, which look like: > > path="/srv/pacemaker/pcmk_alert_sample.sh"> > > value="/var/log/cluster-alerts.log" /> > > Are there any parameters supplied for the script? For the XML: I think "path" for the script to execute is somewhat generic: Why not call it "exec" or something like that? Likewise for "value": Isn't "logfile" a better name? > > As always, id is simply a unique label for the entry. The path is an > arbitrary file path to an alert script. Existing external scripts used > with ClusterMon resources will work as alert scripts, because the > interface is compatible. > > We intend to provide sample scripts in the extra/alerts source > directory. The existing pcmk_notify_sample.sh script has been moved > there (as pcmk_alert_sample.sh), and so has pcmk_snmp_helper.sh. > > Each alert may have any number of recipients configured. These values What I did not understand is how an "alert" is related to some cluster "event": By ID, or by some explict configuration? > will simply be passed to the script as arguments. The first recipient > will also be passed as the CRM_alert_recipient environment variable, for > compatibility with existing scripts that only support one recipient. > (All CRM_alert_* variables will also be passed as CRM_notify_* for > compatibility with existing ClusterMon scripts.) > > An alert may also have instance attributes and meta-attributes, for example: > > path="/srv/pacemaker/pcmk_alert_sample.sh"> > > > > > > > > > > > value="/var/log/cluster-alerts.log" /> > > > > The meta-attributes are optional properties used by the cluster. > Currently, they include "timeout" (which defaults to 30s) and > "tstamp_format" (which defaults to "%H:%M:%S.%06N", and is a > microsecond-resolution timestamp provided to the alert script as the > CRM_alert_timestamp environment variable). > > The instance attributes are arbitrary values that will be passed as > environment variables to the alert script. This provides you a > convenient way to configure your scripts in the cluster, so you can > easily reuse them. At the moment this sounds quite abstract, yet. > > In the current implementation, meta-attributes and instance attributes > may also be specified within the block, in which case they > override any values specified in the block when sent to that > recipient. Whether this stays in the final 1.1.15 release or not depends > on whether people find this to be useful, or confusing. Could you give one complete example (configuration and script), even if it's just as a sample for discussion? ANd will the DTD version number be incremented this time? ;-) > > Sometime during the 1.1.15 release cycle, the previous experimental > interface (the notification-agent and notification-recipient cluster > properties) will be disabled by default at compile-time. If you are > compiling the master branch from source and require that interface, you > can define RHEL7_COMPAT when building, to enable support. > > This feature is already in the upstream master branch, and will be in > the forthcoming 1.1.15-rc1 release candidate. Everyone is encouraged to > try it out and give feedback. Regards, Ulrich ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org