Re: [ClusterLabs] Wait until resource is really ready before moving clusterip
On 01/12/2016 07:57 AM, Kristoffer Grönlund wrote: > Joakim Hansson writes: > >> Hi! >> I have a cluster running tomcat which in turn run solr. >> I use three nodes with loadbalancing via ipaddr2. >> The thing is, when tomcat is started on a node it takes about 2 minutes >> before solr is functioning correctly. >> >> Is there a way to make the ipaddr2-clone wait 2 minutes after tomcat is >> started before it moves the ip to the node? >> >> Much appreciated! > > Hi, > > There is the ocf:heartbeat:Delay resource agent, which on one hand is > documented as a test resource, but on the other hand should do what you > need: > > primitive solr ... > primitive two-minute-delay ocf:heartbeat:Delay \ > params startdelay=120 meta target-role=Started \ > op start timeout=180 > group solr-then-wait solr two-minute-delay > > Now the group acts basically like the solr resource, except for the > two-minute delay after starting solr before the group itself is > considered started. > > Cheers, > Kristoffer > >> >> / Jocke Another way would be to customize the tomcat resource agent so that start doesn't return success until it's fully ready to accept requests (which would probably be specific to whatever app you're running via tomcat). Of course you'd need a long start timeout. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Wait until resource is really ready before moving clusterip
Joakim Hansson writes: > Hi! > I have a cluster running tomcat which in turn run solr. > I use three nodes with loadbalancing via ipaddr2. > The thing is, when tomcat is started on a node it takes about 2 minutes > before solr is functioning correctly. > > Is there a way to make the ipaddr2-clone wait 2 minutes after tomcat is > started before it moves the ip to the node? > > Much appreciated! Hi, There is the ocf:heartbeat:Delay resource agent, which on one hand is documented as a test resource, but on the other hand should do what you need: primitive solr ... primitive two-minute-delay ocf:heartbeat:Delay \ params startdelay=120 meta target-role=Started \ op start timeout=180 group solr-then-wait solr two-minute-delay Now the group acts basically like the solr resource, except for the two-minute delay after starting solr before the group itself is considered started. Cheers, Kristoffer > > / Jocke > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Wait until resource is really ready before moving clusterip
Hi! I have a cluster running tomcat which in turn run solr. I use three nodes with loadbalancing via ipaddr2. The thing is, when tomcat is started on a node it takes about 2 minutes before solr is functioning correctly. Is there a way to make the ipaddr2-clone wait 2 minutes after tomcat is started before it moves the ip to the node? Much appreciated! / Jocke ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Automatic Recover for stonith:external/libvirt
Thanks for the reply. After further successless testing for the automatic recover I had read this artikel: http://clusterlabs.org/doc/crm_fencing.html There is a recommendation to monitor only once in a few hours the fencing device. I am happy with it and so I configured the interval for monitoring at 9600 secs (3-4 hours). Cheers Michael On 08.01.2016 16:30, Ken Gaillot wrote: On 01/08/2016 08:56 AM, m...@inwx.de wrote: Hello List, I have here a test environment for checking pacemaker. Sometimes our kvm-hosts with libvirt have trouble with responding the stonith/libvirt resource, so I like to configure the service to realize as failed after three failed monitoring attempts. I was searching for a configuration here: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/index.html But I failed after hours. That's the configuration line for stonith/libvirt: crm configure primitive p_fence_ha3 stonith:external/libvirt params hostlist="ha3" hypervisor_uri="qemu+tls://debian1/system" op monitor interval="60" Every 60 seconds pacemaker makes something like this: stonith -t external/libvirt hostlist="ha3" hypervisor_uri="qemu+tls://debian1/system" -S ok To simulate the unavailability of the kvm host I remove the certificate in /etc/libvirt/libvirtd.conf and restart libvirtd. After 60 seconds or less I can see the error with "crm status". On the kvm host I add certificate again to /etc/libvirt/libvirtd.conf and restart libvirt again. Although libvirt is again available the stonith-resource did not start again. I altered the configuration line for stonith/libvirt with following parts: op monitor interval="60" pcmk_status_retries="3" op monitor interval="60" pcmk_monitor_retries="3" op monitor interval="60" start-delay=180 meta migration-threshold="200" failure-timeout="120" But always with first failed monitor check after 60 or less seconds pacemakers did not resume stonith-libvirt after libvirt is again available. Is there enough time left in the timeout for the cluster to retry? (The interval is not the same as the timeout.) Check your pacemaker.log for messages like "Attempted to execute agent ... the maximum number of times (...) allowed". That will tell you whether it is retrying. You definitely don't want start-delay, and migration-threshold doesn't really mean much for fence devices. Of course, you also want to fix the underlying problem of libvirt not being responsive. That doesn't sound like something that should routinely happen. BTW I haven't used stonith/external agents (which rely on the cluster-glue package) myself. I use the fence_virtd daemon on the host with fence_xvm as the configured fence agent. Here is the "crm status"-output on debian 8 (Jessie): root@ha4:~# crm status Last updated: Tue Jan 5 10:04:18 2016 Last change: Mon Jan 4 18:18:12 2016 Stack: corosync Current DC: ha3 (167772400) - partition with quorum Version: 1.1.12-561c4cf 2 Nodes configured 2 Resources configured Online: [ ha3 ha4 ] Service-IP (ocf::heartbeat:IPaddr2): Started ha3 haproxy(lsb:haproxy): Started ha3 p_fence_ha3(stonith:external/libvirt): Started ha4 Kind regards Michael R. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Announcing the 2.1.5 release of crmsh
Hello everyone, Today we are proud to announce the release of crmsh version 2.1.5! This release mainly consists of bug fixes, as well as compatibility with Pacemaker 1.1.14. For a complete list of changes since the previous version, please refer to the changelog: * https://github.com/ClusterLabs/crmsh/blob/2.1.5/ChangeLog Packages for several popular Linux distributions can be downloaded from the Stable repository at the OBS: * http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/ Archives of the tagged release: * https://github.com/ClusterLabs/crmsh/archive/2.1.5.tar.gz * https://github.com/ClusterLabs/crmsh/archive/2.1.5.zip Changes since the previous release: - medium: report: Try to load source as session if possible (bsc#927407) - medium: crm_gv: Wrap non-identifier names in quotes (bsc#931837) - medium: crm_gv: Improved quoting of non-identifier node names (bsc#931837) - medium: crm_pkg: Fix cluster init bug on RH-based systems - medium: hb_report: Collect logs from pacemaker.log - medium: constants: Add 'provides' meta attribute (bsc#936587) - high: parse: Add attributes to terminator set (bsc#940920) - Medium: cibconfig: skip sanity check for properties other than cib-bootstrap-options - medium: config: Add report_tool_options (bsc#917638) - low: main: Bash completion didn't handle sudo correctly - high: report: New detection to fix missing transitions (bnc#917131) - medium: report: Add pacemaker.log to find_node_log list (bsc#941734) - high: hb_report: Prefer pacemaker.log if it exists (bsc#941681) - high: report: Output format from pacemaker has changed (bsc#941681) - high: report: Update transition edge regexes (bsc#942906) - medium: report: Reintroduce empty transition pruning (bsc#943291) - medium: log_patterns: Remove reference to function name in log patterns (bsc#942906) - low: hb_report: Collect libqb version (bsc#943327) - high: parse: Fix crash when referencing score types by name (bsc#940194) - low: constants: Add meta attributes for remote nodes - low: ui_history: Swap from and to times if to < from - high: cibconfig: Do not fail on unknown pacemaker schemas (bsc#946893) - high: log_patterns_118: Update the correct set of log patterns (bsc#942906) - high: xmlutil: Order is significant in resource_set (bsc#955434) - high: cibconfig: Fix XML import bug for cloned groups (bsc#959895) Thank you, -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org