Re: [Pacemaker] power failure handling
On May 27, 2010, at 7:21 AM, Andrew Beekhof wrote: On Wed, May 26, 2010 at 9:07 PM, Vadym Chepkov vchep...@gmail.com wrote: Hi, What would be the proper way to shutdown members of two-node cluster in case of a power outage? I assume as soon I issue 'crm node standby node-1 reboot' resources will start to fail-over to the second node and, first of all, there is no reason for that, and, second of all, consecutive 'crm node standby node-2 reboot' might get into some race condition. Why? Just a gutsy feeling and I would prefer to have it in one transaction, call me a purist :) I would use 'crm load update standby.cfg, but I can't figure out how to set lifetime reboot attribute properly. crm is definitely using a hack on this one because when I issue this command the node goes standby, but 'crm configure' and 'crm node show' indicates that standby attribute is off, weird. In pseudo-property terms: crm confgure property stop-all-resources-even-if-target-role-is-started-until-reboot=true crm confgure property stop-all-resources=true followed by: cibadmin --delete-all --xpath '//nvp...@name=target-role]' should work it would also alter those that were stopped for a reason, and certainly can be tweaked, but it won't take care of the until-reboot part. Thanks, Vadym ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Re: [Pacemaker] pengine self-maintenance
On May 17, 2010, at 11:38 AM, Dejan Muhamedagic wrote: You don't want to set it that low. PE input files are part of your cluster history. Set it to a few thousand. What could be the side-backs of having it too low? How are these files being used? And shouldn't be some reasonable default be in place? I just happened to notice 90% inode utilization on my /var, some could be not so lucky. # ls /var/lib/pengine/|wc -l 123500 /var/lib/heartbeat/crm/ seems also growing unattended. Unless there is a bug somewhere, it should be storing only the last 100 configurations. you are right, they are being reused I found another bug/feature :) When it's time to reutilize cib-/pe-xxx the process starts with 1, but initial start creates files with 0 suffix So you have your pe-warn-0.bz2 frozen in time, for example :) Does pacemaker do any self-maintenance or it will cause system to crash eventually by utilizing all inodes? Also, why cluster-recheck-interval not in pengine metadata output? Is it deprecated? Its controlled by the crmd, so its in the crmd metadata output. Ah, then crm cli has a bug? When you click TAB metadata of crmd is not shown: crm(live)configure# property batch-limit= no-quorum-policy= pe-input-series-max= stonith-enabled= cluster-delay=node-health-green= pe-warn-series-max= stonith-timeout= default-action-timeout= node-health-red= remove-after-stop=stop-all-resources= default-resource-stickiness= node-health-strategy= start-failure-is-fatal= stop-orphan-actions= is-managed-default= node-health-yellow= startup-fencing= stop-orphan-resources= maintenance-mode= pe-error-series-max= stonith-action= symmetric-cluster= Yes, you can file a bugzilla for that. Note that the property will still be set if you type it. Done, Bug 2419 Thanks, Vadym ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Re: [Pacemaker] pengine self-maintenance
On Wed, May 19, 2010 at 1:26 PM, Dejan Muhamedagic deja...@fastmail.fmwrote: And shouldn't be some reasonable default be in place? I just happened to notice 90% inode utilization on my /var, some could be not so lucky. Yes, that could be a problem. Perhaps that default could be changed to say 1 which would be close enough to unlimited for clusters in normal use :) Even if your cluster absolutely solid and none of applications ever go up or down this will be reached in 104 days :) I found another bug/feature :) When it's time to reutilize cib-/pe-xxx the process starts with 1, but initial start creates files with 0 suffix So you have your pe-warn-0.bz2 frozen in time, for example :) Vadym ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Re: [Pacemaker] IP address does not failover on a new test cluster
On Tue, May 18, 2010 at 2:22 PM, Ruiyuan Jiang ruiyuan_ji...@liz.comwrote: Hi, Vadym I modified the configuration per your suggestion. Here is the current configuration of the cluster: [r...@usnbrl52 ~]# crm configure show node usnbrl52 node usnbrl53 primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip=156.146.22.48 cidr_netmask=32 \ op monitor interval=30s property $id=cib-bootstrap-options \ dc-version=1.0.8-fab8db4bbd271ba0a630578ec23776dfbaa4e2cf \ cluster-infrastructure=openais \ expected-quorum-votes=2 \ stonith-enabled=false rsc_defaults $id=rsc-options \ resource-stickiness=100 [r...@usnbrl52 ~]# After the change, the IP address still does not fail to the other node usnbrl53 after I shutdown openais on node usnbrl52. The cluster IP has no problem to bound on usnbrl52 when the “openais” gets stopped and started on the node. That's because no-quorum-policy=ignore is still not there, it is not listed in crm configure show output Run the command again crm configure property no-quorum-policy=ignore and make sure 'crm configure show' has changed accordingly Vadym ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Re: [Pacemaker] IP address does not failover on a new test cluster
On Tue, May 18, 2010 at 3:58 PM, Ruiyuan Jiang ruiyuan_ji...@liz.comwrote: Thanks, Vadym This time it failed over to another node. For two nodes cluster, does the cluster have to set to “no-quorum-policy=ignore” to failover or work correctly? I can't say it better myself: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf two-node cluster only has quorum when both nodes are running, which is no longer the case for our cluster. This would normally make the creation of a two-node cluster pointless, however it is possible to control how Pacemaker behaves when quorum is lost. In particular, we can tell the cluster to simply ignore quorum altogether. crm configure property no-quorum-policy=ignore ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Re: [Pacemaker] pengine self-maintenance
On May 17, 2010, at 2:52 AM, Andrew Beekhof wrote: On Sun, May 16, 2010 at 1:09 AM, Vadym Chepkov vchep...@gmail.com wrote: Hi I noticed pengine (pacemaker-1.0.8-6.el5) creates quite a lot of files in /var/lib/pengine, especially when cluster-recheck-interval is set to enable failure-timeout checks. pengine metadata | grep series-max Great, thanks, after I set it, I take it I need to clean excessive manually? # crm configure show |grep series-max pe-error-series-max=10 \ pe-warn-series-max=10 \ pe-input-series-max=10 # ls /var/lib/pengine/|wc -l 123500 /var/lib/heartbeat/crm/ seems also growing unattended. Unless there is a bug somewhere, it should be storing only the last 100 configurations. you are right, they are being reused Does pacemaker do any self-maintenance or it will cause system to crash eventually by utilizing all inodes? Also, why cluster-recheck-interval not in pengine metadata output? Is it deprecated? Its controlled by the crmd, so its in the crmd metadata output. Ah, then crm cli has a bug? When you click TAB metadata of crmd is not shown: crm(live)configure# property batch-limit= no-quorum-policy= pe-input-series-max= stonith-enabled= cluster-delay=node-health-green=pe-warn-series-max= stonith-timeout= default-action-timeout= node-health-red= remove-after-stop= stop-all-resources= default-resource-stickiness= node-health-strategy= start-failure-is-fatal= stop-orphan-actions= is-managed-default= node-health-yellow= startup-fencing= stop-orphan-resources= maintenance-mode= pe-error-series-max= stonith-action= symmetric-cluster= Thanks, Vadym ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Re: [Pacemaker] Detecting a lost network connection
On May 17, 2010, at 11:56 AM, Simon Lavigne-Giroux wrote: Hi, I have 2 servers running Pacemaker. When the router fails, both nodes become primary. Is it possible for Pacemaker on the secondary server to detect that the network connection is not available and not become primary. http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ch09s03s03s02.html ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Re: [Pacemaker] IP address does not failover on a new test cluster
On May 17, 2010, at 5:40 PM, Ruiyuan Jiang wrote: Hi, Gianluca I modified my configuration and deleted “crm configure property no-quorum-policy=ignore” as you suggested but I have the same problem that the IP address does not fail. Thanks. [r...@usnbrl52 log]# crm configure show node usnbrl52 node usnbrl53 primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip=156.146.22.48 cidr_netmask=32 \ op monitor interval=30s property $id=cib-bootstrap-options \ dc-version=1.0.8-fab8db4bbd271ba0a630578ec23776dfbaa4e2cf \ cluster-infrastructure=openais \ expected-quorum-votes=2 \ stonith-enabled=false rsc_defaults $id=rsc-options \ resource-stickiness=default did you run 'crm configure show' after you set the property? Because the option is not shown in your output. also resource-stickiness=default seems suspicious what default? I thought it should be a numeric value. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
[Pacemaker] pengine self-maintenance
Hi I noticed pengine (pacemaker-1.0.8-6.el5) creates quite a lot of files in /var/lib/pengine, especially when cluster-recheck-interval is set to enable failure-timeout checks. /var/lib/heartbeat/crm/ seems also growing unattended. Does pacemaker do any self-maintenance or it will cause system to crash eventually by utilizing all inodes? Also, why cluster-recheck-interval not in pengine metadata output? Is it deprecated? Thanks, Vadym ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Re: [Pacemaker] clone ip definition and location stops my resources...
You forgot to turn on monitor operation for ping (actual job) On May 11, 2010, at 5:15 AM, Gianluca Cecchi wrote: On Mon, May 10, 2010 at 4:39 PM, Vadym Chepkov vchep...@gmail.com wrote: # crm ra meta ping name (string, [undef]): Attribute name The name of the attributes to set. This is the name to be used in the constraints. By default is pingd, but you are checking against pinggw I suggest you do not change name though, but adjust your location constraint to use pingd instead. crm_mon only notices pingd at the moment whenn you pass -f argument: it's hardcoded On Mon, May 10, 2010 at 9:34 AM, Gianluca Cecchi gianluca.cec...@gmail.com wrote: Hello, using pacemaker 1.0.8 on rh el 5 I have some problems understanding the way ping clone works to setup monitoring of gw... even after reading docs... As soon as I run: crm configure location nfs-group-with-pinggw nfs-group rule -inf: not_defined pinggw or pinggw lte 0 the resources go stopped and don't re-start [snip] hem... I changed the location line so that now I have: primitive pinggw ocf:pacemaker:ping \ params host_list=192.168.101.1 multiplier=100 \ op start interval=0 timeout=90 \ op stop interval=0 timeout=100 clone cl-pinggw pinggw \ meta globally-unique=false location nfs-group-with-pinggw nfs-group \ rule $id=nfs-group-with-pinggw-rule -inf: not_defined pingd or pingd lte 0 But now nothing happens if I run for example iptables -A OUTPUT -p icmp -d 192.168.101.1 -j REJECT (or DROP) in the node where nfs-group is running. Do I have to name the primitive itself to pingd It seems that the binary /bin/ping is not accessed at all (with ls -lu ...) Or do I have to change the general property I previously define to avoide failback: rsc_defaults $id=rsc-options \ resource-stickiness=100 crm_mon -f -r gives: Online: [ ha1 ha2 ] Full list of resources: SitoWeb (ocf::heartbeat:apache):Started ha1 Master/Slave Set: NfsData Masters: [ ha1 ] Slaves: [ ha2 ] Resource Group: nfs-group ClusterIP (ocf::heartbeat:IPaddr2): Started ha1 lv_drbd0 (ocf::heartbeat:LVM): Started ha1 NfsFS(ocf::heartbeat:Filesystem):Started ha1 nfssrv (ocf::heartbeat:nfsserver): Started ha1 nfsclient (ocf::heartbeat:Filesystem):Started ha2 Clone Set: cl-pinggw Started: [ ha2 ha1 ] Migration summary: * Node ha1: pingd=100 * Node ha2: pingd=100 Probably I didn't understand correctly what described at the link: http://www.clusterlabs.org/wiki/Pingd_with_resources_on_different_networks or it is outdated now... and instead of defining two clones it is better (aka works) to populate the host_list parameter as described here in case of more networks connected: http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch09s03s03.html Probably I'm missing something very simple but I don't get a clue to it... Gianluca ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Re: [Pacemaker] clone ip definition and location stops my resources...
First of all, none of the monitor operation is on by default in pacemaker, this is something that you have to turn on For the ping RA start and stop op parameters don't do much, so you can safely drop them. Here is my settings, they do work for me: primitive ping ocf:pacemaker:ping \ params name=pingd host_list=10.10.10.250 multiplier=200 timeout=3 \ op monitor interval=10 clone connected ping \ meta globally-unique=false location rg0-connected rg0 \ rule -inf: not_defined pingd or pingd lte 0 On May 11, 2010, at 7:06 AM, Gianluca Cecchi wrote: On Tue, May 11, 2010 at 12:50 PM, Vadym Chepkov vchep...@gmail.com wrote: You forgot to turn on monitor operation for ping (actual job) I saw from the [r...@ha1 ~]# crm ra meta ping command Operations' defaults (advisory minimum): start timeout=60 stop timeout=20 reloadtimeout=100 monitor_0 interval=10 timeout=60 So I presumed it was by default in place for the ping resource. Do you mean that I should define the resource this way: crm configure primitive pinggw ocf:pacemaker:ping \ params host_list=192.168.101.1 multiplier=100 \ op start interval=0 timeout=90 \ op stop interval=0 timeout=100 \ op monitor interval=10 timeout=60 Ok, I did it and I now get the same behavior as with pingd. Thanks ;-) Migration summary: * Node ha1: pingd=0 * Node ha2: pingd=100 And if I remove the iptables rule I get: Migration summary: * Node ha1: pingd=100 * Node ha2: pingd=100 Now I will check the all resources stopped problem... ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Re: [Pacemaker] clone ip definition and location stops my resources...
By the way, there is another issue with your config Since you set multiplier to 100, it will negate your resource-stickiness which is also set to 100. Either reduce multiplier or increase default resource-stickiness ( I have mine at 1000) Vadym On May 11, 2010, at 7:06 AM, Gianluca Cecchi wrote: On Tue, May 11, 2010 at 12:50 PM, Vadym Chepkov vchep...@gmail.com wrote: You forgot to turn on monitor operation for ping (actual job) I saw from the [r...@ha1 ~]# crm ra meta ping command Operations' defaults (advisory minimum): start timeout=60 stop timeout=20 reloadtimeout=100 monitor_0 interval=10 timeout=60 So I presumed it was by default in place for the ping resource. Do you mean that I should define the resource this way: crm configure primitive pinggw ocf:pacemaker:ping \ params host_list=192.168.101.1 multiplier=100 \ op start interval=0 timeout=90 \ op stop interval=0 timeout=100 \ op monitor interval=10 timeout=60 Ok, I did it and I now get the same behavior as with pingd. Thanks ;-) Migration summary: * Node ha1: pingd=0 * Node ha2: pingd=100 And if I remove the iptables rule I get: Migration summary: * Node ha1: pingd=100 * Node ha2: pingd=100 Now I will check the all resources stopped problem... ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Re: [Pacemaker] clone ip definition and location stops my resources...
pingd is a daemon with is running all the time and does it job you still need to define monitor operation though, what if the daemon dies? op monitor just have a different meaning for ping and pingd. with pingd - monitor daemon with ping - monitor connectivity as for warnings: crm configure property default-action-timeout=120s On Tue, May 11, 2010 at 11:00 AM, Gianluca Cecchi gianluca.cec...@gmail.com wrote: On Tue, May 11, 2010 at 1:13 PM, Vadym Chepkov vchep...@gmail.com wrote: First of all, none of the monitor operation is on by default in pacemaker, this is something that you have to turn on For the ping RA start and stop op parameters don't do much, so you can safely drop them. Yes, but for the pacemaker:pingd RA I didn't need to pass the op monitor parameter to have it working Also, in general I added the start/stop op parameters, because without them I get, for example with the command you suggested: [r...@ha1 ~]# crm configure primitive pinggw ocf:pacemaker:ping \ params host_list=192.168.101.1 multiplier=200 timeout=3 \ op monitor interval=10 WARNING: pinggw: default-action-timeout 20s for start is smaller than the advised 60 WARNING: pinggw: default-action-timeout 20s for monitor_0 is smaller than the advised 60 Do I have to ignore the warnings? Or do I have to adapt the resource creation with: [r...@ha1 ~]# crm configure primitive pinggw ocf:pacemaker:ping \ params host_list=192.168.101.1 multiplier=200 timeout=3 \ op start timeout=60 That gives no warnings (even if I would have expected the warning about the monitor_0 timeout as I didn't set it...???) ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Re: [Pacemaker] clone ip definition and location stops my resources...
The is no default unless it's set, that's why crm complains On Tue, May 11, 2010 at 12:41 PM, Gianluca Cecchi gianluca.cec...@gmail.com wrote: On Tue, May 11, 2010 at 5:47 PM, Vadym Chepkov vchep...@gmail.com wrote: pingd is a daemon with is running all the time and does it job you still need to define monitor operation though, what if the daemon dies? op monitor just have a different meaning for ping and pingd. with pingd - monitor daemon with ping - monitor connectivity as for warnings: crm configure property default-action-timeout=120s Thanks again! Now it is more clear. Only doubt: why pacemaker doesn't set directly as a default 120s for timeout? Any drawbacks in setting it to 120? Also, with crm configure show I can see property $id=cib-bootstrap-options \ dc-version=1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7 \ cluster-infrastructure=openais \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore \ last-lrm-refresh=1273484758 rsc_defaults $id=rsc-options \ resource-stickiness=1000 Any way to see what is the default value for default-action-timeout parameter that I'm going to change (I presume it is 20s from the warnings I received) and for other ones for example that are not shown with the show command? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Re: [Pacemaker] Pacemaker installation on CentOs 5.3
You didn't have to do 'yum makecache' Sometimes ago Andrew accidentally replaced some rpms without bumping up revision number. This made yum to complain. 'yum clean all' should have cured all that. On Tue, May 11, 2010 at 2:09 PM, Simon Lavigne-Giroux simon...@gmail.comwrote: I found the solution to my problem, I had to do a 'yum clean all' and 'yum makecache' before doing the 'yum update' I'm just getting used to yum. Simon On Mon, May 10, 2010 at 12:55 PM, Simon Lavigne-Giroux simon...@gmail.com wrote: Hi, I'm trying to install pacemaker from your epel-5 repository from your guide for a CentOs installation and it doesn't work. There is a checksum failure when using 'yum update' : http://www.clusterlabs.org/rpm/epel-5/repodata/filelists.xml.gz: [Errno -1] Metadata file does not match checksum Trying other mirror. Error: failure: repodata/filelists.xml.gz from clusterlabs: [Errno 256] No more mirrors to try. When I call 'yum install pacemaker', I have missing dependency errors for these elements libnetsnmpagent.so.15 libcrypto.so.8 libtinfo.so.5 libxml2.so.2 ... and more. Can you repair the checksum problem? Is there an alternative way to get pacemaker from a repository on CentOs 5.3. Thanks Simon ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Re: [Pacemaker] clone ip definition and location stops my resources...
# crm ra meta ping name (string, [undef]): Attribute name The name of the attributes to set. This is the name to be used in the constraints. By default is pingd, but you are checking against pinggw I suggest you do not change name though, but adjust your location constraint to use pingd instead. crm_mon only notices pingd at the moment whenn you pass -f argument: it's hardcoded On Mon, May 10, 2010 at 9:34 AM, Gianluca Cecchi gianluca.cec...@gmail.comwrote: Hello, using pacemaker 1.0.8 on rh el 5 I have some problems understanding the way ping clone works to setup monitoring of gw... even after reading docs... As soon as I run: crm configure location nfs-group-with-pinggw nfs-group rule -inf: not_defined pinggw or pinggw lte 0 the resources go stopped and don't re-start Then, as soon as I run crm configure delete nfs-group-with-pinggw the resources of the group start again... config (part of it, actually) I try to apply is this: group nfs-group ClusterIP lv_drbd0 NfsFS nfssrv \ meta target-role=Started ms NfsData nfsdrbd \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true primitive pinggw ocf:pacemaker:ping \ params host_list=192.168.101.1 multiplier=100 \ op start interval=0 timeout=90 \ op stop interval=0 timeout=100 clone cl-pinggw pinggw \ meta globally-unique=false location nfs-group-with-pinggw nfs-group \ rule $id=nfs-group-with-pinggw-rule -inf: not_defined pinggw or pinggw lte 0 Is the location constraint to be done with ping resource or with its clone? Is it a cause of the problem that I have also defined an nfs client on the other node with: primitive nfsclient ocf:heartbeat:Filesystem \ params device=nfsha:/nfsdata/web directory=/nfsdata/web fstype=nfs \ op start interval=0 timeout=60 \ op stop interval=0 timeout=60 colocation nfsclient_not_on_nfs-group -inf: nfs-group nfsclient order nfsclient_after_nfs-group inf: nfs-group nfsclient Thansk in advance, Gianluca From messages of the server running the nfs-group at that moment: May 10 15:18:27 ha1 cibadmin: [29478]: info: Invoked: cibadmin -Ql May 10 15:18:27 ha1 cibadmin: [29479]: info: Invoked: cibadmin -Ql May 10 15:18:28 ha1 crm_shadow: [29536]: info: Invoked: crm_shadow -c __crmshell.29455 May 10 15:18:28 ha1 cibadmin: [29537]: info: Invoked: cibadmin -p -U May 10 15:18:28 ha1 crm_shadow: [29539]: info: Invoked: crm_shadow -C __crmshell.29455 --force May 10 15:18:28 ha1 cib: [8470]: info: cib_replace_notify: Replaced: 0.267.14 - 0.269.1 from null May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: - cib epoch=267 num_updates=14 admin_epoch=0 / May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: + cib epoch=269 num_updates=1 admin_epoch=0 May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: + configuration May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: + constraints May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: + rsc_location id=nfs-group-with-pinggw rsc=nfs-group __crm_diff_marker__=added:top May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: + rule boolean-op=or id=nfs-group-with-pinggw-rule score=-INFINITY May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: + expression attribute=pinggw id=nfs-group-with-pinggw-expression operation=not_defined / May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: + expression attribute=pinggw id=nfs-group-with-pinggw-expression-0 operation=lte value=0 / May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: + /rule May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: + /rsc_location May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: + /constraints May 10 15:18:28 ha1 crmd: [8474]: info: abort_transition_graph: need_abort:59 - Triggered transition abort (complete=1) : Non-status change May 10 15:18:28 ha1 attrd: [8472]: info: do_cib_replaced: Sending full refresh May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: + /configuration May 10 15:18:28 ha1 crmd: [8474]: info: need_abort: Aborting on change to epoch May 10 15:18:28 ha1 attrd: [8472]: info: attrd_trigger_update: Sending flush op to all hosts for: master-nfsdrbd:0 (1) May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: + /cib May 10 15:18:28 ha1 crmd: [8474]: info: do_state_transition: State transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] May 10 15:18:28 ha1 cib: [8470]: info: cib_process_request: Operation complete: op cib_replace for section 'all' (origin=local/crm_shadow/2, version=0.269.1): ok (rc=0) May 10 15:18:28 ha1 crmd: [8474]: info: do_state_transition: All 2 cluster nodes are eligible to run resources. May 10 15:18:28 ha1 cib: [8470]: info: cib_process_request: Operation complete: op cib_modify for section nodes
Re: [Pacemaker] pacemaker and gnbd
On Tue, May 4, 2010 at 3:41 AM, Andrew Beekhof and...@beekhof.net wrote: Hmmm... I wonder if the RHEL5.5 kernel is new enough to run the dlm. I suspect not. Why not try the RHEL6 beta? It comes with compatible versions of everything (including pacemaker). http://ftp.redhat.com/redhat/rhel/beta/6/x86_64/os/Packages/ I don't see gnbd. And EPEL is not supporting RHEL6 yet. Vadym ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Re: [Pacemaker] pacemaker and gnbd
On May 3, 2010, at 2:23 AM, Andrew Beekhof wrote: I doubt openais conflicts with corosync, unless you have a very old version of cman. The repos include openais 1.0.x which is built against corosync. Unless I am doing something terribly wrong, this is not the case. Redhat 5.5 (the latest at the moment) comes with cman-2.0.115-34.el5.x86_64.rpm # rpm -q --requires -p cman-2.0.115-34.el5.x86_64.rpm warning: cman-2.0.115-34.el5.x86_64.rpm: Header V3 DSA signature: NOKEY, key ID 37017186 kernel = 2.6.18-36.el5 /sbin/chkconfig /sbin/chkconfig openais pexpect /bin/sh /bin/sh rpmlib(PayloadFilesHavePrefix) = 4.0-1 rpmlib(CompressedFileNames) = 3.0.4-1 /bin/bash /usr/bin/perl /usr/bin/python libcpg.so.2()(64bit) libcpg.so.2(OPENAIS_CPG_1.0)(64bit) libc.so.6()(64bit) libc.so.6(GLIBC_2.2.5)(64bit) libc.so.6(GLIBC_2.3.2)(64bit) libc.so.6(GLIBC_2.3.3)(64bit) libc.so.6(GLIBC_2.3)(64bit) libdlm.so.2()(64bit) libdl.so.2()(64bit) libm.so.6()(64bit) libnss3.so()(64bit) libnss3.so(NSS_3.2)(64bit) libnss3.so(NSS_3.4)(64bit) libpthread.so.0()(64bit) libpthread.so.0(GLIBC_2.2.5)(64bit) libpthread.so.0(GLIBC_2.3.2)(64bit) librt.so.1()(64bit) librt.so.1(GLIBC_2.2.5)(64bit) libSaCkpt.so.2()(64bit) libSaCkpt.so.2(OPENAIS_CKPT_B.01.01)(64bit) libxml2.so.2()(64bit) libz.so.1()(64bit) perl(Getopt::Std) perl(IPC::Open3) perl(Net::Telnet) perl(POSIX) perl(strict) perl(warnings) perl(XML::LibXML) So, it depends on openais 0.8 (libcpg.so.2) And here is yum output: # yum install gnbd Setting up Install Process Resolving Dependencies -- Running transaction check --- Package gnbd.x86_64 0:1.1.7-1.el5 set to be updated -- Processing Dependency: libcman.so.2()(64bit) for package: gnbd -- Running transaction check --- Package cman.x86_64 0:2.0.115-34.el5 set to be updated -- Processing Dependency: libSaCkpt.so.2(OPENAIS_CKPT_B.01.01)(64bit) for package: cman -- Processing Dependency: perl(Net::Telnet) for package: cman -- Processing Dependency: perl(XML::LibXML) for package: cman -- Processing Dependency: pexpect for package: cman -- Processing Dependency: openais for package: cman -- Processing Dependency: libcpg.so.2(OPENAIS_CPG_1.0)(64bit) for package: cman -- Processing Dependency: libSaCkpt.so.2()(64bit) for package: cman -- Processing Dependency: libcpg.so.2()(64bit) for package: cman -- Running transaction check --- Package openais.x86_64 0:0.80.6-16.el5 set to be updated --- Package perl-Net-Telnet.noarch 0:3.03-5 set to be updated --- Package perl-XML-LibXML.x86_64 0:1.58-6 set to be updated -- Processing Dependency: perl-XML-NamespaceSupport for package: perl-XML-LibXML -- Processing Dependency: perl-XML-LibXML-Common for package: perl-XML-LibXML -- Processing Dependency: perl(XML::SAX::Exception) for package: perl-XML-LibXML -- Processing Dependency: perl(XML::LibXML::Common) for package: perl-XML-LibXML -- Processing Dependency: perl-XML-SAX for package: perl-XML-LibXML -- Processing Dependency: perl(XML::SAX::DocumentLocator) for package: perl-XML-LibXML -- Processing Dependency: perl(XML::SAX::Base) for package: perl-XML-LibXML -- Processing Dependency: perl(XML::NamespaceSupport) for package: perl-XML-LibXML --- Package pexpect.noarch 0:2.3-3.el5 set to be updated -- Running transaction check --- Package perl-XML-LibXML-Common.x86_64 0:0.13-8.2.2 set to be updated --- Package perl-XML-NamespaceSupport.noarch 0:1.09-1.2.1 set to be updated --- Package perl-XML-SAX.noarch 0:0.14-8 set to be updated -- Processing Conflict: corosync conflicts openais = 0.89 -- Finished Dependency Resolution corosync-1.2.1-1.el5.x86_64 from installed has depsolving problems -- corosync conflicts with openais Error: corosync conflicts with openais Vadym ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Re: [Pacemaker] pacemaker and gnbd
On May 3, 2010, at 10:27 AM, Andrew Beekhof wrote: It is the case, the conflict is slightly different than you think. Corosync doesn't conflict with all versions of openais, just the one cman wants to use. You need to rebuild cman to use the newer version of openais. Hmm, this is what I asked at the very beginning: On Sat, May 1, 2010 at 3:30 PM, Vadym Chepkov vchep...@gmail.com wrote: Hi, I found out I can't use gnbd if I use pacemaker rpm from clusterlabs repository, because gnbd depends on cman which requires openais which conflicts with corosync pacemaker depends on . Is it just a matter of recompiling cman rpm using corosync libraries instead of openais? Or something else needs to be done? Unfortunately, cman doesn't get compiled right away: DEBUG: make[1]: Entering directory `/builddir/build/BUILD/cman-2.0.115/cman/daemon' DEBUG: gcc -Wall -fPIC -I//builddir/build/BUILD/cman-2.0.115/ccs/lib -I//usr/include -I../config -DCMAN_RELEASE_NAME=\2.0.115\ -DOPENAIS_EXTERNAL_SERVICE -O2 -c -o daemon.o daemon.c DEBUG: daemon.c:32:35: error: openais/totem/aispoll.h: No such file or directory DEBUG: daemon.c:33:35: error: openais/totem/totemip.h: No such file or directory DEBUG: In file included from daemon.c:37: DEBUG: cnxman-private.h:17:33: error: openais/totem/totem.h: No such file or directory DEBUG: In file included from daemon.c:42: DEBUG: ais.h:25: error: array type has incomplete element type DEBUG: ais.h:26: error: array type has incomplete element type DEBUG: daemon.c:59: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'ais_poll_handle' DEBUG: daemon.c:62: error: expected ')' before 'handle' DEBUG: daemon.c:63: error: expected ')' before 'handle' DEBUG: daemon.c: In function 'send_reply_message': DEBUG: daemon.c:89: warning: implicit declaration of function 'remove_client' DEBUG: daemon.c:89: error: 'ais_poll_handle' undeclared (first use in this function) DEBUG: daemon.c:89: error: (Each undeclared identifier is reported only once DEBUG: daemon.c:89: error: for each function it appears in.) DEBUG: daemon.c:108: warning: implicit declaration of function 'poll_dispatch_modify' DEBUG: daemon.c:108: error: 'process_client' undeclared (first use in this function) DEBUG: daemon.c: At top level: DEBUG: daemon.c:113: error: expected ')' before 'handle' DEBUG: daemon.c: In function 'send_queued_reply': DEBUG: daemon.c:168: error: 'ais_poll_handle' undeclared (first use in this function) DEBUG: daemon.c:168: error: 'process_client' undeclared (first use in this function) DEBUG: daemon.c: At top level: DEBUG: daemon.c:173: error: expected ')' before 'handle' DEBUG: daemon.c:323: error: expected ')' before 'handle' DEBUG: daemon.c:354: error: expected declaration specifiers or '...' before 'poll_handle' DEBUG: daemon.c: In function 'open_local_sock': DEBUG: daemon.c:402: warning: implicit declaration of function 'poll_dispatch_add' DEBUG: daemon.c:402: error: 'handle' undeclared (first use in this function) DEBUG: daemon.c:402: error: 'process_rendezvous' undeclared (first use in this function) DEBUG: daemon.c: At top level: DEBUG: daemon.c:500: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'aisexec_poll_handle' DEBUG: daemon.c: In function 'cman_init': DEBUG: daemon.c:506: error: 'ais_poll_handle' undeclared (first use in this function) DEBUG: daemon.c:506: error: 'aisexec_poll_handle' undeclared (first use in this function) DEBUG: daemon.c:512: error: too many arguments to function 'open_local_sock' DEBUG: daemon.c:516: error: too many arguments to function 'open_local_sock' DEBUG: make[1]: Leaving directory `/builddir/build/BUILD/cman-2.0.115/cman/daemon' DEBUG: RPM build errors: DEBUG: make[1]: *** [daemon.o] Error 1 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Re: [Pacemaker] pacemaker and gnbd
On May 3, 2010, at 6:03 PM, Vadym Chepkov wrote: On May 3, 2010, at 5:39 PM, Andrew Beekhof wrote: perhaps try the srpm from F-12 Would be nice, but the last one was in F-9, it seems: http://koji.fedoraproject.org/koji/packageinfo?packageID=182 Oh, I found out it's part of cluster package now But it also doesn't compile :( DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c: In function 'create_lockspace_v5': DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1231: error: 'DLM_LOCKSPACE_LEN' undeclared (first use in this function) DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1231: error: (Each undeclared identifier is reported only once DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1231: error: for each function it appears in.) DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1236: warning: left-hand operand of comma expression has no effect DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1231: warning: unused variable 'reqbuf' DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c: In function 'create_lockspace_v6': DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1255: error: 'DLM_LOCKSPACE_LEN' undeclared (first use in this function) DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1260: warning: left-hand operand of comma expression has no effect DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1255: warning: unused variable 'reqbuf' DEBUG: make[2]: make[2]: Leaving directory `/builddir/build/BUILD/cluster-3.0.7/dlm/libdlm' Vadym ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
[Pacemaker] pacemaker and gnbd
Hi, I found out I can't use gnbd if I use pacemaker rpm from clusterlabs repository, because gnbd depends on cman which requires openais which conflicts with corosync pacemaker depends on . Is it just a matter of recompiling cman rpm using corosync libraries instead of openais? Or something else needs to be done? Thank you, Vadym Chepkov ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Re: [Pacemaker] OpenAIS priorities
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/node-score-equal.html On Apr 29, 2010, at 10:20 AM, Dan Frincu wrote: Greetings all, In the case of two servers in a cluster with OpenAIS, take the following example: location Failover_Alert_1 Failover_Alert 100: abc.localdomain location Failover_Alert_2 Failover_Alert 200: def.localdomain This will setup the preference of a resource to def.localdomain because it has the higher priority assigned to it, but what happens when the priorities match, is there a tiebreaker, some sort of election process to choose which node will be the one handling the resource? Thank you in advance, Best regards. -- Dan FRINCU Internal Support Engineer ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
[Pacemaker] duality and equality
Hi, I noticed there are quite a few configuration parameters in pacemaker that can be set two different ways: via cluster properties or rsc/op_defaults. For example, property default-resource-stickiness and rsc_defaults resource-stickiness, property is-managed-default and rsc_defaults is-managed, property stop-all-resources and rsc_defaults target-role, property default-action-timeout and op_defaults timeout. I assume this duality exists for historical reasons and in computing world it is not unusual to achieve the same results in different ways. But in this case curios minds want to know which parameter takes precedence if equal parameters are both set and contradict each other? I also noticed some differences in how these settings are assessed. # crm configure show node c20.chepkov.lan node c21.chepkov.lan primitive ip_rg0 ocf:heartbeat:IPaddr2 \ params nic=eth0 ip=10.10.10.22 cidr_netmask=32 primitive ping ocf:pacemaker:ping \ params name=ping dampen=5s multiplier=200 host_list=10.10.10.250 clone connected ping \ meta globally-unique=false property $id=cib-bootstrap-options \ dc-version=1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7 \ cluster-infrastructure=openais \ expected-quorum-votes=2 \ no-quorum-policy=ignore \ stonith-enabled=false # crm configure verify WARNING: ping: default-action-timeout 20s for start is smaller than the advised 60 WARNING: ip_rg0: default-action-timeout 20s for start is smaller than the advised 90 WARNING: ip_rg0: default-action-timeout 20s for stop is smaller than the advised 100 # crm configure op_defaults timeout=120 WARNING: ping: default-action-timeout 20s for start is smaller than the advised 60 WARNING: ip_rg0: default-action-timeout 20s for start is smaller than the advised 90 WARNING: ip_rg0: default-action-timeout 20s for stop is smaller than the advised 100 But, # crm configure property default-action-timeout=120 makes it happy. And this makes me wonder, are these parameters really the same or do they have a different meanings? Thank you. Sincerely yours, Vadym Chepkov ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker