Re: [Pacemaker] migration-threshold and failure-timeout
On 21 September 2010 15:28, Vadym Chepkov vchep...@gmail.com wrote: On Tue, Sep 21, 2010 at 9:14 AM, Dan Frincu dfri...@streamwide.ro wrote: Hi, This = http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-failure-migration.html explains it pretty well. Notice the INFINITY score and what sets it. However I don't know of any automatic method to clear the failcount. Regards, Dan in pacemaker 1.0 nothing will clean failcount automatically, this is a feature of pacemaker 1.1, imho But, crm configure rsc_defaults failure-timeout=10min will make cluster to forget about previous failure in 10 minutes. if you want to futher decrease this paramater, you might need to decrease crm configure property cluster-recheck-interval=10min Cheers, Vadym Ok guys thank you very much for the info, Pavlos ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] help with configuration for Xen domU on two DRBD devices
To answer my own email. Just incase it helps someone else. After a bit more research and trying different things, it appears that perhaps my issue was because of a resource failcount. Once I manually cleared all/any resource failcounts it started to work properly. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] monitor operation cancel question
On Tue, Sep 21, 2010 at 8:58 PM, Phil Armstrong p...@sgi.com wrote: Hi, This is my first post to this list so if I'm doing this wrong, please be patient. I am using pacemaker-1.1.2-0.2.1 on sles11sp1. Thanks in advance for any help anyone can give me. Well, fixing this is a good start: Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense But for your other issue, Michael is right, there was a bug which an upgrade will fix. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] error: ocf:heartbeat:IPv6addr: could not parse meta-data
On Tue, Sep 21, 2010 at 4:10 PM, Angelo Höngens a.hong...@netmatch.nl wrote: On 25-8-2010 8:36, Andrew Beekhof wrote: I guess whoever packaged the rpm's can answer why the file is missing, but was that someone from the clusterlabs team or someone from the linux-ha team? :) Basically because I left out the libnet dependancy. The status of libnet as a viable project has been uncertain lately. Andrew, Because we have some ipv6 nodes I want to try out pacemaker on again, I'd really like to build an rpm of the resource agents package with ipv6 support in. If you could tell a newbie like me how to do it, I'd be really grateful, and I think a lot of people will be happy as well. We run CentOS/RHEL5 everywhere. I've never built an RPM before, but it doesn't look that hard (until I saw the errors). I've installed all the dependencies (yum install autoconf automake gcc libnet-devel libtool libxml2-devel bzip2-devel glib2-devel libxslt-devel e2fsprogs-devel docbook-style-xsl rpm-build), and I want to make sure I can compile an RPM first before changing anything in the code. But even when doing that, I get errors: [ang...@test1 redhat]$ sudo rpm -i http://clusterlabs.org/rpm/epel-5/src/resource-agents-1.0.3-2.el5.src.rpm [ang...@test1 redhat]$ cd /usr/src/redhat/ [ang...@test1 redhat]$ rpmbuild -bb SPECS/resource-agents.spec [..cut..] Provides: config(ldirectord) = 1.0.3-2 heartbeat-ldirectord Requires(rpmlib): rpmlib(CompressedFileNames) = 3.0.4-1 rpmlib(PayloadFilesHavePrefix) = 4.0-1 Requires: /bin/sh /usr/bin/perl config(ldirectord) = 1.0.3-2 ipvsadm perl(Digest::MD5) perl(Getopt::Long) perl(IO::Select) perl(IO::Socket) perl(LWP::Debug) perl(LWP::UserAgent) perl(Mail::Send) perl(Net::Ping) perl(Net::SMTP) perl(POSIX) perl(Pod::Usage) perl(Socket) perl(Socket6) perl(Sys::Hostname) perl(Sys::Syslog) perl(strict) perl(vars) perl-MailTools perl-Net-SSLeay perl-libwww-perl Conflicts: heartbeat-ldirectord Obsoletes: heartbeat-ldirectord Checking for unpackaged file(s): /usr/lib/rpm/check-files /var/tmp/resource-agents-1.0.3-build RPM build errors: File not found: /var/tmp/resource-agents-1.0.3-build/usr/sbin/sfex_init File not found: /var/tmp/resource-agents-1.0.3-build/usr/lib64/heartbeat/sfex_daemon [ang...@test1 redhat]$ Can you please help me in my quest to the desired end result? (which is the knowledge to build an ipv6-enabled version of the resource-agents so I can install it on my nodes, and I can rebuild it after each version upgrade of the source package). You're close :-) Try installing the -devel package for cluster-glue and trying again. It would be great if it would be part of the basic packages as well, but alas this does not seem to be the case right now. I can provide ssh root access to a clean vm (centos 5.5, x64) if needed. -- With kind regards, Angelo Höngens systems administrator MCSE on Windows 2003 MCSE on Windows 2000 MS Small Business Specialist -- NetMatch tourism internet software solutions Ringbaan Oost 2b 5013 CA Tilburg +31 (0)13 5811088 +31 (0)13 5821239 a.hong...@netmatch.nl www.netmatch.nl -- ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] chkconfig values in MCP init script (again)
On Tue, Sep 21, 2010 at 2:24 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote: Hi Andrew, hi all. I decided to return to this issue again because of issues with libvirt/KVM virtual domains controlled by pacemaker. libvirt package on Fedora 13 has two init scripts: libvirtd and libvirt-guests. They have following chkconfig values: libvirtd: 97 03 libvirt-guests: 98 02 Currently pacemaker MCP has 90 10. ... So, the next solution would be to move pacemaker to run really last (99) and stop really first (01). This is what Vadim Chepkov suggested earlier and what I am inclined to do (at least for my RPM packages). Of course, there are services which have 99 01 too, but I'd shut eyes on them. I'm fine with that. I'll make the change now. Possibly some stop-requires might be of use too. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Connection to our AIS plugin (9) failed: Library error
2010/9/21 Szymon Hersztek s...@globtel.pl: Wiadomość napisana w dniu 2010-09-21, o godz. 09:08, przez Andrew Beekhof: 2010/9/21 Szymon Hersztek s...@globtel.pl: Wiadomość napisana w dniu 2010-09-21, o godz. 08:34, przez Andrew Beekhof: On Mon, Sep 20, 2010 at 3:34 PM, Szymon Hersztek s...@globtel.pl wrote: Hi Im trying to setup corosync to work as drbd cluster but after installing follow by http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf i got error like below: Unusual, but did pacemaker fork a replacement attrd process? At what time did corosync start? corosync was started manually or do you want to have exact time of start ? well you included at most 1 second's worth of logging. so its kinda hard to know if something took too long or what recovery was attempted. Ok it is not a problem to send more. Do you need debug logging or standard I have to install server once again so in half of hour i can reproduce logs Here's your issue: corosynclib i386 1.2.7-1.1.el5 clusterlabs 155 k corosynclib x86_64 1.2.7-1.1.el5 clusterlabs 172 k Why do you have both i386 and x86_64 versions installed on your machine?? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] error: ocf:heartbeat:IPv6addr: could not parse meta-data
On 22-9-2010 9:58, Andrew Beekhof wrote: Can you please help me in my quest to the desired end result? (which is the knowledge to build an ipv6-enabled version of the resource-agents so I can install it on my nodes, and I can rebuild it after each version upgrade of the source package). You're close :-) Try installing the -devel package for cluster-glue and trying again. Thanks, that works like a charm, I didn't even have to change anything! For people googling, here's my version of the resource-agents with the IPv6addr module: http://files.hongens.nl/RPM/resource-agents-1.0.3-2.x86_64.rpm -- With kind regards, Angelo Höngens systems administrator MCSE on Windows 2003 MCSE on Windows 2000 MS Small Business Specialist -- NetMatch tourism internet software solutions Ringbaan Oost 2b 5013 CA Tilburg +31 (0)13 5811088 +31 (0)13 5821239 a.hong...@netmatch.nl www.netmatch.nl -- ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] re: Pacemaker Digest, Vol 34, Issue 50
On Tue, Sep 21, 2010 at 10:33 AM, jiaju liu liujiaj...@yahoo.com.cn wrote: --- *10年9月21日,周二, pacemaker-requ...@oss.clusterlabs.org pacemaker-requ...@oss.clusterlabs.org* 写道: Message: 5 Date: Tue, 21 Sep 2010 09:15:16 +0200 From: Andrew Beekhof and...@beekhof.nethttp://cn.mc157.mail.yahoo.com/mc/compose?to=and...@beekhof.net To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.orghttp://cn.mc157.mail.yahoo.com/mc/compose?to=pacema...@oss.clusterlabs.org Subject: Re: [Pacemaker] pingd problem about clone Message-ID: aanlkti=l0yw-k4rmzo6dmjp4fuyccnki7w-hbw32z...@mail.gmail.comhttp://cn.mc157.mail.yahoo.com/mc/compose?to=l0yw-k4rmzo6dmjp4fuyccnki7w-hbw32z...@mail.gmail.com Content-Type: text/plain; charset=iso-8859-1 On Fri, Sep 17, 2010 at 2:38 AM, jiaju liu liujiaj...@yahoo.com.cnhttp://cn.mc157.mail.yahoo.com/mc/compose?to=liujiaj...@yahoo.com.cn wrote: Clone Set: pingd_data_net Started: [ oss3 oss2 oss1 ] I use the command : crm_resource -g host_list -r pingd_data_net to check the param host_list what does the resource definition look like? *crm configure primitive pingd_data ocf:pacemaker:pingd meta target-role=stopped params name=pingd_data op start timeout=100s op stop timeout=100s op monitor interval=90s timeout=100s* ** *crm_resource -p host_list -r pingd_data -v 10.53.11.101* ** *crm configure clone pingd_data_net pingd_data meta globally-unique=falsetarget-role=stopped* ** *crm resource pind_data_net start* ** *and then I want to change host_list* *so I use command * *crm_resource -p host_list -r pingd_data -v 10.53.11.101 10.53.11.100* Given: primitive FencingChild stonith:fence_xvm \ params pcmk_arg_map=domain:uname \ op monitor interval=120s timeout=300 \ op start interval=0 timeout=180s \ op stop interval=0 timeout=180s and clone Fencing FencingChild \ meta globally-unique=false migration-threshold=5 I get the following... If I (incorrectly) use the clone name, I get the error you see: [r...@pcmk-1 ~]# crm_resource -g pcmk_arg_map -r Fencing Fencing is active on more than one node, returning the default value for null Error performing operation: The object/attribute does not exist But if I (correctly) use the primitive name, it works: [r...@pcmk-1 ~]# crm_resource -g pcmk_arg_map -r FencingChild domain:uname Can you check you're using the name of the primitive and not the clone please? the result is pingd_data_net is active on more than one node, returning the default value for null Error performing operation: The object/attribute does not exist So If I want to check the parm, what I should do? Thank you:-) ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.orghttp://cn.mc157.mail.yahoo.com/mc/compose?to=pacema...@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker -- next part -- An HTML attachment was scrubbed... URL: http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20100921/223fb58b/attachment.htm -- ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.orghttp://cn.mc157.mail.yahoo.com/mc/compose?to=pacema...@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker End of Pacemaker Digest, Vol 34, Issue 50 * ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] About behavior in Action Lost.
On Tue, Sep 21, 2010 at 8:59 AM, renayama19661...@ybb.ne.jp wrote: Hi, Node was in state that the load was very high, and we confirmed monitor movement of Pacemeker. Action Lost occurred in stop movement after the error of the monitor occurred. Sep 8 20:02:22 cgl54 crmd: [3507]: ERROR: print_elem: Aborting transition, action lost: [Action 9]: In-flight (id: prmApPostgreSQLDB1_stop_0, loc: cgl49, priority: 0) Sep 8 20:02:22 cgl54 crmd: [3507]: info: abort_transition_graph: action_timer_callback:486 - Triggered transition abort (complete=0) : Action lost For the load of the node, We think that the stop movement did not go well. But cannot nodes execute stonith. A long time ago in a galaxy far away, some messaging layers used to loose quite a few actions, including stops. About the same time, we decided that fencing because a stop action was lost wasn't a good idea. The rationale was that if the operation eventually completed, it would end up in the CIB anyway. And even if it didn't, the PE would continue to try the operation again until the whole node fell over at which point it would get shot anyway. Now, having said that, things have improved since then and perhaps, the interest of speeding up recovery in these situations, it is time to stop treating stop operations differently. Would you agree? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Release Matrix
hi, regarding the Release Matrix [1] and the ABI-change in cluster-glue/ clplumbing [2], i wonder if pacemaker 1.0.9.1 really works with glue 1.0.3? cheers, raoul [1] http://www.clusterlabs.org/wiki/ReleaseMatrix [2] http://www.gossamer-threads.com/lists/linuxha/pacemaker/65443 -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email.off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax.+43 1 3670030 15 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] chkconfig values in MCP init script (again)
22.09.2010 11:17, Andrew Beekhof wrote: On Tue, Sep 21, 2010 at 2:24 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote: Hi Andrew, hi all. I decided to return to this issue again because of issues with libvirt/KVM virtual domains controlled by pacemaker. libvirt package on Fedora 13 has two init scripts: libvirtd and libvirt-guests. They have following chkconfig values: libvirtd: 97 03 libvirt-guests: 98 02 Currently pacemaker MCP has 90 10. ... So, the next solution would be to move pacemaker to run really last (99) and stop really first (01). This is what Vadim Chepkov suggested earlier and what I am inclined to do (at least for my RPM packages). Of course, there are services which have 99 01 too, but I'd shut eyes on them. I'm fine with that. I'll make the change now. Possibly some stop-requires might be of use too. I'd add corosync to Required-Stop. Adding other services like libvirtd is not an option I think. You cannot predict what resources are actually run by pacemaker, so you cannot enumerate them all. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] correct permissions for /var/lib/pengine
in my recent hb_report, i find: WARN: problem with permissions/ownership at wc01: wrong permissions or ownership for /var/lib/pengine: drwxr-xr-x 2 hacluster haclient 5038080 Jul 23 08:58 /var/lib/pengine WARN: problem with permissions/ownership at wc02: wrong permissions or ownership for /var/lib/pengine: drwxr-xr-x 2 hacluster haclient 4096 Sep 22 11:26 /var/lib/pengine what should the correct permission be? thanks, raoul -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email.off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax.+43 1 3670030 15 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Connection to our AIS plugin (9) failed: Library error
Wiadomość napisana w dniu 2010-09-22, o godz. 10:26, przez Andrew Beekhof: 2010/9/21 Szymon Hersztek s...@globtel.pl: Wiadomość napisana w dniu 2010-09-21, o godz. 09:08, przez Andrew Beekhof: 2010/9/21 Szymon Hersztek s...@globtel.pl: Wiadomość napisana w dniu 2010-09-21, o godz. 08:34, przez Andrew Beekhof: On Mon, Sep 20, 2010 at 3:34 PM, Szymon Hersztek s...@globtel.pl wrote: Hi Im trying to setup corosync to work as drbd cluster but after installing follow by http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf i got error like below: Unusual, but did pacemaker fork a replacement attrd process? At what time did corosync start? corosync was started manually or do you want to have exact time of start ? well you included at most 1 second's worth of logging. so its kinda hard to know if something took too long or what recovery was attempted. Ok it is not a problem to send more. Do you need debug logging or standard I have to install server once again so in half of hour i can reproduce logs Here's your issue: corosynclib i386 1.2.7-1.1.el5 clusterlabs 155 k corosynclib x86_64 1.2.7-1.1.el5 clusterlabs 172 k Why do you have both i386 and x86_64 versions installed on your machine?? Because yum installed it in this way .. as many other packeges The problem was that i do not use /dev/shm as tmpfs But thanks for trying ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Problems Installing Pacemaker and Heartbeat
Hi, I'm facing problems to install pacemaker and heartbeat on debian lenny: What I did: - Downloaded the least debian image for i386 (Kernel 2.6.26-2-686) - after install configure sources.list: deb http://backports.debian.org/debian-backports lenny-backports main contrib non-free - run apt-get update - run apt-get upgrade - aptitude install pacemaker heartbeat aptitude install heartbeat pacemaker Reading package lists... Done Building dependency tree Reading state information... Done Reading extended state information Initializing package states... Done Writing extended state information... Done Reading task descriptions... Done The following packages are BROKEN: cluster-glue pacemaker The following NEW packages will be installed: cluster-agents{a} heartbeat libcluster-glue{a} libcorosync4{a} libheartbeat2{a} libnet1{a} libopenipmi0{a} libtimedate-perl{a} 0 packages upgraded, 10 newly installed, 0 to remove and 0 not upgraded. Need to get 2277kB/2928kB of archives. After unpacking 9523kB will be used. The following packages have unmet dependencies: pacemaker: Depends: libesmtp5 (= 0.8.8) which is a virtual package. cluster-glue: Depends: libopenhpi2 which is a virtual package. The following actions will resolve these dependencies: Keep the following packages at their current version: cluster-agents [Not Installed] cluster-glue [Not Installed] heartbeat [Not Installed] pacemaker [Not Installed] Score is -19794 Accept this solution? [Y/n/q/?] Y No packages will be installed, upgraded, or removed. 0 packages upgraded, 0 newly installed, 0 to remove and 0 not upgraded. Need to get 0B of archives. After unpacking 0B will be used. Do you want to continue? [Y/n/?] y Writing extended state information... Done Reading package lists... Done Building dependency tree Reading state information... Done Reading extended state information Initializing package states... Done Reading task descriptions... Done and unable to go ahead with errors above. any sugestions? there is another change that have to be made in sources.list? thanks. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Problems Installing Pacemaker and Heartbeat
hi, please refer to http://www.clusterlabs.org/wiki/Debian_Lenny_HowTo cheers, raoul -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email.off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax.+43 1 3670030 15 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] migration-threshold and failure-timeout
On Wed, 22 Sep 2010, Andrew Beekhof wrote: On Tue, Sep 21, 2010 at 3:28 PM, Vadym Chepkov vchep...@gmail.com wrote: On Tue, Sep 21, 2010 at 9:14 AM, Dan Frincu dfri...@streamwide.ro wrote: However I don't know of any automatic method to clear the failcount. in pacemaker 1.0 nothing will clean failcount automatically, this is a feature of pacemaker 1.1, imho Correct, the just released 1.1.3 clears the failcount for you. failcount used to get cleared for me after the cluster-recheck-interval fired, if failure-timeout had expired. Does this mean pacemaker 1.1.3 doesn't need cluster-recheck-interval in order to clear the failcount? Thanks! Mike ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Timeout after nodejoin
Hi all, I have the following packages: # rpm -qa | grep -i (openais|cluster|heartbeat|pacemaker|resource) openais-0.80.5-15.2 cluster-glue-1.0-12.2 pacemaker-1.0.5-4.2 cluster-glue-libs-1.0-12.2 resource-agents-1.0-31.5 pacemaker-libs-1.0.5-4.2 pacemaker-mgmt-1.99.2-7.2 libopenais2-0.80.5-15.2 heartbeat-3.0.0-33.3 pacemaker-mgmt-client-1.99.2-7.2 When I start openais, I get nodejoin immediately, as seen in the logs below. However, it takes some time before the nodes are visible in crm_mon output. Any idea how to minimize this delay? Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: send_member_notification: Sending membership update 8 to 1 children Sep 22 15:27:24 bench1 openais[12935]: [CLM ] got nodejoin message 192.168.165.33 Sep 22 15:27:24 bench1 openais[12935]: [CLM ] got nodejoin message 192.168.165.35 Sep 22 15:27:24 bench1 mgmtd: [12947]: info: Started. Sep 22 15:27:24 bench1 openais[12935]: [crm ] WARN: route_ais_message: Sending message to local.crmd failed: unknown (rc=-2) Sep 22 15:27:24 bench1 openais[12935]: [crm ] WARN: route_ais_message: Sending message to local.crmd failed: unknown (rc=-2) Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: pcmk_ipc: Recorded connection 0x174840d0 for crmd/12946 Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: pcmk_ipc: Sending membership update 8 to crmd Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: update_expected_votes: Expected quorum votes 1024 - 2 Sep 22 15:27:25 bench1 crmd: [12946]: notice: ais_dispatch: Membership 8: quorum aquired Sep 22 15:28:15 bench1 crmd: [12946]: info: do_election_count_vote: Election 2 (owner: bench2) pass: vote from bench2 (Host name) Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State transition S_PENDING - S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ] Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State transition S_ELECTION - S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ] Sep 22 15:28:15 bench1 crmd: [12946]: info: do_te_control: Registering TE UUID: 87c28ab8-ba93-4111-a26a-67e88dd927fb Sep 22 15:28:15 bench1 crmd: [12946]: WARN: cib_client_add_notify_callback: Callback already present Sep 22 15:28:15 bench1 crmd: [12946]: info: set_graph_functions: Setting custom graph functions Sep 22 15:28:15 bench1 crmd: [12946]: info: unpack_graph: Unpacked transition -1: 0 actions in 0 synapses Sep 22 15:28:15 bench1 crmd: [12946]: info: do_dc_takeover: Taking over DC status for this partition Sep 22 15:28:15 bench1 cib: [12942]: info: cib_process_readwrite: We are now in R/W mode Regards, Dan -- Dan FRINCU Systems Engineer CCNA, RHCE Streamwide Romania ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Problems Installing Pacemaker and Heartbeat
Hi, Thanks for your post Raoul, but read this tutorial is what i made at first place. This is my entire sources.list: #deb cdrom:[Debian GNU/Linux 5.0.5 _Lenny_ - Official i386 DVD Binary-1 20100626-17:50]/ lenny contrib main deb cdrom:[Debian GNU/Linux 5.0.5 _Lenny_ - Official i386 DVD Binary-1 20100626-17:50]/ lenny contrib main deb http://security.debian.org/ lenny/updates main contrib deb-src http://security.debian.org/ lenny/updates main contrib deb http://volatile.debian.org/debian-volatile lenny/volatile main contrib deb-src http://volatile.debian.org/debian-volatile lenny/volatile main contrib # For the cluster deb http://backports.debian.org/debian-backports lenny-backports main contrib non-free deb http://www.backports.org/debian lenny-backports main contrib non-free deb http://people.debian.org/~madkiss/ha lenny main But like the How to says: So in order to use Pacemaker on Debian GNU/Linux 5.0 (Lenny), please add the Backports.org-repository to your APT-configuration according to the How-To on this site. This has to be done on all nodes in your cluster. I've followed the tutorial: aptitude -t lenny-backports install heartbeat pacemaker But still have the same error: Reading package lists... Done Building dependency tree Reading state information... Done Reading extended state information Initializing package states... Done Reading task descriptions... Done The following packages are BROKEN: cluster-glue pacemaker The following NEW packages will be installed: cluster-agents{a} heartbeat libcluster-glue{a} libcorosync4{a} libheartbeat2{a} libnet1{a} libopenipmi0{a} libtimedate-perl{a} 0 packages upgraded, 10 newly installed, 0 to remove and 94 not upgraded. Need to get 2277kB/2928kB of archives. After unpacking 9523kB will be used. The following packages have unmet dependencies: pacemaker: Depends: libesmtp5 (= 0.8.8) which is a virtual package. cluster-glue: Depends: libopenhpi2 which is a virtual package. The following actions will resolve these dependencies: Keep the following packages at their current version: cluster-agents [Not Installed] cluster-glue [Not Installed] heartbeat [Not Installed] pacemaker [Not Installed] Score is -19794 There is anything else that hi have missed? Thank you. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Problems Installing Pacemaker and Heartbeat
hi, On 09/22/2010 02:52 PM, Chen Stormstout wrote: Hi, Thanks for your post Raoul, but read this tutorial is what i made at first place. ok. # For the cluster deb http://backports.debian.org/debian-backports lenny-backports main contrib non-free deb http://www.backports.org/debian lenny-backports main contrib non-free deb http://people.debian.org/~madkiss/ha lenny main madkiss' repository shouldn't be necessary anymore. please comment it out for now But like the How to says: So in order to use Pacemaker on Debian GNU/Linux 5.0 (Lenny), please add the Backports.org-repository to your APT-configuration according to the How-To on this site. This has to be done on all nodes in your cluster. I've followed the tutorial: aptitude -t lenny-backports install heartbeat pacemaker do you *need* to use heartbeat? otherwise, i would suggest corosync as - in my experience - it is much faster than heartbeat (e.g. startup). please retry after commenting out the above. if it is still not working, please provide the output of: apt-cache policy cluster-glue pacemaker heartbeat apt-cache policy cluster-agents libcluster-glue libcorosync4 apt-cache policy libheartbeat2 libnet1 libopenipmi0 libtimedate-perl (if you switched to corosync, please let us know if there is any issue when you try to install these packages by using aptitude install -t lenny-backports pacemaker corosync thanks, raoul -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email.off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax.+43 1 3670030 15 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Re: Problems Installing Pacemaker and Heartbeat
Raoul, Thank you for the attention. These are the information that you asked for after remove madkiss. aptitude -t lenny-backports install heartbeat pacemaker, still with the same error message. apt-cache policy cluster-glue pacemaker heartbeat cluster-glue: Installed: (none) Candidate: 1.0.6-1~bpo50+1 Version table: 1.0.6-1~bpo50+1 0 1 http://backports.debian.org lenny-backports/main Packages 1 http://www.backports.org lenny-backports/main Packages pacemaker: Installed: (none) Candidate: 1.0.9.1+hg15626-1~bpo50+1 Version table: 1.0.9.1+hg15626-1~bpo50+1 0 1 http://backports.debian.org lenny-backports/main Packages 1 http://www.backports.org lenny-backports/main Packages heartbeat: Installed: (none) Candidate: 1:3.0.3-2~bpo50+1 Version table: 1:3.0.3-2~bpo50+1 0 1 http://backports.debian.org lenny-backports/main Packages 1 http://www.backports.org lenny-backports/main Packages apt-cache policy cluster-agents libcluster-glue libcorosync4 cluster-agents: Installed: (none) Candidate: 1:1.0.3-3~bpo50+1 Version table: 1:1.0.3-3~bpo50+1 0 1 http://backports.debian.org lenny-backports/main Packages 1 http://www.backports.org lenny-backports/main Packages libcluster-glue: Installed: (none) Candidate: 1.0.6-1~bpo50+1 Version table: 1.0.6-1~bpo50+1 0 1 http://backports.debian.org lenny-backports/main Packages 1 http://www.backports.org lenny-backports/main Packages libcorosync4: Installed: (none) Candidate: 1.2.1-1~bpo50+1 Version table: 1.2.1-1~bpo50+1 0 1 http://backports.debian.org lenny-backports/main Packages 1 http://www.backports.org lenny-backports/main Packages apt-cache policy libheartbeat2 libnet1 libopenipmi0 libtimedate-perl libheartbeat2: Installed: (none) Candidate: 1:3.0.3-2~bpo50+1 Version table: 1:3.0.3-2~bpo50+1 0 1 http://backports.debian.org lenny-backports/main Packages 1 http://www.backports.org lenny-backports/main Packages libnet1: Installed: (none) Candidate: 1.1.2.1-2 Version table: 1.1.2.1-2 0 500 cdrom://[Debian GNU/Linux 5.0.5 _Lenny_ - Official i386 DVD Binary-1 20100626-17:50] lenny/main Packages libopenipmi0: Installed: (none) Candidate: 2.0.14-1 Version table: 2.0.14-1 0 500 cdrom://[Debian GNU/Linux 5.0.5 _Lenny_ - Official i386 DVD Binary-1 20100626-17:50] lenny/main Packages libtimedate-perl: Installed: (none) Candidate: 1.1600-9 Version table: 1.1600-9 0 500 cdrom://[Debian GNU/Linux 5.0.5 _Lenny_ - Official i386 DVD Binary-1 20100626-17:50] lenny/main Packages And corosync has the same problem: aptitude install -t lenny-backports pacemaker corosync Reading package lists... Done Building dependency tree Reading state information... Done Reading extended state information Initializing package states... Done Reading task descriptions... Done The following packages are BROKEN: cluster-glue pacemaker The following NEW packages will be installed: cluster-agents{a} corosync libcluster-glue{a} libcorosync4{a} libheartbeat2{a} libnet1{a} libopenipmi0{a} libtimedate-perl{a} 0 packages upgraded, 10 newly installed, 0 to remove and 94 not upgraded. Need to get 2235kB/2886kB of archives. After unpacking 9351kB will be used. The following packages have unmet dependencies: pacemaker: Depends: libesmtp5 (= 0.8.8) which is a virtual package. cluster-glue: Depends: libopenhpi2 which is a virtual package. The following actions will resolve these dependencies: Keep the following packages at their current version: cluster-agents [Not Installed] cluster-glue [Not Installed] pacemaker [Not Installed] Score is -9863 Any idea about waht is going on? Thank you. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Timeout after nodejoin
hi, On 09/22/2010 02:43 PM, Dan Frincu wrote: When I start openais, I get nodejoin immediately, as seen in the logs below. However, it takes some time before the nodes are visible in crm_mon output. Any idea how to minimize this delay? Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: send_member_notification: Sending membership update 8 to 1 children Sep 22 15:27:24 bench1 openais[12935]: [CLM ] got nodejoin message 192.168.165.33 Sep 22 15:27:24 bench1 openais[12935]: [CLM ] got nodejoin message 192.168.165.35 Sep 22 15:27:24 bench1 mgmtd: [12947]: info: Started. Sep 22 15:27:24 bench1 openais[12935]: [crm ] WARN: route_ais_message: Sending message to local.crmd failed: unknown (rc=-2) Sep 22 15:27:24 bench1 openais[12935]: [crm ] WARN: route_ais_message: Sending message to local.crmd failed: unknown (rc=-2) Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: pcmk_ipc: Recorded connection 0x174840d0 for crmd/12946 Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: pcmk_ipc: Sending membership update 8 to crmd Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: update_expected_votes: Expected quorum votes 1024 - 2 Sep 22 15:27:25 bench1 crmd: [12946]: notice: ais_dispatch: Membership 8: quorum aquired Sep 22 15:28:15 bench1 crmd: [12946]: info: do_election_count_vote: Election 2 (owner: bench2) pass: vote from bench2 (Host name) Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State transition S_PENDING - S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ] Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State transition S_ELECTION - S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ] Sep 22 15:28:15 bench1 crmd: [12946]: info: do_te_control: Registering TE UUID: 87c28ab8-ba93-4111-a26a-67e88dd927fb Sep 22 15:28:15 bench1 crmd: [12946]: WARN: cib_client_add_notify_callback: Callback already present Sep 22 15:28:15 bench1 crmd: [12946]: info: set_graph_functions: Setting custom graph functions Sep 22 15:28:15 bench1 crmd: [12946]: info: unpack_graph: Unpacked transition -1: 0 actions in 0 synapses Sep 22 15:28:15 bench1 crmd: [12946]: info: do_dc_takeover: Taking over DC status for this partition Sep 22 15:28:15 bench1 cib: [12942]: info: cib_process_readwrite: We are now in R/W mode is the cluster up and running and you're only (re-)starting one node? or is this after you start openais on both nodes. thanks, raoul -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email.off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax.+43 1 3670030 15 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Timeout after nodejoin
Hi, Raoul Bhatia [IPAX] wrote: hi, On 09/22/2010 02:43 PM, Dan Frincu wrote: When I start openais, I get nodejoin immediately, as seen in the logs below. However, it takes some time before the nodes are visible in crm_mon output. Any idea how to minimize this delay? Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: send_member_notification: Sending membership update 8 to 1 children Sep 22 15:27:24 bench1 openais[12935]: [CLM ] got nodejoin message 192.168.165.33 Sep 22 15:27:24 bench1 openais[12935]: [CLM ] got nodejoin message 192.168.165.35 Sep 22 15:27:24 bench1 mgmtd: [12947]: info: Started. Sep 22 15:27:24 bench1 openais[12935]: [crm ] WARN: route_ais_message: Sending message to local.crmd failed: unknown (rc=-2) Sep 22 15:27:24 bench1 openais[12935]: [crm ] WARN: route_ais_message: Sending message to local.crmd failed: unknown (rc=-2) Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: pcmk_ipc: Recorded connection 0x174840d0 for crmd/12946 Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: pcmk_ipc: Sending membership update 8 to crmd Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: update_expected_votes: Expected quorum votes 1024 - 2 Sep 22 15:27:25 bench1 crmd: [12946]: notice: ais_dispatch: Membership 8: quorum aquired Sep 22 15:28:15 bench1 crmd: [12946]: info: do_election_count_vote: Election 2 (owner: bench2) pass: vote from bench2 (Host name) Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State transition S_PENDING - S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ] Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State transition S_ELECTION - S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ] Sep 22 15:28:15 bench1 crmd: [12946]: info: do_te_control: Registering TE UUID: 87c28ab8-ba93-4111-a26a-67e88dd927fb Sep 22 15:28:15 bench1 crmd: [12946]: WARN: cib_client_add_notify_callback: Callback already present Sep 22 15:28:15 bench1 crmd: [12946]: info: set_graph_functions: Setting custom graph functions Sep 22 15:28:15 bench1 crmd: [12946]: info: unpack_graph: Unpacked transition -1: 0 actions in 0 synapses Sep 22 15:28:15 bench1 crmd: [12946]: info: do_dc_takeover: Taking over DC status for this partition Sep 22 15:28:15 bench1 cib: [12942]: info: cib_process_readwrite: We are now in R/W mode is the cluster up and running and you're only (re-)starting one node? or is this after you start openais on both nodes. thanks, raoul Second case, just after openais start on both nodes. Regards, Dan -- Dan FRINCU Systems Engineer CCNA, RHCE Streamwide Romania ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] correct permissions for /var/lib/pengine
Hi, On Wed, Sep 22, 2010 at 12:44:45PM +0200, Raoul Bhatia [IPAX] wrote: in my recent hb_report, i find: WARN: problem with permissions/ownership at wc01: wrong permissions or ownership for /var/lib/pengine: drwxr-xr-x 2 hacluster haclient 5038080 Jul 23 08:58 /var/lib/pengine WARN: problem with permissions/ownership at wc02: wrong permissions or ownership for /var/lib/pengine: drwxr-xr-x 2 hacluster haclient 4096 Sep 22 11:26 /var/lib/pengine what should the correct permission be? It should be 750. Thanks, Dejan thanks, raoul -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email.off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax.+43 1 3670030 15 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Timeout after nodejoin
Hi, On Wed, Sep 22, 2010 at 04:48:42PM +0300, Dan Frincu wrote: Hi, Raoul Bhatia [IPAX] wrote: hi, On 09/22/2010 02:43 PM, Dan Frincu wrote: When I start openais, I get nodejoin immediately, as seen in the logs below. However, it takes some time before the nodes are visible in crm_mon output. Any idea how to minimize this delay? Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: send_member_notification: Sending membership update 8 to 1 children Sep 22 15:27:24 bench1 openais[12935]: [CLM ] got nodejoin message 192.168.165.33 Sep 22 15:27:24 bench1 openais[12935]: [CLM ] got nodejoin message 192.168.165.35 Sep 22 15:27:24 bench1 mgmtd: [12947]: info: Started. Sep 22 15:27:24 bench1 openais[12935]: [crm ] WARN: route_ais_message: Sending message to local.crmd failed: unknown (rc=-2) Sep 22 15:27:24 bench1 openais[12935]: [crm ] WARN: route_ais_message: Sending message to local.crmd failed: unknown (rc=-2) Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: pcmk_ipc: Recorded connection 0x174840d0 for crmd/12946 Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: pcmk_ipc: Sending membership update 8 to crmd Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: update_expected_votes: Expected quorum votes 1024 - 2 Sep 22 15:27:25 bench1 crmd: [12946]: notice: ais_dispatch: Membership 8: quorum aquired Sep 22 15:28:15 bench1 crmd: [12946]: info: do_election_count_vote: Election 2 (owner: bench2) pass: vote from bench2 (Host name) Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State transition S_PENDING - S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ] Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State transition S_ELECTION - S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ] Sep 22 15:28:15 bench1 crmd: [12946]: info: do_te_control: Registering TE UUID: 87c28ab8-ba93-4111-a26a-67e88dd927fb Sep 22 15:28:15 bench1 crmd: [12946]: WARN: cib_client_add_notify_callback: Callback already present Sep 22 15:28:15 bench1 crmd: [12946]: info: set_graph_functions: Setting custom graph functions Sep 22 15:28:15 bench1 crmd: [12946]: info: unpack_graph: Unpacked transition -1: 0 actions in 0 synapses Sep 22 15:28:15 bench1 crmd: [12946]: info: do_dc_takeover: Taking over DC status for this partition Sep 22 15:28:15 bench1 cib: [12942]: info: cib_process_readwrite: We are now in R/W mode is the cluster up and running and you're only (re-)starting one node? or is this after you start openais on both nodes. thanks, raoul Second case, just after openais start on both nodes. It's probably due to dc-deadtime (from crm ra info crmd): dc-deadtime (time, [60s]): How long to wait for a response from other nodes during startup. The correct value will depend on the speed/load of your network and the type of switches used. Thanks, Dejan Regards, Dan -- Dan FRINCU Systems Engineer CCNA, RHCE Streamwide Romania ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] migration-threshold and failure-timeout
On Wed, Sep 22, 2010 at 2:12 PM, Michael Smith msm...@cbnco.com wrote: On Wed, 22 Sep 2010, Andrew Beekhof wrote: On Tue, Sep 21, 2010 at 3:28 PM, Vadym Chepkov vchep...@gmail.com wrote: On Tue, Sep 21, 2010 at 9:14 AM, Dan Frincu dfri...@streamwide.ro wrote: However I don't know of any automatic method to clear the failcount. in pacemaker 1.0 nothing will clean failcount automatically, this is a feature of pacemaker 1.1, imho Correct, the just released 1.1.3 clears the failcount for you. failcount used to get cleared for me not quite, it got ignored until the next failure. now it gets removed. after the cluster-recheck-interval fired, if failure-timeout had expired. Does this mean pacemaker 1.1.3 doesn't need cluster-recheck-interval in order to clear the failcount? no, you'll still need that part too ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Add a resource with commandline to an existing?group?
On Wed, Sep 22, 2010 at 4:34 PM, Dejan Muhamedagic deja...@fastmail.fm wrote: Hi, On Tue, Aug 31, 2010 at 12:13:15PM +, Rainer wrote: Dan Frincu dfri...@... writes: You can update the config by typing: crm configure This puts you in the crm shell configure mode. Then you type in edit, that opens a vi session with the config, you edit the group entry by adding the necessary information and then you exit via esc, :wq, verify, commit. Regards, Dan So easy? I was all the time looking for a command like add to group Thx a lot! Yes, with edit you can modify any part of your configuration and commit the result. There's also a filter command in v1.1 which is to edit what sed(1) is to ed(1). This should also work: # echo group g1 g2 ... | crm configure load update - nice! i'll have to remember that one ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Help to configure a cluster with pacemaker
Ok. Now I understand how this work. The lsb script did not work correctly (because conf files where in the drbd disk and it is only mounted in one server). Thank you very much. It is working now. El mar, 21-09-2010 a las 08:55 +0200, Andrew Beekhof escribió: 2010/9/2 Josué Alcalde González josue.alca...@csa.es: First, as i am new in this list I would like hello to everybody. I have configured a 4 node cluster with ubuntu 10.04 with a simple architecture. All nodes are the same machine, with a root partition and a data partition shared with drbd. node1: First apache webserver node node2: Second apache webserver node node1 and node2 has a drbd resource called webserversdisk in master/slave using ext4. node3: First mysql server node4: Second mysql server node3 and node4 has a drbd resource called webserversdisk in master/slave using ext4. DRBD is using Master/Slave without OCFS or GFS (it uses EXT4) so there will be only one node with apache2 and one node with mysql. The problem is I am getting some problem with location and colocation. In fact, my apache primitive try to start in ubuntu02, ubuntu03 and ubuntu04 instead of ubuntu01, where my drbd webserverdisk is mounted. The cluster isn't actually starting apache there. It's checking those nodes to make sure it isn't already running there, but either the RA isn't installed or something it needs to produce a sane status is missing. Thats why you're getting failed _monitor_0 actions: mysqld_monitor_0 (node=ubuntu01, call=4, rc=5, status=complete): not installed mysqld_monitor_0 (node=ubuntu02, call=4, rc=5, status=complete): not installed apache_monitor_0 (node=ubuntu02, call=8, rc=254, status=complete): unknown apache_stop_0 (node=ubuntu02, call=10, rc=254, status=complete): unknown Fix that and your problems will go away. Of course, apache is not starting because it only can works if the webserverdisk is mounted in the machine. My pacemaker configuration is this (I think some location directives are not needed): node ubuntu01 node ubuntu02 node ubuntu03 node ubuntu04 primitive apache lsb:apache2 \ meta target-role=Started primitive drbd_disk_dataserver ocf:linbit:drbd \ params drbd_resource=dataserverdisk \ op monitor interval=15s primitive drbd_disk_webserver ocf:linbit:drbd \ params drbd_resource=webserverdisk \ op monitor interval=15s primitive fs_drbd_dataserver ocf:heartbeat:Filesystem \ params device=/dev/drbd/by-res/dataserverdisk directory=/srv/data fstype=ext4 primitive fs_drbd_webserver ocf:heartbeat:Filesystem \ params device=/dev/drbd/by-res/webserverdisk directory=/srv/data fstype=ext4 primitive ip_cluster_dataserver ocf:heartbeat:IPaddr2 \ params ip=172.16.15.106 nic=eth0:0 primitive ip_cluster_webserver ocf:heartbeat:IPaddr2 \ params ip=172.16.15.105 nic=eth0:0 primitive mysqld lsb:mysql \ meta target-role=Started group group_dataserver fs_drbd_dataserver ip_cluster_dataserver mysqld \ meta target-role=Started group group_webserver fs_drbd_webserver ip_cluster_webserver apache \ meta target-role=Started ms ms_drbd_dataserver drbd_disk_dataserver \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true target-role=Started ms ms_drbd_webserver drbd_disk_webserver \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true target-role=Started location location_dataserver_1 ms_drbd_dataserver 200: ubuntu03 location location_dataserver_1_fs_drbd fs_drbd_dataserver 200: ubuntu03 location location_dataserver_1_group group_dataserver 200: ubuntu03 location location_dataserver_1_ip_cluster ip_cluster_dataserver 200: ubuntu03 location location_dataserver_1_mysqld mysqld 200: ubuntu03 location location_dataserver_2 ms_drbd_dataserver 100: ubuntu04 location location_dataserver_2_fs_drbd fs_drbd_dataserver 100: ubuntu04 location location_dataserver_2_group group_dataserver 100: ubuntu04 location location_dataserver_2_ip_cluster ip_cluster_dataserver 100: ubuntu04 location location_dataserver_2_mysqld mysqld 100: ubuntu04 location location_dataserver_3 ms_drbd_dataserver -inf: ubuntu01 location location_dataserver_3_fs_drbd fs_drbd_dataserver -inf: ubuntu01 location location_dataserver_3_group group_dataserver -inf: ubuntu01 location location_dataserver_3_ip_cluster ip_cluster_dataserver -inf: ubuntu01 location location_dataserver_3_mysqld mysqld -inf: ubuntu01 location location_dataserver_4 ms_drbd_dataserver -inf: ubuntu02 location location_dataserver_4_fs_drbd fs_drbd_dataserver -inf: ubuntu02 location location_dataserver_4_group group_dataserver -inf: ubuntu02 location location_dataserver_4_ip_cluster ip_cluster_dataserver -inf: ubuntu02 location location_dataserver_4_mysqld mysqld -inf: ubuntu02
Re: [Pacemaker] Connection to our AIS plugin (9) failed: Library error
On 09/22/2010 04:02 AM, Szymon Hersztek wrote: Wiadomość napisana w dniu 2010-09-22, o godz. 10:26, przez Andrew Beekhof: 2010/9/21 Szymon Hersztek s...@globtel.pl: Wiadomość napisana w dniu 2010-09-21, o godz. 09:08, przez Andrew Beekhof: 2010/9/21 Szymon Hersztek s...@globtel.pl: Wiadomość napisana w dniu 2010-09-21, o godz. 08:34, przez Andrew Beekhof: On Mon, Sep 20, 2010 at 3:34 PM, Szymon Hersztek s...@globtel.pl wrote: Hi Im trying to setup corosync to work as drbd cluster but after installing follow by http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf i got error like below: Unusual, but did pacemaker fork a replacement attrd process? At what time did corosync start? corosync was started manually or do you want to have exact time of start ? well you included at most 1 second's worth of logging. so its kinda hard to know if something took too long or what recovery was attempted. Ok it is not a problem to send more. Do you need debug logging or standard I have to install server once again so in half of hour i can reproduce logs Here's your issue: corosynclib i386 1.2.7-1.1.el5 clusterlabs 155 k corosynclib x86_64 1.2.7-1.1.el5 clusterlabs 172 k Why do you have both i386 and x86_64 versions installed on your machine?? There should be no problems installing lib files for both i386 and x86_64. These rpms only contain the *.so files (and a LICENSE file). Regards -steve Because yum installed it in this way .. as many other packeges The problem was that i do not use /dev/shm as tmpfs But thanks for trying ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Timeout after nodejoin
On 09/22/2010 05:43 AM, Dan Frincu wrote: Hi all, I have the following packages: # rpm -qa | grep -i (openais|cluster|heartbeat|pacemaker|resource) openais-0.80.5-15.2 cluster-glue-1.0-12.2 pacemaker-1.0.5-4.2 cluster-glue-libs-1.0-12.2 resource-agents-1.0-31.5 pacemaker-libs-1.0.5-4.2 pacemaker-mgmt-1.99.2-7.2 libopenais2-0.80.5-15.2 heartbeat-3.0.0-33.3 pacemaker-mgmt-client-1.99.2-7.2 When I start openais, I get nodejoin immediately, as seen in the logs below. However, it takes some time before the nodes are visible in crm_mon output. Any idea how to minimize this delay? Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: send_member_notification: Sending membership update 8 to 1 children Sep 22 15:27:24 bench1 openais[12935]: [CLM ] got nodejoin message 192.168.165.33 Sep 22 15:27:24 bench1 openais[12935]: [CLM ] got nodejoin message 192.168.165.35 Sep 22 15:27:24 bench1 mgmtd: [12947]: info: Started. Sep 22 15:27:24 bench1 openais[12935]: [crm ] WARN: route_ais_message: Sending message to local.crmd failed: unknown (rc=-2) Sep 22 15:27:24 bench1 openais[12935]: [crm ] WARN: route_ais_message: Sending message to local.crmd failed: unknown (rc=-2) Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: pcmk_ipc: Recorded connection 0x174840d0 for crmd/12946 Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: pcmk_ipc: Sending membership update 8 to crmd Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: update_expected_votes: Expected quorum votes 1024 - 2 Sep 22 15:27:25 bench1 crmd: [12946]: notice: ais_dispatch: Membership 8: quorum aquired Sep 22 15:28:15 bench1 crmd: [12946]: info: do_election_count_vote: Election 2 (owner: bench2) pass: vote from bench2 (Host name) Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State transition S_PENDING - S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ] Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State transition S_ELECTION - S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ] Sep 22 15:28:15 bench1 crmd: [12946]: info: do_te_control: Registering TE UUID: 87c28ab8-ba93-4111-a26a-67e88dd927fb Sep 22 15:28:15 bench1 crmd: [12946]: WARN: cib_client_add_notify_callback: Callback already present Sep 22 15:28:15 bench1 crmd: [12946]: info: set_graph_functions: Setting custom graph functions Sep 22 15:28:15 bench1 crmd: [12946]: info: unpack_graph: Unpacked transition -1: 0 actions in 0 synapses Sep 22 15:28:15 bench1 crmd: [12946]: info: do_dc_takeover: Taking over DC status for this partition Sep 22 15:28:15 bench1 cib: [12942]: info: cib_process_readwrite: We are now in R/W mode Regards, Dan Where did you get that version of openais? openais 0.80.x is deprecated in the community (and hence, no support). We recommend using corosync instead which has improved testing with pacemaker. Regards -steve ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker