Re: [Pacemaker] migration-threshold and failure-timeout

2010-09-22 Thread Pavlos Parissis
On 21 September 2010 15:28, Vadym Chepkov vchep...@gmail.com wrote:

 On Tue, Sep 21, 2010 at 9:14 AM, Dan Frincu dfri...@streamwide.ro wrote:
  Hi,
 
  This =
 
 http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-failure-migration.html
  explains it pretty well. Notice the INFINITY score and what sets it.
 
  However I don't know of any automatic method to clear the failcount.
 
  Regards,
  Dan


 in pacemaker 1.0 nothing will clean failcount automatically, this is a
 feature of pacemaker 1.1, imho

 But,

 crm configure rsc_defaults failure-timeout=10min

 will make cluster to forget about previous failure in 10 minutes.
 if you want to futher decrease this paramater, you might need to decrease

 crm configure property cluster-recheck-interval=10min

 Cheers,
 Vadym


Ok guys thank you very much for the info,
Pavlos
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] help with configuration for Xen domU on two DRBD devices

2010-09-22 Thread Jai
To answer my own email. Just incase it helps someone else.
After a bit more research and trying different things, it appears that perhaps 
my issue was because of a resource failcount.
Once I manually cleared all/any resource failcounts it started to work properly.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] monitor operation cancel question

2010-09-22 Thread Andrew Beekhof
On Tue, Sep 21, 2010 at 8:58 PM, Phil Armstrong p...@sgi.com wrote:
 Hi,

 This is my first post to this list so if I'm doing this wrong, please be
 patient. I am using pacemaker-1.1.2-0.2.1 on sles11sp1. Thanks in advance
 for any help anyone can give me.

Well, fixing this is a good start:
   Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation:
Specifying on_fail=fence and stonith-enabled=false makes no sense

But for your other issue, Michael is right, there was a bug which an
upgrade will fix.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] error: ocf:heartbeat:IPv6addr: could not parse meta-data

2010-09-22 Thread Andrew Beekhof
On Tue, Sep 21, 2010 at 4:10 PM, Angelo Höngens a.hong...@netmatch.nl wrote:
 On 25-8-2010 8:36, Andrew Beekhof wrote:
 I guess whoever packaged the rpm's can answer why the file is missing,
 but was that someone from the clusterlabs team or someone from the
 linux-ha team? :)

 Basically because I left out the libnet dependancy.
 The status of libnet as a viable project has been uncertain lately.


 Andrew,

 Because we have some ipv6 nodes I want to try out pacemaker on again,
 I'd really like to build an rpm of the resource agents package with ipv6
 support in. If you could tell a newbie like me how to do it, I'd be
 really grateful, and I think a lot of people will be happy as well. We
 run CentOS/RHEL5 everywhere.

 I've never built an RPM before, but it doesn't look that hard (until I
 saw the errors).

 I've installed all the dependencies (yum install autoconf automake gcc
 libnet-devel libtool libxml2-devel bzip2-devel glib2-devel libxslt-devel
 e2fsprogs-devel docbook-style-xsl rpm-build), and I want to make sure I
 can compile an RPM first before changing anything in the code.

 But even when doing that, I get errors:

 
 [ang...@test1 redhat]$ sudo rpm -i
 http://clusterlabs.org/rpm/epel-5/src/resource-agents-1.0.3-2.el5.src.rpm
 [ang...@test1 redhat]$ cd /usr/src/redhat/
 [ang...@test1 redhat]$ rpmbuild -bb SPECS/resource-agents.spec
 [..cut..]
 Provides: config(ldirectord) = 1.0.3-2 heartbeat-ldirectord
 Requires(rpmlib): rpmlib(CompressedFileNames) = 3.0.4-1
 rpmlib(PayloadFilesHavePrefix) = 4.0-1
 Requires: /bin/sh /usr/bin/perl config(ldirectord) = 1.0.3-2 ipvsadm
 perl(Digest::MD5) perl(Getopt::Long) perl(IO::Select) perl(IO::Socket)
 perl(LWP::Debug) perl(LWP::UserAgent) perl(Mail::Send) perl(Net::Ping)
 perl(Net::SMTP) perl(POSIX) perl(Pod::Usage) perl(Socket) perl(Socket6)
 perl(Sys::Hostname) perl(Sys::Syslog) perl(strict) perl(vars)
 perl-MailTools perl-Net-SSLeay perl-libwww-perl
 Conflicts: heartbeat-ldirectord
 Obsoletes: heartbeat-ldirectord
 Checking for unpackaged file(s): /usr/lib/rpm/check-files
 /var/tmp/resource-agents-1.0.3-build

 RPM build errors:
    File not found: /var/tmp/resource-agents-1.0.3-build/usr/sbin/sfex_init
    File not found:
 /var/tmp/resource-agents-1.0.3-build/usr/lib64/heartbeat/sfex_daemon
 [ang...@test1 redhat]$
 

 Can you please help me in my quest to the desired end result? (which is
 the knowledge to build an ipv6-enabled version of the resource-agents so
 I can install it on my nodes, and I can rebuild it after each version
 upgrade of the source package).

You're close :-)
Try installing the -devel package for cluster-glue and trying again.



 It would be great if it would be part of the basic packages as well, but
 alas this does not seem to be the case right now.

 I can provide ssh root access to a clean vm (centos 5.5, x64) if needed.


 --


 With kind regards,


 Angelo Höngens
 systems administrator

 MCSE on Windows 2003
 MCSE on Windows 2000
 MS Small Business Specialist
 --
 NetMatch
 tourism internet software solutions

 Ringbaan Oost 2b
 5013 CA Tilburg
 +31 (0)13 5811088
 +31 (0)13 5821239

 a.hong...@netmatch.nl
 www.netmatch.nl
 --



 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] chkconfig values in MCP init script (again)

2010-09-22 Thread Andrew Beekhof
On Tue, Sep 21, 2010 at 2:24 PM, Vladislav Bogdanov
bub...@hoster-ok.com wrote:
 Hi Andrew, hi all.

 I decided to return to this issue again because of issues with
 libvirt/KVM virtual domains controlled by pacemaker.

 libvirt package on Fedora 13 has two init scripts: libvirtd and
 libvirt-guests.
 They have following chkconfig values:
 libvirtd: 97 03
 libvirt-guests: 98 02

 Currently pacemaker MCP has 90 10.


...

 So, the next solution would be to move pacemaker to run really last (99)
 and stop really first (01). This is what Vadim Chepkov suggested earlier
 and what I am inclined to do (at least for my RPM packages). Of course,
 there are services which have 99 01 too, but I'd shut eyes on them.

I'm fine with that.  I'll make the change now.
Possibly some stop-requires might be of use too.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Connection to our AIS plugin (9) failed: Library error

2010-09-22 Thread Andrew Beekhof
2010/9/21 Szymon Hersztek s...@globtel.pl:

 Wiadomość napisana w dniu 2010-09-21, o godz. 09:08, przez Andrew Beekhof:

 2010/9/21 Szymon Hersztek s...@globtel.pl:

 Wiadomość napisana w dniu 2010-09-21, o godz. 08:34, przez Andrew
 Beekhof:

 On Mon, Sep 20, 2010 at 3:34 PM, Szymon Hersztek s...@globtel.pl
 wrote:

 Hi
 Im trying to setup corosync to work as drbd cluster but after
 installing
 follow by http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 i got error like below:

 Unusual, but did pacemaker fork a replacement attrd process?
 At what time did corosync start?


 corosync was started manually or do you want to have exact time of start
 ?

 well you included at most 1 second's worth of logging.
 so its kinda hard to know if something took too long or what recovery
 was attempted.

 Ok it is not a problem to send more. Do you need debug logging or standard
 I have to install server once again so in half of hour i can reproduce logs


Here's your issue:

 corosynclib   i386
   1.2.7-1.1.el5
clusterlabs 155 k
 corosynclib   x86_64
   1.2.7-1.1.el5
clusterlabs 172 k

Why do you have both i386 and x86_64 versions installed on your machine??

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] error: ocf:heartbeat:IPv6addr: could not parse meta-data

2010-09-22 Thread Angelo Höngens
On 22-9-2010 9:58, Andrew Beekhof wrote:
 Can you please help me in my quest to the desired end result? (which is
 the knowledge to build an ipv6-enabled version of the resource-agents so
 I can install it on my nodes, and I can rebuild it after each version
 upgrade of the source package).
 
 You're close :-)
 Try installing the -devel package for cluster-glue and trying again.

Thanks, that works like a charm, I didn't even have to change anything!

For people googling, here's my version of the resource-agents with the
IPv6addr module:
http://files.hongens.nl/RPM/resource-agents-1.0.3-2.x86_64.rpm



-- 


With kind regards,


Angelo Höngens
systems administrator

MCSE on Windows 2003
MCSE on Windows 2000
MS Small Business Specialist
--
NetMatch
tourism internet software solutions

Ringbaan Oost 2b
5013 CA Tilburg
+31 (0)13 5811088
+31 (0)13 5821239

a.hong...@netmatch.nl
www.netmatch.nl
--



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] re: Pacemaker Digest, Vol 34, Issue 50

2010-09-22 Thread Andrew Beekhof
On Tue, Sep 21, 2010 at 10:33 AM, jiaju liu liujiaj...@yahoo.com.cn wrote:



 --- *10年9月21日,周二, pacemaker-requ...@oss.clusterlabs.org 
 pacemaker-requ...@oss.clusterlabs.org* 写道:


 Message: 5
 Date: Tue, 21 Sep 2010 09:15:16 +0200
 From: Andrew Beekhof 
 and...@beekhof.nethttp://cn.mc157.mail.yahoo.com/mc/compose?to=and...@beekhof.net
 
 To: The Pacemaker cluster resource manager
 
 pacemaker@oss.clusterlabs.orghttp://cn.mc157.mail.yahoo.com/mc/compose?to=pacema...@oss.clusterlabs.org
 
 Subject: Re: [Pacemaker] pingd problem about clone
 Message-ID:
 
 aanlkti=l0yw-k4rmzo6dmjp4fuyccnki7w-hbw32z...@mail.gmail.comhttp://cn.mc157.mail.yahoo.com/mc/compose?to=l0yw-k4rmzo6dmjp4fuyccnki7w-hbw32z...@mail.gmail.com
 
 Content-Type: text/plain; charset=iso-8859-1

 On Fri, Sep 17, 2010 at 2:38 AM, jiaju liu 
 liujiaj...@yahoo.com.cnhttp://cn.mc157.mail.yahoo.com/mc/compose?to=liujiaj...@yahoo.com.cn
 wrote:

  Clone Set: pingd_data_net
   Started: [ oss3 oss2 oss1 ]
 
  I use the command :
 
  crm_resource -g host_list -r pingd_data_net
  to check the param host_list
 

 what does the resource definition look like?

 *crm configure primitive pingd_data ocf:pacemaker:pingd meta
 target-role=stopped params name=pingd_data op start timeout=100s op stop
 timeout=100s op monitor interval=90s timeout=100s*
 **

  *crm_resource -p host_list -r pingd_data -v 10.53.11.101*
 **
 *crm configure clone pingd_data_net pingd_data meta
 globally-unique=falsetarget-role=stopped*
 **
 *crm resource pind_data_net start*
 **
 *and then I want to change host_list*
 *so I use command *
 *crm_resource -p host_list -r pingd_data -v 10.53.11.101 10.53.11.100*


Given:

primitive FencingChild stonith:fence_xvm \
params pcmk_arg_map=domain:uname \
op monitor interval=120s timeout=300 \
op start interval=0 timeout=180s \
op stop interval=0 timeout=180s


and

clone Fencing FencingChild \
meta globally-unique=false migration-threshold=5

I get the following...

If I (incorrectly) use the clone name, I get the error you see:

[r...@pcmk-1 ~]# crm_resource -g pcmk_arg_map -r Fencing
Fencing is active on more than one node, returning the default value for
null
Error performing operation: The object/attribute does not exist

But if I (correctly) use the primitive name, it works:

[r...@pcmk-1 ~]# crm_resource -g pcmk_arg_map -r FencingChild
domain:uname


Can you check you're using the name of the primitive and not the clone
please?




 
  the result is
 
  pingd_data_net is active on more than one node, returning the default
  value for null
  Error performing operation: The object/attribute does not exist
 
  So If I want to check the parm, what I should do?
  Thank you:-)
 
 
  ___
  Pacemaker mailing list: 
  Pacemaker@oss.clusterlabs.orghttp://cn.mc157.mail.yahoo.com/mc/compose?to=pacema...@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs:
 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
 
 
 -- next part --
 An HTML attachment was scrubbed...
 URL: 
 http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20100921/223fb58b/attachment.htm
 

 --

 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.orghttp://cn.mc157.mail.yahoo.com/mc/compose?to=pacema...@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker


 End of Pacemaker Digest, Vol 34, Issue 50
 *



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] About behavior in Action Lost.

2010-09-22 Thread Andrew Beekhof
On Tue, Sep 21, 2010 at 8:59 AM,  renayama19661...@ybb.ne.jp wrote:
 Hi,

 Node was in state that the load was very high, and we confirmed monitor 
 movement of Pacemeker.
 Action Lost occurred in stop movement after the error of the monitor occurred.

 Sep  8 20:02:22 cgl54 crmd: [3507]: ERROR: print_elem: Aborting transition, 
 action lost: [Action 9]:
 In-flight (id: prmApPostgreSQLDB1_stop_0, loc: cgl49, priority: 0)
 Sep  8 20:02:22 cgl54 crmd: [3507]: info: abort_transition_graph: 
 action_timer_callback:486 -
 Triggered transition abort (complete=0) : Action lost


 For the load of the node, We think that the stop movement did not go well.
 But cannot nodes execute stonith.

A long time ago in a galaxy far away, some messaging layers used to
loose quite a few actions, including stops.
About the same time, we decided that fencing because a stop action was
lost wasn't a good idea.

The rationale was that if the operation eventually completed, it would
end up in the CIB anyway.
And even if it didn't, the PE would continue to try the operation
again until the whole node fell over at which point it would get shot
anyway.

Now, having said that, things have improved since then and perhaps,
the interest of speeding up recovery in these situations, it is time
to stop treating stop operations differently.
Would you agree?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Release Matrix

2010-09-22 Thread Raoul Bhatia [IPAX]
hi,

regarding the Release Matrix [1] and the ABI-change in cluster-glue/
clplumbing [2], i wonder if pacemaker 1.0.9.1 really works with
glue 1.0.3?

cheers,
raoul

[1] http://www.clusterlabs.org/wiki/ReleaseMatrix
[2] http://www.gossamer-threads.com/lists/linuxha/pacemaker/65443
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] chkconfig values in MCP init script (again)

2010-09-22 Thread Vladislav Bogdanov
22.09.2010 11:17, Andrew Beekhof wrote:
 On Tue, Sep 21, 2010 at 2:24 PM, Vladislav Bogdanov
 bub...@hoster-ok.com wrote:
 Hi Andrew, hi all.

 I decided to return to this issue again because of issues with
 libvirt/KVM virtual domains controlled by pacemaker.

 libvirt package on Fedora 13 has two init scripts: libvirtd and
 libvirt-guests.
 They have following chkconfig values:
 libvirtd: 97 03
 libvirt-guests: 98 02

 Currently pacemaker MCP has 90 10.

 
 ...
 
 So, the next solution would be to move pacemaker to run really last (99)
 and stop really first (01). This is what Vadim Chepkov suggested earlier
 and what I am inclined to do (at least for my RPM packages). Of course,
 there are services which have 99 01 too, but I'd shut eyes on them.
 
 I'm fine with that.  I'll make the change now.
 Possibly some stop-requires might be of use too.

I'd add corosync to Required-Stop.

Adding other services like libvirtd is not an option I think. You cannot
predict what resources are actually run by pacemaker, so you cannot
enumerate them all.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] correct permissions for /var/lib/pengine

2010-09-22 Thread Raoul Bhatia [IPAX]
in my recent hb_report, i find:
 WARN: problem with permissions/ownership at wc01:
 wrong permissions or ownership for /var/lib/pengine:
 drwxr-xr-x 2 hacluster haclient 5038080 Jul 23 08:58 /var/lib/pengine
 WARN: problem with permissions/ownership at wc02:
 wrong permissions or ownership for /var/lib/pengine:
 drwxr-xr-x 2 hacluster haclient 4096 Sep 22 11:26 /var/lib/pengine


what should the correct permission be?

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Connection to our AIS plugin (9) failed: Library error

2010-09-22 Thread Szymon Hersztek


Wiadomość napisana w dniu 2010-09-22, o godz. 10:26, przez Andrew  
Beekhof:



2010/9/21 Szymon Hersztek s...@globtel.pl:


Wiadomość napisana w dniu 2010-09-21, o godz. 09:08, przez Andrew  
Beekhof:



2010/9/21 Szymon Hersztek s...@globtel.pl:


Wiadomość napisana w dniu 2010-09-21, o godz. 08:34, przez Andrew
Beekhof:


On Mon, Sep 20, 2010 at 3:34 PM, Szymon Hersztek s...@globtel.pl
wrote:


Hi
Im trying to setup corosync to work as drbd cluster but after
installing
follow by http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
i got error like below:


Unusual, but did pacemaker fork a replacement attrd process?
At what time did corosync start?



corosync was started manually or do you want to have exact time  
of start

?


well you included at most 1 second's worth of logging.
so its kinda hard to know if something took too long or what  
recovery

was attempted.


Ok it is not a problem to send more. Do you need debug logging or  
standard
I have to install server once again so in half of hour i can  
reproduce logs



Here's your issue:

corosynclib   i386
  1.2.7-1.1.el5
clusterlabs 155 k
corosynclib   x86_64
  1.2.7-1.1.el5
clusterlabs 172 k

Why do you have both i386 and x86_64 versions installed on your  
machine??



Because yum installed it in this way .. as many other packeges
The problem was that i do not use /dev/shm as tmpfs
But thanks for trying






___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Problems Installing Pacemaker and Heartbeat

2010-09-22 Thread Chen Stormstout
Hi,

I'm facing problems to install pacemaker and heartbeat on debian lenny:

What I did:

- Downloaded the least debian image for i386 (Kernel 2.6.26-2-686)
- after install configure sources.list:

deb http://backports.debian.org/debian-backports lenny-backports main contrib 
non-free


- run apt-get update
- run apt-get upgrade

- aptitude install pacemaker heartbeat

aptitude install heartbeat pacemaker

Reading package lists... Done
Building dependency tree   
Reading state information... Done
Reading extended state information  
Initializing package states... Done
Writing extended state information... Done
Reading task descriptions... Done 

The following packages are BROKEN:
  cluster-glue pacemaker 

The following NEW packages will be installed:
  cluster-agents{a} heartbeat libcluster-glue{a} libcorosync4{a} 
libheartbeat2{a} libnet1{a} libopenipmi0{a} 
  libtimedate-perl{a} 

0 packages upgraded, 10 newly installed, 0 to remove and 0 not upgraded.
Need to get 2277kB/2928kB of archives. After unpacking 9523kB will be used.

The following packages have unmet dependencies:
  pacemaker: Depends: libesmtp5 (= 0.8.8) which is a virtual package.
  cluster-glue: Depends: libopenhpi2 which is a virtual package.
The following actions will resolve these dependencies:

Keep the following packages at their current version:
cluster-agents [Not Installed]
cluster-glue [Not Installed]
heartbeat [Not Installed]
pacemaker [Not Installed]

Score is -19794

Accept this solution? [Y/n/q/?] Y   
No packages will be installed, upgraded, or removed.
0 packages upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 0B of archives. After unpacking 0B will be used.
Do you want to continue? [Y/n/?] y
Writing extended state information... Done
Reading package lists... Done 
Building dependency tree   
Reading state information... Done
Reading extended state information  
Initializing package states... Done
Reading task descriptions... Done  

and unable to go ahead with errors above.

any sugestions?
there is another change that have to be made in sources.list?

thanks.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Problems Installing Pacemaker and Heartbeat

2010-09-22 Thread Raoul Bhatia [IPAX]
hi,

please refer to http://www.clusterlabs.org/wiki/Debian_Lenny_HowTo

cheers,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] migration-threshold and failure-timeout

2010-09-22 Thread Michael Smith
On Wed, 22 Sep 2010, Andrew Beekhof wrote:

 On Tue, Sep 21, 2010 at 3:28 PM, Vadym Chepkov vchep...@gmail.com wrote:
  On Tue, Sep 21, 2010 at 9:14 AM, Dan Frincu dfri...@streamwide.ro wrote:

  However I don't know of any automatic method to clear the failcount.
  in pacemaker 1.0 nothing will clean failcount automatically, this is a 
  feature of pacemaker 1.1, imho
 Correct, the just released 1.1.3 clears the failcount for you.

failcount used to get cleared for me after the cluster-recheck-interval 
fired, if failure-timeout had expired. Does this mean pacemaker 1.1.3 
doesn't need cluster-recheck-interval in order to clear the failcount?

Thanks!
Mike

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Timeout after nodejoin

2010-09-22 Thread Dan Frincu

Hi all,

I have the following packages:

# rpm -qa | grep -i (openais|cluster|heartbeat|pacemaker|resource)
openais-0.80.5-15.2
cluster-glue-1.0-12.2
pacemaker-1.0.5-4.2
cluster-glue-libs-1.0-12.2
resource-agents-1.0-31.5
pacemaker-libs-1.0.5-4.2
pacemaker-mgmt-1.99.2-7.2
libopenais2-0.80.5-15.2
heartbeat-3.0.0-33.3
pacemaker-mgmt-client-1.99.2-7.2

When I start openais, I get nodejoin immediately, as seen in the logs 
below. However, it takes some time before the nodes are visible in 
crm_mon output. Any idea how to minimize this delay?


Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info: 
send_member_notification: Sending membership update 8 to 1 children
Sep 22 15:27:24 bench1 openais[12935]: [CLM  ] got nodejoin message 
192.168.165.33
Sep 22 15:27:24 bench1 openais[12935]: [CLM  ] got nodejoin message 
192.168.165.35

Sep 22 15:27:24 bench1 mgmtd: [12947]: info: Started.
Sep 22 15:27:24 bench1 openais[12935]: [crm  ] WARN: route_ais_message: 
Sending message to local.crmd failed: unknown (rc=-2)
Sep 22 15:27:24 bench1 openais[12935]: [crm  ] WARN: route_ais_message: 
Sending message to local.crmd failed: unknown (rc=-2)
Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info: pcmk_ipc: Recorded 
connection 0x174840d0 for crmd/12946
Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info: pcmk_ipc: Sending 
membership update 8 to crmd
Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info: 
update_expected_votes: Expected quorum votes 1024 - 2
Sep 22 15:27:25 bench1 crmd: [12946]: notice: ais_dispatch: Membership 
8: quorum aquired
Sep 22 15:28:15 bench1 crmd: [12946]: info: do_election_count_vote: 
Election 2 (owner: bench2) pass: vote from bench2 (Host name)
Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State 
transition S_PENDING - S_ELECTION [ input=I_ELECTION 
cause=C_FSA_INTERNAL origin=do_election_count_vote ]
Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State 
transition S_ELECTION - S_INTEGRATION [ input=I_ELECTION_DC 
cause=C_FSA_INTERNAL origin=do_election_check ]
Sep 22 15:28:15 bench1 crmd: [12946]: info: do_te_control: Registering 
TE UUID: 87c28ab8-ba93-4111-a26a-67e88dd927fb
Sep 22 15:28:15 bench1 crmd: [12946]: WARN: 
cib_client_add_notify_callback: Callback already present
Sep 22 15:28:15 bench1 crmd: [12946]: info: set_graph_functions: Setting 
custom graph functions
Sep 22 15:28:15 bench1 crmd: [12946]: info: unpack_graph: Unpacked 
transition -1: 0 actions in 0 synapses
Sep 22 15:28:15 bench1 crmd: [12946]: info: do_dc_takeover: Taking over 
DC status for this partition
Sep 22 15:28:15 bench1 cib: [12942]: info: cib_process_readwrite: We are 
now in R/W 
mode



Regards,

Dan

--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Problems Installing Pacemaker and Heartbeat

2010-09-22 Thread Chen Stormstout
Hi,

Thanks for your post Raoul, but read this tutorial is what i made at first 
place.

This is my entire sources.list:

#deb cdrom:[Debian GNU/Linux 5.0.5 _Lenny_ - Official i386 DVD Binary-1 
20100626-17:50]/ lenny contrib main
deb cdrom:[Debian GNU/Linux 5.0.5 _Lenny_ - Official i386 DVD Binary-1 
20100626-17:50]/ lenny contrib main

deb http://security.debian.org/ lenny/updates main contrib
deb-src http://security.debian.org/ lenny/updates main contrib

deb http://volatile.debian.org/debian-volatile lenny/volatile main contrib
deb-src http://volatile.debian.org/debian-volatile lenny/volatile main contrib

# For the cluster
deb http://backports.debian.org/debian-backports lenny-backports main contrib 
non-free
deb http://www.backports.org/debian lenny-backports main contrib non-free
deb http://people.debian.org/~madkiss/ha lenny main


But like the How to says:

So in order to use Pacemaker on Debian GNU/Linux 5.0 (Lenny), please add the 
Backports.org-repository to your APT-configuration according to the How-To on 
this site. This has to be done on all nodes in your cluster.

I've followed the tutorial:

aptitude -t lenny-backports install heartbeat pacemaker

But still have the same error:

Reading package lists... Done
Building dependency tree   
Reading state information... Done
Reading extended state information  
Initializing package states... Done
Reading task descriptions... Done  
The following packages are BROKEN:
  cluster-glue pacemaker 
The following NEW packages will be installed:
  cluster-agents{a} heartbeat libcluster-glue{a} libcorosync4{a} 
  libheartbeat2{a} libnet1{a} libopenipmi0{a} libtimedate-perl{a} 
0 packages upgraded, 10 newly installed, 0 to remove and 94 not upgraded.
Need to get 2277kB/2928kB of archives. After unpacking 9523kB will be used.
The following packages have unmet dependencies:
  pacemaker: Depends: libesmtp5 (= 0.8.8) which is a virtual package.
  cluster-glue: Depends: libopenhpi2 which is a virtual package.
The following actions will resolve these dependencies:

Keep the following packages at their current version:
cluster-agents [Not Installed]
cluster-glue [Not Installed]
heartbeat [Not Installed]
pacemaker [Not Installed]

Score is -19794

There is anything else that hi have missed?

Thank you.




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Problems Installing Pacemaker and Heartbeat

2010-09-22 Thread Raoul Bhatia [IPAX]
hi,

On 09/22/2010 02:52 PM, Chen Stormstout wrote:
 Hi,
 
 Thanks for your post Raoul, but read this tutorial is what i made at first 
 place.

ok.

 # For the cluster
 deb http://backports.debian.org/debian-backports lenny-backports main contrib 
 non-free
 deb http://www.backports.org/debian lenny-backports main contrib non-free
 deb http://people.debian.org/~madkiss/ha lenny main

madkiss' repository shouldn't be necessary anymore. please comment it
out for now

 But like the How to says:
 
 So in order to use Pacemaker on Debian GNU/Linux 5.0 (Lenny), please add 
 the Backports.org-repository to your APT-configuration according to the 
 How-To on this site. This has to be done on all nodes in your cluster.
 
 I've followed the tutorial:
 
 aptitude -t lenny-backports install heartbeat pacemaker

do you *need* to use heartbeat? otherwise, i would suggest corosync
as - in my experience - it is much faster than heartbeat (e.g. startup).

please retry after commenting out the above.

if it is still not working, please provide the output of:

  apt-cache policy cluster-glue pacemaker heartbeat
  apt-cache policy cluster-agents libcluster-glue libcorosync4
  apt-cache policy libheartbeat2 libnet1 libopenipmi0 libtimedate-perl

(if you switched to corosync, please let us know if there is any issue
when you try to install these packages by using

  aptitude install -t lenny-backports pacemaker corosync


thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Re: Problems Installing Pacemaker and Heartbeat

2010-09-22 Thread Chen Stormstout
Raoul,

Thank you for the attention.

These are the information that you asked for after remove madkiss.

aptitude -t lenny-backports install heartbeat pacemaker, still with the same 
error message.

apt-cache policy cluster-glue pacemaker heartbeat
cluster-glue:
  Installed: (none)
  Candidate: 1.0.6-1~bpo50+1
  Version table:
 1.0.6-1~bpo50+1 0
  1 http://backports.debian.org lenny-backports/main Packages
  1 http://www.backports.org lenny-backports/main Packages
pacemaker:
  Installed: (none)
  Candidate: 1.0.9.1+hg15626-1~bpo50+1
  Version table:
 1.0.9.1+hg15626-1~bpo50+1 0
  1 http://backports.debian.org lenny-backports/main Packages
  1 http://www.backports.org lenny-backports/main Packages
heartbeat:
  Installed: (none)
  Candidate: 1:3.0.3-2~bpo50+1
  Version table:
 1:3.0.3-2~bpo50+1 0
  1 http://backports.debian.org lenny-backports/main Packages
  1 http://www.backports.org lenny-backports/main Packages


apt-cache policy cluster-agents libcluster-glue libcorosync4
cluster-agents:
  Installed: (none)
  Candidate: 1:1.0.3-3~bpo50+1
  Version table:
 1:1.0.3-3~bpo50+1 0
  1 http://backports.debian.org lenny-backports/main Packages
  1 http://www.backports.org lenny-backports/main Packages
libcluster-glue:
  Installed: (none)
  Candidate: 1.0.6-1~bpo50+1
  Version table:
 1.0.6-1~bpo50+1 0
  1 http://backports.debian.org lenny-backports/main Packages
  1 http://www.backports.org lenny-backports/main Packages
libcorosync4:
  Installed: (none)
  Candidate: 1.2.1-1~bpo50+1
  Version table:
 1.2.1-1~bpo50+1 0
  1 http://backports.debian.org lenny-backports/main Packages
  1 http://www.backports.org lenny-backports/main Packages


apt-cache policy libheartbeat2 libnet1 libopenipmi0 libtimedate-perl
libheartbeat2:
  Installed: (none)
  Candidate: 1:3.0.3-2~bpo50+1
  Version table:
 1:3.0.3-2~bpo50+1 0
  1 http://backports.debian.org lenny-backports/main Packages
  1 http://www.backports.org lenny-backports/main Packages
libnet1:
  Installed: (none)
  Candidate: 1.1.2.1-2
  Version table:
 1.1.2.1-2 0
500 cdrom://[Debian GNU/Linux 5.0.5 _Lenny_ - Official i386 DVD 
Binary-1 20100626-17:50] lenny/main Packages
libopenipmi0:
  Installed: (none)
  Candidate: 2.0.14-1
  Version table:
 2.0.14-1 0
500 cdrom://[Debian GNU/Linux 5.0.5 _Lenny_ - Official i386 DVD 
Binary-1 20100626-17:50] lenny/main Packages
libtimedate-perl:
  Installed: (none)
  Candidate: 1.1600-9
  Version table:
 1.1600-9 0
500 cdrom://[Debian GNU/Linux 5.0.5 _Lenny_ - Official i386 DVD 
Binary-1 20100626-17:50] lenny/main Packages



And corosync has the same problem:

aptitude install -t lenny-backports pacemaker corosync
Reading package lists... Done
Building dependency tree   
Reading state information... Done
Reading extended state information  
Initializing package states... Done
Reading task descriptions... Done  
The following packages are BROKEN:
  cluster-glue pacemaker 
The following NEW packages will be installed:
  cluster-agents{a} corosync libcluster-glue{a} libcorosync4{a} 
  libheartbeat2{a} libnet1{a} libopenipmi0{a} libtimedate-perl{a} 
0 packages upgraded, 10 newly installed, 0 to remove and 94 not upgraded.
Need to get 2235kB/2886kB of archives. After unpacking 9351kB will be used.
The following packages have unmet dependencies:
  pacemaker: Depends: libesmtp5 (= 0.8.8) which is a virtual package.
  cluster-glue: Depends: libopenhpi2 which is a virtual package.
The following actions will resolve these dependencies:

Keep the following packages at their current version:
cluster-agents [Not Installed]
cluster-glue [Not Installed]
pacemaker [Not Installed]

Score is -9863


Any idea about waht is going on?

Thank you.






___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Timeout after nodejoin

2010-09-22 Thread Raoul Bhatia [IPAX]
hi,

On 09/22/2010 02:43 PM, Dan Frincu wrote:
 When I start openais, I get nodejoin immediately, as seen in the logs
 below. However, it takes some time before the nodes are visible in
 crm_mon output. Any idea how to minimize this delay?
 
 Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info:
 send_member_notification: Sending membership update 8 to 1 children
 Sep 22 15:27:24 bench1 openais[12935]: [CLM  ] got nodejoin message
 192.168.165.33
 Sep 22 15:27:24 bench1 openais[12935]: [CLM  ] got nodejoin message
 192.168.165.35
 Sep 22 15:27:24 bench1 mgmtd: [12947]: info: Started.
 Sep 22 15:27:24 bench1 openais[12935]: [crm  ] WARN: route_ais_message:
 Sending message to local.crmd failed: unknown (rc=-2)
 Sep 22 15:27:24 bench1 openais[12935]: [crm  ] WARN: route_ais_message:
 Sending message to local.crmd failed: unknown (rc=-2)
 Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info: pcmk_ipc: Recorded
 connection 0x174840d0 for crmd/12946
 Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info: pcmk_ipc: Sending
 membership update 8 to crmd
 Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info:
 update_expected_votes: Expected quorum votes 1024 - 2
 Sep 22 15:27:25 bench1 crmd: [12946]: notice: ais_dispatch: Membership
 8: quorum aquired
 Sep 22 15:28:15 bench1 crmd: [12946]: info: do_election_count_vote:
 Election 2 (owner: bench2) pass: vote from bench2 (Host name)
 Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State
 transition S_PENDING - S_ELECTION [ input=I_ELECTION
 cause=C_FSA_INTERNAL origin=do_election_count_vote ]
 Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State
 transition S_ELECTION - S_INTEGRATION [ input=I_ELECTION_DC
 cause=C_FSA_INTERNAL origin=do_election_check ]
 Sep 22 15:28:15 bench1 crmd: [12946]: info: do_te_control: Registering
 TE UUID: 87c28ab8-ba93-4111-a26a-67e88dd927fb
 Sep 22 15:28:15 bench1 crmd: [12946]: WARN:
 cib_client_add_notify_callback: Callback already present
 Sep 22 15:28:15 bench1 crmd: [12946]: info: set_graph_functions: Setting
 custom graph functions
 Sep 22 15:28:15 bench1 crmd: [12946]: info: unpack_graph: Unpacked
 transition -1: 0 actions in 0 synapses
 Sep 22 15:28:15 bench1 crmd: [12946]: info: do_dc_takeover: Taking over
 DC status for this partition
 Sep 22 15:28:15 bench1 cib: [12942]: info: cib_process_readwrite: We are
 now in R/W
 mode  
  

is the cluster up and running and you're only (re-)starting one node?
or is this after you start openais on both nodes.

thanks,
raoul
-- 

DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
Barawitzkagasse 10/2/2/11   email.off...@ipax.at
1190 Wien   tel.   +43 1 3670030
FN 277995t HG Wien  fax.+43 1 3670030 15


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Timeout after nodejoin

2010-09-22 Thread Dan Frincu

Hi,

Raoul Bhatia [IPAX] wrote:

hi,

On 09/22/2010 02:43 PM, Dan Frincu wrote:
  

When I start openais, I get nodejoin immediately, as seen in the logs
below. However, it takes some time before the nodes are visible in
crm_mon output. Any idea how to minimize this delay?

Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info:
send_member_notification: Sending membership update 8 to 1 children
Sep 22 15:27:24 bench1 openais[12935]: [CLM  ] got nodejoin message
192.168.165.33
Sep 22 15:27:24 bench1 openais[12935]: [CLM  ] got nodejoin message
192.168.165.35
Sep 22 15:27:24 bench1 mgmtd: [12947]: info: Started.
Sep 22 15:27:24 bench1 openais[12935]: [crm  ] WARN: route_ais_message:
Sending message to local.crmd failed: unknown (rc=-2)
Sep 22 15:27:24 bench1 openais[12935]: [crm  ] WARN: route_ais_message:
Sending message to local.crmd failed: unknown (rc=-2)
Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info: pcmk_ipc: Recorded
connection 0x174840d0 for crmd/12946
Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info: pcmk_ipc: Sending
membership update 8 to crmd
Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info:
update_expected_votes: Expected quorum votes 1024 - 2
Sep 22 15:27:25 bench1 crmd: [12946]: notice: ais_dispatch: Membership
8: quorum aquired
Sep 22 15:28:15 bench1 crmd: [12946]: info: do_election_count_vote:
Election 2 (owner: bench2) pass: vote from bench2 (Host name)
Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State
transition S_PENDING - S_ELECTION [ input=I_ELECTION
cause=C_FSA_INTERNAL origin=do_election_count_vote ]
Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State
transition S_ELECTION - S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Sep 22 15:28:15 bench1 crmd: [12946]: info: do_te_control: Registering
TE UUID: 87c28ab8-ba93-4111-a26a-67e88dd927fb
Sep 22 15:28:15 bench1 crmd: [12946]: WARN:
cib_client_add_notify_callback: Callback already present
Sep 22 15:28:15 bench1 crmd: [12946]: info: set_graph_functions: Setting
custom graph functions
Sep 22 15:28:15 bench1 crmd: [12946]: info: unpack_graph: Unpacked
transition -1: 0 actions in 0 synapses
Sep 22 15:28:15 bench1 crmd: [12946]: info: do_dc_takeover: Taking over
DC status for this partition
Sep 22 15:28:15 bench1 cib: [12942]: info: cib_process_readwrite: We are
now in R/W
mode   



is the cluster up and running and you're only (re-)starting one node?
or is this after you start openais on both nodes.

thanks,
raoul
  

Second case, just after openais start on both nodes.

Regards,
Dan

--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] correct permissions for /var/lib/pengine

2010-09-22 Thread Dejan Muhamedagic
Hi,

On Wed, Sep 22, 2010 at 12:44:45PM +0200, Raoul Bhatia [IPAX] wrote:
 in my recent hb_report, i find:
  WARN: problem with permissions/ownership at wc01:
  wrong permissions or ownership for /var/lib/pengine:
  drwxr-xr-x 2 hacluster haclient 5038080 Jul 23 08:58 /var/lib/pengine
  WARN: problem with permissions/ownership at wc02:
  wrong permissions or ownership for /var/lib/pengine:
  drwxr-xr-x 2 hacluster haclient 4096 Sep 22 11:26 /var/lib/pengine
 
 
 what should the correct permission be?

It should be 750.

Thanks,

Dejan

 thanks,
 raoul
 -- 
 
 DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
 Technischer Leiter
 
 IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
 Barawitzkagasse 10/2/2/11   email.off...@ipax.at
 1190 Wien   tel.   +43 1 3670030
 FN 277995t HG Wien  fax.+43 1 3670030 15
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Timeout after nodejoin

2010-09-22 Thread Dejan Muhamedagic
Hi,

On Wed, Sep 22, 2010 at 04:48:42PM +0300, Dan Frincu wrote:
 Hi,
 
 Raoul Bhatia [IPAX] wrote:
 hi,
 
 On 09/22/2010 02:43 PM, Dan Frincu wrote:
 When I start openais, I get nodejoin immediately, as seen in the logs
 below. However, it takes some time before the nodes are visible in
 crm_mon output. Any idea how to minimize this delay?
 
 Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info:
 send_member_notification: Sending membership update 8 to 1 children
 Sep 22 15:27:24 bench1 openais[12935]: [CLM  ] got nodejoin message
 192.168.165.33
 Sep 22 15:27:24 bench1 openais[12935]: [CLM  ] got nodejoin message
 192.168.165.35
 Sep 22 15:27:24 bench1 mgmtd: [12947]: info: Started.
 Sep 22 15:27:24 bench1 openais[12935]: [crm  ] WARN: route_ais_message:
 Sending message to local.crmd failed: unknown (rc=-2)
 Sep 22 15:27:24 bench1 openais[12935]: [crm  ] WARN: route_ais_message:
 Sending message to local.crmd failed: unknown (rc=-2)
 Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info: pcmk_ipc: Recorded
 connection 0x174840d0 for crmd/12946
 Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info: pcmk_ipc: Sending
 membership update 8 to crmd
 Sep 22 15:27:24 bench1 openais[12935]: [crm  ] info:
 update_expected_votes: Expected quorum votes 1024 - 2
 Sep 22 15:27:25 bench1 crmd: [12946]: notice: ais_dispatch: Membership
 8: quorum aquired
 Sep 22 15:28:15 bench1 crmd: [12946]: info: do_election_count_vote:
 Election 2 (owner: bench2) pass: vote from bench2 (Host name)
 Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State
 transition S_PENDING - S_ELECTION [ input=I_ELECTION
 cause=C_FSA_INTERNAL origin=do_election_count_vote ]
 Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State
 transition S_ELECTION - S_INTEGRATION [ input=I_ELECTION_DC
 cause=C_FSA_INTERNAL origin=do_election_check ]
 Sep 22 15:28:15 bench1 crmd: [12946]: info: do_te_control: Registering
 TE UUID: 87c28ab8-ba93-4111-a26a-67e88dd927fb
 Sep 22 15:28:15 bench1 crmd: [12946]: WARN:
 cib_client_add_notify_callback: Callback already present
 Sep 22 15:28:15 bench1 crmd: [12946]: info: set_graph_functions: Setting
 custom graph functions
 Sep 22 15:28:15 bench1 crmd: [12946]: info: unpack_graph: Unpacked
 transition -1: 0 actions in 0 synapses
 Sep 22 15:28:15 bench1 crmd: [12946]: info: do_dc_takeover: Taking over
 DC status for this partition
 Sep 22 15:28:15 bench1 cib: [12942]: info: cib_process_readwrite: We are
 now in R/W
 mode
 
 is the cluster up and running and you're only (re-)starting one node?
 or is this after you start openais on both nodes.
 
 thanks,
 raoul
 Second case, just after openais start on both nodes.

It's probably due to dc-deadtime (from crm ra info crmd):

dc-deadtime (time, [60s]): How long to wait for a response from other nodes 
during startup.
The correct value will depend on the speed/load of your
network and the type of switches used.

Thanks,

Dejan

 Regards,
 Dan
 
 -- 
 Dan FRINCU
 Systems Engineer
 CCNA, RHCE
 Streamwide Romania
 

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] migration-threshold and failure-timeout

2010-09-22 Thread Andrew Beekhof
On Wed, Sep 22, 2010 at 2:12 PM, Michael Smith msm...@cbnco.com wrote:
 On Wed, 22 Sep 2010, Andrew Beekhof wrote:

 On Tue, Sep 21, 2010 at 3:28 PM, Vadym Chepkov vchep...@gmail.com wrote:
  On Tue, Sep 21, 2010 at 9:14 AM, Dan Frincu dfri...@streamwide.ro wrote:

  However I don't know of any automatic method to clear the failcount.
  in pacemaker 1.0 nothing will clean failcount automatically, this is a
  feature of pacemaker 1.1, imho
 Correct, the just released 1.1.3 clears the failcount for you.

 failcount used to get cleared for me

not quite, it got ignored until the next failure.
now it gets removed.

 after the cluster-recheck-interval
 fired, if failure-timeout had expired. Does this mean pacemaker 1.1.3
 doesn't need cluster-recheck-interval in order to clear the failcount?

no, you'll still need that part too

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Add a resource with commandline to an existing?group?

2010-09-22 Thread Andrew Beekhof
On Wed, Sep 22, 2010 at 4:34 PM, Dejan Muhamedagic deja...@fastmail.fm wrote:
 Hi,

 On Tue, Aug 31, 2010 at 12:13:15PM +, Rainer wrote:
 Dan Frincu dfri...@... writes:

 
  You can update the config by typing: crm configure
  This puts you in the crm shell configure mode. Then you type in edit,
  that opens a vi session with the config, you edit the group entry by
  adding the necessary information and then you exit via esc, :wq, verify,
  commit.
 
  Regards,
 
  Dan

 So easy? I was all the time looking for a command like add to group
 Thx a lot!

 Yes, with edit you can modify any part of your configuration and
 commit the result. There's also a filter command in v1.1 which is
 to edit what sed(1) is to ed(1). This should also work:

 # echo group g1 g2 ... | crm configure load update -

nice!
i'll have to remember that one

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Help to configure a cluster with pacemaker

2010-09-22 Thread Josué Alcalde González
Ok. 
Now I understand how this work. 
The lsb script did not work correctly (because conf files where in the
drbd disk and it is only mounted in one server).

Thank you very much. It is working now.


El mar, 21-09-2010 a las 08:55 +0200, Andrew Beekhof escribió:

 2010/9/2 Josué Alcalde González josue.alca...@csa.es:
  First, as i am new in this list I would like hello to everybody.
 
  I have configured a 4 node cluster with ubuntu 10.04 with a simple
  architecture.
  All nodes are the same machine, with a root partition and a data
  partition shared with drbd.
 
  node1: First apache webserver node
  node2: Second apache webserver node
  node1 and node2 has a drbd resource called webserversdisk in
  master/slave using ext4.
 
  node3: First mysql server
  node4: Second mysql server
  node3 and node4 has a drbd resource called webserversdisk in
  master/slave using ext4.
 
  DRBD is using Master/Slave without OCFS or GFS (it uses EXT4) so there
  will be only one node with apache2 and one node with mysql.
 
 
  The problem is I am getting some problem with location and colocation.
  In fact, my apache primitive try to start in ubuntu02, ubuntu03 and
  ubuntu04 instead of ubuntu01, where my drbd webserverdisk is mounted.
 
 The cluster isn't actually starting apache there.
 It's checking those nodes to make sure it isn't already running there,
 but either the RA isn't installed or something it needs to produce a
 sane status is missing.
 
 Thats why you're getting failed _monitor_0 actions:
mysqld_monitor_0 (node=ubuntu01, call=4, rc=5, status=complete):
 not installed
mysqld_monitor_0 (node=ubuntu02, call=4, rc=5, status=complete):
 not installed
apache_monitor_0 (node=ubuntu02, call=8, rc=254, status=complete): 
 unknown
apache_stop_0 (node=ubuntu02, call=10, rc=254, status=complete): unknown
 
 Fix that and your problems will go away.
 
  Of course, apache is not starting because it only can works if the
  webserverdisk is mounted in the machine.
 
  My pacemaker configuration is this (I think some location directives are
  not needed):
 
  node ubuntu01
  node ubuntu02
  node ubuntu03
  node ubuntu04
  primitive apache lsb:apache2 \
 meta target-role=Started
  primitive drbd_disk_dataserver ocf:linbit:drbd \
 params drbd_resource=dataserverdisk \
 op monitor interval=15s
  primitive drbd_disk_webserver ocf:linbit:drbd \
 params drbd_resource=webserverdisk \
 op monitor interval=15s
  primitive fs_drbd_dataserver ocf:heartbeat:Filesystem \
 params device=/dev/drbd/by-res/dataserverdisk
  directory=/srv/data fstype=ext4
  primitive fs_drbd_webserver ocf:heartbeat:Filesystem \
 params device=/dev/drbd/by-res/webserverdisk
  directory=/srv/data fstype=ext4
  primitive ip_cluster_dataserver ocf:heartbeat:IPaddr2 \
 params ip=172.16.15.106 nic=eth0:0
  primitive ip_cluster_webserver ocf:heartbeat:IPaddr2 \
 params ip=172.16.15.105 nic=eth0:0
  primitive mysqld lsb:mysql \
 meta target-role=Started
  group group_dataserver fs_drbd_dataserver ip_cluster_dataserver mysqld \
 meta target-role=Started
  group group_webserver fs_drbd_webserver ip_cluster_webserver apache \
 meta target-role=Started
  ms ms_drbd_dataserver drbd_disk_dataserver \
 meta master-max=1 master-node-max=1 clone-max=2
  clone-node-max=1 notify=true target-role=Started
  ms ms_drbd_webserver drbd_disk_webserver \
 meta master-max=1 master-node-max=1 clone-max=2
  clone-node-max=1 notify=true target-role=Started
  location location_dataserver_1 ms_drbd_dataserver 200: ubuntu03
  location location_dataserver_1_fs_drbd fs_drbd_dataserver 200: ubuntu03
  location location_dataserver_1_group group_dataserver 200: ubuntu03
  location location_dataserver_1_ip_cluster ip_cluster_dataserver 200:
  ubuntu03
  location location_dataserver_1_mysqld mysqld 200: ubuntu03
  location location_dataserver_2 ms_drbd_dataserver 100: ubuntu04
  location location_dataserver_2_fs_drbd fs_drbd_dataserver 100: ubuntu04
  location location_dataserver_2_group group_dataserver 100: ubuntu04
  location location_dataserver_2_ip_cluster ip_cluster_dataserver 100:
  ubuntu04
  location location_dataserver_2_mysqld mysqld 100: ubuntu04
  location location_dataserver_3 ms_drbd_dataserver -inf: ubuntu01
  location location_dataserver_3_fs_drbd fs_drbd_dataserver -inf: ubuntu01
  location location_dataserver_3_group group_dataserver -inf: ubuntu01
  location location_dataserver_3_ip_cluster ip_cluster_dataserver -inf:
  ubuntu01
  location location_dataserver_3_mysqld mysqld -inf: ubuntu01
  location location_dataserver_4 ms_drbd_dataserver -inf: ubuntu02
  location location_dataserver_4_fs_drbd fs_drbd_dataserver -inf: ubuntu02
  location location_dataserver_4_group group_dataserver -inf: ubuntu02
  location location_dataserver_4_ip_cluster ip_cluster_dataserver -inf:
  ubuntu02
  location location_dataserver_4_mysqld mysqld -inf: ubuntu02
  

Re: [Pacemaker] Connection to our AIS plugin (9) failed: Library error

2010-09-22 Thread Steven Dake

On 09/22/2010 04:02 AM, Szymon Hersztek wrote:


Wiadomość napisana w dniu 2010-09-22, o godz. 10:26, przez Andrew Beekhof:


2010/9/21 Szymon Hersztek s...@globtel.pl:


Wiadomość napisana w dniu 2010-09-21, o godz. 09:08, przez Andrew
Beekhof:


2010/9/21 Szymon Hersztek s...@globtel.pl:


Wiadomość napisana w dniu 2010-09-21, o godz. 08:34, przez Andrew
Beekhof:


On Mon, Sep 20, 2010 at 3:34 PM, Szymon Hersztek s...@globtel.pl
wrote:


Hi
Im trying to setup corosync to work as drbd cluster but after
installing
follow by http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
i got error like below:


Unusual, but did pacemaker fork a replacement attrd process?
At what time did corosync start?



corosync was started manually or do you want to have exact time of
start
?


well you included at most 1 second's worth of logging.
so its kinda hard to know if something took too long or what recovery
was attempted.


Ok it is not a problem to send more. Do you need debug logging or
standard
I have to install server once again so in half of hour i can
reproduce logs



Here's your issue:

corosynclib i386
1.2.7-1.1.el5
clusterlabs 155 k
corosynclib x86_64
1.2.7-1.1.el5
clusterlabs 172 k

Why do you have both i386 and x86_64 versions installed on your machine??





There should be no problems installing lib files for both i386 and 
x86_64.  These rpms only contain the *.so files (and a LICENSE file).


Regards
-steve


Because yum installed it in this way .. as many other packeges
The problem was that i do not use /dev/shm as tmpfs
But thanks for trying






___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Timeout after nodejoin

2010-09-22 Thread Steven Dake

On 09/22/2010 05:43 AM, Dan Frincu wrote:

Hi all,

I have the following packages:

# rpm -qa | grep -i (openais|cluster|heartbeat|pacemaker|resource)
openais-0.80.5-15.2
cluster-glue-1.0-12.2
pacemaker-1.0.5-4.2
cluster-glue-libs-1.0-12.2
resource-agents-1.0-31.5
pacemaker-libs-1.0.5-4.2
pacemaker-mgmt-1.99.2-7.2
libopenais2-0.80.5-15.2
heartbeat-3.0.0-33.3
pacemaker-mgmt-client-1.99.2-7.2

When I start openais, I get nodejoin immediately, as seen in the logs
below. However, it takes some time before the nodes are visible in
crm_mon output. Any idea how to minimize this delay?

Sep 22 15:27:24 bench1 openais[12935]: [crm ] info:
send_member_notification: Sending membership update 8 to 1 children
Sep 22 15:27:24 bench1 openais[12935]: [CLM ] got nodejoin message
192.168.165.33
Sep 22 15:27:24 bench1 openais[12935]: [CLM ] got nodejoin message
192.168.165.35
Sep 22 15:27:24 bench1 mgmtd: [12947]: info: Started.
Sep 22 15:27:24 bench1 openais[12935]: [crm ] WARN: route_ais_message:
Sending message to local.crmd failed: unknown (rc=-2)
Sep 22 15:27:24 bench1 openais[12935]: [crm ] WARN: route_ais_message:
Sending message to local.crmd failed: unknown (rc=-2)
Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: pcmk_ipc: Recorded
connection 0x174840d0 for crmd/12946
Sep 22 15:27:24 bench1 openais[12935]: [crm ] info: pcmk_ipc: Sending
membership update 8 to crmd
Sep 22 15:27:24 bench1 openais[12935]: [crm ] info:
update_expected_votes: Expected quorum votes 1024 - 2
Sep 22 15:27:25 bench1 crmd: [12946]: notice: ais_dispatch: Membership
8: quorum aquired
Sep 22 15:28:15 bench1 crmd: [12946]: info: do_election_count_vote:
Election 2 (owner: bench2) pass: vote from bench2 (Host name)
Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State
transition S_PENDING - S_ELECTION [ input=I_ELECTION
cause=C_FSA_INTERNAL origin=do_election_count_vote ]
Sep 22 15:28:15 bench1 crmd: [12946]: info: do_state_transition: State
transition S_ELECTION - S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Sep 22 15:28:15 bench1 crmd: [12946]: info: do_te_control: Registering
TE UUID: 87c28ab8-ba93-4111-a26a-67e88dd927fb
Sep 22 15:28:15 bench1 crmd: [12946]: WARN:
cib_client_add_notify_callback: Callback already present
Sep 22 15:28:15 bench1 crmd: [12946]: info: set_graph_functions: Setting
custom graph functions
Sep 22 15:28:15 bench1 crmd: [12946]: info: unpack_graph: Unpacked
transition -1: 0 actions in 0 synapses
Sep 22 15:28:15 bench1 crmd: [12946]: info: do_dc_takeover: Taking over
DC status for this partition
Sep 22 15:28:15 bench1 cib: [12942]: info: cib_process_readwrite: We are
now in R/W mode

Regards,

Dan



Where did you get that version of openais?  openais 0.80.x is deprecated 
in the community (and hence, no support).  We recommend using corosync 
instead which has improved testing with pacemaker.


Regards
-steve

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker