[Pacemaker] STONITH Deathmatch Explained

2009-05-14 Thread Tim Serong
Greetings, I've written up a brief document entitled STONITH Deathmatch Explained (and Some Hints for Resource Agent Authors and Systems Engineers): http://ourobengr.com/ha It's a description of causes of STONITH deathmatch in Heartbeat/Pacemaker HA clusters, where two nodes continually shoot

Re: [Pacemaker] STONITH Deathmatch Explained

2009-05-14 Thread Dejan Muhamedagic
Hi, On Thu, May 14, 2009 at 06:32:00PM +1000, Tim Serong wrote: Greetings, I've written up a brief document entitled STONITH Deathmatch Explained (and Some Hints for Resource Agent Authors and Systems Engineers): http://ourobengr.com/ha It's a description of causes of STONITH

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-14 Thread Nikola Ciprich
Hi, Dejan, thanks a lot, I compiled Your version, but crmd with shipped pacemaker keeps segfaulting with it, and unable to rebuild pacemaker with this heartbeat to get the -debug package. compilation fails with: plugin.c: In function 'check_message_sanity': plugin.c:1190: warning: format '%d'

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-14 Thread Andrew Beekhof
On Thu, May 14, 2009 at 3:58 PM, Nikola Ciprich extmaill...@linuxbox.cz wrote: Hi, Dejan, thanks a lot, I compiled Your version, but crmd with shipped pacemaker keeps segfaulting with it, and unable to rebuild pacemaker with this heartbeat to get the -debug package. compilation fails with:

[Pacemaker] xml passes crm_verify but fails cibadmin --replace

2009-05-14 Thread Joe Armstrong
Any ideas on how I can get more information on the reason --replace fails (running it with --verbose already). Thanks. Joe ... The base xml file came from cibadmin --query new.xml then I manually added the resources section [r...@vm2 ~]# cat new.xml cib validate-with=pacemaker-1.0

[Pacemaker] PEngine Recheck Timer message every 15 minutes - why?

2009-05-14 Thread Ivars Strazdiņš
Hi there, could anyone enlighten me why in a two node cluster one (and only one) node is spitting these messages (below) regularly every 15 minutes? I did erase all configuration with 'cibadmin -E' then recreated manually - yet they are still coming. Is is something to do with default cluster

Re: [Pacemaker] PEngine Recheck Timer message every 15 minutes - why?

2009-05-14 Thread Eliot Gable
Most likely because you have a cluster-recheck-interval=15m specified. Eliot Gable Senior Engineer 1228 Euclid Ave, Suite 390 Cleveland, OH 44115 Direct: 216-373-4808 Fax: 216-373-4657 ega...@broadvox.net CONFIDENTIAL COMMUNICATION. This e-mail and any files transmitted with it are

Re: [Pacemaker] STONITH Deathmatch Explained

2009-05-14 Thread Joe Armstrong
On Thu, May 14, 2009 at 06:32:00PM +1000, Tim Serong wrote: Greetings, I've written up a brief document entitled STONITH Deathmatch Explained (and Some Hints for Resource Agent Authors and Systems Engineers): http://ourobengr.com/ha It's a description of causes of STONITH deathmatch

Re: [Pacemaker] Can't get all my resources online at once ?

2009-05-14 Thread Karl Katzke
Joe: The document you're looking for is Configuration 1.0 explained at this link: http://www.clusterlabs.org/wiki/Documentation --- Karl Katzke Systems Analyst II TAMU - DRGS Joe Armstrong jarmstr...@postpath.com 5/14/2009 10:34 AM - Original Message - From: Dejan

Re: [Pacemaker] PEngine Recheck Timer message every 15 minutes - why?

2009-05-14 Thread Andrew Beekhof
2009/5/14 Ivars Strazdiņš ivars.strazd...@gmail.com: Hi there, could anyone enlighten me why in a two node cluster one (and only one) node is spitting these messages (below) regularly every 15 minutes? Its to facilitate time-based rules and the expiration of resource failures. You can disable

Re: [Pacemaker] xml passes crm_verify but fails cibadmin --replace

2009-05-14 Thread Joe Armstrong
Answering my own question here... I had forgotten to add the status/ tag. Still odd that crm_verify passed though... Joe -Original Message- From: Joe Armstrong Sent: Thursday, May 14, 2009 7:35 AM To: pacema...@clusterlabs.org Subject: [Pacemaker] xml passes crm_verify but fails

[Pacemaker] Drbd disk don't run

2009-05-14 Thread Rafael Emerick
Hi, I'm tryng to make a cluster with xen-ha using drbd and ocfs2... I want that crm management all resources (xen machines, drbd disks and ocfs2 filesystem ). First, a create a clone lsb resource to init drbd with gui interface. Now, I'm following this manual

Re: [Pacemaker] Drbd disk don't run

2009-05-14 Thread Dejan Muhamedagic
Hi, On Thu, May 14, 2009 at 03:18:15PM -0300, Rafael Emerick wrote: Hi, I'm tryng to make a cluster with xen-ha using drbd and ocfs2... I want that crm management all resources (xen machines, drbd disks and ocfs2 filesystem ). First, a create a clone lsb resource to init drbd with gui

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-14 Thread Nikola Ciprich
Hi guys, sooo I've got valgrind grinding:) I had some trouble getting the latest stuff working, so I used heartbeat-2.99.2 with Dejan's (fixed) patch and --enable-valgrind --with-valgrind-log=--log-file=/tmp/crm-%p.valgrind and recompiled pacemaker-1.0.3 (withount openais as Andrew suggested).

Re: [Pacemaker] STONITH Deathmatch Explained

2009-05-14 Thread Andreas Mock
-Ursprüngliche Nachricht- Von: Joe Armstrong jarmstr...@postpath.com Gesendet: 14.05.09 18:01:11 An: 'pacema...@clusterlabs.org' pacema...@clusterlabs.org Betreff: Re: [Pacemaker] STONITH Deathmatch Explained I agree - a nice read. You might want to also add a possibility to avoid