Re: [Pacemaker] how do i test my cluster

2009-01-30 Thread Andrew Beekhof
On Jan 30, 2009, at 1:03 PM, Raoul Bhatia [IPAX] wrote: hi, in the FAQ under the section how do i test my cluster [1] i read that one should use the CTS (or cluster_test) for testing the cluster. is this true for *any* type of cluster, or does this only run a specific test-suite that

Re: [Pacemaker] STONITH

2009-02-02 Thread Andrew Beekhof
On Mon, Feb 2, 2009 at 09:16, Priyanka Ranjan priyanka3rd...@gmail.com wrote: hi , i want to configure STONITH in my two nodes openais (suse 11) cluster. i am looking for a doucment on this. i saw Configuration 1.0 Explained on clusterlab site but unfortunately the stonith section does not

Re: [Pacemaker] using pingd to modify all resources' score

2009-02-02 Thread Andrew Beekhof
On Feb 2, 2009, at 11:45 AM, Raoul Bhatia [IPAX] wrote: hi, is it possible to use one single rsc_location/... rule to modify the score of *all resources* ? no, but in 1.0 you can have multiple constraints share a rule. look for id-ref in the pdf e.g if i have a cluster of 3 nodes and

Re: [Pacemaker] New CTS mode: real-world

2009-02-03 Thread Andrew Beekhof
On Feb 3, 2009, at 11:33 AM, Lars Marowsky-Bree wrote: Hi, I've been thinking about the fact that there's a bunch of scenarios CTS isn't good at simulating. Essentially, it will reboot nodes, or cause processes to crash, way too often for some problems to show. From the point of

Re: [Pacemaker] Call cib_create failed (-47): Update does not conform to the configured schema/DTD

2009-02-06 Thread Andrew Beekhof
On Feb 6, 2009, at 9:07 AM, Stelio Plautz wrote: Hi, following the DRBD Howto (http://clusterlabs.org/wiki/DRBD_HowTo_1.0). I've configured my DRBD and Filesystem resources. But when I try to configure the constraints, as described in the howto, I get the following errors:

Re: [Pacemaker] check_action_definition: Parameters to stonith_rackpdu:0_start_0 on ... changed

2009-02-06 Thread Andrew Beekhof
On Feb 6, 2009, at 11:25 AM, Raoul Bhatia [IPAX] wrote: Andrew Beekhof wrote: On Fri, Dec 5, 2008 at 19:55, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: Raoul Bhatia [IPAX] wrote: what does this tell me? crm_verify[10183]: 2008/12/05_19:40:28 WARN: check_action_definition: Parameters

Re: [Pacemaker] no-quorum-policy

2009-02-09 Thread Andrew Beekhof
On Feb 9, 2009, at 9:57 AM, Romi Verma wrote: On Mon, Feb 9, 2009 at 2:20 PM, Glory Smith xx2gl...@gmail.com wrote: On Sat, Feb 7, 2009 at 8:16 PM, Romi Verma romi3rd...@gmail.com wrote: Hi all, I have one confusion . pacemaker (configuration explained ) document says that if

Re: [Pacemaker] no-quorum-policy

2009-02-09 Thread Andrew Beekhof
On Feb 9, 2009, at 10:49 AM, Romi Verma wrote: when set to suicide, it will only shoot itself and any other nodes in its current partition - it wont try to shoot the nodes it can't see. when set to ignore (or when it has quorum), it will shoot anyone it can't see for any other value,

Re: [Pacemaker] putting a DC into standby does not change the DC

2009-02-10 Thread Andrew Beekhof
On Feb 10, 2009, at 5:03 PM, Raoul Bhatia [IPAX] wrote: Andrew Beekhof wrote: is this an intended behavior? yes if so, can you please elaborate why? (not a priority of course). let me ask a question back, why would it need to be somewhere else? i would not expect a node in standby

Re: [Pacemaker] on_fail

2009-02-13 Thread Andrew Beekhof
On Feb 13, 2009, at 1:06 PM, Romi Verma wrote: It will immediately fence (turn of or reboot depending on your configuration) the node on which the operation failed. Thanks Dominik and Andrew, I have one more question . there is a property requires for any operation. it has three value

Re: [Pacemaker] very urgent

2009-02-16 Thread Andrew Beekhof
On Feb 16, 2009, at 11:24 AM, Glory Smith wrote: we kill the node with STONITH. very hard for a machine to write to shared media when its powered off. we can kill nodes when: - nodes become unresponsive - nodes are not part of the cluster that has quorum - resources fail to stop when

Re: [Pacemaker] very urgent

2009-02-16 Thread Andrew Beekhof
I believe its called SBD, but I'm no expert on it On Feb 16, 2009, at 4:15 PM, Glory Smith wrote: Hi Andrew, how do we configure pesisten reservation fencing in suse 11. Thanks, On Mon, Feb 16, 2009 at 4:48 PM, Glory Smith xx2gl...@gmail.com wrote: I get the feeling that by resource

Re: [Pacemaker] Three questions...

2009-02-23 Thread Andrew Beekhof
On Wed, Feb 18, 2009 at 05:13, Romi Verma romi3rd...@gmail.com wrote: Hi All, do we have any way to encapsulate more than one resource into one . suppose i have a lvm resource, ip add resource so can i combine both?  i know that we can set constraint to make them dependent on each other but

Re: [Pacemaker] Three questions...

2009-02-23 Thread Andrew Beekhof
On Tue, Feb 24, 2009 at 06:38, Romi Verma romi3rd...@gmail.com wrote: Thanks a lot Andrew for your reply, i have another question on stonith. i have configured sbd stonith and riloe stonith.  can i set the order of execution. say i want riloe to get executed first and if it fails then sbd to

Re: [Pacemaker] handling snmp trap with crm_mon

2009-02-25 Thread Andrew Beekhof
On Thu, Feb 26, 2009 at 05:45, Junko IKEDA ike...@intellilink.co.jp wrote: Hi, I try to handle SNMP trap with crm_mon. It seems that the latest crm_mon requires net-snmp 5.4 for enabling SNMP, so I built net-snmp 5.4 on Red Hat, and rebuild Pacemaker 1.0.2 --with-snmp. Usage is, just set -S

Re: [Pacemaker] removing the clone-max restriction messes up my clones

2009-02-27 Thread Andrew Beekhof
We do get automated mails when a new bug is created. There's no need to send it twice. On Fri, Feb 27, 2009 at 17:17, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: hi, i had 2 nodes and the following configuration for my clone_webservice nvpair id=clone_webservice_max      name=clone-max      

Re: [Pacemaker] pingd.c needs glib header

2009-03-01 Thread Andrew Beekhof
weird - it compiles here. applied On Mon, Mar 2, 2009 at 07:28, Junko IKEDA ike...@intellilink.co.jp wrote: Hi, Little patch for pingd.c Compiler might complain about g_timeout_add_seconds(). There is no #include glib.h Best Regards, Junko Ikeda NTT DATA INTELLILINK CORPORATION

Re: [Pacemaker] handling snmp trap with crm_mon

2009-03-01 Thread Andrew Beekhof
I just tried again, and its working here. How are you checking for traps? I get the following: c001n05:~ # echo disableAuthorization yes /etc/snmp/snmptrapd.conf c001n05:~ # sudo snmptrapd -f -a -e -Le -d -Drun:shell,snmptrapd:auth registered debug token run:shell, 1 registered debug token

Re: [Pacemaker] pacemaker.spec for RHEL4

2009-03-02 Thread Andrew Beekhof
Oh, is that why those builds were failing! Thanks for figuring this one out :-) On Mon, Mar 2, 2009 at 09:34, Junko IKEDA ike...@intellilink.co.jp wrote: Hi, It seems that Pacemaker 1.0.2 for RHEL4 on SUSE Build Service fails for the following error. configure: error: C++ preprocessor

Re: [Pacemaker] Fatal assert crm_peer_cache != NULL

2009-03-03 Thread Andrew Beekhof
On Tue, Mar 3, 2009 at 13:20, Andreas Vogler a...@geneon.de wrote: Hi all, I have set up a two node cluster using pacemaker-1.0.0. This worked except for some strange behaviour with respect to handling a master/slave resource. So I upgraded one of the nodes to the openSuSE 10.3 RPM packages

Re: [Pacemaker] pingd.c needs glib header

2009-03-03 Thread Andrew Beekhof
applied - thanks! On Tue, Mar 3, 2009 at 08:35, Junko IKEDA ike...@intellilink.co.jp wrote: Hi, sorry, it's not a problem which header should be set. My build firm is RHEL, and it has only glib2-2.12. g_timeout_add_seconds () can be allowed since 2.14. See attached. Thanks, Junko weird

Re: [Pacemaker] Help for Master/Slave Mysql with Pacemaker

2009-03-10 Thread Andrew Beekhof
On Mon, Mar 9, 2009 at 14:37, r...@free.fr wrote: Hello, I'm trying to setup a master/slave mysql with a VIP on my master and a VIP on my slave. The idea is to use a VIP for R/W connections on the master node and a VIP for R/O connections in the slave node. The R/W VIP always run with the

Re: [Pacemaker] pacemaker-mgmt-client-1.99.0-2.1.x86_64 dependency nightmare

2009-03-11 Thread Andrew Beekhof
On Wed, Mar 11, 2009 at 17:08, lllact...@gmx.net lllact...@gmx.net wrote: Yan Gao wrote: On Wed, 2009-03-11 at 11:53 +0100, lllact...@gmx.net wrote: Hi all, I installed all these rpm's for SLES 10 SP2 (http://ftp5.gwdg.de/pub/opensuse/repositories/server:/ha-clustering/SLES_10/x86_64/)

Re: [Pacemaker] iLO2 stonith device

2009-03-12 Thread Andrew Beekhof
On Wed, Mar 11, 2009 at 17:13, Adrian Chapela achapela.rexist...@gmail.com wrote: One more time. I have decided to use external/riloe as my stonith device but I have some doubts. My system will be a cluster of two nodes. First, Do I need to config riloe stonith as a clone ? not required,

Re: [Pacemaker] do we have nodeid?

2009-03-18 Thread Andrew Beekhof
2009/3/18 Priyanka Ranjan priyanka3rd...@gmail.com: Thanks a lot Andrew for your reply, We decided not to expose the nodeid to normal users - time will tell if that was a good decision. The easiest way to check what value is being used is simply grep the log files. I tried to grep node id 

Re: [Pacemaker] host came online, but is ignored

2009-03-18 Thread Andrew Beekhof
On Wed, Mar 18, 2009 at 13:29, Juha Heinanen j...@tutpro.com wrote: i kept on testing the example configuration and found a failure situation when i rebooted the host (lenny1) that was online, but was not master. starting situation: r...@lenny2:~# crm_mon -1 Last updated: Wed

Re: [Pacemaker] host came online, but is ignored

2009-03-18 Thread Andrew Beekhof
On Wed, Mar 18, 2009 at 17:05, Juha Heinanen j...@tutpro.com wrote: Andrew Beekhof writes:   crmd[1923]: 2009/03/18_14:13:33 WARN: crmd_ha_msg_callback: Ignoring   HA message (op=join_announce) from lenny1: not in our membership list   (size=1)     apparently some part of the cluster

Re: [Pacemaker] Three questions...

2009-03-19 Thread Andrew Beekhof
hand, If a monitor for the instance returns a success code, Pacemaker cannot be aware of the trouble. In the later case, when the node that the instance cannot control must be shooted, Pacemaker will request the useless instance to shoot the node. Andrew Beekhof wrote: 2009/3/16 Romi Verma

Re: [Pacemaker] Monitor a resource without the cluster reacting to the result...

2009-03-26 Thread Andrew Beekhof
On Tue, Mar 24, 2009 at 10:45, foxyc...@yahoo.com wrote: I've been wanting this for some time now and expecting pacemaker would include it in it's newer versions. But I've checked the latest pacemaker 1.0 distribution fresh of the day, and unfortunately have found nothing in it indicating

Re: [Pacemaker] SLES 11 GM is available now - was Re: H.A. on SLES 11?

2009-03-26 Thread Andrew Beekhof
On Tue, Mar 24, 2009 at 17:04, lllact...@gmx.net lllact...@gmx.net wrote: The downloads for SLES 11 GM is now available at: http://download.novell.com/Download?buildid=hwRS9NNA004~ SLE HAE is *_not_* available as Evaluation at: http://www.novell.com/products/highavailability/. Hopefully it

Re: [Pacemaker] H.A. on SLES 11?

2009-03-26 Thread Andrew Beekhof
On Mon, Mar 23, 2009 at 10:40, lllact...@gmx.net lllact...@gmx.net wrote: Martin Gerhard Loschwitz wrote: ... There was the need to invent a packaging design that would allow Pacemaker to be built with OpenAIS and yet have the heartbeat-parts it needs -- without having to install the complete

Re: [Pacemaker] Monitor a resource without the cluster reacting to the result...

2009-03-27 Thread Andrew Beekhof
On Fri, Mar 27, 2009 at 10:30, Joe Bill foxyc...@yahoo.com wrote: --- On Thu, 3/26/09, Andrew Beekhof beek...@gmail.com wrote: what command should I type to cause the cluster to perform a monitor operation at a specific check level on that resource, and return the appropriate OCF

Re: [Pacemaker] Monitor a resource without the cluster reacting to the result...

2009-03-27 Thread Andrew Beekhof
On Fri, Mar 27, 2009 at 13:54, Joe Bill foxyc...@yahoo.com wrote: --- On Fri, 3/27/09, Andrew Beekhof beek...@gmail.com wrote: I saw later that you want to prevent the cluster from doing anything for the resource, simply set is-managed=false for the resource in question. This is incorrect

Re: [Pacemaker] difference between anonymous and globally unique clone

2009-03-30 Thread Andrew Beekhof
2009/3/30 Glory Smith xx2gl...@gmail.com: Hi All, can any one help me in understanding the difference between anonymous and globally unique clone.  anonymous Clones are supposed to behave identically everywhere they are running but globally unique clone can behave differently . Right. The

Re: [Pacemaker] difference between anonymous and globally unique clone

2009-03-31 Thread Andrew Beekhof
2009/3/30 Glory Smith xx2gl...@gmail.com: No. Neither type of resource allows you to place a specific instance on a particular node, this is by design. That means we cant control any clone  resource on individual node. either all instances can be started or  all can be stopped. Not

Re: [Pacemaker] migration-threshold question

2009-04-03 Thread Andrew Beekhof
On Sat, Mar 21, 2009 at 17:04, Juha Heinanen j...@tutpro.com wrote: i have a resource that used to have this crm definition: primitive test lsb:test \        op monitor interval=30s timeout=5s \        meta target-role=Started if i stopped the resource by /etc/init.d/test stop pacemaker

Re: [Pacemaker] compilation problem with last source

2009-04-06 Thread Andrew Beekhof
Do you have libtool installed? What version? On Mon, Apr 6, 2009 at 16:46, Infos E-Blokos in...@e-blokos.com wrote: Hi, /usr/bin/ld: cannot find -lltdl collect2: ld returned 1 exit status gmake[1]: *** [pengine] Error 1 gmake[1]: Leaving directory

Re: [Pacemaker] Fedora 8 and 10 compilation problem

2009-04-07 Thread Andrew Beekhof
On Mon, Apr 6, 2009 at 23:15, Infos E-Blokos in...@e-blokos.com wrote: - Original Message - From: Andrew Beekhof beek...@gmail.com To: pacema...@clusterlabs.org Sent: Monday, April 06, 2009 2:46 PM Subject: Re: [Pacemaker] Fedora 8 and 10 compilation problem On Mon, Apr 6, 2009

Re: [Pacemaker] Resource Scheduler Parameters

2009-04-08 Thread Andrew Beekhof
On Wed, Apr 8, 2009 at 18:40, btinsley btins...@gmail.com wrote: AIS guys said to upgrade to the latest Whitetank :-)  I did and the behavior is the same, but it's not necessarily incorrect. The aisexec process sets itself to the realtime scheduling class, which does the same for all of the

Re: [Pacemaker] Pacemaker on Fedora 10 -- OpenAIS/Corosync version question

2009-04-09 Thread Andrew Beekhof
On Wed, Apr 8, 2009 at 20:54, Ty! Boyack t...@nrel.colostate.edu wrote: I'm trying to get Pacemaker running on a set of Fedora 10 boxes, but I've seen some conflicting/confusing information regarding the state of Pacemaker and how it integrates with OpenAIS/Corosync. It looks like the

Re: [Pacemaker] newbie question

2009-04-09 Thread Andrew Beekhof
On Sat, Apr 4, 2009 at 10:01, Infos E-Blokos in...@e-blokos.com wrote: Hi Andrew, congratulations for your pacemaker work and docs, very useful and clear. also I d like to know how to manage ip aliases onto an openais cluster, I really don't know how to do it and no example on the net. is

Re: [Pacemaker] Pacemaker on Fedora 10 -- OpenAIS/Corosync version question

2009-04-09 Thread Andrew Beekhof
excited to see it become even more stable and ubiquitous on a variety of distributions. -Ty! Andrew Beekhof wrote: On Wed, Apr 8, 2009 at 20:54, Ty! Boyack t...@nrel.colostate.edu wrote: I'm trying to get Pacemaker running on a set of Fedora 10 boxes, but I've seen some conflicting

Re: [Pacemaker] No STONITH resources have been defined

2009-04-14 Thread Andrew Beekhof
On Sat, Apr 11, 2009 at 17:34, Juha Heinanen j...@tutpro.com wrote: when i commit my example crm configuration, i get warning: crm_verify[20553]: 2009/04/11_18:31:08 WARN: unpack_resources: No STONITH resources have been defined Warnings found during check: config may not be valid how to

[Pacemaker] ANNOUNCE: Pacemaker 1.0.3 now available (maintenance release)

2009-04-15 Thread Andrew Beekhof
Hi all, Now that the SLES11 crunch is over, we're back on the regular monthly release schedule. Accordingly, 1.0.3 is now ready and certified to work with both stacks :-) Major points of interest for this release include: - Compact display of clones in crm_mon - Fixed memory leaks -

Re: [Pacemaker] ANNOUNCE: Pacemaker 1.0.3 now available (maintenance release)

2009-04-15 Thread Andrew Beekhof
On Wed, Apr 15, 2009 at 11:51, Juha Heinanen j...@tutpro.com wrote: Rvm writes:   I've just build pacemaker with dpkg-buildpackage -rfakeroot -uc -us   according to the wiki. The generated .deb is in version 1.0.2-1 not   1.0.3. I think the DEBIAN/control is not up to date ? i think that

Re: [Pacemaker] pacemaker 1.0.3 compilation problem on Fedora 8

2009-04-15 Thread Andrew Beekhof
On Thu, Apr 16, 2009 at 06:33, Infos E-Blokos in...@e-blokos.com wrote: xml.Tpo -c xml.c  -fPIC -DPIC -o .libs/xml.o cc1: warnings being treated as errors xml.c: In function 'string2xml': xml.c:486: warning: argument 2 of 'xmlSetGenericErrorFunc' might be a candidate for a format attribute

Re: [Pacemaker] pacemaker 1.0.3 compilation problem on Fedora 8

2009-04-16 Thread Andrew Beekhof
On Thu, Apr 16, 2009 at 07:30, Infos E-Blokos in...@e-blokos.com wrote: ok, but as Fedora 8 is not updated anymore... what the solution ? Re-configure with the --disable-fatal-warnings option Or, change the relevant libxml2 header yourself ___

Re: [Pacemaker] pacemaker 1.0.3 compilation problem on Fedora 8

2009-04-16 Thread Andrew Beekhof
On Thu, Apr 16, 2009 at 09:29, Infos E-Blokos in...@e-blokos.com wrote: - Original Message - From: Andrew Beekhof beek...@gmail.com To: pacema...@clusterlabs.org Sent: Thursday, April 16, 2009 3:22 AM Subject: Re: [Pacemaker] pacemaker 1.0.3 compilation problem on Fedora 8 On Thu

[Pacemaker] Rolling upgrades from 0.6.x to 1.0.x

2009-04-22 Thread Andrew Beekhof
It appears that, despite the best of intentions, rolling upgrades from 0.6.x to 1.0.x simply don't work. The problems are particularly obvious for Heartbeat users, but also exist for those running OpenAIS. Measures have been put in place (a new automated test) so that we don't encounter

Re: [Pacemaker] New System Health feature

2009-04-28 Thread Andrew Beekhof
On Mon, Apr 27, 2009 at 22:25, Mark Hamzy ha...@us.ibm.com wrote: beek...@gmail.com wrote on 04/24/2009 11:00:01 AM: On Thu, Apr 23, 2009 at 17:49, Mark Hamzy ha...@us.ibm.com wrote: Health Attribute-value Meaning green 1000 server is happy, capable of running any resource yellow 0

Re: [Pacemaker] Xen DomU live migration

2009-04-28 Thread Andrew Beekhof
In 1.0.x most underscores were replaced with dashes. So try changing allow_migrate to allow-migrate 2009/4/27 Ivars Strazdiņš ivars.strazd...@gmail.com: Hi Pacemaker experts, sorry for my newbie questions. Here are two. I've setup two node Pacemaker/Heartbeat cluster. Pacemaker is 1.0.3 and

Re: [Pacemaker] New System Health feature

2009-05-07 Thread Andrew Beekhof
On Wed, May 6, 2009 at 11:32 PM, Mark Hamzy ha...@us.ibm.com wrote: beek...@gmail.com wrote on 04/28/2009 10:31:43 AM: Actually, it would still work if the entity responsible for updating the node health combined the readings from the different sources into a single value. However, then you

Re: [Pacemaker] New patch for System Health feature

2009-05-13 Thread Andrew Beekhof
On Wed, May 13, 2009 at 5:07 PM, Mark Hamzy ha...@us.ibm.com wrote: Okay, here is attempt #2: (See attached file: pacemaker.mark.patch) Questions/comments? This is missing the modification to char2score that i mentioned (which would also simplify calculate_system_health()). You'd only want

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-14 Thread Andrew Beekhof
On Thu, May 14, 2009 at 3:58 PM, Nikola Ciprich extmaill...@linuxbox.cz wrote: Hi, Dejan, thanks a lot, I compiled Your version, but crmd with shipped pacemaker keeps segfaulting with it, and unable to rebuild pacemaker with this heartbeat to get the -debug package. compilation fails with:

Re: [Pacemaker] PEngine Recheck Timer message every 15 minutes - why?

2009-05-14 Thread Andrew Beekhof
2009/5/14 Ivars Strazdiņš ivars.strazd...@gmail.com: Hi there, could anyone enlighten me why in a two node cluster one (and only one) node is spitting these messages (below) regularly every 15 minutes? Its to facilitate time-based rules and the expiration of resource failures. You can disable

Re: [Pacemaker] New patch for System Health feature

2009-05-18 Thread Andrew Beekhof
On Mon, May 18, 2009 at 3:03 PM, Lars Marowsky-Bree l...@suse.de wrote: On 2009-05-18T14:15:36, Andrew Beekhof and...@beekhof.net wrote: I've applied a modified version of your patch as http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/deb967617b9e Without sounding like a bore, I disagree

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-18 Thread Andrew Beekhof
On Sat, May 16, 2009 at 10:33 PM, Nikola Ciprich extmaill...@linuxbox.cz wrote: Hi guys, I was able to enable valgrind on our production cluster today, but unfortunately only on the secondary node, I'll be allowed to enable it on primary node hopefully during next weekend. Unfortunately it

Re: [Pacemaker] Pacemaker Digest, Vol 18, Issue 27

2009-05-19 Thread Andrew Beekhof
On Mon, May 18, 2009 at 10:53 PM, Mark Hamzy ha...@us.ibm.com wrote: l...@suse.de wrote on 05/18/2009 15:28:34 AM: Can you provide a description and example of how this final version would be used? Maybe a wiki page or something? Sure thing.  On http://clusterlabs.org/wiki/ somewhere?  What

Re: [Pacemaker] Clone config question

2009-05-19 Thread Andrew Beekhof
On Tue, May 19, 2009 at 1:20 PM, Mark Schenk m.m.a.sch...@tudelft.nl wrote: Hi Andrew,  thanks for the pointer. It turns out that running the Filesystem-monitor operation on the nfs server killed it since it too had a /data filesystem mounted, which then (mistakenly) was being unmounted

Re: [Pacemaker] Clone config question

2009-05-19 Thread Andrew Beekhof
On Tue, May 19, 2009 at 1:39 PM, Mark Schenk m.m.a.sch...@tudelft.nl wrote: Andrew Beekhof wrote: Check out the globally-unique option for clones (hint: you'll want it set to false in this scenario) Hi,  I did this, but it still tries to start two resources on one node! I'm very

Re: [Pacemaker] trigger STONITH for testing purposes

2009-05-19 Thread Andrew Beekhof
On Mon, May 18, 2009 at 8:12 PM, Bob Haxo bh...@sgi.com wrote: Any suggestions as to what needs changing so that the stonith deathmarch can be avoided? If you only have two nodes, the only two ways have already discussed: use poweroff, or don't start the cluster at boot. If you don't want to

Re: [Pacemaker] Clone config question

2009-05-20 Thread Andrew Beekhof
On Wed, May 20, 2009 at 8:55 AM, Mark Schenk m.m.a.sch...@tudelft.nl wrote: Hello Andrew,   thanks for the offer, however I'm pretty sure that it's my lack of knowledge that't the problem here and not pacemaker :-) I'll experiment on and repost here when I'm really stuck... honestly, its

Re: [Pacemaker] trigger STONITH for testing purposes

2009-05-20 Thread Andrew Beekhof
. Cheers, Bob Haxo On Tue, 2009-05-19 at 14:03 +0200, Andrew Beekhof wrote: On Mon, May 18, 2009 at 8:12 PM, Bob Haxo bh...@sgi.com wrote: Any suggestions as to what needs changing so that the stonith deathmarch can be avoided? If you only have two nodes, the only two ways have already

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-20 Thread Andrew Beekhof
+++ b/cib/callbacks.c Wed May 20 14:01:30 2009 +0200 @@ -1064,6 +1064,7 @@ cib_ha_peer_callback(HA_Message * msg, v { xmlNode *xml = convert_ha_message(NULL, msg, __FUNCTION__); cib_peer_callback(xml, private_data); +free_xml(xml); } void On Tue, May 19, 2009 at 8:24 PM, Andrew

Re: [Pacemaker] globally-unique clone question

2009-05-22 Thread Andrew Beekhof
The idea behind unique vs. non-unique is best illustrated by example. Take a CLUSTERIP resource... Based on some criteria (usually the source address), it allocates all requests into a bucket from 0..(N-1), where N ::= clone-max So when we ask is the resource running here, we're really asking:

Re: [Pacemaker] trigger STONITH for testing purposes

2009-05-22 Thread Andrew Beekhof
On Wed, May 20, 2009 at 6:39 PM, Bob Haxo bh...@sgi.com wrote: Hi Andrew, I'd say you removed no-quorum-policy=ignore Actually, the pair of no_quorum_policy and no-quorum-policy are set to ignore, and expected-quorum-votes is set to 2:   crm_config     cluster_property_set

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-22 Thread Andrew Beekhof
{ crmd_ha_msg_filter(msg); - return; } bail: On Wed, May 20, 2009 at 2:47 PM, Nikola Ciprich extmaill...@linuxbox.cz wrote: On Wed, May 20, 2009 at 02:02:52PM +0200, Andrew Beekhof wrote: Ah, well that was pretty obvious. /me humbly apologizes for such a stupid error. Hi and thanks

Re: [Pacemaker] crm command line tool problem

2009-05-25 Thread Andrew Beekhof
Looks like a bug, can you post a hb_report archive of the scenario please? On Fri, May 22, 2009 at 5:51 PM, Joe Armstrong jarmstr...@postpath.com wrote: Hi All, I am playing around with the crm command line tool to create an HA config for pacemaker and am bumping into a problem. If I have

Re: [Pacemaker] PingD Failure-Timeout

2009-05-25 Thread Andrew Beekhof
On Thu, May 21, 2009 at 10:20 PM, Eliot Gable ega...@broadvox.net wrote: Is there a way to time-out the failure of PingD? Yes, but you need version = 1.0.0 I assume you're not running it as a clone right? In my configuration, I cannot run PingD all the time on every node. Only one node

Re: [Pacemaker] crm command line tool problem

2009-05-25 Thread Andrew Beekhof
On Mon, May 25, 2009 at 5:51 PM, Dejan Muhamedagic deja...@fastmail.fm wrote: Hi, On Fri, May 22, 2009 at 08:51:45AM -0700, Joe Armstrong wrote: Hi All, I am playing around with the crm command line tool to create an HA config for pacemaker and am bumping into a problem. If I have a

Re: [Pacemaker] Redesigned Debian HA packages

2009-05-25 Thread Andrew Beekhof
On Mon, May 25, 2009 at 6:05 PM, Juha Heinanen j...@tutpro.com wrote: i replaced my older packages with the new debian packages (heartbeat and pacemaker-heartbeat) and my cluster came up automatically without a need to change anything. regarding crm_mon, i would like to start it automatically

Re: [Pacemaker] Pacemaker on OpenAIS, RRP, and link failure

2009-05-26 Thread Andrew Beekhof
On Tue, May 26, 2009 at 12:08 PM, Florian Haas flor...@linbit.com wrote: Steve, On 2009-05-25 19:56, Juha Heinanen wrote: Steven Dake writes:   The only options I see is to periodically try the failed ring for   liveness.  The problem with this approach is it is hard to implement. try all

Re: [Pacemaker] [Fwd: Re: [RfC] Redesigned Debian HA packages, try 2 (was: try 1)]]

2009-05-27 Thread Andrew Beekhof
On Wed, May 27, 2009 at 2:14 AM, Martin Gerhard Loschwitz martin.loschw...@linbit.com wrote: Raoul Bhatia [IPAX] schrieb: 7. i am not finding /usr/sbin/ocf-tester in any package. cheers, raoul It's in the heartbeat package; is this another candidate for move over to heartbeat-common?

Re: [Pacemaker] pacemaker-mgmt-client: broken dependency in CentOS 5 repo

2009-05-27 Thread Andrew Beekhof
I'll have to defer to Yan Gao on this one. I'm not familiar with the GUI's requirements. On Tue, May 26, 2009 at 1:05 PM, Florian Haas flor...@linbit.com wrote: Andrew, this is probably of minor importance, but the CentOS 5 repo hosted on download.opensuse.org seems to have a dependency issue

Re: [Pacemaker] Changes in the colocation constraints?

2009-05-27 Thread Andrew Beekhof
2009/5/27 Димитър Бойн db...@postpath.com: Hi, I was trying to follow http://wiki.linux-ha.org/DRBD/HowTov2 In the configuration for Floating Peers there is an idea for the following constrains: rsc_colocation id=colo_drbd0_ip0 to=ip0 from=drbd0:0 score=infinity/

Re: [Pacemaker] lvm2-clvm RPMs in opensuse.org package repo?

2009-05-27 Thread Andrew Beekhof
-27 15:26, Andrew Beekhof wrote: openSUSE 11.0 SLE11 thats about it until the next fedora version comes out with a 2.6.26 (or later) kernel ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo

Re: [Pacemaker] mysql ocf error 5

2009-05-27 Thread Andrew Beekhof
On Wed, May 27, 2009 at 6:39 PM, Jason Woodward jason.woodw...@joslin.harvard.edu wrote: I am trying to configure my cluster to run mysql.  I am using the following CRM command: primitive mysql ocf:heartbeat:mysql \ params binary=/usr/local/mysql/bin/mysqld_safe \

Re: [Pacemaker] 'crm configure delete' behavior changed in 1.0.3?

2009-05-28 Thread Andrew Beekhof
On Tue, May 26, 2009 at 6:10 PM, Dejan Muhamedagic deja...@fastmail.fm wrote: Hi, On Tue, May 26, 2009 at 10:01:04AM -0600, Nicholas Dronen wrote: On Tue, May 26, 2009 at 9:55 AM, Dejan Muhamedagic deja...@fastmail.fmwrote: Hi, On Tue, May 26, 2009 at 08:48:26AM -0600, Nicholas Dronen

Re: [Pacemaker] Managing resources - classes

2009-05-28 Thread Andrew Beekhof
On Wed, May 27, 2009 at 4:22 PM, Dejan Muhamedagic deja...@fastmail.fm wrote: Hi, On Wed, May 27, 2009 at 08:42:06AM -0400, Eliot Gable wrote: http://clusterlabs.org/wiki/Documentation First, read this (probably ten times or so, since it won???t make complete sense and you will miss

Re: [Pacemaker] Bug in crm_verify

2009-05-29 Thread Andrew Beekhof
On Thu, May 28, 2009 at 10:56 PM, Dan Urist dur...@ucar.edu wrote: This is minor, but crm_verify from v.1.0.3 of pacemaker apparently doesn't support --verbose as stated. The short option (-V) works. My apologies if this isn't the right place to report a bug-- I didn't see a bug tracker at

Re: [Pacemaker] OCF_RESKEY_CRM_meta* envars

2009-05-29 Thread Andrew Beekhof
On Thu, May 28, 2009 at 10:32 PM, Florian Haas flor...@linbit.com wrote: Andrew, would you mind pointing me to wherever in the code the OCF_RESKEY_CRM_meta* variables are set and passed into the RA environment? I'd like to understand where and how this happens, have been unable to find

Re: [Pacemaker] Managing resources - classes

2009-05-29 Thread Andrew Beekhof
On Thu, May 28, 2009 at 7:33 PM, Dejan Muhamedagic deja...@fastmail.fm wrote: Hi, On Thu, May 28, 2009 at 03:59:49PM +0200, Andrew Beekhof wrote: On Wed, May 27, 2009 at 4:22 PM, Dejan Muhamedagic deja...@fastmail.fm wrote: Hi, On Wed, May 27, 2009 at 08:42:06AM -0400, Eliot Gable

Re: [Pacemaker] eth0:0: warning: name may be invalid

2009-05-29 Thread Andrew Beekhof
2009/5/29 Димитър Бойн db...@postpath.com: Thank you, Neil! I have already tried this but eth0 has IP Address on boot The resource does not start if I change the nic value tp eth0 only :( and I indeed need it to be just additional eth0:0. Thats what it would normally do. It adds an alias to

Re: [Pacemaker] Managing resources - classes

2009-05-29 Thread Andrew Beekhof
On Fri, May 29, 2009 at 12:16 PM, Dejan Muhamedagic deja...@fastmail.fm wrote: Another option would be to implement an alternative parser. At the shell level or in the cib? Something like named.conf (C like). That one looks more pleasing to me, but I may be biased ;-) Agreed, its also quite

Re: [Pacemaker] clone resource examples

2009-06-02 Thread Andrew Beekhof
Try the stonith section of the 1.0 configuration explained pdf. On Tue, Jun 2, 2009 at 7:37 AM, Infos E-Blokos in...@e-blokos.com wrote: Hi, Is anyone know where I can find some clone resource example (with xml) ? Thanks Franck -- This message has been scanned for viruses and dangerous

Re: [Pacemaker] Resource does not migrate to another cluster (node2)

2009-06-02 Thread Andrew Beekhof
it by default. Do you have any stonith resources defined? Thanks in advance, George Gomes On Tue, Jun 2, 2009 at 9:12 AM, Andrew Beekhof and...@beekhof.net wrote: Which pacemaker version? Do you have stonith resources defined? On Mon, Jun 1, 2009 at 11:03 PM, George Gomes geoun...@gmail.com

Re: [Pacemaker] System Health backend part

2009-06-02 Thread Andrew Beekhof
Do you think this should live in pacemaker or with the RAs? I'm inclined to think the latter but am open to persuasion. On Sat, May 30, 2009 at 1:26 AM, Mark Hamzy ha...@us.ibm.com wrote: I would like to see a complete solution for system health shipped with pacemaker. Would you be opposed to

Re: [Pacemaker] trigger STONITH for testing purposes

2009-06-03 Thread Andrew Beekhof
2009/6/3 Yan Gao y...@novell.com: On Fri, 2009-05-22 at 12:33 +0200, Andrew Beekhof wrote: On Wed, May 20, 2009 at 6:39 PM, Bob Haxo bh...@sgi.com wrote: Hi Andrew, I'd say you removed no-quorum-policy=ignore Actually, the pair of no_quorum_policy and no-quorum-policy are set

Re: [Pacemaker] reliable way to cib SEGFAULT -- how is cibadmin -Q --xpath supposed to work?

2009-06-03 Thread Andrew Beekhof
On Wed, Jun 3, 2009 at 6:10 PM, Lars Ellenberg lars.ellenb...@linbit.com wrote: current mercurial pacemaker stable-1.0 do cibadmin -Q --xpath //@id and watch your cib segfault: WARN: Managed /usr/lib/heartbeat/cib process 15295 killed by signal 11 [SIGSEGV - Segmentation violation] crap

Re: [Pacemaker] Do we have a repository for config files?

2009-06-04 Thread Andrew Beekhof
On Wed, Jun 3, 2009 at 9:40 PM, Shaffin Bhanji shaffin.bha...@gmail.com wrote: Hello, I am new to this list but do we have a repository of config files (resources) that enable various HA capabilities yet? Do you mean the scripts or xml fragments that go in the cib? -- Andrew

Re: [Pacemaker] kernel.core_uses_pid and ulimit -c

2009-06-04 Thread Andrew Beekhof
On Thu, Jun 4, 2009 at 8:40 AM, Florian Haas flor...@linbit.com wrote: Andrew, Dejan et al., The TODO page at http://clusterlabs.org/wiki/TODO states that Pacemaker now automagically sets the kernel.core_uses_pid sysctl to ease debugging. Wouldn't it make sense to check the currently set

Re: [Pacemaker] System Health backend part

2009-06-04 Thread Andrew Beekhof
On Tue, Jun 2, 2009 at 11:35 PM, Mark Hamzy ha...@us.ibm.com wrote: I believe that general purpose solutions that follow standards should live in pacemaker. Just returning to this for a moment, if it is a truly general purpose solution, then it could be useful for those not running Pacemaker.

Re: [Pacemaker] pingd comments and metadata

2009-06-04 Thread Andrew Beekhof
On Thu, Jun 4, 2009 at 8:47 AM, Florian Haas flor...@linbit.com wrote: Andrew, Dejan, Dominik, I am by no means a pingd expert, but the current incarnation in stable-1.0 seems to have some outdated and misleading comments and meta data. Examples: parameter name=host_list unique=0 longdesc

Re: [Pacemaker] reliable way to cib SEGFAULT -- how is cibadmin -Q --xpath supposed to work?

2009-06-04 Thread Andrew Beekhof
On Thu, Jun 4, 2009 at 1:33 PM, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Wed, Jun 03, 2009 at 10:45:41PM +0200, Andrew Beekhof wrote: fixed:    http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/cf478ed1269f thanks. though that simply returns the parent XML_ELEMENT_NODE, which

Re: [Pacemaker] ESX guest having SLE HA

2009-06-05 Thread Andrew Beekhof
On Fri, Jun 5, 2009 at 7:19 AM, Priyanka Ranjan priyanka3rd...@gmail.com wrote: Hi All, Does SLE HA is supported as an O.S on VMware ESX guest 3.5 . I'd be surprised if it wasn't ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org

Re: [Pacemaker] cibadmin -R

2009-06-08 Thread Andrew Beekhof
On Sun, Jun 7, 2009 at 5:29 AM, Infos E-Blokos in...@e-blokos.com wrote: Hi, after verify my new cib.xml crm_verify -V -x newcib.xml OK and try to update cibadmin -R -o cib -x cib.xml it says Call cib_replace failed (-47): Update does not conform to the configured schema/DTD null

Re: [Pacemaker] ipAddr2 OCF

2009-06-08 Thread Andrew Beekhof
On Sat, Jun 6, 2009 at 10:52 PM, Infos E-Blokos in...@e-blokos.com wrote: Hi, is it possible to create in primitive ipAddr2 resource as Clone clusterip an IP range (30 IPs) rather than create a primitive resource for eache IP ? Possible, yes. Implemented. no.

Re: [Pacemaker] Spreading the Cluster Accross Datacenters

2009-06-08 Thread Andrew Beekhof
Pacemaker itself doesn't care. What you need to worry about is the requirements of the underlying cluster stack. Generally this involves low latency, high throughput and high reliability. If you have all three, you should be fine. 2009/6/8 Димитър Бойн db...@postpath.com: Hi, Has anybody done

Re: [Pacemaker] Spreading the Cluster Accross Datacenters

2009-06-09 Thread Andrew Beekhof
- From: Andrew Beekhof [mailto:and...@beekhof.net] Sent: Monday, June 08, 2009 2:54 PM To: pacema...@clusterlabs.org Cc: pacema...@clusterlabs.org Subject: Re: [Pacemaker] Spreading the Cluster Accross Datacenters Pacemaker itself doesn't care. What you need to worry about is the requirements

Re: [Pacemaker] Clone resource management

2009-06-09 Thread Andrew Beekhof
You need clone-max=2 and if you're using openais, read http://clusterlabs.org/wiki/FAQ#I_Killed_a_Node_but_the_Cluster_Didn.27t_Recover On Tue, Jun 9, 2009 at 2:08 PM, Shaffin Bhanji shaffin.bha...@gmail.com wrote: Hello, I have setup a clone filesystem resource involving 2 nodes. When one

  1   2   3   4   5   6   7   8   9   10   >