Re: [Linux-HA] Documentation for constraints

2007-03-23 Thread Andrew Beekhof
On Mar 22, 2007, at 7:40 PM, Dejan Muhamedagic wrote: On Thu, Mar 22, 2007 at 06:15:41PM +0100, Ragnar Kj?rstad wrote: On Thu, Mar 22, 2007 at 02:05:57PM +0100, Dejan Muhamedagic wrote: On Thu, Mar 22, 2007 at 02:21:23AM +0100, Ragnar Kj?rstad wrote: BTW: I just noticed that the DTD is

Re: [Linux-HA] why number of nodes 16?

2007-03-26 Thread Andrew Beekhof
On Mar 26, 2007, at 12:10 PM, tanghy wrote: Hi, I am reading Alan's tutorial of Linux-HA. I am wondering why it is said in the tutorial that number of suppported cluster nodes is less than 16? What causes it not able to extend to larger size? basically just the amount of testing we've

Re: [Linux-HA] Documentation for constraints

2007-03-27 Thread Andrew Beekhof
On Mar 24, 2007, at 2:03 PM, Dejan Muhamedagic wrote: On Fri, Mar 23, 2007 at 02:08:17PM +0100, Andrew Beekhof wrote: On Mar 22, 2007, at 7:40 PM, Dejan Muhamedagic wrote: On Thu, Mar 22, 2007 at 06:15:41PM +0100, Ragnar Kj?rstad wrote: On Thu, Mar 22, 2007 at 02:05:57PM +0100, Dejan

Re: [Linux-HA] Fail-count, ping-pong

2007-04-03 Thread Andrew Beekhof
search the archives from recent days - someone asked the same question On 3/31/07, Massi [EMAIL PROTECTED] wrote: ok then no new codes, just bugfixes. Back to my first post, I know how to calculated the scores and stickinesses to setup the retry before failing over but I still don't know how

Re: [Linux-HA] resolving Dependency loop error

2007-04-03 Thread Andrew Beekhof
On 4/2/07, kisalay [EMAIL PROTECTED] wrote: Hi, I have recently upgraded my system from linux-ha 2.0.7 to 2.0.8. Since I have upgraded, i have been seeing some errors/ warnings from pengine. I assume that these errors were not checked for in 2.0.7 and more checks have been added in 2.0.8. Below

Re: [Linux-HA] Repeatable simple colocation bug

2007-04-03 Thread Andrew Beekhof
Certainly not optimal, but not actually a bug... (starting example_cAB somewhere would have been a bug) What we need to remember is that the PE will always be substantially less complex than a human brain (which is basically what its trying to model most of the time)... So there will be times

Re: [Linux-HA] New Command Line Tools: Resource Scripts

2007-04-03 Thread Andrew Beekhof
On 4/2/07, Martin Fick [EMAIL PROTECTED] wrote: Hi, I have been using heartbeat 2 for a while and tend to prefer scripts over GUIs or XML so I have created some helper scripts (that I call Resource Scripts) to configure/modify resources and their constraints from the command line that I would

Re: [Linux-HA] cib.xml races on initialization

2007-04-03 Thread Andrew Beekhof
On 4/3/07, Alan Robertson [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: On 4/3/07, Alan Robertson [EMAIL PROTECTED] wrote: Yan Fitterer wrote: Manual manipulation of cib through /var filesystem is explicitly discouraged. Use the cibadmin tool. Heartbeat will synchronize the cib

Re: [Linux-HA] cib.xml races on initialization

2007-04-04 Thread Andrew Beekhof
On 4/3/07, Alan Robertson [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: On 4/3/07, Alan Robertson [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: On 4/3/07, Alan Robertson [EMAIL PROTECTED] wrote: Yan Fitterer wrote: Manual manipulation of cib through /var filesystem is explicitly

Re: [Linux-HA] cib.xml races on initialization

2007-04-04 Thread Andrew Beekhof
On 4/4/07, Bernd Schubert [EMAIL PROTECTED] wrote: Or make the default epoch of an existing CIB file with a missing epoch to be 1 instead of zero? possibly, but the simplest answer is really to just set a proper value for admin_epoch As of coding certainly, as of writing config files I

Re: [Linux-HA] How to make a colocation rule between a Master/Slave resource's Master and another resource?

2007-04-05 Thread Andrew Beekhof
On 4/5/07, Alan Robertson [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: On Mar 20, 2007 at 4:37 PM Alan Robertson [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: On 3/18/07, Alan Robertson [EMAIL PROTECTED] wrote: Lars Marowsky-Bree wrote: On 2007-03-16T10:38:25, Alan Robertson [EMAIL

Re: [Linux-HA] Can a RA know if a clone resource is ordered or interleave is true?

2007-04-05 Thread Andrew Beekhof
On 4/5/07, Michael Schwartzkopff [EMAIL PROTECTED] wrote: Hi, Can a RA script know if the clone resource has set ordered=true or interleave=true? Is this information somewhere set in a variable, like the OCF_RESKEY_CRM_meta_clone_max for the information about maximum number of clones in a

Re: [Linux-HA] crm_verfify cib.xml verification error

2007-04-10 Thread Andrew Beekhof
pretty sure i commented on this recently i'll patch it today On Apr 6, 2007, at 2:40 PM, Alan Robertson wrote: kisalay wrote: Hi, I recently migrated from 2.0.7 to 2.0.8. when I run my old ( 2.0.7 ) cib.xml through crm_verify now, I receive following warns / errors: element cib: validity

Re: [Linux-HA] 2.0.7 Failover Behavior Question

2007-04-10 Thread Andrew Beekhof
On 3/29/07, Mohler, Eric (EMOHLER) [EMAIL PROTECTED] wrote: Andrew, Thanks for your reply. Please refer to --'s below. The resulting behavior is that the app only restarts on the same node, never ping-pong. ** i assume ON and OFF

Re: [Linux-HA] Heartbeat stop hangs

2007-04-10 Thread Andrew Beekhof
On 4/9/07, Kevin Jamieson [EMAIL PROTECTED] wrote: kisalay wrote: I have a 2 node 2.0.8 Linux HA setup. I have observed that when stop is issued on my setup, as soon as the start returns, the stop hangs indefinitely, and the only way to stop heartbeat is to do killall. or wait for the

Re: [Linux-HA] OCF_RESKEY_interval

2007-04-10 Thread Andrew Beekhof
On 4/10/07, Bernd Schubert [EMAIL PROTECTED] wrote: On Tuesday 10 April 2007 14:07:54 Lars Marowsky-Bree wrote: On 2007-04-10T12:02:30, Peter Kruse [EMAIL PROTECTED] wrote: But when you return the proper status - running, failed, not running -, heartbeat should do the right thing

Re: [Linux-HA] OCF_RESKEY_interval

2007-04-11 Thread Andrew Beekhof
On 4/11/07, Bernd Schubert [EMAIL PROTECTED] wrote: Hi Alan, On Wednesday 11 April 2007 00:41:13 Alan Robertson wrote: Bernd Schubert wrote: On Thursday 05 April 2007 20:11:51 Alan Robertson wrote: This particular document had a couple of other errors too, which I believe I've corrected.

Re: [Linux-HA] OCF_RESKEY_interval

2007-04-12 Thread Andrew Beekhof
On 4/11/07, Lars Marowsky-Bree [EMAIL PROTECTED] wrote: On 2007-04-10T18:29:10, Andrew Beekhof [EMAIL PROTECTED] wrote: Apr 10 17:30:25 ha-test-1 process[26425]: Returnig 7 Apr 10 17:30:40 ha-test-1 process[26493]: Maintainance = Apr 10 17:30:40 ha-test-1 process[26493]: OCF_RESKEY_probe = 1

Re: [Linux-HA] pingd not failing over

2007-04-12 Thread Andrew Beekhof
i hate to pester, but where are the fail counts kept track of and what maintains them? they are stored in the status section and are maintained by the tengine process (which increases it whenever a monitor action fails) there is also a CLI tool called crm_failcount that can be used to view

Re: [Linux-HA] set of apache servers + a service IP

2007-04-19 Thread Andrew Beekhof
On 4/18/07, Jose Jerez [EMAIL PROTECTED] wrote: I have been using heartbeat v2 for some time now and a happy customer I am :-) but I need your help for a configuration a little bit more complex. The system is SLES-10 and heartbeat 2.0.7 We have a group of apache servers each one of them in a

Re: [Linux-HA] status check

2007-04-19 Thread Andrew Beekhof
On 4/18/07, Lars Marowsky-Bree [EMAIL PROTECTED] wrote: On 2007-04-17T19:40:13, [EMAIL PROTECTED] wrote: Easiest way is to model after an existing resource agent, Xen for example. I've found the Dummy one a good start in the past. Simple, and shows the basic required components. Yeah, but

Re: [Linux-HA] status check

2007-04-19 Thread Andrew Beekhof
On 4/19/07, Dejan Muhamedagic [EMAIL PROTECTED] wrote: On Thu, Apr 19, 2007 at 10:07:09AM +0200, Andrew Beekhof wrote: On 4/18/07, Lars Marowsky-Bree [EMAIL PROTECTED] wrote: On 2007-04-17T19:40:13, [EMAIL PROTECTED] wrote: Easiest way is to model after an existing resource agent, Xen

Re: [Linux-HA] BadThingsHappen with v2.0.5.

2007-04-19 Thread Andrew Beekhof
On 4/19/07, Peter Kruse [EMAIL PROTECTED] wrote: Hello, thanks for reading this, as it's with ancient v2.0.5., please tell me that this problem can not happen with recent version of heartbeat. Problem description: yesterday in one of our 2node HA-Clusters a successful takeover happened, where

Re: [Linux-HA] Score, Resource stickiness Problems

2007-04-19 Thread Andrew Beekhof
On 4/18/07, Serge Dewailly [EMAIL PROTECTED] wrote: Hi all, I think I'm doing something wrong, but after many serach can't where I'm going wrong... I'm working on a two nodes setup wit hdrbd + filesystem + xen virtual machines. I made a group for each xen resources : group1 = drbd0 +

Re: [Linux-HA] BadThingsHappen with v2.0.5.

2007-04-19 Thread Andrew Beekhof
On 4/19/07, Peter Kruse [EMAIL PROTECTED] wrote: Hi Andrew! Andrew Beekhof wrote: beosrv-c-2 is the failed node right? it was beosrv-c-1 that failed, beosrv-c-2 took over. then i'm afraid your use of the dont fence nodes on startup option has come back to haunt you beosrv-c-1 came up

Re: [Linux-HA] IPv6, service fail on start behaviour

2007-04-19 Thread Andrew Beekhof
On 4/18/07, Benjamin Watine [EMAIL PROTECTED] wrote: Hi all I have two questions about Heartbeat v2 configuration : 1. IPv6addr : I've tried to configure virtual IPv6 address for a resource group. Because I didn't find documentation about this script, I did it like IPaddr, but it don't seems

Re: [Linux-HA] Restarting a resource that failed to start

2007-04-19 Thread Andrew Beekhof
On 4/13/07, Piotr Kaczmarzyk [EMAIL PROTECTED] wrote: Hi, I'm using version 2.0.8 and I tried to provide a highly-available squid service. I wrote my own OCF script which was tested in two versions: ver 1. 'Start' function started squid, waited a few seconds, then tried to connect to port

Re: [Linux-HA] Cannot create group containing drbd using HB GUI

2007-04-20 Thread Andrew Beekhof
On 4/19/07, Doug Knight [EMAIL PROTECTED] wrote: Hi Alan, I had a question about the master location constraint you provided. Are the prefixes on the rsc_location (loc:) and rule (rule:) ids just a convention you used or are they required? convention only Doug On Tue, 2007-04-17 at 12:24

Re: [Linux-HA] Cannot create group containing drbd using HB GUI

2007-04-20 Thread Andrew Beekhof
On 4/20/07, Knight, Doug [EMAIL PROTECTED] wrote: OK, here's what happened. The drbd resources were both successfully running in Secondary mode on both servers, and both partitions were synched. My Filesystem resource was stopped, with the colocation, order, and place constraints in place. When

Re: [Linux-HA] Cannot create group containing drbd using HB GUI

2007-04-20 Thread Andrew Beekhof
On 4/19/07, Doug Knight [EMAIL PROTECTED] wrote: After a closer look at the DTD and my xml file, I found two things: 1) I had rsc_location where I should have had rsc_colocatioon, which is why crm_verify was choking on the lack of an rsc; and 2) I can only have one constraint (rsc_colocation,

Re: [Linux-HA] how to restart a single resource?

2007-04-20 Thread Andrew Beekhof
On 4/19/07, Francesco Ciocchetti [EMAIL PROTECTED] wrote: Hi all, I'm setting up a HA 2.0.8 cluster of 2 nodes with a drbd data replication. I set up a group containing following resources: primitive class=ocf id=IPaddr_10_237_84_226 provider=heartbeat type=IPaddr primitive class=heartbeat

Re: [Linux-HA] error cleaning

2007-04-20 Thread Andrew Beekhof
On 4/20/07, Bernd Schubert [EMAIL PROTECTED] wrote: Hi, when upgrading from heartbeat-2.0.5 to 2.0.8 another problem occured. We have a script the admin can call to clean occured errors. At the end of the script the date of the last cleaning action is set, using the command crm_attribute -n

Re: [Linux-HA] Difference between resource_failure_stickiness and resource_stickiness

2007-04-20 Thread Andrew Beekhof
On 4/19/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi all, After many tests, I can't understand what the difference between the default_resource_failure_stickiness how badly you want resources to move after their monitor action fails default_resource_stickiness. how badly you want

Re: [Linux-HA] resource start not respecting location constraint

2007-04-20 Thread Andrew Beekhof
On 4/20/07, Yan Fitterer [EMAIL PROTECTED] wrote: In the attached pe- warn, why is resource R_audit being started on idm01 when there is an INFINITY constraint with uname eq idm04? BTW - idm04 is in standby at the moment. That should hardly matter. I expect the resource to be cannot run

Re: [Linux-HA] mysql drbd and SAN all together

2007-04-25 Thread Andrew Beekhof
On 4/25/07, Jan Kalcic [EMAIL PROTECTED] wrote: Hi, After some tests in my lab I have now a two nodes cluster working perfectly where I create a virtual ip resource using hb_gui. I also created a drbd partition which is correctly working but not yet included as heartbeat resource. This is the

Re: [Linux-HA] Location constraints

2007-04-25 Thread Andrew Beekhof
its probably worth creating a bug for both of these: http://old.linux-foundation.org/developer_bugzilla/enter_bug.cgi On 4/25/07, Benjamin Watine [EMAIL PROTECTED] wrote: Dejan Muhamedagic a écrit : On Wed, Apr 25, 2007 at 11:59:02AM +0200, Benjamin Watine wrote: You were true, it wasn't a

Re: [Linux-HA] How to add op to existing master/slave tag from command line

2007-04-26 Thread Andrew Beekhof
On 4/25/07, Doug Knight [EMAIL PROTECTED] wrote: Can someone provide an example of xml to be used with cibadmin to add an op tag to an existing master/slave resource? Here's my master/slave definition: master_slave notify=true id=ms_drbd_7788 instance_attributes

Re: [Linux-HA] Should user declared monitor operation activity be seen in log files?

2007-04-30 Thread Andrew Beekhof
On 4/27/07, Serge Dubrouski [EMAIL PROTECTED] wrote: It should not. Unless you wrote your own pgsql RA and put some logging for monitor operation there. you might see some logging from the LRM depends what log level i guess On 4/27/07, Doug Knight [EMAIL PROTECTED] wrote: If I define a

Re: [Linux-HA] Xen-HA on SLES x86_64

2007-05-03 Thread Andrew Beekhof
hard to comment without seeing your config On 5/2/07, Rene Purcell [EMAIL PROTECTED] wrote: Hi all, just want to know if this kind of setup is possible with heartbeat. - There's two nodes. ( node1 and node2 ) - On each nodes there's two DomU ( vm01 on node 1 and vm01 on node2 ) they all have a

Re: [Linux-HA] Possible bug? Heartbeat not assigning Slave status on resource startup

2007-05-03 Thread Andrew Beekhof
Started and Slave are basically the same state - so nothing is wrong as such - though it might be nice if it did in fact show Slave instead of Started. On 5/2/07, Doug Knight [EMAIL PROTECTED] wrote: When I initially start up a master_slave drbd resource (ms_dbrd_7788), using a Place constraint

Re: [Linux-HA] Compilation of 2.0.8 failure.

2007-05-04 Thread Andrew Beekhof
you'll need: e2fsprogs-devel ncurses-devel On 5/4/07, Ben Clewett [EMAIL PROTECTED] wrote: Morning. My first posting, I hope it's relevant... Installed a brand new SUSE 10.1, downloaded heatbeat-2.0.8 and compilation failed: # ./ConfigureMe configure First off, I could not find requested

Re: [Linux-HA] Compilation of 2.0.8 failure.

2007-05-04 Thread Andrew Beekhof
On 5/4/07, Ben Clewett [EMAIL PROTECTED] wrote: Andrew, Thanks for quick reply. No problem finding those, now trying another clean configure and make. I'll watch my compilation and let you know... --- I noticed another dependency I can't work out: Cimom No package in SUSE

Re: [Linux-HA] How many copies of attrd should be running?

2007-05-04 Thread Andrew Beekhof
On 5/3/07, Doug Knight [EMAIL PROTECTED] wrote: Hi all, Should there be more than one copy of attrd running on a node at the same time? no - there was an issue with attrd not exiting if heartbeat was killed but i think thats been fixed for the next release. I've discovered the two nodes in

Re: [Linux-HA] Compilation of 2.0.8 failure.

2007-05-04 Thread Andrew Beekhof
gnutls-devel iputils libnet libxml2-devel lynx lzo-devel net-snmp-devel openwbem-devel pam-devel python-devel python-gtk python-xml swig tcpd-devel Thanks again, Ben Andrew Beekhof wrote: On 5/4/07, Ben Clewett [EMAIL PROTECTED] wrote: Andrew, Thanks for quick reply. No problem finding

Re: [Linux-HA] Problem installing heartbeat on Solaris 10

2007-05-04 Thread Andrew Beekhof
do you have the ncurses development package installed? On 5/4/07, Torbjörn Sjölander [EMAIL PROTECTED] wrote: Hi, I would like to install Hearbeat 2.0.8 on Solaris 10, but always encounter compiling error when running make. Seems to get stuck on native.c: #sh ./ConfigureMe make ... ... ...

Re: [Linux-HA] failed to get the value of field lrm_opstatus from a ha_msg

2007-05-04 Thread Andrew Beekhof
On 5/4/07, Max Hofer [EMAIL PROTECTED] wrote: I tried some power-off tests and after runing on both cluster nodes at the same time they sometimes go havoc. I run into 2 problems: 1.) on one cluster heartbeat shutdown with ERROR: Cannot write to media pipe 0: Resource temporarily unavailable

Re: [Linux-HA] Help required for installing hearbeta 2.0.8 for Fedora Core 5!!

2007-05-07 Thread Andrew Beekhof
you can find some fedora rpms at: http://software.opensuse.org/download/server:/ha-clustering/ Let me know if you find any problems with them (I don't have fedora to test them on - so the pre-reqs or install commands might not be 100% correct) On 5/6/07, Ramsurrun Visham [EMAIL PROTECTED]

Re: [Linux-HA] Cannot create group containing drbd using HB GUI

2007-05-07 Thread Andrew Beekhof
can you open a bug for this and include the _complete_ logs as well as which version you're running (as I no longer recall) On 5/4/07, Doug Knight [EMAIL PROTECTED] wrote: It seems the two nodes in my cluster are behaving differently from each other. First, some simplification/mapping for node

Re: [Linux-HA] How to add resources into crm through crm admin tools?

2007-05-07 Thread Andrew Beekhof
On 5/6/07, Yan Fitterer [EMAIL PROTECTED] wrote: One reason I like to order the resources, is that when one has more resources than can be displayed by crm_mon in a screenful (at least whilst remaining readable...), it is useful to be able to put things like STONITH resources at the bottom.

Re: [Linux-HA] setting is_managed to true triggers restart

2007-05-08 Thread Andrew Beekhof
On 5/8/07, Peter Kruse [EMAIL PROTECTED] wrote: Hello all, with Heartbeat v2.0.8 I have a configuration with the cib.xml as attached. After I started the resource groups I did: crm_resource -p is_managed -r IPaddr1 -t primitive -v false crm_mon shows: IPaddr1 (q-leap::ocf:IP_address):

Re: [Linux-HA] Xen-HA - SLES X86_64

2007-05-08 Thread Andrew Beekhof
grep ERROR logfile try this for starters: May 7 16:31:41 qclsles01 lrmd: [5020]: info: RA output: (resource_qclvmsles02:stop:stderr) Error: the domain 'resource_qclvmsles02' does not exist. May 7 16:31:41 qclsles01 lrmd: [5020]: info: RA output: (resource_qclvmsles02:stop:stdout) Domain

Re: [Linux-HA] Advice with heartbeat on RHAS4

2007-05-08 Thread Andrew Beekhof
you could try one of the fedora rpms at: http://software.opensuse.org/download/server:/ha-clustering even just having a peak at the spec file might help. On 5/8/07, Peter Sørensen [EMAIL PROTECTED] wrote: Hi, I have been playing around with heartbeat version 2.04-1 and drbd-8.0.0 to setup a

Re: [Linux-HA] Xen-HA - SLES X86_64

2007-05-08 Thread Andrew Beekhof
On 5/8/07, Rene Purcell [EMAIL PROTECTED] wrote: On 5/8/07, Rene Purcell [EMAIL PROTECTED] wrote: On 5/8/07, Andrew Beekhof [EMAIL PROTECTED] wrote: grep ERROR logfile try this for starters: May 7 16:31:41 qclsles01 lrmd: [5020]: info: RA output: (resource_qclvmsles02

Re: [Linux-HA] Advice with heartbeat on RHAS4

2007-05-08 Thread Andrew Beekhof
On 5/8/07, Peter Sørensen [EMAIL PROTECTED] wrote: Hi, I tried to download the fedora rpms but get the following: error: Failed dependencies: fedora-usermgmt is needed by heartbeat-2.0.9-43.1.i386 libc.so.6(GLIBC_2.4) is needed by heartbeat-2.0.9-43.1.i386

Re: [Linux-HA] What does Resource fs_mirror does not support reloads mean?

2007-05-10 Thread Andrew Beekhof
On 5/10/07, Yan Fitterer [EMAIL PROTECTED] wrote: Last I heard, reload was an optional OCF action that was not quite fully implemented in Heartbeat (someone please correct me here if need be!) The benefit would be that configuration changes could be picked up without having to interrupt a

Re: [Linux-HA] nodes stays offline after communication is restored

2007-05-10 Thread Andrew Beekhof
PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Beekhof Sent: den 25 april 2007 14:40 To: General Linux-HA mailing list Subject: Re: [Linux-HA] nodes stays offline after communication is restored On 4/23/07, Dejan Muhamedagic [EMAIL PROTECTED] wrote: On Mon, Apr 23, 2007 at 10:27:55AM

Re: [Linux-HA] What does Resource fs_mirror does not support reloads mean?

2007-05-11 Thread Andrew Beekhof
On 5/11/07, Yan Fitterer [EMAIL PROTECTED] wrote: On Thu, May 10, 2007 at 3:38 PM, in message [EMAIL PROTECTED], Andrew Beekhof [EMAIL PROTECTED] wrote: On 5/10/07, Yan Fitterer [EMAIL PROTECTED] wrote: Last I heard, reload was an optional OCF action that was not quite fully implemented

Re: [Linux-HA] Help required for installing hearbeta 2.0.8 for Fedora Core 5!!

2007-05-14 Thread Andrew Beekhof
On 5/10/07, Sam Tran [EMAIL PROTECTED] wrote: On 5/7/07, Andrew Beekhof [EMAIL PROTECTED] wrote: you can find some fedora rpms at: http://software.opensuse.org/download/server:/ha-clustering/ Let me know if you find any problems with them (I don't have fedora to test them on - so the pre

Re: [Linux-HA] STONITH Module for Dell DRAC5

2007-05-14 Thread Andrew Beekhof
On 5/14/07, Alan Robertson [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: On May 14, 2007, at 2:36 PM, Alan Robertson wrote: Dejan Muhamedagic wrote: On Fri, May 11, 2007 at 10:04:11AM +0200, Th.Paschy, hepasoft oHG wrote: Hi all, I am a new user of heartbeat. I configure an active

Re: [Linux-HA] Re: Fwd: Linux-HA Bug report

2007-05-15 Thread Andrew Beekhof
On 5/10/07, Alan Robertson [EMAIL PROTECTED] wrote: There is no 2.0.9. Suse put out something they call 2.0.9. In fairness to SUSE, it was solely my doing while I learnt how to use the openSUSE build service. The version there isnt an official SUSE release either. To be honest I didnt know

Re: [Linux-HA] Remove node from cluster

2007-05-15 Thread Andrew Beekhof
On 5/12/07, Mariusz Mart [EMAIL PROTECTED] wrote: Hi! I have 3 node cluster and I would like to remove one node from CRM database. Is there a simple solution for that? I can just stop it and not use, but it will be better if there is no extra node in hb_gui;) you need to stop the node then

Re: [Linux-HA] A problem in one node(DC node)

2007-05-15 Thread Andrew Beekhof
On 5/14/07, 山内 英生 [EMAIL PROTECTED] wrote: Hi When a problem happened for a process of Heartbeat, will this action be right? 1.When a problem happens in pengine/tengine after resource start in one node, pengine/tengine does not restart. (example. -9 pengine or tengine) are you saying that if

Re: [Linux-HA] two monitor

2007-05-15 Thread Andrew Beekhof
On 5/9/07, benjamin [EMAIL PROTECTED] wrote: Hi, I see here, http://www.linux-ha.org/ClusterInformationBase/Actions, that we can set different interval and paramters to a monitor but is it possible to have two (or more) monitors for a same resource ? For example : is it possible to have a

Re: [Linux-HA] Coredump on active node when other node joins in

2007-05-15 Thread Andrew Beekhof
Hi Berhhard, can you create a bugzilla entry for this please? i'm no expert on this part of the code, but even _if_ you're doing something wrong we shouldn't be dumping core. On 5/14/07, Bernhard Limbach [EMAIL PROTECTED] wrote: Hi, Update: The runlevel thing was not a solution (I assumed

Re: [Linux-HA] What heartbeat version to install now ?

2007-05-15 Thread Andrew Beekhof
On 5/15/07, Benjamin Watine [EMAIL PROTECTED] wrote: Hi I'm about to install Heartbeat on a fresh Debian 4 system, and I was wondering wich version of heartbeat I should install ? Some bugs have been corrected since v2.0.8, and I would like to install these corrections too. So should I install

Re: [Linux-HA] What heartbeat version to install now ?

2007-05-16 Thread Andrew Beekhof
the situation rectified soon. On 5/15/07, Andrew Beekhof [EMAIL PROTECTED] wrote: On 5/15/07, Benjamin Watine [EMAIL PROTECTED] wrote: Hi I'm about to install Heartbeat on a fresh Debian 4 system, and I was wondering wich version of heartbeat I should install ? Some bugs have been corrected

Re: [Linux-HA] What heartbeat version to install now ?

2007-05-16 Thread Andrew Beekhof
Original-Nachricht Datum: Wed, 16 May 2007 09:47:22 +0200 Von: Andrew Beekhof [EMAIL PROTECTED] An: General Linux-HA mailing list linux-ha@lists.linux-ha.org Betreff: Re: [Linux-HA] What heartbeat version to install now ? On 5/15/07, Serge Dubrouski [EMAIL PROTECTED] wrote: Andrew, you used

Re: [Linux-HA] A problem in one node(DC node)

2007-05-16 Thread Andrew Beekhof
-foundation.org/developer_bugzilla/show_bug.cgi?id=1575 Regards, Yamauchi -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Andrew Beekhof Sent: Tuesday, May 15, 2007 6:21 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] A problem in one node(DC node) On 5

Re: [Linux-HA] What heartbeat version to install now ?

2007-05-16 Thread Andrew Beekhof
On 5/16/07, Benjamin Watine [EMAIL PROTECTED] wrote: Andrew Beekhof a écrit : On 5/15/07, Benjamin Watine [EMAIL PROTECTED] wrote: Andrew Beekhof a écrit : On 5/15/07, Benjamin Watine [EMAIL PROTECTED] wrote: Hi I'm about to install Heartbeat on a fresh Debian 4 system, and I

Re: [Linux-HA] What heartbeat version to install now ?

2007-05-16 Thread Andrew Beekhof
On 5/16/07, Serge Dubrouski [EMAIL PROTECTED] wrote: On 5/16/07, Andrew Beekhof [EMAIL PROTECTED] wrote: On 5/15/07, Serge Dubrouski [EMAIL PROTECTED] wrote: Andrew, you used to periodically update the stable Mercury repositary but it looks like you aren;t doing this anymore. Why? I

Re: [Linux-HA] Rsc_location based on a clone monitoring

2007-05-18 Thread Andrew Beekhof
On 4/20/07, Benjamin Lawetz [EMAIL PROTECTED] wrote: I have a 2 node setup giving HA for an IP for mysql servers. The Mysql servers are replicating and therefore should always be running. So I don't want them to be managed by heartbeat. But I do need the monitoring. So I made a classic 2 node IP

Re: [Linux-HA] Compiling error on Debian Etch

2007-05-18 Thread Andrew Beekhof
On 5/18/07, Benjamin Watine [EMAIL PROTECTED] wrote: Benjamin Watine a écrit : Hi the list I'm trying to compile heartbeat 2 on Debian 4 (Etch). As Andrew advise me, I use the SLE10-SP1 package, with patches for stonith suicide. The configure is ok, but when I try to make heartbeat, the

Re: [Linux-HA] A problem in one node(DC node)

2007-05-19 Thread Andrew Beekhof
will be reported tomorrow. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Andrew Beekhof Sent: Wednesday, May 16, 2007 6:06 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] A problem in one node(DC node) On 5/16/07, YAMAUCHI HIDEO [EMAIL PROTECTED

Re: [Linux-HA] Problem when group has no resource?

2007-05-22 Thread Andrew Beekhof
On 5/22/07, Tao Yu [EMAIL PROTECTED] wrote: I am using heartbeat 2.0.8. Two scenarios: 1. start heartbeat with no resource defined in cib.xml. Then try to add an empty group by sending message add_grp group_1 The response message never comes back. In this case, the cib.xml is like the

Re: [Linux-HA] heartbeat2 groups

2007-05-24 Thread Andrew Beekhof
On 5/23/07, Jacob Leaver [EMAIL PROTECTED] wrote: Hello all, I am using heartbeat 2 with crm, since it seemed to be the thing to do. I'm only using a small fraction of the feature set, since this is my loadbalancer, heartbeat2 handles failover and ldirectord is managing the ipvs stuff. My

Re: [Linux-HA] Re: Writing thirdparty program to manage heartbeat

2007-05-24 Thread Andrew Beekhof
On 5/23/07, Tao Yu [EMAIL PROTECTED] wrote: Is it a practical way to call cibadmin/crm_resource/etc.. from the other program? sure. or you can connect directly to the CIB (from any host over a TLS connection) and send XML instructions On 5/23/07, Tao Yu [EMAIL PROTECTED] wrote: Hi, I

Re: [Linux-HA] Problems to add a monitor operation

2007-05-24 Thread Andrew Beekhof
On 5/23/07, Dejan Muhamedagic [EMAIL PROTECTED] wrote: On Tue, May 22, 2007 at 04:43:49PM -0300, [EMAIL PROTECTED] wrote: Hi Everyone, I'm trying to add monitor operations for some resources I've configured in the cib.xml, but when I try to do it using the GUI, I'm getting some errors:

Re: [Linux-HA] Problems to add a monitor operation

2007-05-24 Thread Andrew Beekhof
On 5/23/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi Dejan, Thanks for your help! I've tried to add the monitor operation as you explained, but got the following error: 1) cibadmin -Q -o resources rsc.xml s0580crmdb2pr1:~ # cat rsc.xml resources group ordered=true description=Grupo

Re: [Linux-HA] Re: Writing thirdparty program to manage heartbeat

2007-05-24 Thread Andrew Beekhof
On 5/24/07, Max Hofer [EMAIL PROTECTED] wrote: On Wednesday 23 May 2007, Tao Yu wrote: Is it a practical way to call cibadmin/crm_resource/etc.. from the other program? On 5/23/07, Tao Yu [EMAIL PROTECTED] wrote: Hi, I am trying to write some thirdparty programs to manage heartbeat.

Re: [Linux-HA] issue with management of heartbeat.pid file

2007-05-24 Thread Andrew Beekhof
On 5/24/07, Kevin Jamieson [EMAIL PROTECTED] wrote: Brian Reichert wrote: What I tracked down was that if the box powered down too quickly for heartbeat to clean up, a PID file was left in place: ... But, there's no check to assure the recorded PID is not stale. Have others seen this?

Re: [Linux-HA] Bonding and simplify

2007-05-24 Thread Andrew Beekhof
my eyes! my eyes! please, no html emails :-) On 5/24/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi,everybody,pI've studied heartbeat for several weeks, and found it has perfect and overall functions. But it is a little big for /ppme./ppSystem discription: Two node, Redhat

Re: [Linux-HA] issue with management of heartbeat.pid file

2007-05-24 Thread Andrew Beekhof
On 5/24/07, Brian Reichert [EMAIL PROTECTED] wrote: On Thu, May 24, 2007 at 12:21:13AM -0700, Kevin Jamieson wrote: Brian Reichert wrote: What I tracked down was that if the box powered down too quickly for heartbeat to clean up, a PID file was left in place: ... But, there's no check

Re: [Linux-HA] issue with management of heartbeat.pid file

2007-05-24 Thread Andrew Beekhof
On 5/24/07, Carson Gaspar [EMAIL PROTECTED] wrote: David Lee wrote: Andrew: Thanks for doing that, especially the concern for the non-Linux systems. That concern is much appreciated. Alas: The test -e that attempts to do this is itself non-portable (so bad shell syntax etc.). I think

Re: [Solved] RE: [Linux-HA] Problem with CRM multiple failovers in testing

2007-05-25 Thread Andrew Beekhof
On 5/25/07, Peter Mueller [EMAIL PROTECTED] wrote: a v2-style cluster with a 19200bps serial link? I think that in itself is an interesting attempt ;-) baud 230400 now. Thnx for documentation http://www.linux-ha.org/ha.cf#baud. i think his point is that most v2-style clusters (for now) are

Re: [Linux-HA] Problems to add a monitor operation

2007-05-25 Thread Andrew Beekhof
On 5/25/07, Dejan Muhamedagic [EMAIL PROTECTED] wrote: On Thu, May 24, 2007 at 09:13:07AM +0200, Andrew Beekhof wrote: On 5/23/07, Dejan Muhamedagic [EMAIL PROTECTED] wrote: On Tue, May 22, 2007 at 04:43:49PM -0300, [EMAIL PROTECTED] wrote: Hi Everyone, I'm trying to add monitor

Re: [Linux-HA] Can I switch master/slave roles?

2007-05-25 Thread Andrew Beekhof
using v2 with the crm and master/slave resources - yes On 5/24/07, Michael Dengler [EMAIL PROTECTED] wrote: Hi, I'm new to Heartbeat so please excuse any ignorance. I want to use heartbeat to failover a postgresql 8.2 server. I'm using the WAL shipping scheme to replicate postgresql (the DB's

Re: [Linux-HA] Can I switch master/slave roles?

2007-05-29 Thread Andrew Beekhof
, Andrew Beekhof [EMAIL PROTECTED] wrote: using v2 with the crm and master/slave resources - yes On 5/24/07, Michael Dengler [EMAIL PROTECTED] wrote: Hi, I'm new to Heartbeat so please excuse any ignorance. I want to use heartbeat to failover a postgresql 8.2 server. I'm using the WAL

Re: [Linux-HA] cluster node won't take resources anymore after resource failure (Heartbeat 2)

2007-05-29 Thread Andrew Beekhof
failure to start is a critical condition and if it occurs we will not attempt to start the resource there again until whatever problem has been fixed and the admin tells us so. http://linux-ha.org/v2/faq/manual_recovery On 5/21/07, Hans Koller [EMAIL PROTECTED] wrote: hi, I have a problem

Re: [Linux-HA] More simple solution

2007-05-29 Thread Andrew Beekhof
On 5/24/07, huijun lu [EMAIL PROTECTED] wrote: I know the CRM can do much as I think, but CRM is too complex for me. Now I have 8 processes from heartbeat if I use serial cable and mcast eth1 for heartbeat. But if I set CRM on, then 9 more processes start. Is it possible for me to simplify the

Re: [Linux-HA] cibadmin -Q times out

2007-05-29 Thread Andrew Beekhof
On 5/29/07, Morten Lømo (MLM) [EMAIL PROTECTED] wrote: Hello, cibadmin -Q times out: cibadmin -Q No messages received in 30 seconds.. Aborting Also, I am not able to get access to Heartbeat via hb_gui. It says: can not get information from the cluster. From what I understand my problem is

Re: [Linux-HA] Problem when group has no resource?

2007-05-29 Thread Andrew Beekhof
the heartbeat keeps electing DC and the CIB version keeps increasing. On 5/22/07, Andrew Beekhof [EMAIL PROTECTED] wrote: On 5/22/07, Tao Yu [EMAIL PROTECTED] wrote: I am using heartbeat 2.0.8. Two scenarios: 1. start heartbeat with no resource defined in cib.xml. Then try to add an empty

Re: [Linux-HA] CentOS 5 ?

2007-05-30 Thread Andrew Beekhof
you could try the fedora packages at: http://software.opensuse.org/download/server:/ha-clustering/ On 5/30/07, Alan Debijađi [EMAIL PROTECTED] wrote: I tried install latest hertbeat version on CentOS 5, but I can't, not so far. Is it possible to install it to CentOS 5 or not? If not I'll

Re: [Linux-HA] running haresources2cib.py tool doesn't generate cib.xml

2007-05-31 Thread Andrew Beekhof
please update to the latest version available from suse via the update service On 5/30/07, Tom Vile [EMAIL PROTECTED] wrote: I am attempting to run the tool haresources2cib.py but I keep getting the error below even though I have stopped heartbeat and removed the files in question. The one

Re: [Linux-HA] failing cluster after adding group

2007-05-31 Thread Andrew Beekhof
logs? version? come on people, we can't read minds. On 5/30/07, John Moerenhout [EMAIL PROTECTED] wrote: Hi all, I configured a cluster in a test environment. Two SLES10 boxes as Xen guest on a SLES10 host. I managed to create 2 resources, one for a secondary ipaddress and one for a

Re: [Linux-HA] running haresources2cib.py tool doesn't generate cib.xml

2007-05-31 Thread Andrew Beekhof
On 5/31/07, Jan Kalcic [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: please update to the latest version available from suse via the update service On SLES 10 updated I don't have the problem mentioned here but I can't generate a cib.xml which works correctly for me. The cib.xml generated

Re: [Linux-HA] running haresources2cib.py tool doesn't generate cib.xml

2007-05-31 Thread Andrew Beekhof
On 5/31/07, Jan Kalcic [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: On 5/31/07, Jan Kalcic [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: please update to the latest version available from suse via the update service On SLES 10 updated I don't have the problem mentioned here but I

Re: [Linux-HA] running haresources2cib.py tool doesn't generate cib.xml

2007-06-01 Thread Andrew Beekhof
On 5/31/07, Jan Kalcic [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: On 5/31/07, Jan Kalcic [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: On 5/31/07, Jan Kalcic [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: please update to the latest version available from suse via the update

Re: [Linux-HA] failing cluster after adding group

2007-06-01 Thread Andrew Beekhof
PROTECTED] wrote: Andrew, this is such a common issue (people not giving us version...), is there any way we could include the hb version in the cib? We aready have cib_feature_revision, but maybe we should have heartbeat_version as well? Feedback anyone? Yan Andrew Beekhof wrote: logs? version

Re: [Linux-HA] standby does not take over on multiple power failure

2007-06-04 Thread Andrew Beekhof
On 6/4/07, Thomas Åkerblom (HF/EBC) [EMAIL PROTECTED] wrote: Hi Andrew. I'm using 2.0.8-0.15, but I have seen the same behavior in 2.0.7. In this case ha-9 is DC and also the standby server. ha-8 has no power, but the standby server does not take over. The logs begin right before I pulled the

Re: [Linux-HA] running haresources2cib.py tool doesn't generate cib.xml

2007-06-04 Thread Andrew Beekhof
On 6/1/07, Jan Kalcic [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: But you're wasting my time and that annoy me. I don't have all day to pry relevant information out of people. it works for him but not me, yet you didn't think to mention the version difference? Thats pretty relevant

  1   2   3   4   5   6   7   8   9   10   >