[Linux-ha-dev] Release 2.1.1 entering final testing - looking for help

2007-07-13 Thread Alan Robertson
Hi, I would like to complete release 2.1.1 before next Monday (my time). I have selected what I hope is the final set of patches. If you can help us test, and send emails to the lists, CCing me, that would be great! Here are the source tar balls for the release:

Re: [Linux-ha-dev] Should br/ tag be br?

2007-07-13 Thread Andrew Beekhof
On 7/13/07, Alan Robertson [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: On 7/11/07, Alan Robertson [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: On 7/11/07, DAIKI MATSUDA [EMAIL PROTECTED] wrote: Hello, all. I found a little bug in crm_mon source code (crm/admin/crm_mon.c). It

Re: [Linux-ha-dev] Release 2.1.1 entering final testing - looking for help

2007-07-13 Thread Lars Marowsky-Bree
On 2007-07-13T01:02:37, Alan Robertson [EMAIL PROTECTED] wrote: Note that these are not being built nightly - at least not as of this writing :-(. At the moment it's the same as the /dev build, but that will change. If you do want a nightly build, just let me know. We've got the

Re: [Linux-ha-dev] ping vs. ping_group

2007-07-13 Thread Lars Marowsky-Bree
On 2007-07-13T13:39:12, David Lee [EMAIL PROTECTED] wrote: I've recently had a look inside ping.c and ping_group.c. It wasn't a great surprise to find some commonality. I guess one was derived from other. It means that we've got considerable code duplication, potential unnecessary

Re: [Linux-ha-dev] 2.1.1 is imminent

2007-07-13 Thread Max Hofer
On Friday 13 July 2007, Alan Robertson wrote: Andrew Beekhof wrote: On 7/12/07, Lars Marowsky-Bree [EMAIL PROTECTED] wrote: On 2007-07-12T12:26:35, Andrew Beekhof [EMAIL PROTECTED] wrote: Andrew has some more critical bugfixes pending / under development (one of them is a STONITH

Re: [Linux-ha-dev] 2.1.1 is imminent

2007-07-13 Thread Lars Marowsky-Bree
On 2007-07-13T10:13:28, Max Hofer [EMAIL PROTECTED] wrote: But what is it good for running regression tests (which i hope you guys do before releasing a new version) if you pull code from dev to test branch 3 days before the release should come out? Well, dev is being constantly tested

[Linux-ha-dev] More thoughts about 2.1.1 - SCHEDULE CHANGE - 23 July, 2007

2007-07-13 Thread Alan Robertson
Hi, I have a few things to say here, and a change of mind, which I'll explain in more detail. First, I appreciate the work that Andrew and Dejan and Dave and others have gone to to get us what looks like a really solid release. Awesome work guys! Second, I want to apologize for not fully

Re: [Linux-ha-dev] 2.1.1 is imminent

2007-07-13 Thread Andrew Beekhof
On 7/13/07, Max Hofer [EMAIL PROTECTED] wrote: On Friday 13 July 2007, Alan Robertson wrote: Andrew Beekhof wrote: On 7/12/07, Lars Marowsky-Bree [EMAIL PROTECTED] wrote: On 2007-07-12T12:26:35, Andrew Beekhof [EMAIL PROTECTED] wrote: Andrew has some more critical bugfixes pending /

Re: [Linux-ha-dev] ping vs. ping_group

2007-07-13 Thread Alan Robertson
David Lee wrote: I've recently had a look inside ping.c and ping_group.c. It wasn't a great surprise to find some commonality. I guess one was derived from other. It means that we've got considerable code duplication, potential unnecessary divergence and potential one-but-miss-the-other

Re: [Linux-ha-dev] ping vs. ping_group

2007-07-13 Thread Lars Marowsky-Bree
On 2007-07-13T13:34:59, Alan Robertson [EMAIL PROTECTED] wrote: Here's the bugzilla for that capability: http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1497 This would greatly improve the situation for 2-node clusters and in most cases would eliminate the possibility of

Re: [Linux-HA] Reasonable values for timeouts

2007-07-13 Thread Max Hofer
I agree with the experience dscribed by Eddie. For the 'monitor' keep in mind that the timeout should be lower than the interval. It does not make sense to start a 2nd monitor cycle when the first one did not finish. In the end it boils down to: * you have to know what kind of

[Linux-HA] RE: Failover of resource

2007-07-13 Thread Taldevkar, Chetan
Message: 4 Hi Andrew, Please ignore my earlier 2 mails as they were with node 1 and node 2 having same score (1500). The below cib.xml is the result of exact forced_failedover configuration given on the site. 1. I tried with default_resource_failure_stickiness = -INFINITY and

Re: [Linux-HA] active/passive and SAN

2007-07-13 Thread Fabrice Grelaud
Robert Wipfel a e'crit: On Wed, Jun 27, 2007 at 9:51 AM, in message [EMAIL PROTECTED], Dejan Muhamedagic [EMAIL PROTECTED] wrote: On Wed, Jun 27, 2007 at 12:37:37PM +0200, FG wrote: Hi all, [...] Yes, you should use STONITH. Or perhaps somebody knows

[Linux-HA] DRBD/Heartbeat2 with CRM

2007-07-13 Thread Adrian Overbury
I have a cluster on my hands to setup, and I'm a complete newbie to configuring heartbeat. Not a good combination, huh? But, we've all got to start somewhere. The cluster's a fairly simple drbd/nfs cluster. Two hosts, server1 and server2. drbd is already installed, configured and working

[Linux-HA] active/passive and SAN

2007-07-13 Thread Fabrice Grelaud
Hi all, I need some advices about my configuration for high-availability. I would like to set up an active/passive configuration for our POP/IMAP server. We have two servers with each one two HBA cards (FC) attached to a SAN IBM DS4000 where we store the mailbox. My concern: In case of a

Re: [Linux-HA] MYSQL HA with Linux HA

2007-07-13 Thread Dave Dykstra
What does it mean to be a master server when there is only one machine? Do you mean you want it to be able to do updates and send the changes back to the other when the other comes back up? MySQL has (at least) 2 ways of doing replication. One is master/slave where everything that changes on a

Re: [Linux-HA] File synchronization using heartbeat

2007-07-13 Thread Dominik Klein
Can anybody tell whether heartbeat can synchronize files between primary and secondary during failover? And if so is it possible to control this synchronization (Sync only when necessary). Take a look at DRBD (www.drbd.org) and drbdlinks (http://www.tummy.com/Community/software/drbdlinks/)

Re: [Linux-HA] nfs-kernel-server in heartbeat 2

2007-07-13 Thread Adrian Overbury
It's Heartbeat 2.0.2. I know there's a more recent one (several, in fact) but I can't use them in this environment. This is the Ubuntu package for Dapper Drake, and my boss isn't willing for us to use a self-compiled or backported version, so 2.0.2 is what I've got to work with. Regards,

[Linux-HA] File synchronization using heartbeat

2007-07-13 Thread Gokak, Arun Madhukar
Hi, Can anybody tell whether heartbeat can synchronize files between primary and secondary during failover? And if so is it possible to control this synchronization (Sync only when necessary). Regards, Arun. ___ Linux-HA mailing list

Re: [Linux-HA] Porting Linux-HA

2007-07-13 Thread Alan Robertson
Andrew Beekhof wrote: On 7/10/07, Pankaj [EMAIL PROTECTED] wrote: Hello, We are trying to port Linux-HA on an old OS that does not support shared / dynamic libraries. We went through module.c code where the hash table is created dynamically. We basically need to create the same statically.

Re: [Linux-HA] iLo2 stonith problem

2007-07-13 Thread Tijl Van den Broeck
Afaik, the LOCFG thing wasn't really necessary (but it could be on the newest ILO version and apparently it's in CPQLOCFG.EXE as Alain pointed out). I suggest you give the external/riloe script from the development trunk a try (it can be placed in a stable older 2.0.7 for instance). It's

Re: [Linux-HA] DRBD/Heartbeat2 with CRM

2007-07-13 Thread Dominik Klein
This can help you set it up: http://wiki.linux-ha.org/DRBD/HowTov2 This can get you started to understand HAv2 http://www.linux-ha.org/HeartbeatTutorials Watch the video, read the sheets, take some time and test! Please. Is there any help you guys can give me? All of this confuses me

Antw: Re: [Linux-HA] Reasonable values for timeouts

2007-07-13 Thread matilda matilda
Hi Andrew, you wrote: Andrew Beekhof [EMAIL PROTECTED] 13.07.2007 09:43 i _think_ that the interval is the time between one action ending and the next one starting (rather than between both starting) at least i hope that This is a big difference. So, I would be interested if the interval

Re: [Linux-HA] Reasonable values for timeouts

2007-07-13 Thread Max Hofer
On Friday 13 July 2007, Andrew Beekhof wrote: On 7/13/07, Max Hofer [EMAIL PROTECTED] wrote: I agree with the experience dscribed by Eddie. For the 'monitor' keep in mind that the timeout should be lower than the interval. It does not make sense to start a 2nd monitor cycle when the

Re: Re: Re: [Linux-HA] Reasonable values for timeouts

2007-07-13 Thread Andrew Beekhof
On 7/12/07, Eddie C [EMAIL PROTECTED] wrote: I have found a few things: 1) A status or monitor function.. I would set a timeout for more then 30 seconds. Why? Sometimes developers/administrators do not understand the heartbeat capability. They only want to to/restart a service quickly. If you

Re: [Linux-HA] /usr/lib/heartbeat/findif segmentation fault

2007-07-13 Thread Lars Marowsky-Bree
On 2007-07-13T10:07:19, claudemirf [EMAIL PROTECTED] wrote: I have a new problem...(hehehe). Currently I installed heartbeat2 in my box Suse Linux Enterprise Server 10, I configured a resource “Virtual IP Address”, but when I started my cluster, this resource isn?t running. I was analyzing

[Linux-HA] Online extention of the cluster (ho üpefuly not a duplicate mail)

2007-07-13 Thread maloja01
I hope my email is not shipped twice, but my last mail seams not to recive the list. My messge was: Is it possible to extent a running cluster with new cluster nodes? The extention should be done without any stop of any resource placed on nodes, which are running in the cluster before we extend

Re: [Linux-HA] iLo2 stonith problem

2007-07-13 Thread Dejan Muhamedagic
On Fri, Jul 13, 2007 at 09:13:43AM +0200, Tijl Van den Broeck wrote: Afaik, the LOCFG thing wasn't really necessary (but it could be on the newest ILO version and apparently it's in CPQLOCFG.EXE as Alain pointed out). I suggest you give the external/riloe script from the development trunk a

Re: [Linux-HA] Civility on the mailing lists

2007-07-13 Thread Andrew Beekhof
Alan, Given some of the emails you yourself have written in the last few days (and many times prior to that) as well as your penchant for making disparaging remarks in commit messages, I find this development highly hypocritical and disturbing. I will neither condone such biased and

Re: [Linux-HA] iLo2 stonith problem

2007-07-13 Thread Tijl Van den Broeck
On 7/13/07, Dejan Muhamedagic [EMAIL PROTECTED] wrote: Just tested yesterday: the power method definitely requires ACPI whereas the button method works in any case, as Guy Coates suggested. Updated the xml info in external/riloe to that effect. Ah yes, my bad, mixed up those two :-) Thanks

Re: [Linux-HA] restarting stonithd

2007-07-13 Thread Andrew Beekhof
On 7/13/07, Bernd Schubert [EMAIL PROTECTED] wrote: Hi, on a customer system is running an ancient version of heartbeat (2.0.3). While looking into an entirely different problem I also see that stonithd is heavily leaking memory (~200MB over the last 7 days). No idea why this hasn't been a

Re: [Linux-HA] RE: Failover of resource

2007-07-13 Thread Andrew Beekhof
On 7/13/07, Taldevkar, Chetan [EMAIL PROTECTED] wrote: Message: 4 Hi Andrew, Please ignore my earlier 2 mails as they were with node 1 and node 2 having same score (1500). The below cib.xml is the result of exact forced_failedover configuration given on the site. 1. I tried with

[Linux-HA] RE: Failover of resource

2007-07-13 Thread Taldevkar, Chetan
Message: 4 Hi Andrew, Please find the attachment with the mail. This attachment has output generated from cibadmin - Q for forced_failover. It continues to run monitor part of the script on failed node even after failing over to another node. The crm_failcount -G -U on failed node return

Re: [Linux-HA] Reasonable values for timeouts

2007-07-13 Thread Andrew Beekhof
On 7/13/07, Max Hofer [EMAIL PROTECTED] wrote: On Friday 13 July 2007, Andrew Beekhof wrote: On 7/13/07, Max Hofer [EMAIL PROTECTED] wrote: I agree with the experience dscribed by Eddie. For the 'monitor' keep in mind that the timeout should be lower than the interval. It does not

[Linux-HA] heartbeat resource migration policy problem

2007-07-13 Thread Maxim Veksler
Hello list, I have two nodes (rnd-dev1, rnd-dev2) running RH4 with CentOS based rpm install of heartbeat 2.0.7 I'm seeing a problem where heartbeat prefers to always run resources on node2, even when I deliberately make them return 1 on monitor action. The only scenario where heartbeat chooses

[Linux-HA] Online extention of the cluster

2007-07-13 Thread maloja01
Is it possible to extent a running cluster with new cluster nodes? The extention should be done without any stop of any resource placed on nodes, which are running in the cluster before we extend the cluster. If it is possible, can I use the is_managed attribute to leave the resources

Re: [Linux-HA] More thoughts about 2.1.1 - SCHEDULE CHANGE - 23 July, 2007

2007-07-13 Thread Andrew Beekhof
I thoroughly agree with what you are proposing. You may find the attached script (my own work) useful for cherry-picking patches from one mercurial repo to another. In particular, it preserves the log, timestamp and author which is useful when trawling through changesets at a later date. It

Re: [Linux-HA] More thoughts about 2.1.1 - SCHEDULE CHANGE - 23 July, 2007

2007-07-13 Thread Andrew Beekhof
And attaching the script would probably be helpful... On 7/13/07, Andrew Beekhof [EMAIL PROTECTED] wrote: I thoroughly agree with what you are proposing. You may find the attached script (my own work) useful for cherry-picking patches from one mercurial repo to another. In particular, it

Re: [Linux-HA] Online extention of the cluster

2007-07-13 Thread Andreas Kurz
On 7/13/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Is it possible to extent a running cluster with new cluster nodes? The easiest way is to configure autojoin any in your ha.cf file, which is safe if you use sha1 or md5 keys to authenticate in your authkeys file. This allows every new node