Re: [Linux-ha-dev] start-delay parameter for monitor operation and Split-Brain

2007-04-12 Thread Andrew Beekhof
On 4/12/07, 池田淳子 [EMAIL PROTECTED] wrote: Hi all, I'm newbie, and trying to understand how or when Split-Brain happens. when 1 or more nodes can't communicate with each other there is no connection between this and an operation's start-delay what you *might* be seeing is an old bug that

Re: [Linux-ha-dev] start-delay parameter for monitor operation and Split-Brain

2007-04-12 Thread Alan Robertson
池田淳子 wrote: Hi all, I'm newbie, and trying to understand how or when Split-Brain happens. As trial, I run Dummy resource for now. Split-brain occurs when there is a total communication failure (from the heartbeat perspective) between at least two different cluster nodes. Heartbeat version

[Linux-HA] Heartbeat versus Novell Cluster Services

2007-04-12 Thread Sander van Vugt
Hi, Just like to know your opinion about the following. A pure Linux shop would of course definitely go for Heartbeat as the solution for high availability. However, in an environment that comes from Novell's NetWare, Novell Cluster Services (NCS) would be the best choice, especially if running

Re: [Linux-HA] OCF_RESKEY_interval

2007-04-12 Thread Andrew Beekhof
On 4/11/07, Lars Marowsky-Bree [EMAIL PROTECTED] wrote: On 2007-04-10T18:29:10, Andrew Beekhof [EMAIL PROTECTED] wrote: Apr 10 17:30:25 ha-test-1 process[26425]: Returnig 7 Apr 10 17:30:40 ha-test-1 process[26493]: Maintainance = Apr 10 17:30:40 ha-test-1 process[26493]: OCF_RESKEY_probe = 1

Re: [Linux-HA] pingd not failing over

2007-04-12 Thread Andrew Beekhof
i hate to pester, but where are the fail counts kept track of and what maintains them? they are stored in the status section and are maintained by the tengine process (which increases it whenever a monitor action fails) there is also a CLI tool called crm_failcount that can be used to view

Re: [Linux-HA] Heartbeat versus Novell Cluster Services

2007-04-12 Thread Yan Fitterer
NCS has better integration with EVMS, and has data-network heartbeat. It does not therefore require STONITH. It has had much more testing than HB for large clusters as well. 20+ node clusters are not uncommon. Yan Sander van Vugt wrote: Hi, Just like to know your opinion about the

Re: [Linux-HA] Heartbeat versus Novell Cluster Services

2007-04-12 Thread Yan Fitterer
Oh - and I forgot... UI was _much_ nicer last time I looked. And nowhere near as buggy as the HB GUI. Yan Fitterer wrote: NCS has better integration with EVMS, and has data-network heartbeat. It does not therefore require STONITH. It has had much more testing than HB for large clusters as

[Linux-HA] Re: ping/ping_group directives failing WAS: unable to start pingd (cannot start resource groups as a result

2007-04-12 Thread Alan Robertson
Andrew Beekhof wrote: On 4/11/07, Terry L. Inzauro [EMAIL PROTECTED] wrote: list, this is a continuation of another thread that was started a few weeks back. the original thread was started in regards to the setup of pingd. this thread is in regards to pingd not being able to start for

[Linux-HA] Re: ping/ping_group directives failing WAS: unable to start pingd (cannot start resource groups as a result

2007-04-12 Thread Terry L. Inzauro
Andrew Beekhof wrote: On 4/11/07, Terry L. Inzauro [EMAIL PROTECTED] wrote: list, this is a continuation of another thread that was started a few weeks back. the original thread was started in regards to the setup of pingd. this thread is in regards to pingd not being able to start for

Re: [Linux-HA] Re: ping/ping_group directives failing WAS: unable to start pingd (cannot start resource groups as a result

2007-04-12 Thread Alan Robertson
Terry L. Inzauro wrote: Andrew Beekhof wrote: On 4/11/07, Terry L. Inzauro [EMAIL PROTECTED] wrote: list, this is a continuation of another thread that was started a few weeks back. the original thread was started in regards to the setup of pingd. this thread is in regards to pingd not

Re: [Linux-HA] Heartbeat versus Novell Cluster Services

2007-04-12 Thread Alan Robertson
Sander van Vugt wrote: Hi, Just like to know your opinion about the following. A pure Linux shop would of course definitely go for Heartbeat as the solution for high availability. However, in an environment that comes from Novell's NetWare, Novell Cluster Services (NCS) would be the best

Re: [Linux-HA] no failback

2007-04-12 Thread Alan Robertson
Bernd Eichenberg wrote: Hi at all, I've to warn you, because I'm a newby at HA and I'm german with very small english skillz. Hope you understand my problem and thank for that. My Problem is, there is no failback after the 1.node is valid again. 1. node is Suse 8 with heartbeat that

Re: [Linux-HA] UDP Checksum error in heartbeat packets

2007-04-12 Thread Alan Robertson
Dominik Klein wrote: Hi I use heartbeat 2.0.7 from openSuSE 10.2 10.250.250.27 is master 10.250.250.28 is backup They are connected with direct ethernet cable When watching UDP heartbeats with tshark, my master machine says this (checksum errors): 16:27:24.140509 10.250.250.28 -

Re: [Linux-HA] OCF_RESKEY_interval

2007-04-12 Thread Lars Marowsky-Bree
On 2007-04-12T08:58:15, Andrew Beekhof [EMAIL PROTECTED] wrote: So I thought that probe is maybe never unset correct http://www.osdl.org/developer_bugzilla/show_bug.cgi?id=1479 lmb - not the same thing. the resource is not deleted in between the two types of monitor calls. this

[Linux-HA] Memory Leaks

2007-04-12 Thread Hariharan Jayaraman
Hi All, We are using linux ha for achieving HA solution for a 2 node system. We have observed memory leaks in the crmd process and seen it grow to beyond 700MB. I am currently running 2.0.8 with a patch that fixes a few memory leak issues in crmd. I am trying to use valgrind and efence to nail