Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread David Lang
On Tue, 10 Aug 2010, Igor Chudov wrote: > On Tue, Aug 10, 2010 at 7:05 PM, David Lang > wrote: >> On Tue, 10 Aug 2010, Igor Chudov wrote: >> >>> On Tue, Aug 10, 2010 at 6:41 PM, David Lang >>> wrote: On Tue, 10 Aug 2010, Igor Chudov wrote: As I noted in a prior e-mail, to work around i

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread Igor Chudov
On Tue, Aug 10, 2010 at 7:05 PM, David Lang wrote: > On Tue, 10 Aug 2010, Igor Chudov wrote: > >> On Tue, Aug 10, 2010 at 6:41 PM, David Lang >> wrote: >>> On Tue, 10 Aug 2010, Igor Chudov wrote: >>> Guys, I have a bit of clarification. In an attempt to avoid the timing issues, an hour

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread David Lang
On Tue, 10 Aug 2010, Igor Chudov wrote: > On Tue, Aug 10, 2010 at 6:41 PM, David Lang > wrote: >> On Tue, 10 Aug 2010, Igor Chudov wrote: >> >>> Guys, I have a bit of clarification. In an attempt to avoid the timing >>> issues, an hour ago I tried adding a configuration change to >>> /etc/init.d/

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread Igor Chudov
On Tue, Aug 10, 2010 at 6:41 PM, David Lang wrote: > On Tue, 10 Aug 2010, Igor Chudov wrote: > >> Guys, I have a bit of clarification. In an attempt to avoid the timing >> issues, an hour ago I tried adding a configuration change to >> /etc/init.d/heartbeat to delay starting it by 2 minutes on one

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread David Lang
On Tue, 10 Aug 2010, Igor Chudov wrote: > Guys, I have a bit of clarification. In an attempt to avoid the timing > issues, an hour ago I tried adding a configuration change to > /etc/init.d/heartbeat to delay starting it by 2 minutes on one box. So > logs with takeover succeeding, and heartbeat sh

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread Igor Chudov
Guys, I have a bit of clarification. In an attempt to avoid the timing issues, an hour ago I tried adding a configuration change to /etc/init.d/heartbeat to delay starting it by 2 minutes on one box. So logs with takeover succeeding, and heartbeat shutting down are partly an artifact of this change

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread David Lang
actually, what catches my attention is a little before that Aug 10 17:49:08 pfs-srv3 heartbeat: [1162]: WARN: Shutdown delayed until current resource activity finishes. Aug 10 17:49:08 pfs-srv3 heartbeat: [1162]: info: Heartbeat shutdown in progress. (1162) Aug 10 17:49:08 pfs-srv3 heartbeat: [

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread Igor Chudov
David and Dmitri, Here's one more try and one more set of log files. I now see that heartbeat is shutting down, which is beyond what used to happen. some interesting lines I saw: Aug 10 17:49:09 pfs-srv4 heartbeat: [1276]: info: Received shutdown notice from 'pfs-srv3'. Aug 10 17:49:08 pfs-srv3

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread Dimitri Maziuk
On Tuesday 10 August 2010 17:19, Igor Chudov wrote: > Guys, I just sent ha-log, ha.cf, haresources from both machines. These look like shutdown logs, not startup logs. FWIW here's what mine's like (sanitized): ** secondary ** heartbeat: [8356]: info: Configuration validated. Starting heartbeat 2.

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread David Lang
On Tue, 10 Aug 2010, Igor Chudov wrote: > Guys, I just sent ha-log, ha.cf, haresources from both machines. > > At this point, I of course greatly appreciate your help and your > generous assistance. > > But I wonder if our attention is going in a wrong direction of "try > this and try that". > > W

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread Igor Chudov
Guys, I just sent ha-log, ha.cf, haresources from both machines. At this point, I of course greatly appreciate your help and your generous assistance. But I wonder if our attention is going in a wrong direction of "try this and try that". What if right now, I need to systematically understand wh

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread Igor Chudov
On Tue, Aug 10, 2010 at 3:25 PM, David Lang wrote: > could you re-post the files (log files, ha.cf and haresources from each box) > Log file from pfs-srv3 Aug 10 17:08:28 pfs-srv3 heartbeat: [1216]: info: other_holds_resources: 0 Aug 10 17:08:28 pfs-srv3 heartbeat: [1216]: info: Received shutd

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread David Lang
could you re-post the files (log files, ha.cf and haresources from each box) David Lang On Tue, 10 Aug 2010, Igor Chudov wrote: > Date: Tue, 10 Aug 2010 15:23:44 -0500 > From: Igor Chudov > Reply-To: General Linux-HA mailing list > To: General Linux-HA mailing list > Subject: Re: [Linux-HA] H

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread Igor Chudov
On Tue, Aug 10, 2010 at 2:59 PM, David Lang wrote: > Ok, just checking again, the two haresources files are truely identical. > > you didn't put different system names in the first line of each file or > something like that? (this is a common mistake) > > I would also remove the second host from t

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread David Lang
Ok, just checking again, the two haresources files are truely identical. you didn't put different system names in the first line of each file or something like that? (this is a common mistake) I would also remove the second host from the haresources file. having it there with no resources on it

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread Igor Chudov
On Tue, Aug 10, 2010 at 2:28 PM, David Lang wrote: > On Tue, 10 Aug 2010, Igor Chudov wrote: > >> Dmitri, you are right. >> >> In any case the name change did nothing. > > did it eliminate the error from the log? does the log say anything else after > that point? It eliminated the error from the

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread David Lang
On Tue, 10 Aug 2010, Igor Chudov wrote: > Dmitri, you are right. > > In any case the name change did nothing. did it eliminate the error from the log? does the log say anything else after that point? David Lang > They are still refuse to take over when rebooted simultaneously. > > The symptoms

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread Igor Chudov
Dmitri, you are right. In any case the name change did nothing. They are still refuse to take over when rebooted simultaneously. The symptoms are the same as usual. I am thinking, should I perhaps put a little statement in /etc/init.d/heartbeat on one of the boxes and add "sleep 100" in it? i

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread Dimitri Maziuk
On Tuesday 10 August 2010 13:14, Igor Chudov wrote: > > Haresources refers to "drbddisk", however, the resource in > /usr/lib/ocf/resource.d/heartbeat is called "drbd". Heartbeat 2.1.4 on centos 5 comes with /etc/ha.d/resource.d/drbddisk. Looks like the docs you read don't match the version you h

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread Igor Chudov
On Tue, Aug 10, 2010 at 1:08 PM, David Lang wrote: > On Tue, 10 Aug 2010, Igor Chudov wrote: > >> On Tue, Aug 10, 2010 at 12:51 PM, David Lang >> wrote: >>> >>> one problem I see in ha-log-2.txt is the lines >>> >>> Aug 10 10:38:06 pfs-srv4 ResourceManager[1241]: [1253]: ERROR: Cannot >>> locate

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread David Lang
On Tue, 10 Aug 2010, Igor Chudov wrote: On Tue, Aug 10, 2010 at 12:51 PM, David Lang wrote: one problem I see in ha-log-2.txt is the lines Aug 10 10:38:06 pfs-srv4 ResourceManager[1241]: [1253]: ERROR: Cannot locate resource script Aug 10 10:38:06 pfs-srv4 req_resource[1236]: [1256]: debug:

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread Igor Chudov
On Tue, Aug 10, 2010 at 12:51 PM, David Lang wrote: > one problem I see in ha-log-2.txt is the lines > > Aug 10 10:38:06 pfs-srv4 ResourceManager[1241]: [1253]: ERROR: Cannot locate > resource script > Aug 10 10:38:06 pfs-srv4 req_resource[1236]: [1256]: debug: in > /usr/share/heartbeat/req_reso

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread David Lang
one problem I see in ha-log-2.txt is the lines Aug 10 10:38:06 pfs-srv4 ResourceManager[1241]: [1253]: ERROR: Cannot locate resource script Aug 10 10:38:06 pfs-srv4 req_resource[1236]: [1256]: debug: in /usr/share/heartbeat/req_resource Aug 10 10:38:06 pfs-srv4 req_resource[1236]: [1258]: debug:

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread Igor Chudov
On Tue, Aug 10, 2010 at 10:21 AM, Pushkar Pradhan wrote: > David, I did a fresh restart today (without changing to mcast, yet, as > I want to do one thing at a time). > > Again, neither server took over. > > Here's the ha-logs from them: > > http://igor.chudov.com/tmp/ha-log-1.txt > http://igor.ch

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread Pushkar Pradhan
From: linux-ha-boun...@lists.linux-ha.org on behalf of Igor Chudov Sent: Tue 8/10/2010 6:50 AM To: General Linux-HA mailing list Cc: david.l...@digitalinsight.com Subject: Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

Re: [Linux-HA] Heartbeat does not take over if BOTH machines arebootedat the same time

2010-08-10 Thread Igor Chudov
On Mon, Aug 9, 2010 at 5:07 PM, David Lang wrote: > ha-log should give you a detailed picture of what each box is thinking as they > startup. I've always been able to track down the problem with that info for my > systems. > David, I did a fresh restart today (without changing to mcast, yet, as I