Re: [Linux-HA] Server becomes unresponsive after node failure

2011-03-11 Thread Dejan Muhamedagic
On Wed, Mar 09, 2011 at 11:51:20AM -0600, Dimitri Maziuk wrote: Dejan Muhamedagic wrote: On Tue, Mar 08, 2011 at 02:27:52PM -0600, Dimitri Maziuk wrote: Well, realistically, if the link is a foot of x/over cable and gremlins have not been pulling on it, and the NICs aren't falling out of

Re: [Linux-HA] Server becomes unresponsive after node failure

2011-03-11 Thread Dimitri Maziuk
Dejan Muhamedagic wrote: I guess that in some shops you'd need to clone yourself or sth else, otherwise you just wouldn't scale with demand. Yeah, I keep telling my boss that. ... split-brain So, did you have stonith in place then? ;-) I've users instead, they come and tell me the

Re: [Linux-HA] Server becomes unresponsive after node failure

2011-03-09 Thread Dejan Muhamedagic
On Tue, Mar 08, 2011 at 02:27:52PM -0600, Dimitri Maziuk wrote: Lars Ellenberg wrote: Oh, that's easy. external/ssh pings the victim, and if it does not answer, which will be the case for a down node as well as a down link, stonith is considered to have been successful ;-) In the

Re: [Linux-HA] Server becomes unresponsive after node failure

2011-03-09 Thread Dimitri Maziuk
Dejan Muhamedagic wrote: On Tue, Mar 08, 2011 at 02:27:52PM -0600, Dimitri Maziuk wrote: Well, realistically, if the link is a foot of x/over cable and gremlins have not been pulling on it, and the NICs aren't falling out of their slots, and are half-decent quality hardware, and the drivers

Re: [Linux-HA] Server becomes unresponsive after node failure

2011-03-08 Thread Sascha Hagedorn
: Montag, 7. März 2011 16:43 An: General Linux-HA mailing list Betreff: Re: [Linux-HA] Server becomes unresponsive after node failure Hi, On Mon, Mar 07, 2011 at 10:55:01AM +0100, Sascha Hagedorn wrote: Hello everyone, I am evaluating a two node cluster setup and I am running into some problems

Re: [Linux-HA] Server becomes unresponsive after node failure

2011-03-08 Thread Dejan Muhamedagic
Muhamedagic Gesendet: Montag, 7. März 2011 16:43 An: General Linux-HA mailing list Betreff: Re: [Linux-HA] Server becomes unresponsive after node failure Hi, On Mon, Mar 07, 2011 at 10:55:01AM +0100, Sascha Hagedorn wrote: Hello everyone, I am evaluating a two node cluster setup and I am

Re: [Linux-HA] Server becomes unresponsive after node failure

2011-03-08 Thread Lars Ellenberg
On Tue, Mar 08, 2011 at 05:43:17PM +0100, Dejan Muhamedagic wrote: Hi, On Tue, Mar 08, 2011 at 05:32:44PM +0100, Sascha Hagedorn wrote: Hi Dejan, thank you for your answer. I added an external/ssh stonith resource to test this and it resolved the problem. It wasn't clear to me that

Re: [Linux-HA] Server becomes unresponsive after node failure

2011-03-08 Thread Dimitri Maziuk
Lars Ellenberg wrote: Oh, that's easy. external/ssh pings the victim, and if it does not answer, which will be the case for a down node as well as a down link, stonith is considered to have been successful ;-) In the node down case, this will allow the cluster to proceed, and all is

[Linux-HA] Server becomes unresponsive after node failure

2011-03-07 Thread Sascha Hagedorn
Hello everyone, I am evaluating a two node cluster setup and I am running into some problems. The cluster runs a dual master DRBD disk with a OCFS2 filesystem. Here are the used software versions: - SLES11 + HAE Extension - DRBD 8.3.7 - OCFS2 1.4.2 -

Re: [Linux-HA] Server becomes unresponsive after node failure

2011-03-07 Thread Dejan Muhamedagic
Hi, On Mon, Mar 07, 2011 at 10:55:01AM +0100, Sascha Hagedorn wrote: Hello everyone, I am evaluating a two node cluster setup and I am running into some problems. The cluster runs a dual master DRBD disk with a OCFS2 filesystem. Here are the used software versions: - SLES11