Re: [DRBD-user] New fence handler for RHCS; rhcs_fence

2012-01-15 Thread Lars Ellenberg
On Sun, Jan 15, 2012 at 12:48:05AM -0500, Digimer wrote: Hi all, I spoke to Lon, the author of obliterate-peer.sh, about updating/rewriting his script to add a few features. From that, I decided to use perl as it's the language I am most comfortable with, so I did a full rewrite. I

[DRBD-user] Master - master and split brain if network is saturated

2012-01-15 Thread Steve Kieu
Hello everyone, I am still currently testing DRBD + OCFS2 in both master nodes and notices that if the networks interface is sort of saturated DRBD drop the connection. Node A and B both centos 6 - drbd-8.3.12 build from vanilla source kernel 2.6.37.6 with vserver patch vs2.3.0.37-rc5 It has

Re: [DRBD-user] Master - master and split brain if network is saturated

2012-01-15 Thread Digimer
On 01/15/2012 07:36 PM, Steve Kieu wrote: Hello everyone, I am still currently testing DRBD + OCFS2 in both master nodes and notices that if the networks interface is sort of saturated DRBD drop the connection. This is not a surprise. If the packets don't arrive in time, be it from a failed

Re: [DRBD-user] Master - master and split brain if network is saturated

2012-01-15 Thread Steve Kieu
This is not a surprise. If the packets don't arrive in time, be it from a failed link or a saturated link, it appears the same to DRBD and will trigger the fence handler. All right - I must/should use a dedicated interface in production then, I guess it is the only way. thanks for the info

Re: [DRBD-user] Master - master and split brain if network is saturated

2012-01-15 Thread Digimer
On 01/15/2012 08:01 PM, Steve Kieu wrote: This is not a surprise. If the packets don't arrive in time, be it from a failed link or a saturated link, it appears the same to DRBD and will trigger the fence handler. All right - I must/should use a dedicated interface in

Re: [DRBD-user] New fence handler for RHCS; rhcs_fence

2012-01-15 Thread Digimer
On 01/15/2012 08:18 AM, Lars Ellenberg wrote: Some comments on where I think that script's logic is incomplete, still: First, if you manage to get a simultaneous cluster crash, and then only one node comes back, you'll be offline, and need admin intervention to get online again. There is