Re: [Linux-HA] unable to recover from split-brain in a two-node cluster

2014-06-24 Thread Andrew Beekhof
On 25 Jun 2014, at 12:03 am, Lars Ellenberg wrote: > On Tue, Jun 24, 2014 at 12:23:30PM +1000, Andrew Beekhof wrote: >> >> On 24 Jun 2014, at 1:52 am, f...@vmware.com wrote: >> >>> Hi, >>> >>> I understand that initially the split-brain is caused by heartbeat >>> messaging layer and there is

[Linux-HA] heartbeat 3.0.3 crashes if there are networking/multicast issues (ERROR: lowseq cannnot be greater than ackseq)

2014-06-24 Thread Pasi Kärkkäinen
Hello! I've been seeing heartbeat cluster problems in Linux-based Vyatta and more recent VyOS networking/router appliances. These are currently based on Debian Squeeze, and thus are using: Package: heartbeat Version: 1:3.0.3-2 VyOS bug report: http://bugzilla.vyos.net/show_bug.cgi?id=244 The p

Re: [Linux-HA] unable to recover from split-brain in a two-node cluster

2014-06-24 Thread fank
Hi Lars, Thanks for pointing out the patch. It is not in the heartbeat version on the system (it is using Heartbeat-3-0-7e3a82377fa8). I'll try that out. As for ccm_testclient, the system has stripped out unnecessary files that won't be used during normal operation, including gcc. So ccm_testcl

Re: [Linux-HA] unable to recover from split-brain in a two-node cluster

2014-06-24 Thread fank
Hi Andrew, I do see the last status update from crmd as following on node-1 from crmd is but crm_mon -1 still shows node-0 offline: crmd_ha_status_callback: Status update: Node node-0 now has status [active] [DC=false] Same on node-0 showing node-1 now has status active but crm_mon -1 shows it

Re: [Linux-HA] unable to recover from split-brain in a two-node cluster

2014-06-24 Thread Lars Ellenberg
On Tue, Jun 24, 2014 at 12:23:30PM +1000, Andrew Beekhof wrote: > > On 24 Jun 2014, at 1:52 am, f...@vmware.com wrote: > > > Hi, > > > > I understand that initially the split-brain is caused by heartbeat > > messaging layer and there is nothing much can be done when packets are > > dropped. Ho