[Ocfs2-users] Re; another node is heartbeating in our slot! (when starting 3rd node)

Gareth Bult Sat, 09 Feb 2008 10:29:07 -0800

Hi, 

I've seen a number of people with this problem (me too!) but nobody seems to 
have a solution, any help would be greatly appreciated.


Two noded work file with DRBD/OCFS2, but when I load a third using GNBD, I seem 
to run into problems... 

I'm running an RH 2.6.21 kernel with Xen 3.2. - OCFS version 1.3.3 - Tools 
1.2.4. 

I have two nodes with the following config; 

node: 
ip_port = 7777 
ip_address = 10.0.0.1 
number = 0 
name = nodea 
cluster = ocfs2 

node: 
ip_port = 7777 
ip_address = 10.0.0.2 
number = 1 
name = nodeb 
cluster = ocfs2 

node: 
ip_port = 7777 
ip_address = 10.0.0.20 
number = 3 
name = mgm 
cluster = ocfs2 

cluster: 
node_count = 3 
name = ocfs2 

nodea is running a 400G filesystem on /drdb1 
nodeb is running a 400G filesystem on /drdb2 (mirroring drbd1 using drbd 8) 

I can load nodes a and b and things look fine and work no problem, both systems 
can mount their respective drbd's and it all seems to work. 

I then run gnbd_serv on both machines and export the drbd devices. 

On booting "mgm", I load drbd-client, then /etc/init.d/o2cb, so far so good; 

[EMAIL PROTECTED]:~# /etc/init.d/o2cb status 
Module "configfs": Loaded 
Filesystem "configfs": Mounted 
Module "ocfs2_nodemanager": Loaded 
Module "ocfs2_dlm": Loaded 
Module "ocfs2_dlmfs": Loaded 
Filesystem "ocfs2_dlmfs": Mounted 
Checking O2CB cluster ocfs2: Online 
Heartbeat dead threshold = 7 
Network idle timeout: 10000 
Network keepalive delay: 5000 
Network reconnect delay: 2000 
Checking O2CB heartbeat: Not active 

[EMAIL PROTECTED]:~# mounted.ocfs2 -f 
Device FS Nodes 
/dev/gnbd0 ocfs2 nodea, nodeb 
/dev/gnbd1 ocfs2 nodea, nodeb 

[EMAIL PROTECTED]:~# mounted.ocfs2 -d 
Device FS UUID Label 
/dev/gnbd0 ocfs2 35fff639-0ec2-4a8d-8849-2b9ef078a40a brick 
/dev/gnbd1 ocfs2 35fff639-0ec2-4a8d-8849-2b9ef078a40a brick 

Slots; 
Slot# Node# 
0 0 
1 1 

Slot# Node# 
0 0 
1 1 

Now .. I come to try and mount a device on host "mgm"; 

mount -t ocfs2 /dev/gnbd0 /cluster 

In the kernel log on nodea I see; 
Feb 9 17:37:01 nodea kernel: (3576,0):o2hb_do_disk_heartbeat:767 ERROR: Device 
"drbd1": another node is heartbeating in our slot! 
Feb 9 17:37:03 nodea kernel: (3576,0):o2hb_do_disk_heartbeat:767 ERROR: Device 
"drbd1": another node is heartbeating in our slot! 

On nodeb I see; 
Feb 9 17:37:00 nodeb kernel: (3515,0):o2hb_do_disk_heartbeat:767 ERROR: Device 
"drbd2": another node is heartbeating in our slot! 
Feb 9 17:37:02 nodeb kernel: (3515,0):o2hb_do_disk_heartbeat:767 ERROR: Device 
"drbd2": another node is heartbeating in our slot! 

And within 10 seconds or so both machines fence themselves off and reboot. 

It "seems" as tho' mgm is not recognising that slots 0 and 1 are already taken 
.. but everything "look" Ok to me. 
Can anyone spot any glaring mistakes or suggest a way I can debug this or 
provide more information to the list? 

Many thanks, 
Gareth.

_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

[Ocfs2-users] Re; another node is heartbeating in our slot! (when starting 3rd node)

Reply via email to