Hi, I've seen a number of people with this problem (me too!) but nobody seems to have a solution, any help would be greatly appreciated.
Two noded work file with DRBD/OCFS2, but when I load a third using GNBD, I seem to run into problems... I'm running an RH 2.6.21 kernel with Xen 3.2. - OCFS version 1.3.3 - Tools 1.2.4. I have two nodes with the following config; node: ip_port = 7777 ip_address = 10.0.0.1 number = 0 name = nodea cluster = ocfs2 node: ip_port = 7777 ip_address = 10.0.0.2 number = 1 name = nodeb cluster = ocfs2 node: ip_port = 7777 ip_address = 10.0.0.20 number = 3 name = mgm cluster = ocfs2 cluster: node_count = 3 name = ocfs2 nodea is running a 400G filesystem on /drdb1 nodeb is running a 400G filesystem on /drdb2 (mirroring drbd1 using drbd 8) I can load nodes a and b and things look fine and work no problem, both systems can mount their respective drbd's and it all seems to work. I then run gnbd_serv on both machines and export the drbd devices. On booting "mgm", I load drbd-client, then /etc/init.d/o2cb, so far so good; [EMAIL PROTECTED]:~# /etc/init.d/o2cb status Module "configfs": Loaded Filesystem "configfs": Mounted Module "ocfs2_nodemanager": Loaded Module "ocfs2_dlm": Loaded Module "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking O2CB cluster ocfs2: Online Heartbeat dead threshold = 7 Network idle timeout: 10000 Network keepalive delay: 5000 Network reconnect delay: 2000 Checking O2CB heartbeat: Not active [EMAIL PROTECTED]:~# mounted.ocfs2 -f Device FS Nodes /dev/gnbd0 ocfs2 nodea, nodeb /dev/gnbd1 ocfs2 nodea, nodeb [EMAIL PROTECTED]:~# mounted.ocfs2 -d Device FS UUID Label /dev/gnbd0 ocfs2 35fff639-0ec2-4a8d-8849-2b9ef078a40a brick /dev/gnbd1 ocfs2 35fff639-0ec2-4a8d-8849-2b9ef078a40a brick Slots; Slot# Node# 0 0 1 1 Slot# Node# 0 0 1 1 Now .. I come to try and mount a device on host "mgm"; mount -t ocfs2 /dev/gnbd0 /cluster In the kernel log on nodea I see; Feb 9 17:37:01 nodea kernel: (3576,0):o2hb_do_disk_heartbeat:767 ERROR: Device "drbd1": another node is heartbeating in our slot! Feb 9 17:37:03 nodea kernel: (3576,0):o2hb_do_disk_heartbeat:767 ERROR: Device "drbd1": another node is heartbeating in our slot! On nodeb I see; Feb 9 17:37:00 nodeb kernel: (3515,0):o2hb_do_disk_heartbeat:767 ERROR: Device "drbd2": another node is heartbeating in our slot! Feb 9 17:37:02 nodeb kernel: (3515,0):o2hb_do_disk_heartbeat:767 ERROR: Device "drbd2": another node is heartbeating in our slot! And within 10 seconds or so both machines fence themselves off and reboot. It "seems" as tho' mgm is not recognising that slots 0 and 1 are already taken .. but everything "look" Ok to me. Can anyone spot any glaring mistakes or suggest a way I can debug this or provide more information to the list? Many thanks, Gareth.
_______________________________________________ Ocfs2-users mailing list [email protected] http://oss.oracle.com/mailman/listinfo/ocfs2-users
