Re: [Ocfs2-users] another node is heartbeating in our slot!

Patrick Donker Mon, 18 Dec 2006 14:49:38 -0800

Patrick Donker wrote:
Hi everybody,
First of all, I am new to this list and ocfs2, so forgive my ignorance.
Anyhow, what I'm doing is this:
I'm experimenting on a 2 node debian etch shared fs and haveinstalled ocfs2-tools 1.2.1.The debs run on a vmware esx 3.0.0 server and are clones of a defaulttemplate.
This is my cluster.conf:
cluster:
       node_count = 2
       name = san         node:
       ip_port = 7777
       ip_address = 192.168.100.2
       number = 0
       name = mail         cluster = san         node:
       ip_port = 7777
       ip_address = 192.168.100.5
       number = 1
       name = san
       cluster = san
If I start and mount the fs on one of the nodes, everything goesfine. However, as soon as I mount the fs on the other node I get akernel panic with this message:
Dec 17 13:06:01 san kernel: (2797,0):o2hb_do_disk_heartbeat:854ERROR: Device "sdb": another node is heartbeating in our slot!
mounted.ocfs2 -d on both nodes tell me this:

/dev/sdb              ocfs2  6616a964-f474-4c5e-94b9-3a20343a7178
fsck.ocfs2 -n /dev/sdb

Checking OCFS2 filesystem in /dev/sdb:
 label:              <NONE>
 uuid:               66 16 a9 64 f4 74 4c 5e 94 b9 3a 20 34 3a 71 78
 number of blocks:   26214400
 bytes per block:    4096
 number of clusters: 3276800
 bytes per cluster:  32768
 max slots:  16
Somehow both nodes use the same slot to heartbeat in. Not sure whatcauses this or how to change this. Please help me debug this problembecause I'm stuck.
Thanks
Patrick


_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Sunil Mushran wrote:

As per the config, your node names are 'san' and 'mail'.
Are the names the same as the hostname?

Do on both nodes:
# for i in /config/cluster/san/node/*/local ; do LOCAL=`cat $i`; if [$LOCAL -eq 1 ] ; then echo $i; fi; done;
You should see /config/cluster/san/node/mail/local on mail and
/config/cluster/san/node/san/local on san.

For more, refer to the user's guide, faq sand the mount/umount support
guide in the doc section on http://oss.oracle.com/projects/ocfs2.


Thanks for the suggestion, but if I enter:

for i in /config/cluster/san/node//*/local ; do LOCAL=`cat $i`; if [$LOCAL -eq 1 ] ; then echo $i; fi; done; /


as you suggested, I get

/cat: /config/cluster/san/node/*/local: No such file or directory
-bash: [: -eq: unary operator expected
/

So I guess there is either a typo in your query, or there is an issuewith my set up. I dont have enough linux knowledge (yet ;) to decidewhich one it is...

The hostnames equal the node names.

Last night I added another node, just to see what happens, and to mysurprise all goes well. Now, these are my thoughts, please bear with me:

san is the vm where an iscsi target is running, it consists of a debinstallation, with 2 additional virtual hdd's (sdb & sdc), which Iexport. On sdb and sdc I have created an ocfs2 fs.On the other nodes, amongst which 'mail' is one of them, I connect tothe target using an iscsi initiator, which works fine. As soon as Imount the new iscsi drive, and I monitor activity on 'san' using watch-d -n 1 "echo \"hb\" | debugfs.ocfs2 -n /dev/sdb" I see a heartbeatoriginating from node 0. If I do the same from another node which I'veadded, 'deb01', I see another heartbeat appearing from that node.

Everything works fine so far.

Now, as soon as I mount /dev/sdb on 'san' itself, I get the 'anothernode is heartbeating in our slot!' message, and the system fences allnodes, which results in kernel panics. Apparently 'san' is trying toheartbeat on slot 0, which already is occupied by 'mail'. Looking at thecluster.conf, 'san' should select slot 1. How come it is trying to use 0then?Am I correct in assuming that I cannot mount the ocfs2 fs on the systemwhich is running the cluster???


-Patrick

_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] another node is heartbeating in our slot!

Reply via email to