More details:

I am attempting to add a node (node 2) to an existing 2 node ( node 0 and node1) cluster. Alll nodes are curently running SLES9 (2.6.5-7.283-bigsmp i686) + ocfs 1.2.1-4.2. This is the ocfs package that ships with SLES9. Node 2 is not part of the RAC cluster yet, I have only installed ocfs on it. I can mount the ocfs file system on all nodes, and the ocfs file system is accessible from all nodes.

Node 0 is the node alway fenced and gets fenced very frequently. Before I added the kernel.panic parameter, node 0 would get fenced, panic and hang. Only a power reboot would make it responsive again.

My issue is the frequency at which node 0 gets fenced, it has happened at least once a day in the last 2 days.

This is what happened this morning.

I was remotely connected to node 0 via ssh. Then I suddenly lost the connection. I tried to ssh again but node 0 refused the connection.

Checking node 1 dmesg I found :
ocfs2_dlm: Nodes in domain ("A7AE746FB3D34479A4B04C0535A0A341"): 0 1 2
o2net: connection to node ora1 (num 0) at 10.12.1.34:7777 has been idle for 10 seconds, shutting it down. (0,3):o2net_idle_timer:1310 here are some times that might help debug the situation: (tmr 1176207822.713473 now 1176207832.712008 dr 1176207822.713466 adv 1176207822.713475:1176207822.713476 func (1459c2a9:504) 1176196519.600486:1176196519.600489)
o2net: no longer connected to node ora1 (num 0) at 10.12.1.34:7777

checking node 2 dmesg I found:
ocfs2_dlm: Nodes in domain ("A7AE746FB3D34479A4B04C0535A0A341"): 0 1 2
o2net: connection to node ora1 (num 0) at 10.12.1.34:7777 has been idle for 10 seconds, shutting it down. (0,0):o2net_idle_timer:1310 here are some times that might help debug the situation: (tmr 1176207823.774296 now 1176207833.772712 dr 1176207823.774293 adv 1176207823.774297:1176207823.774297 func (1459c2a9:504) 1176196505.704238:1176196505.704240)
o2net: no longer connected to node ora1 (num 0) at 10.12.1.34:7777

Since I had reboot on panic on both node 0, node 0 restarted. Checking /var/log/messages I found: Apr 10 09:39:50 ora1 kernel: (12,2):o2quo_make_decision:121 ERROR: fencing this node because it is only connected to 1 nodes and 2 is needed to make a quorum out of 3 heartbeating nodes Apr 10 09:39:50 ora1 kernel: (12,2):o2hb_stop_all_regions:1909 ERROR: stopping heartbeat on all active regions. Apr 10 09:39:50 ora1 kernel: Kernel panic: ocfs2 is very sorry to be fencing this system by panicing
A



----Original Message Follows----
From: Sunil Mushran <[EMAIL PROTECTED]>
To: enohi ibekwe <[EMAIL PROTECTED]>
CC: [email protected]
Subject: Re: [Ocfs2-users] OCFS2 Fencing, then panic
Date: Fri, 06 Apr 2007 09:31:17 -0700

You will have to provide more information. If you
have a netconsole server configured, it would have the details.
Else, I would recommend you configure one to catch the
messages during fence. We have to see the deduce for the fence
to determine the actual problem.

enohi ibekwe wrote:
Is this also an issue on SLES9?

I see this exact issue on my SLES9 + ocfs 1.2.1-4.2 RAC cluster. I see the error on the same box on the cluster.

_________________________________________________________________
Need a break? Find your escape route with Live Search Maps. http://maps.live.com/?icid=hmtag3


_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

_________________________________________________________________
Mortgage rates near historic lows. Refinance $200,000 loan for as low as $771/month* https://www2.nextag.com/goto.jsp?product=100000035&url=%2fst.jsp&tm=y&search=mortgage_text_links_88_h27f8&disc=y&vers=689&s=4056&p=5117


_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to