Just for upgrade - you must upgrade both, kernel and utilities, to SLES9 SP3 + make dynamic update to, at least, kernel 257 and current ocfs tools.
Oracle supports OCFSv2 startiung with SLES9 kernel 255, which is 'SP3 + few online updates'. Another advice - always run 3-d node, even if you dont use it, just for quorum. And update heartbeat parameters (see /etc/sysconfig/o2cb) - default results in 12 seconds timeout, which is not practical because network convergency in ethernet is 40 seconds by STP standard. ----- Original Message ----- From: "Mark Maiden" <[EMAIL PROTECTED]> To: "ocfs2-users" <[email protected]> Sent: Monday, July 10, 2006 4:02 AM Subject: [Ocfs2-users] 2 Node cluster crashing > Hi, > > We have a two node cluster running SLES 9 SP2 connecting directly to an > EMC CX300 for storage. > > We are using OCFS(OCFS2 DLM 0.99.15-SLES) for the voting disk etc, and > ASM for data files. > > The system has been running until last Friday when the whole cluster > went down with the following error messages in the /var/log/messages > files : > > rac1: > > Jul 7 14:56:23 rac1 kernel: (0,3):o2net_state_change:512 connection to node > rac2.globoforce.com num 1 at 198.87.235.246:7777 has been idle for 10 > seconds, > shutting it down. > Jul 7 14:56:23 rac1 kernel: (10042,0):o2net_set_nn_state:414 no longer > connected to node rac2.globoforce.com at 198.87.235.246:7777 > Jul 7 14:56:56 rac1 kernel: (14410,3):ocfs2_replay_journal:1123 Recovering > node 1 from slot 1 on device (8,65) > > rac2: > > Jul 7 14:56:24 rac2 kernel: (0,0):o2net_state_change:512 connection to node > rac1.globoforce.com num 0 at 198.87.235.244:7777 has been idle for 10 > seconds, > shutting it down. > Jul 7 14:56:24 rac2 kernel: (10201,0):o2net_set_nn_state:414 no longer > connected to node rac1.globoforce.com at 198.87.235.244:7777 > Jul 7 14:56:42 rac2 kernel: (10201,0):o2net_check_quorum:1468 ERROR: fencing > this node because it is connected to a half-quorum of 1 out of 2 nodes which > doesn't include the lowest active node 0 > Jul 7 14:56:42 rac2 kernel: (10201,0):o2hb_stop_all_regions:1589 ERROR: > stopping heartbeat on all active regions. > Jul 7 14:56:42 rac2 kernel: Kernel panic: ocfs2 is very sorry to be fencing > this system by panicing > > > I opened up an SR with Oracle and they recommended that we upgrade to > SLES 9 SP3 because they don't support the OCFS version that we are > running. I inquired as to whether this will sort out the problem, but > they replied with a very vague answer. > > Can somebody please shed some light on this : is this version of OCFS > that we are running very buggy and causes lots of problems like this? > And if we upgrade is it going to sort out the problem, or are we just > brining ourselves into "Supported-land" and we can get fixed from there? > > Also(sorry for all the questions :), when we upgrade, is it just a case > of upgrading the kernel and the OCFS rpm's? > > Thank you for your help in advance...much appreciated!! > -- > > Mark Maiden > Systems Administrator > Globoforce, Ltd > 6 Beckett Way Parkwest > Dublin 12 > Ireland > t: +353 1 625 8812 > f: +353 1 625 8880 > e: [EMAIL PROTECTED] > www.globoforce.com > > http://guidance.gospelcom.net/answer.htm > > _______________________________________________ > Ocfs2-users mailing list > [email protected] > http://oss.oracle.com/mailman/listinfo/ocfs2-users > _______________________________________________ Ocfs2-users mailing list [email protected] http://oss.oracle.com/mailman/listinfo/ocfs2-users
