Stop o2cb and switch node number in /etc/cluster/ocfs2.conf. After changing on boh, restart o2cb on both.
[EMAIL PROTECTED] wrote: > > Hi Sunil, > > my lotus notes choked on the table from excel... So the two nodes have > the following nodenumbers: > Node ocfs2 crs/css > byaz05 0 2 > byaz10 1 1 > > Greets, > Alex > > > >In such a situation, ocfs2 fences the higher node number. afaik, > >css does the same. What are the css node numbers for the two nodes? > > _>alexandra.strauss at bayerbbs.com_ > <http://oss.oracle.com/mailman/listinfo/ocfs2-users> wrote: > >>/ > />>/ Hello, > />>/ > />>/ I refer to you hoping you may help me with my problem... We have got > />>/ an issur here and opened a SR at Metalink but until now, we got no > />>/ useful information in solving our problem. SR-Number is > 6855815.994... > />>/ > />>/ We wanted to protect 9i Single-Instance Databases with 10g > Clusterware > />>/ following the third-party-tool approach. There are no RAC-databases > />>/ involved. But we want to achieve high availability as the databases > />>/ are business critical systems. We want to make the systems able to > />>/ relocate to another machine in case of failure to keep downtimes > />>/ low... To achieve this we want to use OCFS2 for the filesystem. > />>/ Relocate is done by script with help of CRS. > />>/ > />>/ So we took two systems (byaz05 and byaz10) and installed the > following > />>/ software: 10g CRS (10.2.0.3) and Oracle Software 9.2.0.8 and > OCFS2 1.2.8 > />>/ > />>/ We found the following Metalinknotes and adjusted the heartbeat and > />>/ timeouts for OCFS2: Metalink Note 395878.1: Heartbeat/Voting/Quorum > />>/ Related Timeout Configuration for Linux, OCFS2, RAC Stack to avoid > />>/ unnessary node fencing, panic and reboot > />>/ Metalink Note 391771.1: OCFS2 - FREQUENTLY ASKED QUESTIONS (hier > />>/ insbesondere der Abschnitt zu Fencing und Quorum) > />>/ Metalink Note 434255.1: Common reasons for OCFS2 Kernel Panic or > />>/ Reboot Issues > />>/ Metalink Note 457423.1: OCFS2 Fencing, Network, and Disk Heartbeat > />>/ Timeout Configuration > />>/ > />>/ We did no changes to the CRS/CSS default settings until now. > />>/ > />>/ During HA-testing we watched unexpected behaviour of the system. We > />>/ deactivated the bond for private interconnect and expected only one > />>/ node to go down. But we faced both nodes going down. As it seems > to me > />>/ one node was rebooted from OCFS2 and the other one from CRS/CSS. > />>/ > />>/ Timestamp > />>/ > -------------------------------------------------------------------------------------------------------------- > > > />>/ > />>/ 10:21:06 bond1 disabled (eth1) > />>/ */var/log/messages byaz05* > />>/ Apr 25 10:21:06 byaz05 kernel: bonding: bond1: link status > definitely > />>/ down for interface eth1, disabling it > />>/ Apr 25 10:21:06 byaz05 kernel: bonding: bond1: making interface eth5 > />>/ the new active one. > />>/ > />>/ 10:21:09 bond1 disabled (eth5) > />>/ */var/log/messages byaz05* > />>/ Apr 25 10:21:09 byaz05 kernel: bonding: bond1: link status > definitely > />>/ down for interface eth5, disabling it > />>/ Apr 25 10:21:09 byaz05 kernel: bonding: bond1: now running > without any > />>/ active interface ! > />>/ > />>/ 10:21:23 o2net – no longer connected > />>/ */var/log/messages byaz05* > />>/ Apr 25 10:21:23 byaz05 kernel: o2net: no longer connected to node > />>/ byaz10.bayer-ag.com (num 1) at 10.190.59.6:7777 > />>/ */var/log/messages byaz10* > />>/ Apr 25 10:21:23 byaz10 kernel: o2net: no longer connected to node > />>/ byaz05.bayer-ag.com (num 0) at 10.190.59.5:7777 > />>/ > />>/ 10:21:27 CSSD failure 134 > />>/ 10:21:29 Reboot initiated by CRS > />>/ */var/log/messages byaz05* > />>/ Apr 25 10:21:27 byaz05 logger: Oracle clsomon failed with fatal > status > />>/ 12. > />>/ Apr 25 10:21:27 byaz05 logger: Oracle CSSD failure 134. > />>/ Apr 25 10:21:27 byaz05 su(pam_unix)[25839]: session closed for user > />>/ oracle > />>/ Apr 25 10:21:27 byaz05 logger: Oracle CRS failure. Rebooting for > />>/ cluster integrity. > />>/ Apr 25 10:21:27 byaz05 kernel: md: stopping all md devices. > />>/ Apr 25 10:21:27 byaz05 kernel: md: md0 switched to read-only mode. > />>/ Apr 25 10:21:29 byaz05 logger: Oracle CRS failure. Rebooting for > />>/ cluster integrity. > />>/ Apr 25 10:21:29 byaz05 kernel: e1000: eth2: e1000_watchdog_task: NIC > />>/ Link is Up 1000 Mbps Full Duplex > />>/ Apr 25 10:21:29 byaz05 logger: Oracle init script ceding reboot to > />>/ sibling 27383. > />/> > />/> 10:21:58 Reboot initiated by OCFS2(?) > />/> */var/log/messages byaz10* > />/> Apr 25 10:21:58 byaz10 su(pam_unix)[4595]: session opened for user > />/> oracle by (uid=0) > />/> Apr 25 10:21:58 byaz10 su(pam_unix)[4595]: session closed for > user oracle > />/> Apr 25 10:25:58 byaz10 syslogd 1.4.1: restart. > />/> Apr 25 10:25:58 byaz10 syslog: syslogd startup succeeded > />/> Apr 25 10:25:58 byaz10 kernel: klogd 1.4.1, log source = /proc/kmsg > />/> started. > />/> Apr 25 10:25:58 byaz10 kernel: Bootdata ok (command line is ro > />/> root=/dev/vgroot/_) > />/> > />/> > />/> We supposed all the time this is a timing problem. But we don't know > />>/ which settings raise the problem and which steps to do to to correct > />/> them. Otherwise we'll have to work over the complete concept for the > />>/ business critical systems. > />>/ Can anyone help me? > />>/ > / > >>/ Regards, > />>/ Alexandra > / > ------------------------------------------------------------------------ > > _______________________________________________ > Ocfs2-users mailing list > [email protected] > http://oss.oracle.com/mailman/listinfo/ocfs2-users _______________________________________________ Ocfs2-users mailing list [email protected] http://oss.oracle.com/mailman/listinfo/ocfs2-users
