RE: Can't fire up 2nd ORACM of 2-node Linux RAC

Nick Wagner Fri, 29 Aug 2003 17:03:29 +0000

by any chance do you have an Oracle 7, or 8 instance (or even listener) running on 
that second machine?  I think I've seen something similar to this when we had the 
wrong libraries linked to the database on one of the nodes.


Nick

-----Original Message-----
Sent: Friday, August 29, 2003 10:39 AM
To: Multiple recipients of list ORACLE-L


Hey all,

We're testing the useability of 9iRAC.  Our test hardware is a 2-node Intel
RedHat9 "cluster" (AS2.1 won't recognize our hardware) with OCFS on a shared
SCSI drive, setup using the "How to Build a $1000 RAC" from www.bradmark.com
Following the "Step-By-Step Installation of 9.2.0.4 RAC on Linux" MetaLink
doc (with adjustments for doc inaccuracies), I'm about ready to create the
DB.  However, when I try to fire up the ORACM on the second node, it errors
out.  The OCFS mount point is "/spice" and all seems to be OK with it from
the OS level of either node.  The start of the cm.log is:

oracm, version[ 9.2.0.2.0.47 ] started {Fri Aug 29 09:43:35 2003 }
KernelModuleName is hangcheck-timer {Fri Aug 29 09:43:35 2003 }
OemNodeConfig(): Network Address of node0: 192.168.1.241 (port 9998)
 {Fri Aug 29 09:43:35 2003 }
OemNodeConfig(): Network Address of node1: 192.168.1.242 (port 9998)
 {Fri Aug 29 09:43:35 2003 }
>WARNING:  OemInit2: Opened file(/spice/RAC_quorum.dbf 8), tid = main:16384
file = oem.c, line = 491 {Fri Aug 29 09:43:35 2003 }
Debug Hang : ClusterListener (PID=22194) Registered withwatchdog daemon.
{Fri Aug 29 09:43:37 2003 }
InitializeCM: ModuleName = hangcheck-timer  {Fri Aug 29 09:43:37 2003 }
InitializeCM: Kernel module hangcheck-timer is already loaded {Fri Aug 29
09:43:37 2003 }
Debug Hang : CmConnectListener (PID=22195):Registered with watchdog daemon.
{Fri Aug 29 09:43:37 2003 }
Debug Hang :StartNMMon (PID=22187) Registered with watchdog daemon. {Fri Aug
29 09:43:37 2003 }
CreateLocalEndpoint(): Network Address: 192.168.1.242
 {Fri Aug 29 09:43:37 2003 }
Debug Hang :PollingThread (PID=135159137): Registered with  {Fri Aug 29
09:43:37 2003 }
Debug Hang : DiskPingThread (PID=135159137): Registered with  {Fri Aug 29
09:43:37 2003 }
Debug Hang :SendingThread (PID=135159137): Registered with  {Fri Aug 29
09:43:37 2003 }
--- DUMP GROUP STATE DB ---
--- END OF GROUP STATE DUMP ---

All looks OK there.  At least it looks the same as the first node that was
successful in starting.  The trace part is long and wouldn't look nice here,
but here's the end of it (hopefully the pertinent part):

>TRACE:    SendingThread: Spawned with tid 0x1c008, 0x0., tid = 114696 file
= nmmember.c, line = 511 {Fri Aug 29 09:43:37 2003 }
Debug Hang :SendingThread (PID=135159137): Registered with  {Fri Aug 29
09:43:37 2003 }
>TRACE:    SendingThread (pid=22198, tid=114696): Registered with watchdog
daemon., tid = 114696 file = nmmember.c, line = 576 {Fri Aug 29 09:43:37
2003 }
>TRACE:    HandleJoin(): src[1] dest[1] dom[0] seq[1] sync[0], tid =
ClusterListener:49156 file = nmlisten.c, line = 346 {Fri Aug 29 09:43:37
2003 }
>TRACE:    HandleJoin(): JOIN from node(1)->(1), tid = ClusterListener:49156
file = nmlisten.c, line = 362 {Fri Aug 29 09:43:37 2003 }
>TRACE:    HandleSync(): src[0] dest[1] dom[0] seq[6] sync[1], tid =
ClusterListener:49156 file = nmlisten.c, line = 506 {Fri Aug 29 09:43:37
2003 }
>TRACE:    SendAck(): node(0) domain(0) syncSeqNo(1) type(11), tid =
ClusterListener:49156 file = nmmember.c, line = 1913 {Fri Aug 29 09:43:37
2003 }
>TRACE:    HandleVote(): src[0] dest[1] dom[0] seq[7] sync[1], tid =
ClusterListener:49156 file = nmlisten.c, line = 643 {Fri Aug 29 09:43:38
2003 }
>TRACE:    SendVoteInfo(): node(0) domain(0) syncSeqNo(1), tid =
ClusterListener:49156 file = nmmember.c, line = 1727 {Fri Aug 29 09:43:38
2003 }
>TRACE:    HandleShutdown(): src[0] dest[1] dom[0] seq[0] sync[1] type[4],
tid = ClusterListener:49156 file = nmlisten.c, line = 1087 {Fri Aug 29
09:43:39 2003 }
>TRACE:    IncrementEventValue: *(80f2900) = (1, 1), tid =
ClusterListener:49156 file = unixinc.c, line = 253 {Fri Aug 29 09:43:39 2003
}
--- End Dump ---

There's no ERROR or WARNING listed in the trace part.  Hmmmm.  Also, here's
my cmcfg.ora:

HeartBeat=15000
KernelModuleName=hangcheck-timer
ClusterName=Oracle Cluster Manager, version 9i
PollInterval=1000
MissCount=210
PrivateNodeNames=rac1-private rac2-private 
PublicNodeNames=rac1 rac2 
ServicePort=9998
#WatchdogSafetyMargin=5000
#WatchdogTimerMargin=60000
CmDiskFile=/spice/RAC_quorum.dbf
HostName=rac1

I've installed and verified the hangcheck-timer kernel mod in favor of the
Watchdog timer, as the docs say to do.  I've tried blowing away the shared
quorum file, recreating it with touch, recreating it with dd, and opening up
security on the file and directory to no avail.  The one problem I know I
had was that I had the local node aliased to localhost in my /etc/hosts.
Everything seemed to work, but instead of having a "cluster", I had two
separate nodes sharing a disk.  Once I changed /etc/hosts, I started getting
this problem.  There are a few MetaLink Forum posts just like this, but none
have been resolved (and I just lost my net connection).

Anyone care to take a stab at it?  TIA!
Rich

Rich Jesse                           System/Database Administrator
[EMAIL PROTECTED]                  Quad/Tech Inc, Sussex, WI USA
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Jesse, Rich
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Nick Wagner
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

RE: Can't fire up 2nd ORACM of 2-node Linux RAC

Reply via email to