Hi Andrey, Apart from what Manuel has mentioned, you might also want to look at 'FaultOnMonitorTimeouts' and 'MonitorTimeout' attributes. These are type-level attributes (hatype -display Oracle). MonitorTimeout is the number of seconds the monitor is allowed to run (or hang) before the agent kills it. If 'FaultOnMonitorTimeouts' number of monitors timeout consecutively, agent will call clean and forecefully fault the resource. The message you are seeing (2-13026) means that the monitor timedout (ran for too long) 4 consecutive times and at the 5th attempt, it ran and completed successfully before 'MonitorTimeout' seconds elapsed. More in User's Guide. HTH, Anand
________________________________ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Manuel Braun Sent: Thursday, April 10, 2008 10:19 AM To: Andrey Dmitriev; [email protected] Subject: Re: [Veritas-ha] Tweaking monitors Hi Andrey, are you looking for "ToleranceLimit"? Here is an excerpt from the VCS user guide: "About the ToleranceLimit attribute The ToleranceLimit attribute defines the number of times the Monitor routine should return an offline status before declaring a resource offline. This attribute is typically used when a resource is busy and appears to be offline. Setting the attribute to a non-zero value instructs VCS to allow multiple failing monitor cycles with the expectation that the resource will eventually respond. Setting a non-zero ToleranceLimit also extends the time required to respond to an actual fault." Regards Manuel ________________________________ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andrey Dmitriev Sent: Donnerstag, 10. April 2008 18:21 To: [email protected] Subject: [Veritas-ha] Tweaking monitors All, The cluster keeps 'spawning' listeners (oracle) under high load, as it detects it offline (not true) 2008/04/10 09:59:31 VCS INFO V-16-2-13026 (pluto) Resource(base_listener) - monitor procedure finished successfully after failing to complete within the expected time for (4) consecutive times. I'd like it not to declare the resource as faulted until X monitor failures as well as increase timeout, but I don't remember where to change those things. Relevant configs attached. Could someone either point me into the right direction, or tell me a param off the top of their head to look into? # hares -display base_listener #Resource Attribute System Value base_listener Group global pluto_gp base_listener Type global Netlsnr base_listener AutoStart global 1 base_listener Critical global 1 base_listener Enabled global 1 base_listener LastOnline global pluto base_listener MonitorOnly global 0 base_listener ResourceOwner global unknown base_listener TriggerEvent global 0 base_listener ArgListValues mars oracle /u81/app/oracle/product/102 /u81/app/oracle/product/102/network/admin base_listener "" ./bin/Netlsnr/LsnrTest.pl "" 0 "" base_listener ArgListValues pluto oracle /u81/app/oracle/product/102 /u81/app/oracle/product/102/network/admin base_listener "" ./bin/Netlsnr/LsnrTest.pl "" 0 "" base_listener ArgListValues sun oracle /u81/app/oracle/product/102 /u81/app/oracle/product/102/network/admin base_listener "" ./bin/Netlsnr/LsnrTest.pl "" 0 "" base_listener ConfidenceLevel mars 0 base_listener ConfidenceLevel pluto 100 base_listener ConfidenceLevel sun 0 base_listener Flags mars base_listener Flags pluto base_listener Flags sun base_listener IState mars not waiting base_listener IState pluto not waiting base_listener IState sun not waiting base_listener Probed mars 1 base_listener Probed pluto 1 base_listener Probed sun 1 base_listener Start mars 0 base_listener Start pluto 1 base_listener Start sun 0 base_listener State mars OFFLINE base_listener State pluto ONLINE base_listener State sun OFFLINE base_listener AgentDebug global 0 base_listener ComputeStats global 0 base_listener Encoding global base_listener EnvFile global base_listener Home global /u81/app/oracle/product/102 base_listener Listener global base_listener base_listener LsnrPwd global base_listener MonScript global ./bin/Netlsnr/LsnrTest.pl base_listener Owner global oracle base_listener ResourceInfo global State Stale Msg TS base_listener TnsAdmin global /u81/app/oracle/product/102/network/admin base_listener MonitorTimeStats mars Avg 0 TS base_listener MonitorTimeStats pluto Avg 0 TS base_listener MonitorTimeStats sun Avg 0 TS OracleTypes.cf type Netlsnr ( static keylist SupportedActions = { VRTS_GetInstanceName, VRTS_GetRunningServices } static int OnlineRetryLimit = 2 static int RestartLimit = 2 static str ArgList[] = { Owner, Home, TnsAdmin, Listener, EnvFile, MonScript, LsnrPwd, AgentDebug, Encoding } str Owner str Home str TnsAdmin str Listener str EnvFile str MonScript = "./bin/Netlsnr/LsnrTest.pl" str LsnrPwd boolean AgentDebug = 0 str Encoding )
_______________________________________________ Veritas-ha maillist - [email protected] http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
