Hi Andrey,
 
Apart from what Manuel has mentioned, you might also want to look at
'FaultOnMonitorTimeouts' and 'MonitorTimeout' attributes. These are
type-level attributes (hatype -display Oracle).
 
MonitorTimeout is the number of seconds the monitor is allowed to run
(or hang) before the agent kills it.
 
If 'FaultOnMonitorTimeouts' number of monitors timeout consecutively,
agent will call clean and forecefully fault the resource. The message
you are seeing (2-13026) means that the monitor timedout (ran for too
long) 4 consecutive times and at the 5th attempt, it ran and completed
successfully before 'MonitorTimeout' seconds elapsed.
 
More in User's Guide.
 
HTH,
Anand

________________________________

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Manuel
Braun
Sent: Thursday, April 10, 2008 10:19 AM
To: Andrey Dmitriev; [email protected]
Subject: Re: [Veritas-ha] Tweaking monitors


Hi Andrey,
 
are you looking for "ToleranceLimit"?
 
Here is an excerpt from the VCS user guide:
 
"About the ToleranceLimit attribute
The ToleranceLimit attribute defines the number of times the Monitor
routine should return an offline status before declaring a resource
offline. This attribute is typically used when a resource is busy and
appears to be offline. Setting the attribute to a non-zero value
instructs VCS to allow multiple failing monitor cycles with the
expectation that the resource will eventually respond. Setting a
non-zero ToleranceLimit also extends the time required to respond to an
actual fault."
 
Regards
 
Manuel

________________________________

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Andrey
Dmitriev
Sent: Donnerstag, 10. April 2008 18:21
To: [email protected]
Subject: [Veritas-ha] Tweaking monitors


All, 
The cluster keeps 'spawning' listeners (oracle) under high load, as it
detects it offline (not true) 

2008/04/10 09:59:31 VCS INFO V-16-2-13026 (pluto)
Resource(base_listener) - monitor procedure finished successfully after
failing to complete within the expected time for (4) consecutive times.

I'd like it not to declare the resource as faulted until X monitor
failures as well as increase timeout, but I don't remember where to
change those things. Relevant configs attached. 
Could someone either point me into the right direction, or tell me a
param off the top of their head to look into? 
# hares -display base_listener
#Resource     Attribute        System     Value
base_listener Group            global     pluto_gp
base_listener Type             global     Netlsnr
base_listener AutoStart        global     1
base_listener Critical         global     1
base_listener Enabled          global     1
base_listener LastOnline       global     pluto
base_listener MonitorOnly      global     0
base_listener ResourceOwner    global     unknown
base_listener TriggerEvent     global     0
base_listener ArgListValues    mars       oracle
/u81/app/oracle/product/102
/u81/app/oracle/product/102/network/admin    base_listener    ""
./bin/Netlsnr/LsnrTest.pl       ""      0       ""
base_listener ArgListValues    pluto      oracle
/u81/app/oracle/product/102
/u81/app/oracle/product/102/network/admin    base_listener    ""
./bin/Netlsnr/LsnrTest.pl       ""      0       ""
base_listener ArgListValues    sun        oracle
/u81/app/oracle/product/102
/u81/app/oracle/product/102/network/admin    base_listener    ""
./bin/Netlsnr/LsnrTest.pl       ""      0       ""
base_listener ConfidenceLevel  mars       0
base_listener ConfidenceLevel  pluto      100
base_listener ConfidenceLevel  sun        0
base_listener Flags            mars
base_listener Flags            pluto
base_listener Flags            sun
base_listener IState           mars       not waiting
base_listener IState           pluto      not waiting
base_listener IState           sun        not waiting
base_listener Probed           mars       1
base_listener Probed           pluto      1
base_listener Probed           sun        1
base_listener Start            mars       0
base_listener Start            pluto      1
base_listener Start            sun        0
base_listener State            mars       OFFLINE
base_listener State            pluto      ONLINE
base_listener State            sun        OFFLINE
base_listener AgentDebug       global     0
base_listener ComputeStats     global     0
base_listener Encoding         global
base_listener EnvFile          global
base_listener Home             global     /u81/app/oracle/product/102
base_listener Listener         global     base_listener
base_listener LsnrPwd          global
base_listener MonScript        global     ./bin/Netlsnr/LsnrTest.pl
base_listener Owner            global     oracle
base_listener ResourceInfo     global     State Stale   Msg
TS
base_listener TnsAdmin         global
/u81/app/oracle/product/102/network/admin
base_listener MonitorTimeStats mars       Avg   0       TS
base_listener MonitorTimeStats pluto      Avg   0       TS
base_listener MonitorTimeStats sun        Avg   0       TS

 OracleTypes.cf
type Netlsnr (
        static keylist SupportedActions = { VRTS_GetInstanceName,
VRTS_GetRunningServices }
        static int OnlineRetryLimit = 2
        static int RestartLimit = 2
        static str ArgList[] = { Owner, Home, TnsAdmin, Listener,
EnvFile, MonScript, LsnrPwd, AgentDebug, Encoding }
        str Owner
        str Home
        str TnsAdmin
        str Listener
        str EnvFile
        str MonScript = "./bin/Netlsnr/LsnrTest.pl"
        str LsnrPwd
        boolean AgentDebug = 0
        str Encoding
) 
_______________________________________________
Veritas-ha maillist  -  [email protected]
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Reply via email to