Re: [Veritas-ha] Private communication problem

2008-04-10 Thread Manuel Braun
Hi Shashi,

Why didn't you configured the second required cluster interconnect? 

Regards

Manuel

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Shashi
Kanth Boddula
Sent: Donnerstag, 10. April 2008 14:50
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Private communication problem

I have 2-node VCS 3.5 cluster running on HP-UX 11.11. There is some
problem with private communication which i am not able to understand
what exactly the problem and how to solve it. In each node I have 2 LAN
cards, and i am using LAN1 for private communication.

Node 1

#cat /etc/llttab
set-node 0
set-cluster 9
link lan1 /dev/lan:1 - ether - -

#cat /etc/llthosts
0 solstice
1 express

#lltstat -n
LLT node information:
Node State Links
   * 0 solsticeOPEN2

Node 2
---

#cat /etc/llttab
set-node 1
set-cluster 9
link lan1 /dev/lan:1 - ether - -
#cat /etc/llthosts
0 solstice
1 express
#lltstat -n
LLT node information:
Node State Links
 0 solsticeCONNWAIT1
   * 1 express OPEN2



In each node, for LAN1, i have assigned a private IP to check whether is
there any network problem.

node 1
--
#ifconfig lan1
lan1: flags=843UP,BROADCAST,RUNNING,MULTICAST
inet 172.31.0.1 netmask ff00 broadcast 172.31.0.255

#ping 172.31.0.2
PING 172.31.0.2: 64 byte packets
64 bytes from 172.31.0.2: icmp_seq=0. time=0. ms

node 2

#ifconfig lan1
lan1: flags=1843UP,BROADCAST,RUNNING,MULTICAST,CKO
inet 172.31.0.2 netmask ff00 broadcast 172.31.0.255

#ping 172.31.0.1
PING 172.31.0.1: 64 byte packets
64 bytes from 172.31.0.1: icmp_seq=0. time=0. ms

So the above indicates that there is no network issue.

Node 1


#lltstat -nvv
LLT node information:
Node StateLink  Status  Address
   * 0 solsticeOPEN
lan1   UP  00:30:6E:37:09:85
 1 express CONNWAIT
lan1   DOWN

Node 2
---
#lltstat -nvv
LLT node information:
Node StateLink  Status  Address
 0 solsticeCONNWAIT
lan1   UP  00:30:6E:37:09:85
   * 1 express OPEN
lan1   UP  00:30:6E:49:D6:97


One point to mention is, on first node, the LAN1 card type is Fast
Ethernet, and in the second node, the LAN1 card type is 1000-baseT .
Does this gives any problem ?



___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha


[Veritas-ha] Tweaking monitors

2008-04-10 Thread Andrey Dmitriev
All,
 
The cluster keeps 'spawning' listeners (oracle) under high load, as it
detects it offline (not true)

2008/04/10 09:59:31 VCS INFO V-16-2-13026 (pluto)
Resource(base_listener) - monitor procedure finished successfully after
failing to complete within the expected time for (4) consecutive times.

 
I'd like it not to declare the resource as faulted until X monitor
failures as well as increase timeout, but I don't remember where to
change those things. Relevant configs attached.
 
Could someone either point me into the right direction, or tell me a
param off the top of their head to look into?
 
 
# hares -display base_listener
#Resource AttributeSystem Value
base_listener Groupglobal pluto_gp
base_listener Type global Netlsnr
base_listener AutoStartglobal 1
base_listener Critical global 1
base_listener Enabled  global 1
base_listener LastOnline   global pluto
base_listener MonitorOnly  global 0
base_listener ResourceOwnerglobal unknown
base_listener TriggerEvent global 0
base_listener ArgListValuesmars   oracle
/u81/app/oracle/product/102
/u81/app/oracle/product/102/network/adminbase_listener
./bin/Netlsnr/LsnrTest.pl 0   
base_listener ArgListValuespluto  oracle
/u81/app/oracle/product/102
/u81/app/oracle/product/102/network/adminbase_listener
./bin/Netlsnr/LsnrTest.pl 0   
base_listener ArgListValuessunoracle
/u81/app/oracle/product/102
/u81/app/oracle/product/102/network/adminbase_listener
./bin/Netlsnr/LsnrTest.pl 0   
base_listener ConfidenceLevel  mars   0
base_listener ConfidenceLevel  pluto  100
base_listener ConfidenceLevel  sun0
base_listener Flagsmars
base_listener Flagspluto
base_listener Flagssun
base_listener IState   mars   not waiting
base_listener IState   pluto  not waiting
base_listener IState   sunnot waiting
base_listener Probed   mars   1
base_listener Probed   pluto  1
base_listener Probed   sun1
base_listener Startmars   0
base_listener Startpluto  1
base_listener Startsun0
base_listener Statemars   OFFLINE
base_listener Statepluto  ONLINE
base_listener StatesunOFFLINE
base_listener AgentDebug   global 0
base_listener ComputeStats global 0
base_listener Encoding global
base_listener EnvFile  global
base_listener Home global /u81/app/oracle/product/102
base_listener Listener global base_listener
base_listener LsnrPwd  global
base_listener MonScriptglobal ./bin/Netlsnr/LsnrTest.pl
base_listener Ownerglobal oracle
base_listener ResourceInfo global State Stale   Msg
TS
base_listener TnsAdmin global
/u81/app/oracle/product/102/network/admin
base_listener MonitorTimeStats mars   Avg   0   TS
base_listener MonitorTimeStats pluto  Avg   0   TS
base_listener MonitorTimeStats sunAvg   0   TS

 OracleTypes.cf
type Netlsnr (
static keylist SupportedActions = { VRTS_GetInstanceName,
VRTS_GetRunningServices }
static int OnlineRetryLimit = 2
static int RestartLimit = 2
static str ArgList[] = { Owner, Home, TnsAdmin, Listener,
EnvFile, MonScript, LsnrPwd, AgentDebug, Encoding }
str Owner
str Home
str TnsAdmin
str Listener
str EnvFile
str MonScript = ./bin/Netlsnr/LsnrTest.pl
str LsnrPwd
boolean AgentDebug = 0
str Encoding
)
 
 

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha


Re: [Veritas-ha] Private communication problem

2008-04-10 Thread Manuel Braun
Did you connect both different cards via a cross-over cable?

Regards

Manuel

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Manuel
Braun
Sent: Donnerstag, 10. April 2008 16:13
To: Shashi Kanth Boddula; veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] Private communication problem

Hi Shashi,

Why didn't you configured the second required cluster interconnect? 

Regards

Manuel

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Shashi
Kanth Boddula
Sent: Donnerstag, 10. April 2008 14:50
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Private communication problem

I have 2-node VCS 3.5 cluster running on HP-UX 11.11. There is some
problem with private communication which i am not able to understand
what exactly the problem and how to solve it. In each node I have 2 LAN
cards, and i am using LAN1 for private communication.

Node 1

#cat /etc/llttab
set-node 0
set-cluster 9
link lan1 /dev/lan:1 - ether - -

#cat /etc/llthosts
0 solstice
1 express

#lltstat -n
LLT node information:
Node State Links
   * 0 solsticeOPEN2

Node 2
---

#cat /etc/llttab
set-node 1
set-cluster 9
link lan1 /dev/lan:1 - ether - -
#cat /etc/llthosts
0 solstice
1 express
#lltstat -n
LLT node information:
Node State Links
 0 solsticeCONNWAIT1
   * 1 express OPEN2



In each node, for LAN1, i have assigned a private IP to check whether is
there any network problem.

node 1
--
#ifconfig lan1
lan1: flags=843UP,BROADCAST,RUNNING,MULTICAST
inet 172.31.0.1 netmask ff00 broadcast 172.31.0.255

#ping 172.31.0.2
PING 172.31.0.2: 64 byte packets
64 bytes from 172.31.0.2: icmp_seq=0. time=0. ms

node 2

#ifconfig lan1
lan1: flags=1843UP,BROADCAST,RUNNING,MULTICAST,CKO
inet 172.31.0.2 netmask ff00 broadcast 172.31.0.255

#ping 172.31.0.1
PING 172.31.0.1: 64 byte packets
64 bytes from 172.31.0.1: icmp_seq=0. time=0. ms

So the above indicates that there is no network issue.

Node 1


#lltstat -nvv
LLT node information:
Node StateLink  Status  Address
   * 0 solsticeOPEN
lan1   UP  00:30:6E:37:09:85
 1 express CONNWAIT
lan1   DOWN

Node 2
---
#lltstat -nvv
LLT node information:
Node StateLink  Status  Address
 0 solsticeCONNWAIT
lan1   UP  00:30:6E:37:09:85
   * 1 express OPEN
lan1   UP  00:30:6E:49:D6:97


One point to mention is, on first node, the LAN1 card type is Fast
Ethernet, and in the second node, the LAN1 card type is 1000-baseT .
Does this gives any problem ?



___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha


Re: [Veritas-ha] Tweaking monitors

2008-04-10 Thread Manuel Braun
Hi Andrey,
 
are you looking for ToleranceLimit?
 
Here is an excerpt from the VCS user guide:
 
About the ToleranceLimit attribute
The ToleranceLimit attribute defines the number of times the Monitor
routine should return an offline status before declaring a resource
offline. This attribute is typically used when a resource is busy and
appears to be offline. Setting the attribute to a non-zero value
instructs VCS to allow multiple failing monitor cycles with the
expectation that the resource will eventually respond. Setting a
non-zero ToleranceLimit also extends the time required to respond to an
actual fault.
 
Regards
 
Manuel



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Andrey
Dmitriev
Sent: Donnerstag, 10. April 2008 18:21
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Tweaking monitors


All, 
The cluster keeps 'spawning' listeners (oracle) under high load, as it
detects it offline (not true) 

2008/04/10 09:59:31 VCS INFO V-16-2-13026 (pluto)
Resource(base_listener) - monitor procedure finished successfully after
failing to complete within the expected time for (4) consecutive times.

I'd like it not to declare the resource as faulted until X monitor
failures as well as increase timeout, but I don't remember where to
change those things. Relevant configs attached. 
Could someone either point me into the right direction, or tell me a
param off the top of their head to look into? 
# hares -display base_listener
#Resource AttributeSystem Value
base_listener Groupglobal pluto_gp
base_listener Type global Netlsnr
base_listener AutoStartglobal 1
base_listener Critical global 1
base_listener Enabled  global 1
base_listener LastOnline   global pluto
base_listener MonitorOnly  global 0
base_listener ResourceOwnerglobal unknown
base_listener TriggerEvent global 0
base_listener ArgListValuesmars   oracle
/u81/app/oracle/product/102
/u81/app/oracle/product/102/network/adminbase_listener
./bin/Netlsnr/LsnrTest.pl 0   
base_listener ArgListValuespluto  oracle
/u81/app/oracle/product/102
/u81/app/oracle/product/102/network/adminbase_listener
./bin/Netlsnr/LsnrTest.pl 0   
base_listener ArgListValuessunoracle
/u81/app/oracle/product/102
/u81/app/oracle/product/102/network/adminbase_listener
./bin/Netlsnr/LsnrTest.pl 0   
base_listener ConfidenceLevel  mars   0
base_listener ConfidenceLevel  pluto  100
base_listener ConfidenceLevel  sun0
base_listener Flagsmars
base_listener Flagspluto
base_listener Flagssun
base_listener IState   mars   not waiting
base_listener IState   pluto  not waiting
base_listener IState   sunnot waiting
base_listener Probed   mars   1
base_listener Probed   pluto  1
base_listener Probed   sun1
base_listener Startmars   0
base_listener Startpluto  1
base_listener Startsun0
base_listener Statemars   OFFLINE
base_listener Statepluto  ONLINE
base_listener StatesunOFFLINE
base_listener AgentDebug   global 0
base_listener ComputeStats global 0
base_listener Encoding global
base_listener EnvFile  global
base_listener Home global /u81/app/oracle/product/102
base_listener Listener global base_listener
base_listener LsnrPwd  global
base_listener MonScriptglobal ./bin/Netlsnr/LsnrTest.pl
base_listener Ownerglobal oracle
base_listener ResourceInfo global State Stale   Msg
TS
base_listener TnsAdmin global
/u81/app/oracle/product/102/network/admin
base_listener MonitorTimeStats mars   Avg   0   TS
base_listener MonitorTimeStats pluto  Avg   0   TS
base_listener MonitorTimeStats sunAvg   0   TS

 OracleTypes.cf
type Netlsnr (
static keylist SupportedActions = { VRTS_GetInstanceName,
VRTS_GetRunningServices }
static int OnlineRetryLimit = 2
static int RestartLimit = 2
static str ArgList[] = { Owner, Home, TnsAdmin, Listener,
EnvFile, MonScript, LsnrPwd, AgentDebug, Encoding }
str Owner
str Home
str TnsAdmin
str Listener
str EnvFile
str MonScript = ./bin/Netlsnr/LsnrTest.pl
str LsnrPwd
boolean AgentDebug = 0
str Encoding
) 
___
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha


Re: [Veritas-ha] Tweaking monitors

2008-04-10 Thread Gene Henriksen
FaultOnMonitorTimeout is obviously set to 4 and you are having more than
4 timeouts. That is 4 minutes of non-monitoring. You could increase FOTM
or the monitor interval or both.



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Manuel
Braun
Sent: Thursday, April 10, 2008 1:19 PM
To: Andrey Dmitriev; veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] Tweaking monitors


Hi Andrey,
 
are you looking for ToleranceLimit?
 
Here is an excerpt from the VCS user guide:
 
About the ToleranceLimit attribute
The ToleranceLimit attribute defines the number of times the Monitor
routine should return an offline status before declaring a resource
offline. This attribute is typically used when a resource is busy and
appears to be offline. Setting the attribute to a non-zero value
instructs VCS to allow multiple failing monitor cycles with the
expectation that the resource will eventually respond. Setting a
non-zero ToleranceLimit also extends the time required to respond to an
actual fault.
 
Regards
 
Manuel



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Andrey
Dmitriev
Sent: Donnerstag, 10. April 2008 18:21
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Tweaking monitors


All, 
The cluster keeps 'spawning' listeners (oracle) under high load, as it
detects it offline (not true) 

2008/04/10 09:59:31 VCS INFO V-16-2-13026 (pluto)
Resource(base_listener) - monitor procedure finished successfully after
failing to complete within the expected time for (4) consecutive times.

I'd like it not to declare the resource as faulted until X monitor
failures as well as increase timeout, but I don't remember where to
change those things. Relevant configs attached. 
Could someone either point me into the right direction, or tell me a
param off the top of their head to look into? 
# hares -display base_listener
#Resource AttributeSystem Value
base_listener Groupglobal pluto_gp
base_listener Type global Netlsnr
base_listener AutoStartglobal 1
base_listener Critical global 1
base_listener Enabled  global 1
base_listener LastOnline   global pluto
base_listener MonitorOnly  global 0
base_listener ResourceOwnerglobal unknown
base_listener TriggerEvent global 0
base_listener ArgListValuesmars   oracle
/u81/app/oracle/product/102
/u81/app/oracle/product/102/network/adminbase_listener
./bin/Netlsnr/LsnrTest.pl 0   
base_listener ArgListValuespluto  oracle
/u81/app/oracle/product/102
/u81/app/oracle/product/102/network/adminbase_listener
./bin/Netlsnr/LsnrTest.pl 0   
base_listener ArgListValuessunoracle
/u81/app/oracle/product/102
/u81/app/oracle/product/102/network/adminbase_listener
./bin/Netlsnr/LsnrTest.pl 0   
base_listener ConfidenceLevel  mars   0
base_listener ConfidenceLevel  pluto  100
base_listener ConfidenceLevel  sun0
base_listener Flagsmars
base_listener Flagspluto
base_listener Flagssun
base_listener IState   mars   not waiting
base_listener IState   pluto  not waiting
base_listener IState   sunnot waiting
base_listener Probed   mars   1
base_listener Probed   pluto  1
base_listener Probed   sun1
base_listener Startmars   0
base_listener Startpluto  1
base_listener Startsun0
base_listener Statemars   OFFLINE
base_listener Statepluto  ONLINE
base_listener StatesunOFFLINE
base_listener AgentDebug   global 0
base_listener ComputeStats global 0
base_listener Encoding global
base_listener EnvFile  global
base_listener Home global /u81/app/oracle/product/102
base_listener Listener global base_listener
base_listener LsnrPwd  global
base_listener MonScriptglobal ./bin/Netlsnr/LsnrTest.pl
base_listener Ownerglobal oracle
base_listener ResourceInfo global State Stale   Msg
TS
base_listener TnsAdmin global
/u81/app/oracle/product/102/network/admin
base_listener MonitorTimeStats mars   Avg   0   TS
base_listener MonitorTimeStats pluto  Avg   0   TS
base_listener MonitorTimeStats sunAvg   0   TS

 OracleTypes.cf
type Netlsnr (
static keylist SupportedActions = { VRTS_GetInstanceName,
VRTS_GetRunningServices }
static int OnlineRetryLimit = 2
static int RestartLimit = 2
static str ArgList[] = { Owner, Home, TnsAdmin, Listener,
EnvFile, MonScript, LsnrPwd, AgentDebug, Encoding }
str Owner
str Home
str TnsAdmin
str Listener
str EnvFile
str MonScript = 

Re: [Veritas-ha] Tweaking monitors

2008-04-10 Thread Anand Ganesh
Hi Andrey,
 
Apart from what Manuel has mentioned, you might also want to look at
'FaultOnMonitorTimeouts' and 'MonitorTimeout' attributes. These are
type-level attributes (hatype -display Oracle).
 
MonitorTimeout is the number of seconds the monitor is allowed to run
(or hang) before the agent kills it.
 
If 'FaultOnMonitorTimeouts' number of monitors timeout consecutively,
agent will call clean and forecefully fault the resource. The message
you are seeing (2-13026) means that the monitor timedout (ran for too
long) 4 consecutive times and at the 5th attempt, it ran and completed
successfully before 'MonitorTimeout' seconds elapsed.
 
More in User's Guide.
 
HTH,
Anand



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Manuel
Braun
Sent: Thursday, April 10, 2008 10:19 AM
To: Andrey Dmitriev; veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] Tweaking monitors


Hi Andrey,
 
are you looking for ToleranceLimit?
 
Here is an excerpt from the VCS user guide:
 
About the ToleranceLimit attribute
The ToleranceLimit attribute defines the number of times the Monitor
routine should return an offline status before declaring a resource
offline. This attribute is typically used when a resource is busy and
appears to be offline. Setting the attribute to a non-zero value
instructs VCS to allow multiple failing monitor cycles with the
expectation that the resource will eventually respond. Setting a
non-zero ToleranceLimit also extends the time required to respond to an
actual fault.
 
Regards
 
Manuel



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Andrey
Dmitriev
Sent: Donnerstag, 10. April 2008 18:21
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Tweaking monitors


All, 
The cluster keeps 'spawning' listeners (oracle) under high load, as it
detects it offline (not true) 

2008/04/10 09:59:31 VCS INFO V-16-2-13026 (pluto)
Resource(base_listener) - monitor procedure finished successfully after
failing to complete within the expected time for (4) consecutive times.

I'd like it not to declare the resource as faulted until X monitor
failures as well as increase timeout, but I don't remember where to
change those things. Relevant configs attached. 
Could someone either point me into the right direction, or tell me a
param off the top of their head to look into? 
# hares -display base_listener
#Resource AttributeSystem Value
base_listener Groupglobal pluto_gp
base_listener Type global Netlsnr
base_listener AutoStartglobal 1
base_listener Critical global 1
base_listener Enabled  global 1
base_listener LastOnline   global pluto
base_listener MonitorOnly  global 0
base_listener ResourceOwnerglobal unknown
base_listener TriggerEvent global 0
base_listener ArgListValuesmars   oracle
/u81/app/oracle/product/102
/u81/app/oracle/product/102/network/adminbase_listener
./bin/Netlsnr/LsnrTest.pl 0   
base_listener ArgListValuespluto  oracle
/u81/app/oracle/product/102
/u81/app/oracle/product/102/network/adminbase_listener
./bin/Netlsnr/LsnrTest.pl 0   
base_listener ArgListValuessunoracle
/u81/app/oracle/product/102
/u81/app/oracle/product/102/network/adminbase_listener
./bin/Netlsnr/LsnrTest.pl 0   
base_listener ConfidenceLevel  mars   0
base_listener ConfidenceLevel  pluto  100
base_listener ConfidenceLevel  sun0
base_listener Flagsmars
base_listener Flagspluto
base_listener Flagssun
base_listener IState   mars   not waiting
base_listener IState   pluto  not waiting
base_listener IState   sunnot waiting
base_listener Probed   mars   1
base_listener Probed   pluto  1
base_listener Probed   sun1
base_listener Startmars   0
base_listener Startpluto  1
base_listener Startsun0
base_listener Statemars   OFFLINE
base_listener Statepluto  ONLINE
base_listener StatesunOFFLINE
base_listener AgentDebug   global 0
base_listener ComputeStats global 0
base_listener Encoding global
base_listener EnvFile  global
base_listener Home global /u81/app/oracle/product/102
base_listener Listener global base_listener
base_listener LsnrPwd  global
base_listener MonScriptglobal ./bin/Netlsnr/LsnrTest.pl
base_listener Ownerglobal oracle
base_listener ResourceInfo global State Stale   Msg
TS
base_listener TnsAdmin global
/u81/app/oracle/product/102/network/admin
base_listener MonitorTimeStats mars   Avg   0   TS
base_listener MonitorTimeStats pluto  Avg   0   TS