---

** [tickets:#1866] Cluster reset happened because of CLMNA healthcheck timeout 
in headless state**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Wed Jun 08, 2016 10:22 AM UTC by Ritu Raj
**Last Updated:** Wed Jun 08, 2016 10:22 AM UTC
**Owner:** nobody
**Attachments:**

- 
[SCALE_SLOT-71.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1866/attachment/SCALE_SLOT-71.tar.bz2)
 (4.1 MB; application/x-bzip)
- 
[SCALE_SLOT-72.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1866/attachment/SCALE_SLOT-72.tar.bz2)
 (2.1 MB; application/x-bzip)
- 
[SCALE_SLOT-73.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1866/attachment/SCALE_SLOT-73.tar.bz2)
 (655.7 kB; application/x-bzip)
- 
[SCALE_SLOT-76.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1866/attachment/SCALE_SLOT-76.tar.bz2)
 (385.5 kB; application/x-bzip)


setup:
Version - opensaf 5.0.GA
6-Node cluster(SC-1:Active, SC-2:Standby, SC-3:Spare PL:4,PL-5&PL-6: Payloads)


Issue Observed:
Cluster reset happened because of CLMNA healthcheck timeout in headless state


Steps Performed:
(1). Started Opensaf on 6-node cluster with Active, Stanbdy, Spare and 3 
Payloads
(2). Performed Failover operation in order, killing Active Controller first 
followed by Standby and Spare controller.
(3). After few successful failover, CLMNA got crashed because of  healthcheck 
timeout and cluster reset happened.


Jun 16 16:26:18 SCALE_SLOT-76 osafamfnd[5292]: WA saClmInitialize_4 returned 31
Jun 16 16:26:18 SCALE_SLOT-76 osafamfnd[5292]: NO SU failover probation timer 
started (timeout: 1200000000000 ns)
Jun 16 16:26:18 SCALE_SLOT-76 osafamfnd[5292]: NO Performing failover of 
'safSu=PL-6,safSg=NoRed,safApp=OpenSAF' (SU failover count: 1)
Jun 16 16:26:18 SCALE_SLOT-76 osafamfnd[5292]: NO 
'safComp=CLMNA,safSu=PL-6,safSg=NoRed,safApp=OpenSAF' recovery action escalated 
from 'componentFailover' to 'suFailover'
Jun 16 16:26:18 SCALE_SLOT-76 osafamfnd[5292]: NO 
'safComp=CLMNA,safSu=PL-6,safSg=NoRed,safApp=OpenSAF' faulted due to 
'healthCheckcallbackTimeout' : Recovery is 'suFailover'
Jun 16 16:26:18 SCALE_SLOT-76 osafamfnd[5292]: ER 
**safComp=CLMNA,safSu=PL-6,safSg=NoRed,safApp=OpenSAF Faulted due 
to:healthCheckcallbackTimeout Recovery** is:suFailover
Jun 16 16:26:18 SCALE_SLOT-76 osafamfnd[5292]: Rebooting OpenSAF NodeId = 
132623 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 132623, SupervisionTime = 60
Jun 16 16:26:18 SCALE_SLOT-76 osafamfnd[5292]: WA saClmInitialize_4 returned 31


>Syslog of PL-6 and Active, Standby and Spare controllers is attached.
>clmd and amfnd traces of Controller's attached.
> amfnd tace of PL-6 is attached


* The timestamp of PL-6 at which issue observed.
Jun 16 16:26:18 


Note:
There is time gap between all system
SC-1: Wed Jun 15 13:51:39 IST 2016
SC-2: Fri Jun 10 18:51:43 IST 2016
SC-3: Thu Jun 16 19:38:05 IST 2016
PL-6: Thu Jun 16 19:41:51 IST 2016


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to