- **summary**: Opensaf cluster went for reset wihle invoking failover --> Node 
rebooted as saImmOiInitialize_2 failed during middleware active assignment
- Attachments has changed:

Diff:

~~~~

--- old
+++ new
@@ -1,2 +1,3 @@
 SC1_syslog.txt (436.4 kB; text/plain)
 SC2_syslog.txt (425.6 kB; text/plain)
+1529.tgz (586.3 kB; application/x-compressed-tar)

~~~~

- **Comment**:

Similar issue is observed while invoking switchover :

 On the newly promoted Controller SC-1  after some switchovers, imm initialize 
failed with ERR_TIMEOUT and CLMD faulted due to avaDown. 
 
 Oct  9 14:22:16 SOFO-64BIT-S1 osafntfimcnd[30122]: ER ntfimcn_imm_init 
saImmOiInitialize_2 failed SA_AIS_ERR_TIMEOUT (5)
Oct  9 14:22:16 SOFO-64BIT-S1 osafntfimcnd[30122]: ER ntfimcn_imm_init() Fail
Oct  9 14:22:17 SOFO-64BIT-S1 osaflogd[5406]: NO conf_runtime_obj_create: 
Cannot create config runtime object SA_AIS_ERR_TIMEOUT (5)
Oct  9 14:22:17 SOFO-64BIT-S1 osafclmd[5431]: ER saImmOiClassImplementerSet 
failed for class SaClmCluster, rc = 5,
Oct  9 14:22:17 SOFO-64BIT-S1 osafamfnd[5460]: NO 
'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'


The following is the mds.log snippet from the SC-1 at that time.

Oct  9 14:22:16.232286 osafimmnd[5396] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Oct  9 14:22:16.822751 osafntfimcnd[30122] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Oct  9 14:22:16.822899 osafntfimcnd[30122] ERR  |MDS_SND_RCV: Timeout occured 
on sndrsp message
Oct  9 14:22:17.213508 osafclmd[5431] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Oct  9 14:22:17.213625 osafclmd[5431] ERR  |MDS_SND_RCV: Timeout occured on 
sndrsp message
Oct  9 14:22:17.213871 osaflogd[5406] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Oct  9 14:22:17.213949 osaflogd[5406] ERR  |MDS_SND_RCV: Timeout occured on 
sndrsp message

The  quiesced / old active controller got promoted back to active and the 
remaining cluster is fine.



---

** [tickets:#1529] Node rebooted as saImmOiInitialize_2 failed during 
middleware active assignment**

**Status:** unassigned
**Milestone:** 4.5.2
**Created:** Thu Oct 08, 2015 07:53 AM UTC by Chani Srivastava
**Last Updated:** Fri Oct 09, 2015 10:33 AM UTC
**Owner:** nobody
**Attachments:**

- 
[SC1_syslog.txt](https://sourceforge.net/p/opensaf/tickets/1529/attachment/SC1_syslog.txt)
 (436.4 kB; text/plain)
- 
[SC2_syslog.txt](https://sourceforge.net/p/opensaf/tickets/1529/attachment/SC2_syslog.txt)
 (425.6 kB; text/plain)
- 
[1529.tgz](https://sourceforge.net/p/opensaf/tickets/1529/attachment/1529.tgz) 
(586.3 kB; application/x-compressed-tar)


Setup:
Changeset-6901
Invoked continuous failovers on a 4-node Cluster with 2 controllers and 2 
payloads. All nodes have 64bit architecture.
2PBE enabled with 25K objects

Issue Observed:
Cluster reset occurred on invoking continuous failovers

Attachments:
Attaching syslogs for SC-1 and SC-2
Traces for immnd and immd can be shared seperately if required

Steps:
* Initially SC-1 is active and SC-2 standby
* A test script invoked failover via killing osafclmd on SC1
* SC-2 became active

Oct  7 18:23:32 OSAF-SC1 root: killing osafclmd from invoke_failover.sh
Oct  7 19:25:20 OSAF-SC2 osafamfd[2191]: NO FAILOVER StandBy --> Active

* On the new active controler, saImmOiInitialize_2 failed 

Oct  7 19:25:22 OSAF-SC2 osafntfimcnd[2735]: ER ntfimcn_imm_init 
saImmOiInitialize_2 failed SA_AIS_ERR_TIMEOUT (5)
Oct  7 19:25:22 OSAF-SC2 osafntfimcnd[2735]: ER ntfimcn_imm_init() Fail
Oct  7 19:25:22 OSAF-SC2 osafimmnd[2131]: NO Implementer connected: 333 
(safLckService) <299, 2020f>
Oct  7 19:25:22 OSAF-SC2 osafimmnd[2131]: NO Implementer connected: 334 
(safEvtService) <298, 2020f>
Oct  7 19:25:23 OSAF-SC2 osafntfimcnd[2738]: ER ntfimcn_imm_init 
saImmOiInitialize_2 failed SA_AIS_ERR_TIMEOUT (5)
Oct  7 19:25:23 OSAF-SC2 osafntfimcnd[2738]: ER ntfimcn_imm_init() Fail
Oct  7 19:25:23 OSAF-SC2 osafimmnd[2131]: WA MDS Send Failed
Oct  7 19:25:23 OSAF-SC2 osafimmnd[2131]: WA Error code 2 returned for message 
type 4 - ignoring

* Other services also fail to initialize with IMM on new active 
controller..i.e. SC-2

* And finally SMF had csi set timeout
* SC-2 went for reboot and hence the entire cluster reset, as SC-2 is the only 
active controller at the time

Oct  7 19:25:51 OSAF-SC2 osafamfnd[2205]: NO 
'safComp=SMF,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 
'csiSetcallbackTimeout' : Recovery is 'nodeFailfast'
Oct  7 19:25:51 OSAF-SC2 osafamfnd[2205]: ER 
safComp=SMF,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due 
to:csiSetcallbackTimeout Recovery is:nodeFailfast
Oct  7 19:25:51 OSAF-SC2 osafamfnd[2205]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
Oct  7 19:25:51 OSAF-SC2 opensaf_reboot: Rebooting local node; timeout=60




---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to