- **Milestone**: 4.5.2 --> 4.6.2


---

** [tickets:#1464] Cluster reset triggered, after middleware si-swap ( one of 
controller in disabled )**

**Status:** unassigned
**Milestone:** 4.6.2
**Created:** Fri Aug 28, 2015 10:00 AM UTC by Srikanth R
**Last Updated:** Fri Aug 28, 2015 10:00 AM UTC
**Owner:** nobody
**Attachments:**

- 
[clusterReset.tgz](https://sourceforge.net/p/opensaf/tickets/1464/attachment/clusterReset.tgz)
 (5.4 MB; application/x-compressed)


*Setup*
4.7M0 with changeset 6770
4 nodes configured with no PBE configured and 2N application hosted.
SC-1 is active controller and SC-2 is standby controller and both the 
controllers are hosting application SUs configured with 2N redundancy model.

*Issues*

 Cluster went for reset, for the si-swap operation on middleware. The active 
controller is in disabled state, before invoking si-swap operation.
 
 
 *Steps Performed*
 
 -> Because of faulty application, SC-1 moved to disabled state. NodeAutorepair 
feature is disabled for SC-1.
 
 Aug 28 15:03:17 SYSTEST-CNTLR-1 osafamfnd[4650]: NO 
'safComp=COMP3SU1TWONAPP,safSu=SU1,safSg=SGONE,safApp=TWONAPP' faulted due to 
'csiSetcallbackTimeout' : Recovery is 'nodeFailover'
Aug 28 15:03:17 SYSTEST-CNTLR-1 osafamfd[4640]: NO NodeAutorepair disabled for 
'safAmfNode=SC-1,safAmfCluster=myAmfCluster', no reboot ordered

-> Invoked si-swap operation on middleware SI.

-> Standby controller ( SC-2) got rebooted, as implementer set failed with 
ERR_EXIST .


Aug 28 15:03:32 SYSTEST-CNTLR-2 osafamfnd[4761]: NO Assigning 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Aug 28 15:03:32 SYSTEST-CNTLR-2 osafntfimcnd[4726]: NO exiting on signal 15
Aug 28 15:03:32 SYSTEST-CNTLR-2 osafimmd[4686]: WA IMMD not re-electing coord 
for switch-over (si-swap) coord at (2010f)
Aug 28 15:03:32 SYSTEST-CNTLR-2 osafmsgd[4882]: ER mqd_imm_declare_implementer 
failed: err = 14
Aug 28 15:03:32 SYSTEST-CNTLR-2 osaflogd[4707]: ER saImmOiClassImplementerSet 
(safLogService) failed: 14
Aug 28 15:03:32 SYSTEST-CNTLR-2 osafckptd[4780]: ER cpd immOiImplmenterSet 
failed with err = 14
Aug 28 15:03:32 SYSTEST-CNTLR-2 osafamfnd[4761]: NO 
'safComp=CPD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'

 
 -> SC-1 also got rebooted, after SC-2 reboot.
  
Aug 28 15:03:58 SYSTEST-CNTLR-1 osafamfd[4640]: NO Node 'SC-2' left the cluster
Aug 28 15:03:58 SYSTEST-CNTLR-1 osafamfd[4640]: WA State change notification 
lost for 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Aug 28 15:03:58 SYSTEST-CNTLR-1 osafamfd[4640]: ER Failed to start cluster 
tracking 6
Aug 28 15:03:58 SYSTEST-CNTLR-1 osafamfd[4640]: NO NodeAutorepair disabled for 
'safAmfNode=SC-1,safAmfCluster=myAmfCluster', no reboot ordered
Aug 28 15:03:58 SYSTEST-CNTLR-1 opensaf_reboot: Rebooting remote node in the 
absence of PLM is outside the scope of OpenSAF
Aug 28 15:04:03 SYSTEST-CNTLR-1 osafclmd[4621]: ER saNtfNotificationSend() 
returned: SA_AIS_ERR_TRY_AGAIN (6)
Aug 28 15:04:08 SYSTEST-CNTLR-1 osaflogd[4596]: WA saImmOiRtObjectDelete 
returned 5 for safLgStr=TWONLOGSTREAM
Aug 28 15:04:08 SYSTEST-CNTLR-1 osafimmnd[4583]: WA ERR_BAD_HANDLE: Handle use 
is blocked by pending reply on syncronous call
Aug 28 15:04:08 SYSTEST-CNTLR-1 osafimmnd[4583]: NO Implementer locally 
disconnected. Marking it as doomed 4 <17, 2010f> (safAmfService)
Aug 28 15:04:08 SYSTEST-CNTLR-1 osafamfd[4640]: NO Re-initializing with IMM
Aug 28 15:04:08 SYSTEST-CNTLR-1 osafimmnd[4583]: WA IMMND - Client Node Get 
Failed for cli_hdl 73014575375
Aug 28 15:04:08 SYSTEST-CNTLR-1 osafimmnd[4583]: WA Timeout on syncronous admin 
operation 1
Aug 28 15:04:13 SYSTEST-CNTLR-1 osafimmnd[4583]: WA ERR_BAD_HANDLE: Handle use 
is blocked by pending reply on syncronous call
Aug 28 15:04:13 SYSTEST-CNTLR-1 osafimmnd[4583]: NO Implementer locally 
disconnected. Marking it as doomed 3 <12, 2010f> (safClmService)
Aug 28 15:04:13 SYSTEST-CNTLR-1 osafimmnd[4583]: WA IMMND - Client Node Get 
Failed for cli_hdl 51539738895
Aug 28 15:04:22 SYSTEST-CNTLR-1 osafclmd[4621]: ER saImmOiImplementerSet failed 
rc:6, exiting
Aug 28 15:04:22 SYSTEST-CNTLR-1 osafamfnd[4650]: NO 
'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'

-> As both the controllers went for reboot,  payloads went for reboot.


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to