- **Milestone**: 4.5.2 --> 4.6.2
---
** [tickets:#1464] Cluster reset triggered, after middleware si-swap ( one of
controller in disabled )**
**Status:** unassigned
**Milestone:** 4.6.2
**Created:** Fri Aug 28, 2015 10:00 AM UTC by Srikanth R
**Last Updated:** Fri Aug 28, 2015 10:00 AM UTC
**Owner:** nobody
**Attachments:**
-
[clusterReset.tgz](https://sourceforge.net/p/opensaf/tickets/1464/attachment/clusterReset.tgz)
(5.4 MB; application/x-compressed)
*Setup*
4.7M0 with changeset 6770
4 nodes configured with no PBE configured and 2N application hosted.
SC-1 is active controller and SC-2 is standby controller and both the
controllers are hosting application SUs configured with 2N redundancy model.
*Issues*
Cluster went for reset, for the si-swap operation on middleware. The active
controller is in disabled state, before invoking si-swap operation.
*Steps Performed*
-> Because of faulty application, SC-1 moved to disabled state. NodeAutorepair
feature is disabled for SC-1.
Aug 28 15:03:17 SYSTEST-CNTLR-1 osafamfnd[4650]: NO
'safComp=COMP3SU1TWONAPP,safSu=SU1,safSg=SGONE,safApp=TWONAPP' faulted due to
'csiSetcallbackTimeout' : Recovery is 'nodeFailover'
Aug 28 15:03:17 SYSTEST-CNTLR-1 osafamfd[4640]: NO NodeAutorepair disabled for
'safAmfNode=SC-1,safAmfCluster=myAmfCluster', no reboot ordered
-> Invoked si-swap operation on middleware SI.
-> Standby controller ( SC-2) got rebooted, as implementer set failed with
ERR_EXIST .
Aug 28 15:03:32 SYSTEST-CNTLR-2 osafamfnd[4761]: NO Assigning
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Aug 28 15:03:32 SYSTEST-CNTLR-2 osafntfimcnd[4726]: NO exiting on signal 15
Aug 28 15:03:32 SYSTEST-CNTLR-2 osafimmd[4686]: WA IMMD not re-electing coord
for switch-over (si-swap) coord at (2010f)
Aug 28 15:03:32 SYSTEST-CNTLR-2 osafmsgd[4882]: ER mqd_imm_declare_implementer
failed: err = 14
Aug 28 15:03:32 SYSTEST-CNTLR-2 osaflogd[4707]: ER saImmOiClassImplementerSet
(safLogService) failed: 14
Aug 28 15:03:32 SYSTEST-CNTLR-2 osafckptd[4780]: ER cpd immOiImplmenterSet
failed with err = 14
Aug 28 15:03:32 SYSTEST-CNTLR-2 osafamfnd[4761]: NO
'safComp=CPD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' :
Recovery is 'nodeFailfast'
-> SC-1 also got rebooted, after SC-2 reboot.
Aug 28 15:03:58 SYSTEST-CNTLR-1 osafamfd[4640]: NO Node 'SC-2' left the cluster
Aug 28 15:03:58 SYSTEST-CNTLR-1 osafamfd[4640]: WA State change notification
lost for 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Aug 28 15:03:58 SYSTEST-CNTLR-1 osafamfd[4640]: ER Failed to start cluster
tracking 6
Aug 28 15:03:58 SYSTEST-CNTLR-1 osafamfd[4640]: NO NodeAutorepair disabled for
'safAmfNode=SC-1,safAmfCluster=myAmfCluster', no reboot ordered
Aug 28 15:03:58 SYSTEST-CNTLR-1 opensaf_reboot: Rebooting remote node in the
absence of PLM is outside the scope of OpenSAF
Aug 28 15:04:03 SYSTEST-CNTLR-1 osafclmd[4621]: ER saNtfNotificationSend()
returned: SA_AIS_ERR_TRY_AGAIN (6)
Aug 28 15:04:08 SYSTEST-CNTLR-1 osaflogd[4596]: WA saImmOiRtObjectDelete
returned 5 for safLgStr=TWONLOGSTREAM
Aug 28 15:04:08 SYSTEST-CNTLR-1 osafimmnd[4583]: WA ERR_BAD_HANDLE: Handle use
is blocked by pending reply on syncronous call
Aug 28 15:04:08 SYSTEST-CNTLR-1 osafimmnd[4583]: NO Implementer locally
disconnected. Marking it as doomed 4 <17, 2010f> (safAmfService)
Aug 28 15:04:08 SYSTEST-CNTLR-1 osafamfd[4640]: NO Re-initializing with IMM
Aug 28 15:04:08 SYSTEST-CNTLR-1 osafimmnd[4583]: WA IMMND - Client Node Get
Failed for cli_hdl 73014575375
Aug 28 15:04:08 SYSTEST-CNTLR-1 osafimmnd[4583]: WA Timeout on syncronous admin
operation 1
Aug 28 15:04:13 SYSTEST-CNTLR-1 osafimmnd[4583]: WA ERR_BAD_HANDLE: Handle use
is blocked by pending reply on syncronous call
Aug 28 15:04:13 SYSTEST-CNTLR-1 osafimmnd[4583]: NO Implementer locally
disconnected. Marking it as doomed 3 <12, 2010f> (safClmService)
Aug 28 15:04:13 SYSTEST-CNTLR-1 osafimmnd[4583]: WA IMMND - Client Node Get
Failed for cli_hdl 51539738895
Aug 28 15:04:22 SYSTEST-CNTLR-1 osafclmd[4621]: ER saImmOiImplementerSet failed
rc:6, exiting
Aug 28 15:04:22 SYSTEST-CNTLR-1 osafamfnd[4650]: NO
'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' :
Recovery is 'nodeFailfast'
-> As both the controllers went for reboot, payloads went for reboot.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets