We recently saw a situation where osaf got stuck in a restart loop. In this case there are two nodes SCM1 and SCM2. SCM1 experienced a problem (unfortunately don’t have any logs for SCM1); SCM2 noticed and tried to make its instances active but got the following errors. The only way to recover was to power-cycle SCM1.
Any insights appreciated. a) The master went down in an unclean manner, and TIPC did not time out on the standby. b) All processes on the standby went active, but openSAF still seems to think there is a co-ordinator at 10f0f (previous controller). 5522:2014-02-18T11:15:09.427198-08:00 scm2 osafamfnd[2234]: NO Assigning 'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SCM2,safSg=2N,safApp=OpenSAF' 5594:2014-02-18T11:15:09.465620-08:00 scm2 osafntfd[2145]: WA Error when logging (6), queue for relogging 5595:2014-02-18T11:15:09.471155-08:00 scm2 osafntfd[2145]: last message repeated 2 times 5596:2014-02-18T11:15:09.471042-08:00 scm2 osafimmd[2078]: WA IMMD not re-electing coord for switch-over (si-swap) coord at (10f0f) 5598:2014-02-18T11:15:09.479064-08:00 scm2 osafsmfd[2369]: ER amf_active_state_handler oi activate FAILED 5599:2014-02-18T11:15:09.487476-08:00 scm2 osafamfnd[2234]: NO 'safComp=SMF,safSu=SCM2,safSg=2N,safApp=OpenSAF' faulted due to 'csiSetcallbackFailed' : Recovery is 'nodeFailfast' 5600:2014-02-18T11:15:09.487540-08:00 scm2 osafamfnd[2234]: ER safComp=SMF,safSu=SCM2,safSg=2N,safApp=OpenSAF Faulted due to:csiSetcallbackFailed Recovery is:nodeFailfast 5601:2014-02-18T11:15:09.487566-08:00 scm2 osafamfnd[2234]: Rebooting OpenSAF NodeId = 69647 EE Name = , Reason: Component faulted: recovery is node failfast 5602:2014-02-18T11:15:09.487590-08:00 scm2 osafmsgd[2401]: ER mqd_imm_declare_implementer failed: err = 14 5603:2014-02-18T11:15:09.487611-08:00 scm2 osafckptd[2465]: ER cpd immOiImplmenterSet failed with err = 14 5614:2014-02-18T11:15:09.504500-08:00 scm2 osafmsgd[2401]: ER saImmOiImplementerSet failed with return value=14 5619:2014-02-18T11:15:09.507404-08:00 scm2 osafntfd[2145]: WA Error when logging (6), queue for relogging 5620:2014-02-18T11:15:09.519830-08:00 scm2 osafntfd[2145]: last message repeated 2 times 5621:2014-02-18T11:15:09.507821-08:00 scm2 osafimmnd[2090]: NO Implementer disconnected 10 <3, 1100f> (@safLogService) 5622:2014-02-18T11:15:09.508353-08:00 scm2 osaflckd[2434]: ER saImmOiImplementerSet FAILED, rc = 14 5625:2014-02-18T11:15:09.514932-08:00 scm2 osafntfd[2145]: WA Error when logging (6), queue for relogging 5626:2014-02-18T11:15:09.520078-08:00 scm2 osafntfd[2145]: last message repeated 2 times 5628:2014-02-18T11:15:09.523500-08:00 scm2 osafclmd[2155]: ER saImmOiImplementerSet failed rc:14, exiting 5629:2014-02-18T11:15:09.524049-08:00 scm2 osafevtd[2449]: ER saImmOiImplementerSet failed with error: 14 ------------------------------------------------------------------------------ Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
