Hi , Please share the complete syslog of both the controllers.
/Neel. On Wednesday 19 February 2014 07:02 PM, Tony Hart wrote: > We recently saw a situation where osaf got stuck in a restart loop. In this > case there are two nodes SCM1 and SCM2. SCM1 experienced a problem > (unfortunately don’t have any logs for SCM1); SCM2 noticed and tried to make > its instances active but got the following errors. The only way to recover > was to power-cycle SCM1. > > Any insights appreciated. > > a) The master went down in an unclean manner, and TIPC did not time out on > the standby. > b) All processes on the standby went active, but openSAF still seems to think > there is a co-ordinator at 10f0f (previous controller). > > 5522:2014-02-18T11:15:09.427198-08:00 scm2 osafamfnd[2234]: NO Assigning > 'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SCM2,safSg=2N,safApp=OpenSAF' > 5594:2014-02-18T11:15:09.465620-08:00 scm2 osafntfd[2145]: WA Error when > logging (6), queue for relogging > 5595:2014-02-18T11:15:09.471155-08:00 scm2 osafntfd[2145]: last message > repeated 2 times > 5596:2014-02-18T11:15:09.471042-08:00 scm2 osafimmd[2078]: WA IMMD not > re-electing coord for switch-over (si-swap) coord at (10f0f) > 5598:2014-02-18T11:15:09.479064-08:00 scm2 osafsmfd[2369]: ER > amf_active_state_handler oi activate FAILED > 5599:2014-02-18T11:15:09.487476-08:00 scm2 osafamfnd[2234]: NO > 'safComp=SMF,safSu=SCM2,safSg=2N,safApp=OpenSAF' faulted due to > 'csiSetcallbackFailed' : Recovery is 'nodeFailfast' > 5600:2014-02-18T11:15:09.487540-08:00 scm2 osafamfnd[2234]: ER > safComp=SMF,safSu=SCM2,safSg=2N,safApp=OpenSAF Faulted due > to:csiSetcallbackFailed Recovery is:nodeFailfast > 5601:2014-02-18T11:15:09.487566-08:00 scm2 osafamfnd[2234]: Rebooting > OpenSAF NodeId = 69647 EE Name = , Reason: Component faulted: recovery is > node failfast > 5602:2014-02-18T11:15:09.487590-08:00 scm2 osafmsgd[2401]: ER > mqd_imm_declare_implementer failed: err = 14 > 5603:2014-02-18T11:15:09.487611-08:00 scm2 osafckptd[2465]: ER cpd > immOiImplmenterSet failed with err = 14 > 5614:2014-02-18T11:15:09.504500-08:00 scm2 osafmsgd[2401]: ER > saImmOiImplementerSet failed with return value=14 > 5619:2014-02-18T11:15:09.507404-08:00 scm2 osafntfd[2145]: WA Error when > logging (6), queue for relogging > 5620:2014-02-18T11:15:09.519830-08:00 scm2 osafntfd[2145]: last message > repeated 2 times > 5621:2014-02-18T11:15:09.507821-08:00 scm2 osafimmnd[2090]: NO > Implementer disconnected 10 <3, 1100f> (@safLogService) > 5622:2014-02-18T11:15:09.508353-08:00 scm2 osaflckd[2434]: ER > saImmOiImplementerSet FAILED, rc = 14 > 5625:2014-02-18T11:15:09.514932-08:00 scm2 osafntfd[2145]: WA Error when > logging (6), queue for relogging > 5626:2014-02-18T11:15:09.520078-08:00 scm2 osafntfd[2145]: last message > repeated 2 times > 5628:2014-02-18T11:15:09.523500-08:00 scm2 osafclmd[2155]: ER > saImmOiImplementerSet failed rc:14, exiting > 5629:2014-02-18T11:15:09.524049-08:00 scm2 osafevtd[2449]: ER > saImmOiImplementerSet failed with error: 14 > > > > ------------------------------------------------------------------------------ > Managing the Performance of Cloud-Based Applications > Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. > Read the Whitepaper. > http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk > _______________________________________________ > Opensaf-users mailing list > Opensaf-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/opensaf-users ------------------------------------------------------------------------------ Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-users mailing list Opensaf-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-users