We recently saw a situation where osaf got stuck in a restart loop.  In this 
case there are two nodes SCM1 and SCM2.  SCM1 experienced a problem 
(unfortunately don’t have any logs for SCM1); SCM2 noticed and tried to make 
its instances active but got the following errors.  The only way to recover was 
to power-cycle SCM1.

Any insights appreciated.

a) The master went down in an unclean manner, and TIPC did not time out on the 
standby.
b) All processes on the standby went active, but openSAF still seems to think 
there is a co-ordinator at 10f0f (previous controller).

  5522:2014-02-18T11:15:09.427198-08:00 scm2 osafamfnd[2234]: NO Assigning 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SCM2,safSg=2N,safApp=OpenSAF'
   5594:2014-02-18T11:15:09.465620-08:00 scm2 osafntfd[2145]: WA Error when 
logging (6), queue for relogging
   5595:2014-02-18T11:15:09.471155-08:00 scm2 osafntfd[2145]: last message 
repeated 2 times
   5596:2014-02-18T11:15:09.471042-08:00 scm2 osafimmd[2078]: WA IMMD not 
re-electing coord for switch-over (si-swap) coord at (10f0f)
   5598:2014-02-18T11:15:09.479064-08:00 scm2 osafsmfd[2369]: ER 
amf_active_state_handler oi activate FAILED
   5599:2014-02-18T11:15:09.487476-08:00 scm2 osafamfnd[2234]: NO 
'safComp=SMF,safSu=SCM2,safSg=2N,safApp=OpenSAF' faulted due to 
'csiSetcallbackFailed' : Recovery is 'nodeFailfast'
   5600:2014-02-18T11:15:09.487540-08:00 scm2 osafamfnd[2234]: ER 
safComp=SMF,safSu=SCM2,safSg=2N,safApp=OpenSAF Faulted due 
to:csiSetcallbackFailed Recovery is:nodeFailfast
   5601:2014-02-18T11:15:09.487566-08:00 scm2 osafamfnd[2234]: Rebooting 
OpenSAF NodeId = 69647 EE Name = , Reason: Component faulted: recovery is node 
failfast
   5602:2014-02-18T11:15:09.487590-08:00 scm2 osafmsgd[2401]: ER 
mqd_imm_declare_implementer failed: err = 14
   5603:2014-02-18T11:15:09.487611-08:00 scm2 osafckptd[2465]: ER cpd 
immOiImplmenterSet failed with err = 14
   5614:2014-02-18T11:15:09.504500-08:00 scm2 osafmsgd[2401]: ER 
saImmOiImplementerSet failed with return value=14
   5619:2014-02-18T11:15:09.507404-08:00 scm2 osafntfd[2145]: WA Error when 
logging (6), queue for relogging
   5620:2014-02-18T11:15:09.519830-08:00 scm2 osafntfd[2145]: last message 
repeated 2 times
   5621:2014-02-18T11:15:09.507821-08:00 scm2 osafimmnd[2090]: NO Implementer 
disconnected 10 <3, 1100f> (@safLogService)
   5622:2014-02-18T11:15:09.508353-08:00 scm2 osaflckd[2434]: ER 
saImmOiImplementerSet FAILED, rc = 14
   5625:2014-02-18T11:15:09.514932-08:00 scm2 osafntfd[2145]: WA Error when 
logging (6), queue for relogging
   5626:2014-02-18T11:15:09.520078-08:00 scm2 osafntfd[2145]: last message 
repeated 2 times
   5628:2014-02-18T11:15:09.523500-08:00 scm2 osafclmd[2155]: ER 
saImmOiImplementerSet failed rc:14, exiting
   5629:2014-02-18T11:15:09.524049-08:00 scm2 osafevtd[2449]: ER 
saImmOiImplementerSet failed with error: 14



------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to