> 
> You mean there was an immnd crash ?
> Resurrect only deals with crashes of hte local immnd.
>

'Like', an immnd crash! In this case, the IMMND was restarted by the 'restart' 
AMF admin operation of the IMMND 'component'.
The general use case is an admin restart of a standby or payload IMMND 
triggered from any node in the cluster.
The example/particular case in this mail thread is about the OM client from a 
standby controller is invoking the 'admin restart' command of the IMMND on the 
same standby controller!
-Mathi.
 
> /AndersBj
> 
> -----Original Message-----
> From: Mathivanan Naickan Palanivelu [mailto:[email protected]]
> Sent: den 18 juli 2013 16:38
> To: Anders Björnerstedt; Reddy Neelakanta Reddy Peddavandla; Praveen
> Malviya
> Cc: [email protected]
> Subject: RE: [devel] Early patch for #501 for review/testing (was Re: #501
> amf: No node directors register to AMF within time after "#7 cleanup instead
> of terminate used at component restart")
> 
> But, the OM client handle would have got resurrected here and that must be
> the reason why the imm-adm client is waiting/blocked until it eventually
> times out!?
> -Mathi.
> 
> 
> > -----Original Message-----
> > From: Anders Björnerstedt [mailto:[email protected]]
> > Sent: Thursday, July 18, 2013 6:03 PM
> > To: Mathivanan Naickan Palanivelu; Reddy Neelakanta Reddy Peddavandla;
> > Praveen Malviya
> > Cc: [email protected]
> > Subject: RE: [devel] Early patch for #501 for review/testing (was Re:
> > #501
> > amf: No node directors register to AMF within time after "#7 cleanup
> > instead of terminate used at component restart")
> >
> > Its not as simple as that.
> > In this case the invoking om-client has moved.
> > Thus the reply is to be sent to a different om-handle (as seen by the
> immsv).
> >
> > /AndersBj
> >
> > -----Original Message-----
> > From: Mathivanan Naickan Palanivelu [mailto:[email protected]]
> > Sent: den 18 juli 2013 14:36
> > To: Anders Björnerstedt; Reddy Neelakanta Reddy Peddavandla; Praveen
> > Malviya
> > Cc: [email protected]
> > Subject: RE: [devel] Early patch for #501 for review/testing (was Re:
> > #501
> > amf: No node directors register to AMF within time after "#7 cleanup
> > instead of terminate used at component restart")
> >
> > Can't IMMSv subscribe for IMMND dests?
> > Thanks,
> > Mathi.
> >
> > > -----Original Message-----
> > > From: Anders Björnerstedt [mailto:[email protected]]
> > > Sent: Thursday, July 18, 2013 5:30 PM
> > > To: Neelakanta Reddy; praveen malviya
> > > Cc: [email protected]
> > > Subject: Re: [devel] Early patch for #501 for review/testing (was Re:
> > > #501
> > > amf: No node directors register to AMF within time after "#7 cleanup
> > > instead of terminate used at component restart")
> > >
> > > Sounds like you could have needed to use the "continuationId"
> > > parameter to saImmOmAdminOperationInvoke().
> > > Unfortunately this A.2.1 feature is not yet implemented in the immsv.
> > >
> > > https://sourceforge.net/p/opensaf/tickets/51/
> > >
> > >
> > > /AndersBj
> > >
> > >
> > > -----Original Message-----
> > > From: Neelakanta Reddy [mailto:[email protected]]
> > > Sent: den 18 juli 2013 13:41
> > > To: praveen malviya
> > > Cc: [email protected]
> > > Subject: Re: [devel] Early patch for #501 for review/testing (was Re:
> > > #501
> > > amf: No node directors register to AMF within time after "#7 cleanup
> > > instead of terminate used at component restart")
> > >
> > > HI Mathi/Praveen,
> > >
> > > I misunderstood, the flow of admin operation related to the component.
> > >
> > > After analyzing the logs the following is the reason why the reply
> > > can not be
> > > sent:
> > >
> > > The admin operation,to terminate IMMND is called at standby. The
> > > implementer is the active amfd.
> > >
> > > The active amfd sends the admin operation result to local active
> > > IMMND, active IMMND tries to send the result to the IMMND(standby)
> > > where the admin operation is called, the mds adest that is stored in
> > > the active IMMND is the adest of the old IMMND(standby).
> > >
> > > Because of this the following error message will come at the active
> > > controller:
> > >
> > > ER Problem in sending to peer IMMND over MDS. Discarding admin op
> > reply.
> > >
> > >
> > > Thanks,
> > > Neel.
> > > On Thursday 18 July 2013 04:52 PM, praveen malviya wrote:
> > > > Hi,
> > > > For restart admin on any component AMFD sends admin operation
> > > message
> > > > to corresponding AMFND.
> > > > AMFND will restart the component. When the operation will be in
> > > > progress presence state of the component will transition from
> > > > INSTANTIATED to  RESTARTING and then from RESTARTING to
> > > INSTANTIATED.
> > > > AMFND updates presence state to AMFD whenever it changes,  but
> > AMFD
> > > > will respond to IMM for the completion of operation only when
> > > > component presence state becomes INSTANTIATED.
> > > >
> > > > Thanks,
> > > > Praveen.
> > > > On 17-Jul-13 7:09 PM, Neelakanta Reddy wrote:
> > > >> Hi Mathi,
> > > >>
> > > >> After giving the terminate message to local amnfnd, amfd
> > > >> immediately sends the admin operation result.
> > > >>
> > > >> The amfnd sends the message to the IMMND, the IMMND is
> processing
> > > in
> > > >> the immnd_amf_comp_terminate_callback, which will terminate
> > IMMND.
> > > >> The admin operation result also arrives at local IMMND. since the
> > > >> terminate callback is executed first, the IMMND will not get the
> > > >> chance to execute the admin operation result.
> > > >>
> > > >> The admin operation initiated for terminating immnd will
> > > >> eventually leads to TIMEOUT.
> > > >>
> > > >> Thanks,
> > > >> Neel.
> > > >>
> > > >>
> > > >> On Wednesday 17 July 2013 01:22 PM, Mathivanan Naickan Palanivelu
> > > wrote:
> > > >>> Hi,
> > > >>>
> > > >>> The attached patch works for this ticket. (Note: The
> > > >>> afmterminate callback has to be corrected for directors also,
> > > >>> will do that in a separate patch)
> > > >>>
> > > >>> Please note that when running this test for IMM, the immadm or
> > > >>> amf-adm commands do not return to the command prompt, even
> > > though
> > > >>> the command
> > > >>>
> > > >>> had functionally succeeded, i.e. IMM got successfully restarted.
> > > >>>
> > > >>> I suspect that the reason could be either be that AMF is not
> > > >>> responding the admin-op result to IMM or the result is being
> > > >>> discarded by IMM.
> > > >>>
> > > >>> Neel/Nagendra, could you please confirm whether the
> > > >>> issue(response to admin op) is with IMM or AMF?
> > > >>>
> > > >>> See snapshot below:
> > > >>>
> > > >>> Jul 17 13:08:33 SC-2 osafamfnd[8169]: NO Admin restart requested
> > > >>> for 'safComp=IMMND,safSu=SC-2,safSg=NoRed,safApp=OpenSAF'
> > > >>>
> > > >>> Jul 17 13:08:33 SC-2 osafimmnd[8457]: NO Received AMF component
> > > >>> terminate callback, exiting
> > > >>>
> > > >>> Jul 17 13:08:33 SC-2 osafamfd[8159]: NO Re-initializing with IMM
> > > >>>
> > > >>> Jul 17 13:08:33 SC-2 osafimmnd[8530]: Started
> > > >>>
> > > >>> Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO SERVER STATE:
> > > >>> IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
> > > >>>
> > > >>> Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO SERVER STATE:
> > > >>> IMM_SERVER_CLUSTER_WAITING -->
> > > IMM_SERVER_LOADING_PENDING
> > > >>>
> > > >>> Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO SERVER STATE:
> > > >>> IMM_SERVER_LOADING_PENDING -->
> IMM_SERVER_SYNC_PENDING
> > > >>>
> > > >>> Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO NODE STATE->
> > > >>> IMM_NODE_ISOLATED
> > > >>>
> > > >>> Jul 17 13:08:35 SC-2 osafimmd[8101]: NO Ruling epoch noted as:10
> > > >>> on IMMD standby
> > > >>>
> > > >>> Jul 17 13:08:35 SC-2 osafimmd[8101]: NO IMMND coord at 2010f
> > > >>>
> > > >>> Jul 17 13:08:35 SC-2 osafimmnd[8530]: NO NODE STATE->
> > > >>> IMM_NODE_W_AVAILABLE
> > > >>>
> > > >>> Jul 17 13:08:35 SC-2 osafimmnd[8530]: NO SERVER STATE:
> > > >>> IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT
> > > >>>
> > > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO NODE STATE->
> > > >>> IMM_NODE_FULLY_AVAILABLE 2171
> > > >>>
> > > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO RepositoryInitModeT is
> > > >>> SA_IMM_INIT_FROM_FILE
> > > >>>
> > > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Epoch set to 10 in
> > > >>> ImmModel
> > > >>>
> > > >>> Jul 17 13:08:36 SC-2 immadm: IN Received PROC_STALE_CLIENTS
> > > >>>
> > > >>> Jul 17 13:08:36 SC-2 osafimmd[8101]: NO SBY: New Epoch for IMMND
> > > >>> process at node 2010f old epoch: 9  new epoch:10
> > > >>>
> > > >>> Jul 17 13:08:36 SC-2 osafimmd[8101]: NO IMMND coord at 2010f
> > > >>>
> > > >>> Jul 17 13:08:36 SC-2 osafimmd[8101]: NO SBY: New Epoch for IMMND
> > > >>> process at node 2020f old epoch: 0  new epoch:10
> > > >>>
> > > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Implementer connected:
> > > >>> 33
> > > >>> (MsgQueueService131599) <283, 2020f>
> > > >>>
> > > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO SERVER STATE:
> > > >>> IMM_SERVER_SYNC_CLIENT --> IMM SERVER READY
> > > >>>
> > > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Implementer (applier)
> > > >>> connected: 34 (@safLogService) <511, 2020f>
> > > >>>
> > > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Implementer (applier)
> > > >>> connected: 35 (@safAmfService2020f) <512, 2020f>
> > > >>>
> > > >>> Jul 17 13:08:37 SC-2 osafamfd[8159]: NO Finished re-initializing
> > > >>> with IMM
> > > >>>
> > > >>> Thanks,
> > > >>>
> > > >>> Mathi.
> > > >>>
> > > >>> *From:*Mathi Naickan [mailto:[email protected]]
> > > >>> *Sent:* Tuesday, July 16, 2013 12:36 PM
> > > >>> *To:* [opensaf:tickets]
> > > >>> *Subject:* [opensaf:tickets] Re: #501 amf: No node directors
> > > >>> register to AMF within time after "#7 cleanup instead of
> > > >>> terminate used at component restart"
> > > >>>
> > > >>> I checked the NDs. I think we should remove these sleeps(legacy).
> > > >>>
> > > >>> Also, the exits should be styled like the daemon_exit()s.
> > > >>>
> > > >>> We also need to test such 'exit's from the terminatecallback for
> > > >>> directors as well and consider special classes like NTF where we
> > > >>> ought to
> > > >>>
> > > >>> call the likes of stop_ntfimcn().
> > > >>>
> > > >>> Will get back on this.
> > > >>>
> > > >>> Thanks,
> > > >>>
> > > >>> Mathi.
> > > >>>
> > > >>> From: Praveen [mailto:[email protected]]
> > > >>> Sent: Monday, July 15, 2013 9:35 AM
> > > >>> To: [opensaf:tickets]
> > > >>> Subject: [opensaf:tickets] Re: #501 amf: No node directors
> > > >>> register to AMF within time after "#7 cleanup instead of
> > > >>> terminate used at component restart"
> > > >>>
> > > >>> Can sleep(1) be added before giving response to AMF?
> > > >>>
> > > >>> Thanks
> > > >>> Praveen
> > > >>> On 15-Jul-13 8:10 AM, Nagendra Kumar wrote:
> > > >>>
> > > >>> There is no problem with AMF as amf is running instantiate
> > > >>> script for all the services(cpnd, glnd, mqnd, smfnd).
> > > >>> The problem resides in these services, because it is sleeping
> > > >>> for
> > > >>> 1 seconds after giving amf response in the terminate callback.
> > > >>> Ex:
> > > >>> cpnd_amf_comp_terminate_callback
> > > >>>
> > > >>> saAmfResponse(cb->amf_hdl,  invocation,  saErr);
> > > >>> ncshm_give_hdl(gl_cpnd_cb_hdl); sleep(1); LOG_NO("Received
> AMF
> > > >>> component terminate callback, exiting"); exit(0);
> > > >>>
> > > >>> When instantiate script is executed by amf, since the process is
> > > >>> still up and running(because of sleep of 1 second),
> > > >>> 'start_daemon -p $pidfile $binary $args' becomes ineffective and the
> processes(e.g.
> > > >>> cpnd) doesn't start.
> > > >>>
> > > >>> I tested by removing sleep and all worked as expected.
> > > >>>
> > > >>> So, it is advised in other services to find out why sleep of 1
> > > >>> was introduced and how we can get rid of sleep.
> > > >>>
> > > >>> *_*
> > > >>>
> > > >>> HYPERLINK
> > > >>> "http://sourceforge.net/p/opensaf/tickets/501/"[tickets:#501]
> > > >>> <http://sourceforge.net/p/opensaf/tickets/501/>
> > > >>> http://sourceforge.net/p/opensaf/tickets/501/ amf:
> > > >>> No node directors register to AMF within time after "#7 cleanup
> > > >>> instead of terminate used at component restart"
> > > >>>
> > > >>> Status: unassigned
> > > >>> Created: Thu Jul 11, 2013 07:47 AM UTC by Ingvar Bergström Last
> > > >>> Updated: Thu Jul 11, 2013 07:47 AM UTC
> > > >>> Owner: nobody
> > > >>>
> > > >>> After introduction of patches solving "#7 cleanup instead of
> > > >>> terminate used at component restart", no node directors
> > > >>> registers to AMF within time according to messages log.
> > > >>> I have tried SMFND, CPND, GLND and MQND.
> > > >>>
> > > >>> It seems however that the main routines of the node director
> > > >>> daemons are not started until 10 seconds after the terminate
> > > >>> callback (after the registration timeout).
> > > >>>
> > > >>> It is very easy to see the fault by entering command "amf-adm
> > > >>> restart safComp=xxxND,safSu=SC-
> 1,safSg=NoRed,safApp=OpenSAF"
> > > >>>
> > > >>> *_*
> > > >>>
> > > >>> Sent from sourceforge.net because HYPERLINK
> > > >>> "mailto:[email protected]"opensaf-
> tickets@lists.
> > > >>> sourceforge.net
> > > >>>
> > > >>> is subscribed to
> > > >>> https://sourceforge.net/p/opensaf/tickets/
> > > >>>
> > > >>> To unsubscribe from further messages, a project admin can change
> > > >>> settings at
> https://sourceforge.net/p/opensaf/admin/tickets/options.
> > > >>> Or, if this is a mailing list, you can unsubscribe from the
> > > >>> mailing list.
> > > >>>
> > > >>> *_*
> > > >>>
> > > >>> See everything from the browser to the database with AppDynamics
> > > Get
> > > >>> end-to-end visibility with application monitoring from
> > > >>> AppDynamics Isolate bottlenecks and diagnose root cause in
> seconds.
> > > >>> Start your free trial of AppDynamics Pro today!
> > > >>>
> > >
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg
> > > >>> .clktrk
> > > >>>
> > > >>>
> > > >>> *_*
> > > >>>
> > > >>> Opensaf-tickets mailing list
> > > >>> HYPERLINK
> > > >>> "mailto:[email protected]"Opensaf-
> tickets@lists.
> > > >>> sourceforge.net
> > > >>>
> > > >>> https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
> > > >>>
> > > >>> *_*
> > > >>>
> > > >>> HYPERLINK
> > > >>> "http://sourceforge.net/p/opensaf/tickets/501/"[tickets:#501]
> > > >>> <http://sourceforge.net/p/opensaf/tickets/501/> amf: No node
> > > >>> directors register to AMF within time after "#7 cleanup instead
> > > >>> of terminate used at component restart"
> > > >>>
> > > >>> Status: unassigned
> > > >>> Created: Thu Jul 11, 2013 07:47 AM UTC by Ingvar Bergström Last
> > > >>> Updated: Mon Jul 15, 2013 02:42 AM UTC
> > > >>> Owner: nobody
> > > >>>
> > > >>> After introduction of patches solving "#7 cleanup instead of
> > > >>> terminate used at component restart", no node directors
> > > >>> registers to AMF within time according to messages log.
> > > >>> I have tried SMFND, CPND, GLND and MQND.
> > > >>>
> > > >>> It seems however that the main routines of the node director
> > > >>> daemons are not started until 10 seconds after the terminate
> > > >>> callback (after the registration timeout).
> > > >>>
> > > >>> It is very easy to see the fault by entering command "amf-adm
> > > >>> restart safComp=xxxND,safSu=SC-
> 1,safSg=NoRed,safApp=OpenSAF"
> > > >>>
> > > >>> *_*
> > > >>>
> > > >>> Sent from sourceforge.net because you indicated interest in
> > > >>> https://sourceforge.net/p/opensaf/tickets/501/
> > > >>>
> > > >>> To unsubscribe from further messages, please visit
> > > >>> https://sourceforge.net/auth/subscriptions/
> > > >>>
> > > >>> ----------------------------------------------------------------
> > > >>> --
> > > >>> --
> > > >>> ----
> > > >>>
> > > >>>
> > > >>> *[tickets:#501] <http://sourceforge.net/p/opensaf/tickets/501/>
> amf:
> > > >>> No node directors register to AMF within time after "#7 cleanup
> > > >>> instead of terminate used at component restart"*
> > > >>>
> > > >>> *Status:* unassigned
> > > >>> *Created:* Thu Jul 11, 2013 07:47 AM UTC by Ingvar Bergström
> > > >>> *Last
> > > >>> Updated:* Mon Jul 15, 2013 02:42 AM UTC
> > > >>> *Owner:* nobody
> > > >>>
> > > >>> After introduction of patches solving "#7 cleanup instead of
> > > >>> terminate used at component restart", no node directors
> > > >>> registers to AMF within time according to messages log.
> > > >>> I have tried SMFND, CPND, GLND and MQND.
> > > >>>
> > > >>> It seems however that the main routines of the node director
> > > >>> daemons are not started until 10 seconds after the terminate
> > > >>> callback (after the registration timeout).
> > > >>>
> > > >>> It is very easy to see the fault by entering command "amf-adm
> > > >>> restart safComp=xxxND,safSu=SC-
> 1,safSg=NoRed,safApp=OpenSAF"
> > > >>>
> > > >>> ----------------------------------------------------------------
> > > >>> --
> > > >>> --
> > > >>> ----
> > > >>>
> > > >>>
> > > >>> Sent from sourceforge.net because you indicated interest in
> > > >>> https://sourceforge.net/p/opensaf/tickets/501/
> > > >>>
> > > >>> To unsubscribe from further messages, please visit
> > > >>> https://sourceforge.net/auth/subscriptions/
> > > >>>
> > > >> -----------------------------------------------------------------
> > > >> --
> > > >> --
> > > >> ---------
> > > >>
> > > >> See everything from the browser to the database with AppDynamics
> > > >> Get end-to-end visibility with application monitoring from
> > > >> AppDynamics Isolate bottlenecks and diagnose root cause in seconds.
> > > >> Start your free trial of AppDynamics Pro today!
> > > >>
> > >
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.
> > > >> clktrk
> > > >>
> > > >> _______________________________________________
> > > >> Opensaf-devel mailing list
> > > >> [email protected]
> > > >> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
> > > >
> > >
> > >
> > > --------------------------------------------------------------------
> > > --
> > > -------- See everything from the browser to the database with
> > > AppDynamics Get end-to-end visibility with application monitoring
> > > from AppDynamics Isolate bottlenecks and diagnose root cause in
> seconds.
> > > Start your free trial of AppDynamics Pro today!
> > >
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg
> > > .c
> > > lk
> > > trk
> > > _______________________________________________
> > > Opensaf-devel mailing list
> > > [email protected]
> > > https://lists.sourceforge.net/lists/listinfo/opensaf-devel
> > >
> > > --------------------------------------------------------------------
> > > --
> > > -------- See everything from the browser to the database with
> > > AppDynamics Get end-to-end visibility with application monitoring
> > > from AppDynamics Isolate bottlenecks and diagnose root cause in
> seconds.
> > > Start your free trial of AppDynamics Pro today!
> > >
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg
> > > .c
> > > lk
> > > trk
> > > _______________________________________________
> > > Opensaf-devel mailing list
> > > [email protected]
> > > https://lists.sourceforge.net/lists/listinfo/opensaf-devel

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to