Also, this particular case of standby IMMND (component) restart performed by an OM client from the standby controller would be of lesser significance/importance because the SMF(OM client) would originate/perform such tasks from the ACTIVE controller.
We could scope this discussion to the case when such an OM client is run outside SMF or when a campaign that is written such that this admin restart command gets executed from/on that node! -Mathi. > > > > > You mean there was an immnd crash ? > > Resurrect only deals with crashes of hte local immnd. > > > > 'Like', an immnd crash! In this case, the IMMND was restarted by the 'restart' > AMF admin operation of the IMMND 'component'. > The general use case is an admin restart of a standby or payload IMMND > triggered from any node in the cluster. > The example/particular case in this mail thread is about the OM client from a > standby controller is invoking the 'admin restart' command of the IMMND on > the same standby controller! > -Mathi. > > > /AndersBj > > > > -----Original Message----- > > From: Mathivanan Naickan Palanivelu [mailto:[email protected]] > > Sent: den 18 juli 2013 16:38 > > To: Anders Björnerstedt; Reddy Neelakanta Reddy Peddavandla; Praveen > > Malviya > > Cc: [email protected] > > Subject: RE: [devel] Early patch for #501 for review/testing (was Re: > > #501 > > amf: No node directors register to AMF within time after "#7 cleanup > > instead of terminate used at component restart") > > > > But, the OM client handle would have got resurrected here and that > > must be the reason why the imm-adm client is waiting/blocked until it > > eventually times out!? > > -Mathi. > > > > > > > -----Original Message----- > > > From: Anders Björnerstedt [mailto:[email protected]] > > > Sent: Thursday, July 18, 2013 6:03 PM > > > To: Mathivanan Naickan Palanivelu; Reddy Neelakanta Reddy > > > Peddavandla; Praveen Malviya > > > Cc: [email protected] > > > Subject: RE: [devel] Early patch for #501 for review/testing (was Re: > > > #501 > > > amf: No node directors register to AMF within time after "#7 cleanup > > > instead of terminate used at component restart") > > > > > > Its not as simple as that. > > > In this case the invoking om-client has moved. > > > Thus the reply is to be sent to a different om-handle (as seen by > > > the > > immsv). > > > > > > /AndersBj > > > > > > -----Original Message----- > > > From: Mathivanan Naickan Palanivelu > > > [mailto:[email protected]] > > > Sent: den 18 juli 2013 14:36 > > > To: Anders Björnerstedt; Reddy Neelakanta Reddy Peddavandla; Praveen > > > Malviya > > > Cc: [email protected] > > > Subject: RE: [devel] Early patch for #501 for review/testing (was Re: > > > #501 > > > amf: No node directors register to AMF within time after "#7 cleanup > > > instead of terminate used at component restart") > > > > > > Can't IMMSv subscribe for IMMND dests? > > > Thanks, > > > Mathi. > > > > > > > -----Original Message----- > > > > From: Anders Björnerstedt > > > > [mailto:[email protected]] > > > > Sent: Thursday, July 18, 2013 5:30 PM > > > > To: Neelakanta Reddy; praveen malviya > > > > Cc: [email protected] > > > > Subject: Re: [devel] Early patch for #501 for review/testing (was Re: > > > > #501 > > > > amf: No node directors register to AMF within time after "#7 > > > > cleanup instead of terminate used at component restart") > > > > > > > > Sounds like you could have needed to use the "continuationId" > > > > parameter to saImmOmAdminOperationInvoke(). > > > > Unfortunately this A.2.1 feature is not yet implemented in the immsv. > > > > > > > > https://sourceforge.net/p/opensaf/tickets/51/ > > > > > > > > > > > > /AndersBj > > > > > > > > > > > > -----Original Message----- > > > > From: Neelakanta Reddy [mailto:[email protected]] > > > > Sent: den 18 juli 2013 13:41 > > > > To: praveen malviya > > > > Cc: [email protected] > > > > Subject: Re: [devel] Early patch for #501 for review/testing (was Re: > > > > #501 > > > > amf: No node directors register to AMF within time after "#7 > > > > cleanup instead of terminate used at component restart") > > > > > > > > HI Mathi/Praveen, > > > > > > > > I misunderstood, the flow of admin operation related to the > component. > > > > > > > > After analyzing the logs the following is the reason why the reply > > > > can not be > > > > sent: > > > > > > > > The admin operation,to terminate IMMND is called at standby. The > > > > implementer is the active amfd. > > > > > > > > The active amfd sends the admin operation result to local active > > > > IMMND, active IMMND tries to send the result to the IMMND(standby) > > > > where the admin operation is called, the mds adest that is stored > > > > in the active IMMND is the adest of the old IMMND(standby). > > > > > > > > Because of this the following error message will come at the > > > > active > > > > controller: > > > > > > > > ER Problem in sending to peer IMMND over MDS. Discarding admin op > > > reply. > > > > > > > > > > > > Thanks, > > > > Neel. > > > > On Thursday 18 July 2013 04:52 PM, praveen malviya wrote: > > > > > Hi, > > > > > For restart admin on any component AMFD sends admin operation > > > > message > > > > > to corresponding AMFND. > > > > > AMFND will restart the component. When the operation will be in > > > > > progress presence state of the component will transition from > > > > > INSTANTIATED to RESTARTING and then from RESTARTING to > > > > INSTANTIATED. > > > > > AMFND updates presence state to AMFD whenever it changes, but > > > AMFD > > > > > will respond to IMM for the completion of operation only when > > > > > component presence state becomes INSTANTIATED. > > > > > > > > > > Thanks, > > > > > Praveen. > > > > > On 17-Jul-13 7:09 PM, Neelakanta Reddy wrote: > > > > >> Hi Mathi, > > > > >> > > > > >> After giving the terminate message to local amnfnd, amfd > > > > >> immediately sends the admin operation result. > > > > >> > > > > >> The amfnd sends the message to the IMMND, the IMMND is > > processing > > > > in > > > > >> the immnd_amf_comp_terminate_callback, which will terminate > > > IMMND. > > > > >> The admin operation result also arrives at local IMMND. since > > > > >> the terminate callback is executed first, the IMMND will not > > > > >> get the chance to execute the admin operation result. > > > > >> > > > > >> The admin operation initiated for terminating immnd will > > > > >> eventually leads to TIMEOUT. > > > > >> > > > > >> Thanks, > > > > >> Neel. > > > > >> > > > > >> > > > > >> On Wednesday 17 July 2013 01:22 PM, Mathivanan Naickan > > > > >> Palanivelu > > > > wrote: > > > > >>> Hi, > > > > >>> > > > > >>> The attached patch works for this ticket. (Note: The > > > > >>> afmterminate callback has to be corrected for directors also, > > > > >>> will do that in a separate patch) > > > > >>> > > > > >>> Please note that when running this test for IMM, the immadm or > > > > >>> amf-adm commands do not return to the command prompt, even > > > > though > > > > >>> the command > > > > >>> > > > > >>> had functionally succeeded, i.e. IMM got successfully restarted. > > > > >>> > > > > >>> I suspect that the reason could be either be that AMF is not > > > > >>> responding the admin-op result to IMM or the result is being > > > > >>> discarded by IMM. > > > > >>> > > > > >>> Neel/Nagendra, could you please confirm whether the > > > > >>> issue(response to admin op) is with IMM or AMF? > > > > >>> > > > > >>> See snapshot below: > > > > >>> > > > > >>> Jul 17 13:08:33 SC-2 osafamfnd[8169]: NO Admin restart > > > > >>> requested for 'safComp=IMMND,safSu=SC- > 2,safSg=NoRed,safApp=OpenSAF' > > > > >>> > > > > >>> Jul 17 13:08:33 SC-2 osafimmnd[8457]: NO Received AMF > > > > >>> component terminate callback, exiting > > > > >>> > > > > >>> Jul 17 13:08:33 SC-2 osafamfd[8159]: NO Re-initializing with > > > > >>> IMM > > > > >>> > > > > >>> Jul 17 13:08:33 SC-2 osafimmnd[8530]: Started > > > > >>> > > > > >>> Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO SERVER STATE: > > > > >>> IMM_SERVER_ANONYMOUS --> > IMM_SERVER_CLUSTER_WAITING > > > > >>> > > > > >>> Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO SERVER STATE: > > > > >>> IMM_SERVER_CLUSTER_WAITING --> > > > > IMM_SERVER_LOADING_PENDING > > > > >>> > > > > >>> Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO SERVER STATE: > > > > >>> IMM_SERVER_LOADING_PENDING --> > > IMM_SERVER_SYNC_PENDING > > > > >>> > > > > >>> Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO NODE STATE-> > > > > >>> IMM_NODE_ISOLATED > > > > >>> > > > > >>> Jul 17 13:08:35 SC-2 osafimmd[8101]: NO Ruling epoch noted > > > > >>> as:10 on IMMD standby > > > > >>> > > > > >>> Jul 17 13:08:35 SC-2 osafimmd[8101]: NO IMMND coord at 2010f > > > > >>> > > > > >>> Jul 17 13:08:35 SC-2 osafimmnd[8530]: NO NODE STATE-> > > > > >>> IMM_NODE_W_AVAILABLE > > > > >>> > > > > >>> Jul 17 13:08:35 SC-2 osafimmnd[8530]: NO SERVER STATE: > > > > >>> IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT > > > > >>> > > > > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO NODE STATE-> > > > > >>> IMM_NODE_FULLY_AVAILABLE 2171 > > > > >>> > > > > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO RepositoryInitModeT > > > > >>> is SA_IMM_INIT_FROM_FILE > > > > >>> > > > > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Epoch set to 10 in > > > > >>> ImmModel > > > > >>> > > > > >>> Jul 17 13:08:36 SC-2 immadm: IN Received PROC_STALE_CLIENTS > > > > >>> > > > > >>> Jul 17 13:08:36 SC-2 osafimmd[8101]: NO SBY: New Epoch for > > > > >>> IMMND process at node 2010f old epoch: 9 new epoch:10 > > > > >>> > > > > >>> Jul 17 13:08:36 SC-2 osafimmd[8101]: NO IMMND coord at 2010f > > > > >>> > > > > >>> Jul 17 13:08:36 SC-2 osafimmd[8101]: NO SBY: New Epoch for > > > > >>> IMMND process at node 2020f old epoch: 0 new epoch:10 > > > > >>> > > > > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Implementer connected: > > > > >>> 33 > > > > >>> (MsgQueueService131599) <283, 2020f> > > > > >>> > > > > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO SERVER STATE: > > > > >>> IMM_SERVER_SYNC_CLIENT --> IMM SERVER READY > > > > >>> > > > > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Implementer (applier) > > > > >>> connected: 34 (@safLogService) <511, 2020f> > > > > >>> > > > > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Implementer (applier) > > > > >>> connected: 35 (@safAmfService2020f) <512, 2020f> > > > > >>> > > > > >>> Jul 17 13:08:37 SC-2 osafamfd[8159]: NO Finished > > > > >>> re-initializing with IMM > > > > >>> > > > > >>> Thanks, > > > > >>> > > > > >>> Mathi. > > > > >>> > > > > >>> *From:*Mathi Naickan [mailto:[email protected]] > > > > >>> *Sent:* Tuesday, July 16, 2013 12:36 PM > > > > >>> *To:* [opensaf:tickets] > > > > >>> *Subject:* [opensaf:tickets] Re: #501 amf: No node directors > > > > >>> register to AMF within time after "#7 cleanup instead of > > > > >>> terminate used at component restart" > > > > >>> > > > > >>> I checked the NDs. I think we should remove these sleeps(legacy). > > > > >>> > > > > >>> Also, the exits should be styled like the daemon_exit()s. > > > > >>> > > > > >>> We also need to test such 'exit's from the terminatecallback > > > > >>> for directors as well and consider special classes like NTF > > > > >>> where we ought to > > > > >>> > > > > >>> call the likes of stop_ntfimcn(). > > > > >>> > > > > >>> Will get back on this. > > > > >>> > > > > >>> Thanks, > > > > >>> > > > > >>> Mathi. > > > > >>> > > > > >>> From: Praveen [mailto:[email protected]] > > > > >>> Sent: Monday, July 15, 2013 9:35 AM > > > > >>> To: [opensaf:tickets] > > > > >>> Subject: [opensaf:tickets] Re: #501 amf: No node directors > > > > >>> register to AMF within time after "#7 cleanup instead of > > > > >>> terminate used at component restart" > > > > >>> > > > > >>> Can sleep(1) be added before giving response to AMF? > > > > >>> > > > > >>> Thanks > > > > >>> Praveen > > > > >>> On 15-Jul-13 8:10 AM, Nagendra Kumar wrote: > > > > >>> > > > > >>> There is no problem with AMF as amf is running instantiate > > > > >>> script for all the services(cpnd, glnd, mqnd, smfnd). > > > > >>> The problem resides in these services, because it is sleeping > > > > >>> for > > > > >>> 1 seconds after giving amf response in the terminate callback. > > > > >>> Ex: > > > > >>> cpnd_amf_comp_terminate_callback > > > > >>> > > > > >>> saAmfResponse(cb->amf_hdl, invocation, saErr); > > > > >>> ncshm_give_hdl(gl_cpnd_cb_hdl); sleep(1); LOG_NO("Received > > AMF > > > > >>> component terminate callback, exiting"); exit(0); > > > > >>> > > > > >>> When instantiate script is executed by amf, since the process > > > > >>> is still up and running(because of sleep of 1 second), > > > > >>> 'start_daemon -p $pidfile $binary $args' becomes ineffective > > > > >>> and the > > processes(e.g. > > > > >>> cpnd) doesn't start. > > > > >>> > > > > >>> I tested by removing sleep and all worked as expected. > > > > >>> > > > > >>> So, it is advised in other services to find out why sleep of 1 > > > > >>> was introduced and how we can get rid of sleep. > > > > >>> > > > > >>> *_* > > > > >>> > > > > >>> HYPERLINK > > > > >>> "http://sourceforge.net/p/opensaf/tickets/501/"[tickets:#501] > > > > >>> <http://sourceforge.net/p/opensaf/tickets/501/> > > > > >>> http://sourceforge.net/p/opensaf/tickets/501/ amf: > > > > >>> No node directors register to AMF within time after "#7 > > > > >>> cleanup instead of terminate used at component restart" > > > > >>> > > > > >>> Status: unassigned > > > > >>> Created: Thu Jul 11, 2013 07:47 AM UTC by Ingvar Bergström > > > > >>> Last > > > > >>> Updated: Thu Jul 11, 2013 07:47 AM UTC > > > > >>> Owner: nobody > > > > >>> > > > > >>> After introduction of patches solving "#7 cleanup instead of > > > > >>> terminate used at component restart", no node directors > > > > >>> registers to AMF within time according to messages log. > > > > >>> I have tried SMFND, CPND, GLND and MQND. > > > > >>> > > > > >>> It seems however that the main routines of the node director > > > > >>> daemons are not started until 10 seconds after the terminate > > > > >>> callback (after the registration timeout). > > > > >>> > > > > >>> It is very easy to see the fault by entering command "amf-adm > > > > >>> restart safComp=xxxND,safSu=SC- > > 1,safSg=NoRed,safApp=OpenSAF" > > > > >>> > > > > >>> *_* > > > > >>> > > > > >>> Sent from sourceforge.net because HYPERLINK > > > > >>> "mailto:[email protected]"opensaf- > > tickets@lists. > > > > >>> sourceforge.net > > > > >>> > > > > >>> is subscribed to > > > > >>> https://sourceforge.net/p/opensaf/tickets/ > > > > >>> > > > > >>> To unsubscribe from further messages, a project admin can > > > > >>> change settings at > > https://sourceforge.net/p/opensaf/admin/tickets/options. > > > > >>> Or, if this is a mailing list, you can unsubscribe from the > > > > >>> mailing list. > > > > >>> > > > > >>> *_* > > > > >>> > > > > >>> See everything from the browser to the database with > > > > >>> AppDynamics > > > > Get > > > > >>> end-to-end visibility with application monitoring from > > > > >>> AppDynamics Isolate bottlenecks and diagnose root cause in > > seconds. > > > > >>> Start your free trial of AppDynamics Pro today! > > > > >>> > > > > > > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg > > > > >>> .clktrk > > > > >>> > > > > >>> > > > > >>> *_* > > > > >>> > > > > >>> Opensaf-tickets mailing list > > > > >>> HYPERLINK > > > > >>> "mailto:[email protected]"Opensaf- > > tickets@lists. > > > > >>> sourceforge.net > > > > >>> > > > > >>> https://lists.sourceforge.net/lists/listinfo/opensaf-tickets > > > > >>> > > > > >>> *_* > > > > >>> > > > > >>> HYPERLINK > > > > >>> "http://sourceforge.net/p/opensaf/tickets/501/"[tickets:#501] > > > > >>> <http://sourceforge.net/p/opensaf/tickets/501/> amf: No node > > > > >>> directors register to AMF within time after "#7 cleanup > > > > >>> instead of terminate used at component restart" > > > > >>> > > > > >>> Status: unassigned > > > > >>> Created: Thu Jul 11, 2013 07:47 AM UTC by Ingvar Bergström > > > > >>> Last > > > > >>> Updated: Mon Jul 15, 2013 02:42 AM UTC > > > > >>> Owner: nobody > > > > >>> > > > > >>> After introduction of patches solving "#7 cleanup instead of > > > > >>> terminate used at component restart", no node directors > > > > >>> registers to AMF within time according to messages log. > > > > >>> I have tried SMFND, CPND, GLND and MQND. > > > > >>> > > > > >>> It seems however that the main routines of the node director > > > > >>> daemons are not started until 10 seconds after the terminate > > > > >>> callback (after the registration timeout). > > > > >>> > > > > >>> It is very easy to see the fault by entering command "amf-adm > > > > >>> restart safComp=xxxND,safSu=SC- > > 1,safSg=NoRed,safApp=OpenSAF" > > > > >>> > > > > >>> *_* > > > > >>> > > > > >>> Sent from sourceforge.net because you indicated interest in > > > > >>> https://sourceforge.net/p/opensaf/tickets/501/ > > > > >>> > > > > >>> To unsubscribe from further messages, please visit > > > > >>> https://sourceforge.net/auth/subscriptions/ > > > > >>> > > > > >>> -------------------------------------------------------------- > > > > >>> -- > > > > >>> -- > > > > >>> -- > > > > >>> ---- > > > > >>> > > > > >>> > > > > >>> *[tickets:#501] > > > > >>> <http://sourceforge.net/p/opensaf/tickets/501/> > > amf: > > > > >>> No node directors register to AMF within time after "#7 > > > > >>> cleanup instead of terminate used at component restart"* > > > > >>> > > > > >>> *Status:* unassigned > > > > >>> *Created:* Thu Jul 11, 2013 07:47 AM UTC by Ingvar Bergström > > > > >>> *Last > > > > >>> Updated:* Mon Jul 15, 2013 02:42 AM UTC > > > > >>> *Owner:* nobody > > > > >>> > > > > >>> After introduction of patches solving "#7 cleanup instead of > > > > >>> terminate used at component restart", no node directors > > > > >>> registers to AMF within time according to messages log. > > > > >>> I have tried SMFND, CPND, GLND and MQND. > > > > >>> > > > > >>> It seems however that the main routines of the node director > > > > >>> daemons are not started until 10 seconds after the terminate > > > > >>> callback (after the registration timeout). > > > > >>> > > > > >>> It is very easy to see the fault by entering command "amf-adm > > > > >>> restart safComp=xxxND,safSu=SC- > > 1,safSg=NoRed,safApp=OpenSAF" > > > > >>> > > > > >>> -------------------------------------------------------------- > > > > >>> -- > > > > >>> -- > > > > >>> -- > > > > >>> ---- > > > > >>> > > > > >>> > > > > >>> Sent from sourceforge.net because you indicated interest in > > > > >>> https://sourceforge.net/p/opensaf/tickets/501/ > > > > >>> > > > > >>> To unsubscribe from further messages, please visit > > > > >>> https://sourceforge.net/auth/subscriptions/ > > > > >>> > > > > >> --------------------------------------------------------------- > > > > >> -- > > > > >> -- > > > > >> -- > > > > >> --------- > > > > >> > > > > >> See everything from the browser to the database with > > > > >> AppDynamics Get end-to-end visibility with application > > > > >> monitoring from AppDynamics Isolate bottlenecks and diagnose root > cause in seconds. > > > > >> Start your free trial of AppDynamics Pro today! > > > > >> > > > > > > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg. > > > > >> clktrk > > > > >> > > > > >> _______________________________________________ > > > > >> Opensaf-devel mailing list > > > > >> [email protected] > > > > >> https://lists.sourceforge.net/lists/listinfo/opensaf-devel > > > > > > > > > > > > > > > > > ------------------------------------------------------------------ > > > > -- > > > > -- > > > > -------- See everything from the browser to the database with > > > > AppDynamics Get end-to-end visibility with application monitoring > > > > from AppDynamics Isolate bottlenecks and diagnose root cause in > > seconds. > > > > Start your free trial of AppDynamics Pro today! > > > > > > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg > > > > .c > > > > lk > > > > trk > > > > _______________________________________________ > > > > Opensaf-devel mailing list > > > > [email protected] > > > > https://lists.sourceforge.net/lists/listinfo/opensaf-devel > > > > > > > > ------------------------------------------------------------------ > > > > -- > > > > -- > > > > -------- See everything from the browser to the database with > > > > AppDynamics Get end-to-end visibility with application monitoring > > > > from AppDynamics Isolate bottlenecks and diagnose root cause in > > seconds. > > > > Start your free trial of AppDynamics Pro today! > > > > > > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg > > > > .c > > > > lk > > > > trk > > > > _______________________________________________ > > > > Opensaf-devel mailing list > > > > [email protected] > > > > https://lists.sourceforge.net/lists/listinfo/opensaf-devel ------------------------------------------------------------------------------ See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
