Its not as simple as that. In this case the invoking om-client has moved. Thus the reply is to be sent to a different om-handle (as seen by the immsv).
/AndersBj -----Original Message----- From: Mathivanan Naickan Palanivelu [mailto:[email protected]] Sent: den 18 juli 2013 14:36 To: Anders Björnerstedt; Reddy Neelakanta Reddy Peddavandla; Praveen Malviya Cc: [email protected] Subject: RE: [devel] Early patch for #501 for review/testing (was Re: #501 amf: No node directors register to AMF within time after "#7 cleanup instead of terminate used at component restart") Can't IMMSv subscribe for IMMND dests? Thanks, Mathi. > -----Original Message----- > From: Anders Björnerstedt [mailto:[email protected]] > Sent: Thursday, July 18, 2013 5:30 PM > To: Neelakanta Reddy; praveen malviya > Cc: [email protected] > Subject: Re: [devel] Early patch for #501 for review/testing (was Re: > #501 > amf: No node directors register to AMF within time after "#7 cleanup > instead of terminate used at component restart") > > Sounds like you could have needed to use the "continuationId" > parameter to saImmOmAdminOperationInvoke(). > Unfortunately this A.2.1 feature is not yet implemented in the immsv. > > https://sourceforge.net/p/opensaf/tickets/51/ > > > /AndersBj > > > -----Original Message----- > From: Neelakanta Reddy [mailto:[email protected]] > Sent: den 18 juli 2013 13:41 > To: praveen malviya > Cc: [email protected] > Subject: Re: [devel] Early patch for #501 for review/testing (was Re: > #501 > amf: No node directors register to AMF within time after "#7 cleanup > instead of terminate used at component restart") > > HI Mathi/Praveen, > > I misunderstood, the flow of admin operation related to the component. > > After analyzing the logs the following is the reason why the reply can > not be > sent: > > The admin operation,to terminate IMMND is called at standby. The > implementer is the active amfd. > > The active amfd sends the admin operation result to local active > IMMND, active IMMND tries to send the result to the IMMND(standby) > where the admin operation is called, the mds adest that is stored in > the active IMMND is the adest of the old IMMND(standby). > > Because of this the following error message will come at the active > controller: > > ER Problem in sending to peer IMMND over MDS. Discarding admin op reply. > > > Thanks, > Neel. > On Thursday 18 July 2013 04:52 PM, praveen malviya wrote: > > Hi, > > For restart admin on any component AMFD sends admin operation > message > > to corresponding AMFND. > > AMFND will restart the component. When the operation will be in > > progress presence state of the component will transition from > > INSTANTIATED to RESTARTING and then from RESTARTING to > INSTANTIATED. > > AMFND updates presence state to AMFD whenever it changes, but AMFD > > will respond to IMM for the completion of operation only when > > component presence state becomes INSTANTIATED. > > > > Thanks, > > Praveen. > > On 17-Jul-13 7:09 PM, Neelakanta Reddy wrote: > >> Hi Mathi, > >> > >> After giving the terminate message to local amnfnd, amfd > >> immediately sends the admin operation result. > >> > >> The amfnd sends the message to the IMMND, the IMMND is processing > in > >> the immnd_amf_comp_terminate_callback, which will terminate IMMND. > >> The admin operation result also arrives at local IMMND. since the > >> terminate callback is executed first, the IMMND will not get the > >> chance to execute the admin operation result. > >> > >> The admin operation initiated for terminating immnd will eventually > >> leads to TIMEOUT. > >> > >> Thanks, > >> Neel. > >> > >> > >> On Wednesday 17 July 2013 01:22 PM, Mathivanan Naickan Palanivelu > wrote: > >>> Hi, > >>> > >>> The attached patch works for this ticket. (Note: The afmterminate > >>> callback has to be corrected for directors also, will do that in a > >>> separate patch) > >>> > >>> Please note that when running this test for IMM, the immadm or > >>> amf-adm commands do not return to the command prompt, even > though > >>> the command > >>> > >>> had functionally succeeded, i.e. IMM got successfully restarted. > >>> > >>> I suspect that the reason could be either be that AMF is not > >>> responding the admin-op result to IMM or the result is being > >>> discarded by IMM. > >>> > >>> Neel/Nagendra, could you please confirm whether the issue(response > >>> to admin op) is with IMM or AMF? > >>> > >>> See snapshot below: > >>> > >>> Jul 17 13:08:33 SC-2 osafamfnd[8169]: NO Admin restart requested > >>> for 'safComp=IMMND,safSu=SC-2,safSg=NoRed,safApp=OpenSAF' > >>> > >>> Jul 17 13:08:33 SC-2 osafimmnd[8457]: NO Received AMF component > >>> terminate callback, exiting > >>> > >>> Jul 17 13:08:33 SC-2 osafamfd[8159]: NO Re-initializing with IMM > >>> > >>> Jul 17 13:08:33 SC-2 osafimmnd[8530]: Started > >>> > >>> Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO SERVER STATE: > >>> IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING > >>> > >>> Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO SERVER STATE: > >>> IMM_SERVER_CLUSTER_WAITING --> > IMM_SERVER_LOADING_PENDING > >>> > >>> Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO SERVER STATE: > >>> IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING > >>> > >>> Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO NODE STATE-> > >>> IMM_NODE_ISOLATED > >>> > >>> Jul 17 13:08:35 SC-2 osafimmd[8101]: NO Ruling epoch noted as:10 > >>> on IMMD standby > >>> > >>> Jul 17 13:08:35 SC-2 osafimmd[8101]: NO IMMND coord at 2010f > >>> > >>> Jul 17 13:08:35 SC-2 osafimmnd[8530]: NO NODE STATE-> > >>> IMM_NODE_W_AVAILABLE > >>> > >>> Jul 17 13:08:35 SC-2 osafimmnd[8530]: NO SERVER STATE: > >>> IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT > >>> > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO NODE STATE-> > >>> IMM_NODE_FULLY_AVAILABLE 2171 > >>> > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO RepositoryInitModeT is > >>> SA_IMM_INIT_FROM_FILE > >>> > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Epoch set to 10 in > >>> ImmModel > >>> > >>> Jul 17 13:08:36 SC-2 immadm: IN Received PROC_STALE_CLIENTS > >>> > >>> Jul 17 13:08:36 SC-2 osafimmd[8101]: NO SBY: New Epoch for IMMND > >>> process at node 2010f old epoch: 9 new epoch:10 > >>> > >>> Jul 17 13:08:36 SC-2 osafimmd[8101]: NO IMMND coord at 2010f > >>> > >>> Jul 17 13:08:36 SC-2 osafimmd[8101]: NO SBY: New Epoch for IMMND > >>> process at node 2020f old epoch: 0 new epoch:10 > >>> > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Implementer connected: 33 > >>> (MsgQueueService131599) <283, 2020f> > >>> > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO SERVER STATE: > >>> IMM_SERVER_SYNC_CLIENT --> IMM SERVER READY > >>> > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Implementer (applier) > >>> connected: 34 (@safLogService) <511, 2020f> > >>> > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Implementer (applier) > >>> connected: 35 (@safAmfService2020f) <512, 2020f> > >>> > >>> Jul 17 13:08:37 SC-2 osafamfd[8159]: NO Finished re-initializing > >>> with IMM > >>> > >>> Thanks, > >>> > >>> Mathi. > >>> > >>> *From:*Mathi Naickan [mailto:[email protected]] > >>> *Sent:* Tuesday, July 16, 2013 12:36 PM > >>> *To:* [opensaf:tickets] > >>> *Subject:* [opensaf:tickets] Re: #501 amf: No node directors > >>> register to AMF within time after "#7 cleanup instead of terminate > >>> used at component restart" > >>> > >>> I checked the NDs. I think we should remove these sleeps(legacy). > >>> > >>> Also, the exits should be styled like the daemon_exit()s. > >>> > >>> We also need to test such 'exit's from the terminatecallback for > >>> directors as well and consider special classes like NTF where we > >>> ought to > >>> > >>> call the likes of stop_ntfimcn(). > >>> > >>> Will get back on this. > >>> > >>> Thanks, > >>> > >>> Mathi. > >>> > >>> From: Praveen [mailto:[email protected]] > >>> Sent: Monday, July 15, 2013 9:35 AM > >>> To: [opensaf:tickets] > >>> Subject: [opensaf:tickets] Re: #501 amf: No node directors > >>> register to AMF within time after "#7 cleanup instead of terminate > >>> used at component restart" > >>> > >>> Can sleep(1) be added before giving response to AMF? > >>> > >>> Thanks > >>> Praveen > >>> On 15-Jul-13 8:10 AM, Nagendra Kumar wrote: > >>> > >>> There is no problem with AMF as amf is running instantiate script > >>> for all the services(cpnd, glnd, mqnd, smfnd). > >>> The problem resides in these services, because it is sleeping for > >>> 1 seconds after giving amf response in the terminate callback. > >>> Ex: > >>> cpnd_amf_comp_terminate_callback > >>> > >>> saAmfResponse(cb->amf_hdl, invocation, saErr); > >>> ncshm_give_hdl(gl_cpnd_cb_hdl); sleep(1); LOG_NO("Received AMF > >>> component terminate callback, exiting"); exit(0); > >>> > >>> When instantiate script is executed by amf, since the process is > >>> still up and running(because of sleep of 1 second), 'start_daemon > >>> -p $pidfile $binary $args' becomes ineffective and the processes(e.g. > >>> cpnd) doesn't start. > >>> > >>> I tested by removing sleep and all worked as expected. > >>> > >>> So, it is advised in other services to find out why sleep of 1 was > >>> introduced and how we can get rid of sleep. > >>> > >>> *_* > >>> > >>> HYPERLINK > >>> "http://sourceforge.net/p/opensaf/tickets/501/"[tickets:#501] > >>> <http://sourceforge.net/p/opensaf/tickets/501/> > >>> http://sourceforge.net/p/opensaf/tickets/501/ amf: > >>> No node directors register to AMF within time after "#7 cleanup > >>> instead of terminate used at component restart" > >>> > >>> Status: unassigned > >>> Created: Thu Jul 11, 2013 07:47 AM UTC by Ingvar Bergström Last > >>> Updated: Thu Jul 11, 2013 07:47 AM UTC > >>> Owner: nobody > >>> > >>> After introduction of patches solving "#7 cleanup instead of > >>> terminate used at component restart", no node directors registers > >>> to AMF within time according to messages log. > >>> I have tried SMFND, CPND, GLND and MQND. > >>> > >>> It seems however that the main routines of the node director > >>> daemons are not started until 10 seconds after the terminate > >>> callback (after the registration timeout). > >>> > >>> It is very easy to see the fault by entering command "amf-adm > >>> restart safComp=xxxND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF" > >>> > >>> *_* > >>> > >>> Sent from sourceforge.net because > >>> HYPERLINK > >>> "mailto:[email protected]"opensaf-tickets@lists. > >>> sourceforge.net > >>> > >>> is subscribed to > >>> https://sourceforge.net/p/opensaf/tickets/ > >>> > >>> To unsubscribe from further messages, a project admin can change > >>> settings at https://sourceforge.net/p/opensaf/admin/tickets/options. > >>> Or, if this is a mailing list, you can unsubscribe from the > >>> mailing list. > >>> > >>> *_* > >>> > >>> See everything from the browser to the database with AppDynamics > Get > >>> end-to-end visibility with application monitoring from AppDynamics > >>> Isolate bottlenecks and diagnose root cause in seconds. > >>> Start your free trial of AppDynamics Pro today! > >>> > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg > >>> .clktrk > >>> > >>> > >>> *_* > >>> > >>> Opensaf-tickets mailing list > >>> HYPERLINK > >>> "mailto:[email protected]"Opensaf-tickets@lists. > >>> sourceforge.net > >>> > >>> https://lists.sourceforge.net/lists/listinfo/opensaf-tickets > >>> > >>> *_* > >>> > >>> HYPERLINK > >>> "http://sourceforge.net/p/opensaf/tickets/501/"[tickets:#501] > >>> <http://sourceforge.net/p/opensaf/tickets/501/> amf: No node > >>> directors register to AMF within time after "#7 cleanup instead of > >>> terminate used at component restart" > >>> > >>> Status: unassigned > >>> Created: Thu Jul 11, 2013 07:47 AM UTC by Ingvar Bergström Last > >>> Updated: Mon Jul 15, 2013 02:42 AM UTC > >>> Owner: nobody > >>> > >>> After introduction of patches solving "#7 cleanup instead of > >>> terminate used at component restart", no node directors registers > >>> to AMF within time according to messages log. > >>> I have tried SMFND, CPND, GLND and MQND. > >>> > >>> It seems however that the main routines of the node director > >>> daemons are not started until 10 seconds after the terminate > >>> callback (after the registration timeout). > >>> > >>> It is very easy to see the fault by entering command "amf-adm > >>> restart safComp=xxxND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF" > >>> > >>> *_* > >>> > >>> Sent from sourceforge.net because you indicated interest in > >>> https://sourceforge.net/p/opensaf/tickets/501/ > >>> > >>> To unsubscribe from further messages, please visit > >>> https://sourceforge.net/auth/subscriptions/ > >>> > >>> ------------------------------------------------------------------ > >>> -- > >>> ---- > >>> > >>> > >>> *[tickets:#501] <http://sourceforge.net/p/opensaf/tickets/501/> amf: > >>> No node directors register to AMF within time after "#7 cleanup > >>> instead of terminate used at component restart"* > >>> > >>> *Status:* unassigned > >>> *Created:* Thu Jul 11, 2013 07:47 AM UTC by Ingvar Bergström *Last > >>> Updated:* Mon Jul 15, 2013 02:42 AM UTC > >>> *Owner:* nobody > >>> > >>> After introduction of patches solving "#7 cleanup instead of > >>> terminate used at component restart", no node directors registers > >>> to AMF within time according to messages log. > >>> I have tried SMFND, CPND, GLND and MQND. > >>> > >>> It seems however that the main routines of the node director > >>> daemons are not started until 10 seconds after the terminate > >>> callback (after the registration timeout). > >>> > >>> It is very easy to see the fault by entering command "amf-adm > >>> restart safComp=xxxND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF" > >>> > >>> ------------------------------------------------------------------ > >>> -- > >>> ---- > >>> > >>> > >>> Sent from sourceforge.net because you indicated interest in > >>> https://sourceforge.net/p/opensaf/tickets/501/ > >>> > >>> To unsubscribe from further messages, please visit > >>> https://sourceforge.net/auth/subscriptions/ > >>> > >> ------------------------------------------------------------------- > >> -- > >> --------- > >> > >> See everything from the browser to the database with AppDynamics > >> Get end-to-end visibility with application monitoring from > >> AppDynamics Isolate bottlenecks and diagnose root cause in seconds. > >> Start your free trial of AppDynamics Pro today! > >> > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg. > >> clktrk > >> > >> _______________________________________________ > >> Opensaf-devel mailing list > >> [email protected] > >> https://lists.sourceforge.net/lists/listinfo/opensaf-devel > > > > > ---------------------------------------------------------------------- > -------- See everything from the browser to the database with > AppDynamics Get end-to-end visibility with application monitoring from > AppDynamics Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.c > lk > trk > _______________________________________________ > Opensaf-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/opensaf-devel > > ---------------------------------------------------------------------- > -------- See everything from the browser to the database with > AppDynamics Get end-to-end visibility with application monitoring from > AppDynamics Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.c > lk > trk > _______________________________________________ > Opensaf-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/opensaf-devel ------------------------------------------------------------------------------ See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
