Hi,

 

The attached patch works for this ticket. (Note: The afmterminate callback has 
to be corrected for directors also, will do that in a separate patch)

Please note that when running this test for IMM, the immadm or amf-adm commands 
do not return to the command prompt, even though the command

had functionally succeeded, i.e. IMM got successfully restarted.

I suspect that the reason could be either be that AMF is not responding the 
admin-op result to IMM or the result is being discarded by IMM.

Neel/Nagendra, could you please confirm whether the issue(response to admin op) 
is with IMM or AMF?

See snapshot below:

 

Jul 17 13:08:33 SC-2 osafamfnd[8169]: NO Admin restart requested for 
'safComp=IMMND,safSu=SC-2,safSg=NoRed,safApp=OpenSAF'

Jul 17 13:08:33 SC-2 osafimmnd[8457]: NO Received AMF component terminate 
callback, exiting

Jul 17 13:08:33 SC-2 osafamfd[8159]: NO Re-initializing with IMM

Jul 17 13:08:33 SC-2 osafimmnd[8530]: Started

Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> 
IMM_SERVER_CLUSTER_WAITING

Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING

Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING

Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO NODE STATE-> IMM_NODE_ISOLATED

Jul 17 13:08:35 SC-2 osafimmd[8101]: NO Ruling epoch noted as:10 on IMMD standby

Jul 17 13:08:35 SC-2 osafimmd[8101]: NO IMMND coord at 2010f

Jul 17 13:08:35 SC-2 osafimmnd[8530]: NO NODE STATE-> IMM_NODE_W_AVAILABLE

Jul 17 13:08:35 SC-2 osafimmnd[8530]: NO SERVER STATE: IMM_SERVER_SYNC_PENDING 
--> IMM_SERVER_SYNC_CLIENT

Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 
2171

Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO RepositoryInitModeT is 
SA_IMM_INIT_FROM_FILE

Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Epoch set to 10 in ImmModel

Jul 17 13:08:36 SC-2 immadm: IN Received PROC_STALE_CLIENTS

Jul 17 13:08:36 SC-2 osafimmd[8101]: NO SBY: New Epoch for IMMND process at 
node 2010f old epoch: 9  new epoch:10

Jul 17 13:08:36 SC-2 osafimmd[8101]: NO IMMND coord at 2010f

Jul 17 13:08:36 SC-2 osafimmd[8101]: NO SBY: New Epoch for IMMND process at 
node 2020f old epoch: 0  new epoch:10

Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Implementer connected: 33 
(MsgQueueService131599) <283, 2020f>

Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO SERVER STATE: IMM_SERVER_SYNC_CLIENT 
--> IMM SERVER READY

Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Implementer (applier) connected: 34 
(@safLogService) <511, 2020f>

Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Implementer (applier) connected: 35 
(@safAmfService2020f) <512, 2020f>

Jul 17 13:08:37 SC-2 osafamfd[8159]: NO Finished re-initializing with IMM

 

Thanks,

Mathi.

 

From: Mathi Naickan [mailto:[email protected]] 
Sent: Tuesday, July 16, 2013 12:36 PM
To: [opensaf:tickets] 
Subject: [opensaf:tickets] Re: #501 amf: No node directors register to AMF 
within time after "#7 cleanup instead of terminate used at component restart"

 

I checked the NDs. I think we should remove these sleeps(legacy).

Also, the exits should be styled like the daemon_exit()s.

We also need to test such ‘exit’s from the terminatecallback for directors as 
well and consider special classes like NTF where we ought to

call the likes of stop_ntfimcn().

Will get back on this.

Thanks,

Mathi.

From: Praveen [mailto:[email protected]] 
Sent: Monday, July 15, 2013 9:35 AM
To: [opensaf:tickets] 
Subject: [opensaf:tickets] Re: #501 amf: No node directors register to AMF 
within time after "#7 cleanup instead of terminate used at component restart"

Can sleep(1) be added before giving response to AMF?

Thanks
Praveen
On 15-Jul-13 8:10 AM, Nagendra Kumar wrote:

There is no problem with AMF as amf is running instantiate script for 
all the services(cpnd, glnd, mqnd, smfnd).
The problem resides in these services, because it is sleeping for 1 
seconds after giving amf response in the terminate callback.
Ex:
cpnd_amf_comp_terminate_callback

saAmfResponse(cb->amf_hdl,  invocation,  saErr);
ncshm_give_hdl(gl_cpnd_cb_hdl);
sleep(1);
LOG_NO("Received AMF component terminate callback, exiting");
exit(0);

When instantiate script is executed by amf, since the process is still 
up and running(because of sleep of 1 second), 'start_daemon -p 
$pidfile $binary $args' becomes ineffective and the processes(e.g. 
cpnd) doesn't start.

I tested by removing sleep and all worked as expected.

So, it is advised in other services to find out why sleep of 1 was 
introduced and how we can get rid of sleep.

_  

HYPERLINK "http://sourceforge.net/p/opensaf/tickets/501/"HYPERLINK 
"http://sourceforge.net/p/opensaf/tickets/501/"[tickets:#501] 
http://sourceforge.net/p/opensaf/tickets/501/ amf: 
No node directors register to AMF within time after "#7 cleanup 
instead of terminate used at component restart"

Status: unassigned
Created: Thu Jul 11, 2013 07:47 AM UTC by Ingvar Bergström
Last Updated: Thu Jul 11, 2013 07:47 AM UTC
Owner: nobody

After introduction of patches solving "#7 cleanup instead of terminate 
used at component restart", no node directors registers to AMF within 
time according to messages log.
I have tried SMFND, CPND, GLND and MQND.

It seems however that the main routines of the node director daemons 
are not started until 10 seconds after the terminate callback (after 
the registration timeout).

It is very easy to see the fault by entering command "amf-adm restart 
safComp=xxxND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF"

_  

Sent from sourceforge.net because 
HYPERLINK 
"mailto:[email protected]"[email protected]
 is subscribed to 
https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change 
settings at https://sourceforge.net/p/opensaf/admin/tickets/options. 
Or, if this is a mailing list, you can unsubscribe from the mailing list.

_  

See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk

_  

Opensaf-tickets mailing list
HYPERLINK 
"mailto:[email protected]"[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

_  

HYPERLINK "http://sourceforge.net/p/opensaf/tickets/501/"HYPERLINK 
"http://sourceforge.net/p/opensaf/tickets/501/"[tickets:#501] amf: No node 
directors register to AMF within time after "#7 cleanup instead of terminate 
used at component restart"

Status: unassigned
Created: Thu Jul 11, 2013 07:47 AM UTC by Ingvar Bergström
Last Updated: Mon Jul 15, 2013 02:42 AM UTC
Owner: nobody

After introduction of patches solving "#7 cleanup instead of terminate used at 
component restart”, no node directors registers to AMF within time according to 
messages log.
I have tried SMFND, CPND, GLND and MQND.

It seems however that the main routines of the node director daemons are not 
started until 10 seconds after the terminate callback (after the registration 
timeout).

It is very easy to see the fault by entering command "amf-adm restart 
safComp=xxxND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF"

_  

Sent from sourceforge.net because you indicated interest in 
https://sourceforge.net/p/opensaf/tickets/501/

To unsubscribe from further messages, please visit 
https://sourceforge.net/auth/subscriptions/

  _____  

HYPERLINK "http://sourceforge.net/p/opensaf/tickets/501/"[tickets:#501] amf: No 
node directors register to AMF within time after "#7 cleanup instead of 
terminate used at component restart"

Status: unassigned
Created: Thu Jul 11, 2013 07:47 AM UTC by Ingvar Bergström
Last Updated: Mon Jul 15, 2013 02:42 AM UTC
Owner: nobody

After introduction of patches solving "#7 cleanup instead of terminate used at 
component restart”, no node directors registers to AMF within time according to 
messages log.
I have tried SMFND, CPND, GLND and MQND.

It seems however that the main routines of the node director daemons are not 
started until 10 seconds after the terminate callback (after the registration 
timeout).

It is very easy to see the fault by entering command "amf-adm restart 
safComp=xxxND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF"

  _____  

Sent from sourceforge.net because you indicated interest in 
https://sourceforge.net/p/opensaf/tickets/501/

To unsubscribe from further messages, please visit 
https://sourceforge.net/auth/subscriptions/

Attachment: 501_osaf.patch
Description: Binary data

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to