Hi Mathi, Ack, code review only. Just a few comments, there is no guarantee that killproc delivering a sigterm to the component will succed, i.e the component may still be running afterwards. When the component exits, mds sends an avaDown but the component has already been removed from the cb->compdb, perhaps amf could keep track of when also the component has exit by using the avaDown messages and keep the component in the cb->compdb a bit longer? Or use some retry logic in the script around the killproc.
/Thanks HansN -----Original Message----- From: [email protected] [mailto:[email protected]] Sent: den 6 maj 2015 16:47 To: Anders Widell; [email protected]; Hans Nordebäck; [email protected]; [email protected] Cc: [email protected] Subject: [PATCH 1 of 1] osaf: During adminrestart of node directors, before re-instantiating kill them [#1326] osaf/services/saf/cpsv/cpnd/cpnd_amf.c | 14 +++++++++++++- osaf/services/saf/cpsv/cpnd/scripts/osaf-ckptnd.in | 13 +++++++++++++ osaf/services/saf/glsv/glnd/glnd_amf.c | 14 ++++++++++++++ osaf/services/saf/glsv/glnd/scripts/osaf-lcknd.in | 13 +++++++++++++ osaf/services/saf/immsv/immnd/immnd_amf.c | 11 +++++++++++ osaf/services/saf/immsv/immnd/scripts/osaf-immnd.in | 18 ++++++++++++++++++ osaf/services/saf/mqsv/mqnd/mqnd_amf.c | 11 +++++++++++ osaf/services/saf/mqsv/mqnd/scripts/osaf-msgnd.in | 13 +++++++++++++ osaf/services/saf/smfsv/smfnd/scripts/osaf-smfnd.in | 13 +++++++++++++ osaf/services/saf/smfsv/smfnd/smfnd_amf.c | 11 +++++++++++ 10 files changed, 130 insertions(+), 1 deletions(-) The command $ amf-adm restart <DN name> is one way of administratively restarting an AMF component. As apart of this admin operation, AMF sends the component terminate callback to the PI components. It is up to the component to release all its resources and respond to AMF the status of its self-termination before exiting (typically) the process itself. After receiving the response from the component, AMF invokes the instantiation script of the component. During this time, it is possible that the previously running instance of the process (of this component) has not yet exited. This situation when there is already a running daemon/process and now a new instantiation is being attempted can cause the instantiation script to return failure. This patch creates temporary term_state_file from inside the component terminate callback of the node directors. In the instantiation scripts, a check is done to distinguish a a fresh instantiation versus an instantiation after a termination. If the term_state_file exists then it means, its an instantiation after termination. If so, just attempt to kill (using killproc) the process again before calling start_daemon. Note: There has been mention of using start_daemon -f option which will create another copy of the daemon if the previous daemon is still running. Using this option may not be ideal for us as it can create any inconsistency between the two daemons when using any resources and also, there is no proof or documentation of start_daemon -f working successfully. This is even more significant given that some distros are really slow in becoming LSB compliant, particularly the start_daemon and the likes of it. diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_amf.c b/osaf/services/saf/cpsv/cpnd/cpnd_amf.c --- a/osaf/services/saf/cpsv/cpnd/cpnd_amf.c +++ b/osaf/services/saf/cpsv/cpnd/cpnd_amf.c @@ -35,7 +35,9 @@ ******************************************************************************/ #include "cpnd.h" +#include "configmake.h" +static const char *term_state_file = PKGPIDDIR "/osafckptnd_termstate"; /**************************************************************************** * Name : cpnd_saf_health_chk_callback * @@ -232,13 +234,23 @@ void cpnd_amf_comp_terminate_callback(Sa { CPND_CB *cb = NULL; SaAisErrorT saErr = SA_AIS_OK; + int fd; + TRACE_ENTER(); - TRACE_ENTER(); cb = ncshm_take_hdl(NCS_SERVICE_ID_CPND, gl_cpnd_cb_hdl); if (cb == NULL) { LOG_ER("cpnd cb take handle failed in amf term callback"); return; } + + fd = open(term_state_file, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR); + + if (fd >=0) + (void)close(fd); + else + LOG_NO("cannot create termstate file %s: %s", + term_state_file, strerror(errno)); + saAmfResponse(cb->amf_hdl, invocation, saErr); ncshm_give_hdl(gl_cpnd_cb_hdl); LOG_NO("Received AMF component terminate callback, exiting"); diff --git a/osaf/services/saf/cpsv/cpnd/scripts/osaf-ckptnd.in b/osaf/services/saf/cpsv/cpnd/scripts/osaf-ckptnd.in --- a/osaf/services/saf/cpsv/cpnd/scripts/osaf-ckptnd.in +++ b/osaf/services/saf/cpsv/cpnd/scripts/osaf-ckptnd.in @@ -30,10 +30,21 @@ fi binary=$pkglibdir/$prog pidfile=$pkgpiddir/$prog.pid lockfile=$lockdir/$initscript +termfile=$pkgpiddir/$prog"_termstate" RETVAL=0 start() { + #If the term file exists, it means instantiation is + #attempted after a termination For eg:- during administrative + #restart of a component. In this case, first try to kill + #the component since it might be seen as still running while exiting + #via the termination callback or termination scripts(in case of NPI). + #Note: start_daemon -f may also be used to create another copy of the daemon, + #but the behaviour of -f option has not been tested yet! + + [ -e $termfile ] && killproc -p $pidfile $binary + export LD_LIBRARY_PATH=$pkglibdir:$LD_LIBRARY_PATH [ -x $binary ] || exit 5 echo -n "Starting $prog: " @@ -41,6 +52,7 @@ start() { RETVAL=$? if [ $RETVAL -eq 0 ]; then touch $lockfile + rm -f $termfile log_success_msg else log_failure_msg @@ -55,6 +67,7 @@ stop() { if [ $RETVAL -eq 0 ] || [ $RETVAL -eq 7 ]; then rm -f $lockfile log_success_msg + rm -f $termfile RETVAL=0 else log_failure_msg diff --git a/osaf/services/saf/glsv/glnd/glnd_amf.c b/osaf/services/saf/glsv/glnd/glnd_amf.c --- a/osaf/services/saf/glsv/glnd/glnd_amf.c +++ b/osaf/services/saf/glsv/glnd/glnd_amf.c @@ -35,6 +35,8 @@ ******************************************************************************/ #include "glnd.h" +#include "configmake.h" + void glnd_amf_comp_terminate_callback(SaInvocationT invocation, const SaNameT *compName); void glnd_saf_health_chk_callback(SaInvocationT invocation, const SaNameT *compName, const SaAmfHealthcheckKeyT *checkType); @@ -45,6 +47,7 @@ void glnd_amf_CSI_set_callback(SaInvocat void glnd_amf_csi_rmv_callback(SaInvocationT invocation, const SaNameT *compName, const SaNameT *csiName, SaAmfCSIFlagsT csiFlags); +static const char *term_state_file = PKGPIDDIR "/osaflcknd_termstate"; /**************************************************************************** * Name : glnd_saf_health_chk_callback * @@ -114,17 +117,28 @@ void glnd_amf_comp_terminate_callback(Sa GLND_CB *glnd_cb; SaAisErrorT error = SA_AIS_OK; TRACE_ENTER2("Component Name: %s", compName->value); + int fd; /* take the handle */ glnd_cb = (GLND_CB *)m_GLND_TAKE_GLND_CB; if (!glnd_cb) { LOG_ER("GLND cb take handle failed"); } else { + + fd = open(term_state_file, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR); + + if (fd >=0) + (void)close(fd); + else + LOG_NO("cannot create termstate file %s: %s", + term_state_file, strerror(errno)); + if (saAmfResponse(glnd_cb->amf_hdl, invocation, error) != SA_AIS_OK) LOG_ER("GLND amf response failed"); /* giveup the handle */ m_GLND_GIVEUP_GLND_CB; } + LOG_NO("Received AMF component terminate callback, exiting"); TRACE_LEAVE(); diff --git a/osaf/services/saf/glsv/glnd/scripts/osaf-lcknd.in b/osaf/services/saf/glsv/glnd/scripts/osaf-lcknd.in --- a/osaf/services/saf/glsv/glnd/scripts/osaf-lcknd.in +++ b/osaf/services/saf/glsv/glnd/scripts/osaf-lcknd.in @@ -30,10 +30,21 @@ fi binary=$pkglibdir/$prog pidfile=$pkgpiddir/$prog.pid lockfile=$lockdir/$initscript +termfile=$pkgpiddir/$prog"_termstate" RETVAL=0 start() { + #If the term file exists, it means instantiation is + #attempted after a termination For eg:- during administrative + #restart of a component. In this case, first try to kill + #the component since it might be seen as still running while exiting + #via the termination callback or termination scripts(in case of NPI). + #Note: start_daemon -f may also be used to create another copy of the daemon, + #but the behaviour of -f option has not been tested yet! + + [ -e $termfile ] && killproc -p $pidfile $binary + export LD_LIBRARY_PATH=$pkglibdir:$LD_LIBRARY_PATH [ -x $binary ] || exit 5 echo -n "Starting $prog: " @@ -41,6 +52,7 @@ start() { RETVAL=$? if [ $RETVAL -eq 0 ]; then touch $lockfile + rm -f $termfile log_success_msg else log_failure_msg @@ -55,6 +67,7 @@ stop() { if [ $RETVAL -eq 0 ] || [ $RETVAL -eq 7 ]; then rm -f $lockfile log_success_msg + rm -f $termfile RETVAL=0 else log_failure_msg diff --git a/osaf/services/saf/immsv/immnd/immnd_amf.c b/osaf/services/saf/immsv/immnd/immnd_amf.c --- a/osaf/services/saf/immsv/immnd/immnd_amf.c +++ b/osaf/services/saf/immsv/immnd/immnd_amf.c @@ -18,7 +18,9 @@ #include "immnd.h" #include <nid_start_util.h> #include "osaf_extended_name.h" +#include "configmake.h" +static const char *term_state_file = PKGPIDDIR "/osafimmnd_termstate"; /**************************************************************************** * Name : immnd_saf_health_chk_callback * @@ -73,12 +75,21 @@ static void immnd_saf_health_chk_callbac static void immnd_amf_comp_terminate_callback(SaInvocationT invocation, const SaNameT *compName) { TRACE_ENTER(); + int fd; if (immnd_cb->pbePid > 0) kill(immnd_cb->pbePid, SIGTERM); if (immnd_cb->syncPid > 0) kill(immnd_cb->syncPid, SIGTERM); + fd = open(term_state_file, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR); + + if (fd >=0) + (void)close(fd); + else + LOG_NO("cannot create termstate file %s: %s", + term_state_file, strerror(errno)); + LOG_NO("Received AMF component terminate callback, exiting"); saAmfResponse(immnd_cb->amf_hdl, invocation, SA_AIS_OK); diff --git a/osaf/services/saf/immsv/immnd/scripts/osaf-immnd.in b/osaf/services/saf/immsv/immnd/scripts/osaf-immnd.in --- a/osaf/services/saf/immsv/immnd/scripts/osaf-immnd.in +++ b/osaf/services/saf/immsv/immnd/scripts/osaf-immnd.in @@ -31,12 +31,17 @@ fi binary=$pkglibdir/$prog pidfile=$pkgpiddir/$prog.pid lockfile=$lockdir/$initscript +termfile=$pkgpiddir/$prog"_termstate" RETVAL=0 NIDSERV="IMMND" COMPNAMEFILE=$pkglocalstatedir/immnd_comp_name start() { + # remove any termination file created previously via + # AMF component termination callback + rm -f $termfile + export LD_LIBRARY_PATH=$pkglibdir:$LD_LIBRARY_PATH [ -p $NIDFIFO ] || exit 1 if [ ! -x $binary ]; then @@ -58,6 +63,16 @@ start() { } instantiate() { + #If the term file exists, it means instantiation is + #attempted after a termination For eg:- during administrative + #restart of a component. In this case, first try to kill + #the component since it might be seen as still running while exiting + #via the termination callback or termination scripts(in case of NPI). + #Note: start_daemon -f may also be used to create another copy of the daemon, + #but the behaviour of -f option has not been tested yet! + + [ -e $termfile ] && killproc -p $pidfile $binary + echo -n "AMF Instantiating $prog: " echo $SA_AMF_COMPONENT_NAME > $COMPNAMEFILE pidofproc -p $pidfile $binary @@ -71,9 +86,11 @@ instantiate() { fi if [ $RETVAL -eq 0 ]; then log_success_msg + rm -f $termfile else log_failure_msg fi + return $RETVAL } @@ -86,6 +103,7 @@ stop() { if [ $RETVAL -eq 0 ] || [ $RETVAL -eq 7 ]; then rm -f $lockfile rm -f $COMPNAMEFILE + rm -f $termfile log_success_msg RETVAL=0 else diff --git a/osaf/services/saf/mqsv/mqnd/mqnd_amf.c b/osaf/services/saf/mqsv/mqnd/mqnd_amf.c --- a/osaf/services/saf/mqsv/mqnd/mqnd_amf.c +++ b/osaf/services/saf/mqsv/mqnd/mqnd_amf.c @@ -35,6 +35,7 @@ ******************************************************************************/ #include "mqnd.h" +#include "configmake.h" static void mqnd_saf_health_chk_callback(SaInvocationT invocation, const SaNameT *compName, SaAmfHealthcheckKeyT *checkType); @@ -47,6 +48,7 @@ static void mqnd_amf_CSI_set_callback(Sa const SaNameT *compName, SaAmfHAStateT haState, SaAmfCSIDescriptorT csiDescriptor); +static const char *term_state_file = PKGPIDDIR "/osafmsgnd_termstate"; /**************************************************************************** * Name : mqnd_saf_health_chk_callback * @@ -227,6 +229,7 @@ static void mqnd_amf_comp_terminate_call TRACE_ENTER(); uint32_t cb_hdl = m_MQND_GET_HDL(); + int fd; /* Get the CB from the handle */ mqnd_cb = ncshm_take_hdl(NCS_SERVICE_ID_MQND, cb_hdl); @@ -236,6 +239,14 @@ static void mqnd_amf_comp_terminate_call return; } + fd = open(term_state_file, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR); + + if (fd >=0) + (void)close(fd); + else + LOG_NO("cannot create termstate file %s: %s", + term_state_file, strerror(errno)); + saAmfResponse(mqnd_cb->amf_hdl, invocation, saErr); LOG_ER("Amf Terminate Callback called"); diff --git a/osaf/services/saf/mqsv/mqnd/scripts/osaf-msgnd.in b/osaf/services/saf/mqsv/mqnd/scripts/osaf-msgnd.in --- a/osaf/services/saf/mqsv/mqnd/scripts/osaf-msgnd.in +++ b/osaf/services/saf/mqsv/mqnd/scripts/osaf-msgnd.in @@ -30,10 +30,21 @@ fi binary=$pkglibdir/$prog pidfile=$pkgpiddir/$prog.pid lockfile=$lockdir/$initscript +termfile=$pkgpiddir/$prog"_termstate" RETVAL=0 start() { + #If the term file exists, it means instantiation is + #attempted after a termination For eg:- during administrative + #restart of a component. In this case, first try to kill + #the component since it might be seen as still running while exiting + #via the termination callback or termination scripts(in case of NPI). + #Note: start_daemon -f may also be used to create another copy of the daemon, + #but the behaviour of -f option has not been tested yet! + + [ -e $termfile ] && killproc -p $pidfile $binary + export LD_LIBRARY_PATH=$pkglibdir:$LD_LIBRARY_PATH [ -x $binary ] || exit 5 echo -n "Starting $prog: " @@ -42,6 +53,7 @@ start() { if [ $RETVAL -eq 0 ]; then touch $lockfile log_success_msg + rm -f $termfile else log_failure_msg fi @@ -55,6 +67,7 @@ stop() { if [ $RETVAL -eq 0 ] || [ $RETVAL -eq 7 ]; then rm -f $lockfile log_success_msg + rm -f $termfile RETVAL=0 else log_failure_msg diff --git a/osaf/services/saf/smfsv/smfnd/scripts/osaf-smfnd.in b/osaf/services/saf/smfsv/smfnd/scripts/osaf-smfnd.in --- a/osaf/services/saf/smfsv/smfnd/scripts/osaf-smfnd.in +++ b/osaf/services/saf/smfsv/smfnd/scripts/osaf-smfnd.in @@ -30,10 +30,21 @@ fi binary=$pkglibdir/$prog pidfile=$pkgpiddir/$prog.pid lockfile=$lockdir/$initscript +termfile=$pkgpiddir/$prog"_termstate" RETVAL=0 start() { + #If the term file exists, it means instantiation is + #attempted after a termination For eg:- during administrative + #restart of a component. In this case, first try to kill + #the component since it might be seen as still running while exiting + #via the termination callback or termination scripts(in case of NPI). + #Note: start_daemon -f may also be used to create another copy of the daemon, + #but the behaviour of -f option has not been tested yet! + + [ -e $termfile ] && killproc -p $pidfile $binary + export LD_LIBRARY_PATH=$pkglibdir:$LD_LIBRARY_PATH [ -x $binary ] || exit 5 echo -n "Starting $prog: " @@ -42,6 +53,7 @@ start() { if [ $RETVAL -eq 0 ]; then touch $lockfile log_success_msg + rm -f $termfile else log_failure_msg fi @@ -55,6 +67,7 @@ stop() { if [ $RETVAL -eq 0 ] || [ $RETVAL -eq 7 ]; then rm -f $lockfile log_success_msg + rm -f $termfile RETVAL=0 else log_failure_msg diff --git a/osaf/services/saf/smfsv/smfnd/smfnd_amf.c b/osaf/services/saf/smfsv/smfnd/smfnd_amf.c --- a/osaf/services/saf/smfsv/smfnd/smfnd_amf.c +++ b/osaf/services/saf/smfsv/smfnd/smfnd_amf.c @@ -20,7 +20,9 @@ */ #include "smfnd.h" +#include "configmake.h" +static const char *term_state_file = PKGPIDDIR "/osafsmfnd_termstate"; /**************************************************************************** * Name : amf_health_chk_callback * @@ -107,6 +109,15 @@ static void amf_csi_set_callback(SaInvoc static void amf_comp_terminate_callback(SaInvocationT invocation, const SaNameT * compName) { TRACE_ENTER(); + int fd; + + fd = open(term_state_file, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR); + + if (fd >=0) + (void)close(fd); + else + LOG_NO("cannot create termstate file %s: %s", + term_state_file, strerror(errno)); saAmfResponse(smfnd_cb->amf_hdl, invocation, SA_AIS_OK); ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
