Nagendra, Comments inline: ----- [email protected] wrote:
> Tested the following scenarios on SLES SP2 (x86_64): > 1. Did kill and admin restart on ckptnd, it came up successfully. > 2. Kept sleep in ckptnd terminate callback and did admin restart. > Ckptnd came up successfully. > > Ack with the following component: > 1. The similar changes need to be done for Amfwd. Okay, will do that. > 2. When I was testing while keeping sleep in terminate callback, I > observed that start clc-cli script waits at killproc till Ckptnd keeps > sleeping > and comes out when Cpknd exits. Hope that is expected. I mean it > doesn't terminate the process until it is alive. Will check that. > 3. In all 'stop' (cleanup case), ' rm -f $termfile' could be added > outside check of RETVAL. No, This has to be done only in the case when the stop related actions are successful. Because if stop fails, then this fail should continue to exist such that during start, an explicit kill of the process is attempted. Mathi. > > Thanks > -Nagu > > > -----Original Message----- > > From: Mathivanan Naickan Palanivelu > > Sent: 06 May 2015 20:17 > > To: [email protected]; Ramesh Babu Betham; > > [email protected]; Nagendra Kumar; Praveen Malviya > > Cc: [email protected] > > Subject: [PATCH 1 of 1] osaf: During adminrestart of node directors, > before re- > > instantiating kill them [#1326] > > > > osaf/services/saf/cpsv/cpnd/cpnd_amf.c | 14 > +++++++++++++- > > osaf/services/saf/cpsv/cpnd/scripts/osaf-ckptnd.in | 13 > +++++++++++++ > > osaf/services/saf/glsv/glnd/glnd_amf.c | 14 > ++++++++++++++ > > osaf/services/saf/glsv/glnd/scripts/osaf-lcknd.in | 13 > +++++++++++++ > > osaf/services/saf/immsv/immnd/immnd_amf.c | 11 > +++++++++++ > > osaf/services/saf/immsv/immnd/scripts/osaf-immnd.in | 18 > > ++++++++++++++++++ > > osaf/services/saf/mqsv/mqnd/mqnd_amf.c | 11 > +++++++++++ > > osaf/services/saf/mqsv/mqnd/scripts/osaf-msgnd.in | 13 > +++++++++++++ > > osaf/services/saf/smfsv/smfnd/scripts/osaf-smfnd.in | 13 > +++++++++++++ > > osaf/services/saf/smfsv/smfnd/smfnd_amf.c | 11 > +++++++++++ > > 10 files changed, 130 insertions(+), 1 deletions(-) > > > > > > The command $ amf-adm restart <DN name> is one way of > administratively > > restarting an AMF component. > > As apart of this admin operation, AMF sends the component terminate > > callback > > to the PI components. It is up to the component to release all its > resources > > and respond > > to AMF the status of its self-termination before exiting (typically) > the process > > itself. > > After receiving the response from the component, AMF invokes the > > instantiation script of > > the component. During this time, it is possible that the previously > running > > instance > > of the process (of this component) has not yet exited. This > situation when > > there is already a running daemon/process and now a new > instantiation is > > being attempted > > can cause the instantiation script to return failure. > > This patch creates temporary term_state_file from inside the > component > > terminate callback > > of the node directors. > > In the instantiation scripts, a check is done to distinguish a > > a fresh instantiation versus an instantiation after a termination. > > If the term_state_file exists then it means, its an instantiation > after > > termination. > > If so, just attempt to kill (using killproc) the process again > before calling > > start_daemon. > > > > Note: There has been mention of using start_daemon -f option which > will > > create another > > copy of the daemon if the previous daemon is still running. Using > this option > > may not > > be ideal for us as it can create any inconsistency between the two > daemons > > when > > using any resources and also, there is no proof or documentation of > > start_daemon -f > > working successfully. This is even more significant given that some > distros are > > really slow in becoming LSB compliant, particularly the start_daemon > and the > > likes of it. > > > > diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_amf.c > > b/osaf/services/saf/cpsv/cpnd/cpnd_amf.c > > --- a/osaf/services/saf/cpsv/cpnd/cpnd_amf.c > > +++ b/osaf/services/saf/cpsv/cpnd/cpnd_amf.c > > @@ -35,7 +35,9 @@ > > > > ***************************************************************** > > *************/ > > > > #include "cpnd.h" > > +#include "configmake.h" > > > > +static const char *term_state_file = PKGPIDDIR > "/osafckptnd_termstate"; > > > > /**************************************************************** > > ************ > > * Name : cpnd_saf_health_chk_callback > > * > > @@ -232,13 +234,23 @@ void cpnd_amf_comp_terminate_callback(Sa > > { > > CPND_CB *cb = NULL; > > SaAisErrorT saErr = SA_AIS_OK; > > + int fd; > > + TRACE_ENTER(); > > > > - TRACE_ENTER(); > > cb = ncshm_take_hdl(NCS_SERVICE_ID_CPND, gl_cpnd_cb_hdl); > > if (cb == NULL) { > > LOG_ER("cpnd cb take handle failed in amf term callback"); > > return; > > } > > + > > + fd = open(term_state_file, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR); > > + > > + if (fd >=0) > > + (void)close(fd); > > + else > > + LOG_NO("cannot create termstate file %s: %s", > > + term_state_file, strerror(errno)); > > + > > saAmfResponse(cb->amf_hdl, invocation, saErr); > > ncshm_give_hdl(gl_cpnd_cb_hdl); > > LOG_NO("Received AMF component terminate callback, exiting"); > > diff --git a/osaf/services/saf/cpsv/cpnd/scripts/osaf-ckptnd.in > > b/osaf/services/saf/cpsv/cpnd/scripts/osaf-ckptnd.in > > --- a/osaf/services/saf/cpsv/cpnd/scripts/osaf-ckptnd.in > > +++ b/osaf/services/saf/cpsv/cpnd/scripts/osaf-ckptnd.in > > @@ -30,10 +30,21 @@ fi > > binary=$pkglibdir/$prog > > pidfile=$pkgpiddir/$prog.pid > > lockfile=$lockdir/$initscript > > +termfile=$pkgpiddir/$prog"_termstate" > > > > RETVAL=0 > > > > start() { > > + #If the term file exists, it means instantiation is > > + #attempted after a termination For eg:- during administrative > > + #restart of a component. In this case, first try to kill > > + #the component since it might be seen as still running while > exiting > > + #via the termination callback or termination scripts(in case of > NPI). > > + #Note: start_daemon -f may also be used to create another copy of > > the daemon, > > + #but the behaviour of -f option has not been tested yet! > > + > > + [ -e $termfile ] && killproc -p $pidfile $binary > > + > > export LD_LIBRARY_PATH=$pkglibdir:$LD_LIBRARY_PATH > > [ -x $binary ] || exit 5 > > echo -n "Starting $prog: " > > @@ -41,6 +52,7 @@ start() { > > RETVAL=$? > > if [ $RETVAL -eq 0 ]; then > > touch $lockfile > > + rm -f $termfile > > log_success_msg > > else > > log_failure_msg > > @@ -55,6 +67,7 @@ stop() { > > if [ $RETVAL -eq 0 ] || [ $RETVAL -eq 7 ]; then > > rm -f $lockfile > > log_success_msg > > + rm -f $termfile > > RETVAL=0 > > else > > log_failure_msg > > diff --git a/osaf/services/saf/glsv/glnd/glnd_amf.c > > b/osaf/services/saf/glsv/glnd/glnd_amf.c > > --- a/osaf/services/saf/glsv/glnd/glnd_amf.c > > +++ b/osaf/services/saf/glsv/glnd/glnd_amf.c > > @@ -35,6 +35,8 @@ > > > > ***************************************************************** > > *************/ > > > > #include "glnd.h" > > +#include "configmake.h" > > + > > void glnd_amf_comp_terminate_callback(SaInvocationT invocation, > const > > SaNameT *compName); > > void glnd_saf_health_chk_callback(SaInvocationT invocation, > > const SaNameT *compName, const > > SaAmfHealthcheckKeyT *checkType); > > @@ -45,6 +47,7 @@ void glnd_amf_CSI_set_callback(SaInvocat > > void glnd_amf_csi_rmv_callback(SaInvocationT invocation, > > const SaNameT *compName, const SaNameT > > *csiName, SaAmfCSIFlagsT csiFlags); > > > > +static const char *term_state_file = PKGPIDDIR > "/osaflcknd_termstate"; > > > > /**************************************************************** > > ************ > > * Name : glnd_saf_health_chk_callback > > * > > @@ -114,17 +117,28 @@ void glnd_amf_comp_terminate_callback(Sa > > GLND_CB *glnd_cb; > > SaAisErrorT error = SA_AIS_OK; > > TRACE_ENTER2("Component Name: %s", compName->value); > > + int fd; > > > > /* take the handle */ > > glnd_cb = (GLND_CB *)m_GLND_TAKE_GLND_CB; > > if (!glnd_cb) { > > LOG_ER("GLND cb take handle failed"); > > } else { > > + > > + fd = open(term_state_file, O_CREAT | O_RDWR, S_IRUSR | > > S_IWUSR); > > + > > + if (fd >=0) > > + (void)close(fd); > > + else > > + LOG_NO("cannot create termstate file %s: %s", > > + term_state_file, strerror(errno)); > > + > > if (saAmfResponse(glnd_cb->amf_hdl, invocation, error) != > > SA_AIS_OK) > > LOG_ER("GLND amf response failed"); > > /* giveup the handle */ > > m_GLND_GIVEUP_GLND_CB; > > } > > + > > LOG_NO("Received AMF component terminate callback, exiting"); > > TRACE_LEAVE(); > > > > diff --git a/osaf/services/saf/glsv/glnd/scripts/osaf-lcknd.in > > b/osaf/services/saf/glsv/glnd/scripts/osaf-lcknd.in > > --- a/osaf/services/saf/glsv/glnd/scripts/osaf-lcknd.in > > +++ b/osaf/services/saf/glsv/glnd/scripts/osaf-lcknd.in > > @@ -30,10 +30,21 @@ fi > > binary=$pkglibdir/$prog > > pidfile=$pkgpiddir/$prog.pid > > lockfile=$lockdir/$initscript > > +termfile=$pkgpiddir/$prog"_termstate" > > > > RETVAL=0 > > > > start() { > > + #If the term file exists, it means instantiation is > > + #attempted after a termination For eg:- during administrative > > + #restart of a component. In this case, first try to kill > > + #the component since it might be seen as still running while > exiting > > + #via the termination callback or termination scripts(in case of > NPI). > > + #Note: start_daemon -f may also be used to create another copy of > > the daemon, > > + #but the behaviour of -f option has not been tested yet! > > + > > + [ -e $termfile ] && killproc -p $pidfile $binary > > + > > export LD_LIBRARY_PATH=$pkglibdir:$LD_LIBRARY_PATH > > [ -x $binary ] || exit 5 > > echo -n "Starting $prog: " > > @@ -41,6 +52,7 @@ start() { > > RETVAL=$? > > if [ $RETVAL -eq 0 ]; then > > touch $lockfile > > + rm -f $termfile > > log_success_msg > > else > > log_failure_msg > > @@ -55,6 +67,7 @@ stop() { > > if [ $RETVAL -eq 0 ] || [ $RETVAL -eq 7 ]; then > > rm -f $lockfile > > log_success_msg > > + rm -f $termfile > > RETVAL=0 > > else > > log_failure_msg > > diff --git a/osaf/services/saf/immsv/immnd/immnd_amf.c > > b/osaf/services/saf/immsv/immnd/immnd_amf.c > > --- a/osaf/services/saf/immsv/immnd/immnd_amf.c > > +++ b/osaf/services/saf/immsv/immnd/immnd_amf.c > > @@ -18,7 +18,9 @@ > > #include "immnd.h" > > #include <nid_start_util.h> > > #include "osaf_extended_name.h" > > +#include "configmake.h" > > > > +static const char *term_state_file = PKGPIDDIR > "/osafimmnd_termstate"; > > > > /**************************************************************** > > ************ > > * Name : immnd_saf_health_chk_callback > > * > > @@ -73,12 +75,21 @@ static void immnd_saf_health_chk_callbac > > static void immnd_amf_comp_terminate_callback(SaInvocationT > invocation, > > const SaNameT *compName) > > { > > TRACE_ENTER(); > > + int fd; > > > > if (immnd_cb->pbePid > 0) > > kill(immnd_cb->pbePid, SIGTERM); > > if (immnd_cb->syncPid > 0) > > kill(immnd_cb->syncPid, SIGTERM); > > > > + fd = open(term_state_file, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR); > > + > > + if (fd >=0) > > + (void)close(fd); > > + else > > + LOG_NO("cannot create termstate file %s: %s", > > + term_state_file, strerror(errno)); > > + > > LOG_NO("Received AMF component terminate callback, exiting"); > > saAmfResponse(immnd_cb->amf_hdl, invocation, SA_AIS_OK); > > > > diff --git a/osaf/services/saf/immsv/immnd/scripts/osaf-immnd.in > > b/osaf/services/saf/immsv/immnd/scripts/osaf-immnd.in > > --- a/osaf/services/saf/immsv/immnd/scripts/osaf-immnd.in > > +++ b/osaf/services/saf/immsv/immnd/scripts/osaf-immnd.in > > @@ -31,12 +31,17 @@ fi > > binary=$pkglibdir/$prog > > pidfile=$pkgpiddir/$prog.pid > > lockfile=$lockdir/$initscript > > +termfile=$pkgpiddir/$prog"_termstate" > > > > RETVAL=0 > > NIDSERV="IMMND" > > COMPNAMEFILE=$pkglocalstatedir/immnd_comp_name > > > > start() { > > + # remove any termination file created previously via > > + # AMF component termination callback > > + rm -f $termfile > > + > > export LD_LIBRARY_PATH=$pkglibdir:$LD_LIBRARY_PATH > > [ -p $NIDFIFO ] || exit 1 > > if [ ! -x $binary ]; then > > @@ -58,6 +63,16 @@ start() { > > } > > > > instantiate() { > > + #If the term file exists, it means instantiation is > > + #attempted after a termination For eg:- during administrative > > + #restart of a component. In this case, first try to kill > > + #the component since it might be seen as still running while > exiting > > + #via the termination callback or termination scripts(in case of > NPI). > > + #Note: start_daemon -f may also be used to create another copy of > > the daemon, > > + #but the behaviour of -f option has not been tested yet! > > + > > + [ -e $termfile ] && killproc -p $pidfile $binary > > + > > echo -n "AMF Instantiating $prog: " > > echo $SA_AMF_COMPONENT_NAME > $COMPNAMEFILE > > pidofproc -p $pidfile $binary > > @@ -71,9 +86,11 @@ instantiate() { > > fi > > if [ $RETVAL -eq 0 ]; then > > log_success_msg > > + rm -f $termfile > > else > > log_failure_msg > > fi > > + > > return $RETVAL > > } > > > > @@ -86,6 +103,7 @@ stop() { > > if [ $RETVAL -eq 0 ] || [ $RETVAL -eq 7 ]; then > > rm -f $lockfile > > rm -f $COMPNAMEFILE > > + rm -f $termfile > > log_success_msg > > RETVAL=0 > > else > > diff --git a/osaf/services/saf/mqsv/mqnd/mqnd_amf.c > > b/osaf/services/saf/mqsv/mqnd/mqnd_amf.c > > --- a/osaf/services/saf/mqsv/mqnd/mqnd_amf.c > > +++ b/osaf/services/saf/mqsv/mqnd/mqnd_amf.c > > @@ -35,6 +35,7 @@ > > > > ***************************************************************** > > *************/ > > > > #include "mqnd.h" > > +#include "configmake.h" > > > > static void mqnd_saf_health_chk_callback(SaInvocationT invocation, > const > > SaNameT *compName, > > SaAmfHealthcheckKeyT *checkType); > > @@ -47,6 +48,7 @@ static void mqnd_amf_CSI_set_callback(Sa > > const SaNameT *compName, > > SaAmfHAStateT haState, > > SaAmfCSIDescriptorT csiDescriptor); > > > > +static const char *term_state_file = PKGPIDDIR > "/osafmsgnd_termstate"; > > > > /**************************************************************** > > ************ > > * Name : mqnd_saf_health_chk_callback > > * > > @@ -227,6 +229,7 @@ static void mqnd_amf_comp_terminate_call > > TRACE_ENTER(); > > > > uint32_t cb_hdl = m_MQND_GET_HDL(); > > + int fd; > > > > /* Get the CB from the handle */ > > mqnd_cb = ncshm_take_hdl(NCS_SERVICE_ID_MQND, cb_hdl); > > @@ -236,6 +239,14 @@ static void mqnd_amf_comp_terminate_call > > return; > > } > > > > + fd = open(term_state_file, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR); > > + > > + if (fd >=0) > > + (void)close(fd); > > + else > > + LOG_NO("cannot create termstate file %s: %s", > > + term_state_file, strerror(errno)); > > + > > saAmfResponse(mqnd_cb->amf_hdl, invocation, saErr); > > LOG_ER("Amf Terminate Callback called"); > > > > diff --git a/osaf/services/saf/mqsv/mqnd/scripts/osaf-msgnd.in > > b/osaf/services/saf/mqsv/mqnd/scripts/osaf-msgnd.in > > --- a/osaf/services/saf/mqsv/mqnd/scripts/osaf-msgnd.in > > +++ b/osaf/services/saf/mqsv/mqnd/scripts/osaf-msgnd.in > > @@ -30,10 +30,21 @@ fi > > binary=$pkglibdir/$prog > > pidfile=$pkgpiddir/$prog.pid > > lockfile=$lockdir/$initscript > > +termfile=$pkgpiddir/$prog"_termstate" > > > > RETVAL=0 > > > > start() { > > + #If the term file exists, it means instantiation is > > + #attempted after a termination For eg:- during administrative > > + #restart of a component. In this case, first try to kill > > + #the component since it might be seen as still running while > exiting > > + #via the termination callback or termination scripts(in case of > NPI). > > + #Note: start_daemon -f may also be used to create another copy of > > the daemon, > > + #but the behaviour of -f option has not been tested yet! > > + > > + [ -e $termfile ] && killproc -p $pidfile $binary > > + > > export LD_LIBRARY_PATH=$pkglibdir:$LD_LIBRARY_PATH > > [ -x $binary ] || exit 5 > > echo -n "Starting $prog: " > > @@ -42,6 +53,7 @@ start() { > > if [ $RETVAL -eq 0 ]; then > > touch $lockfile > > log_success_msg > > + rm -f $termfile > > else > > log_failure_msg > > fi > > @@ -55,6 +67,7 @@ stop() { > > if [ $RETVAL -eq 0 ] || [ $RETVAL -eq 7 ]; then > > rm -f $lockfile > > log_success_msg > > + rm -f $termfile > > RETVAL=0 > > else > > log_failure_msg > > diff --git a/osaf/services/saf/smfsv/smfnd/scripts/osaf-smfnd.in > > b/osaf/services/saf/smfsv/smfnd/scripts/osaf-smfnd.in > > --- a/osaf/services/saf/smfsv/smfnd/scripts/osaf-smfnd.in > > +++ b/osaf/services/saf/smfsv/smfnd/scripts/osaf-smfnd.in > > @@ -30,10 +30,21 @@ fi > > binary=$pkglibdir/$prog > > pidfile=$pkgpiddir/$prog.pid > > lockfile=$lockdir/$initscript > > +termfile=$pkgpiddir/$prog"_termstate" > > > > RETVAL=0 > > > > start() { > > + #If the term file exists, it means instantiation is > > + #attempted after a termination For eg:- during administrative > > + #restart of a component. In this case, first try to kill > > + #the component since it might be seen as still running while > exiting > > + #via the termination callback or termination scripts(in case of > NPI). > > + #Note: start_daemon -f may also be used to create another copy of > > the daemon, > > + #but the behaviour of -f option has not been tested yet! > > + > > + [ -e $termfile ] && killproc -p $pidfile $binary > > + > > export LD_LIBRARY_PATH=$pkglibdir:$LD_LIBRARY_PATH > > [ -x $binary ] || exit 5 > > echo -n "Starting $prog: " > > @@ -42,6 +53,7 @@ start() { > > if [ $RETVAL -eq 0 ]; then > > touch $lockfile > > log_success_msg > > + rm -f $termfile > > else > > log_failure_msg > > fi > > @@ -55,6 +67,7 @@ stop() { > > if [ $RETVAL -eq 0 ] || [ $RETVAL -eq 7 ]; then > > rm -f $lockfile > > log_success_msg > > + rm -f $termfile > > RETVAL=0 > > else > > log_failure_msg > > diff --git a/osaf/services/saf/smfsv/smfnd/smfnd_amf.c > > b/osaf/services/saf/smfsv/smfnd/smfnd_amf.c > > --- a/osaf/services/saf/smfsv/smfnd/smfnd_amf.c > > +++ b/osaf/services/saf/smfsv/smfnd/smfnd_amf.c > > @@ -20,7 +20,9 @@ > > */ > > > > #include "smfnd.h" > > +#include "configmake.h" > > > > +static const char *term_state_file = PKGPIDDIR > "/osafsmfnd_termstate"; > > > > /**************************************************************** > > ************ > > * Name : amf_health_chk_callback > > * > > @@ -107,6 +109,15 @@ static void amf_csi_set_callback(SaInvoc > > static void amf_comp_terminate_callback(SaInvocationT invocation, > const > > SaNameT * compName) > > { > > TRACE_ENTER(); > > + int fd; > > + > > + fd = open(term_state_file, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR); > > + > > + if (fd >=0) > > + (void)close(fd); > > + else > > + LOG_NO("cannot create termstate file %s: %s", > > + term_state_file, strerror(errno)); > > > > saAmfResponse(smfnd_cb->amf_hdl, invocation, SA_AIS_OK); > > ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
