Hi Mathi,

Ack, code review only. Just a few comments, there is no guarantee that killproc 
 delivering a sigterm  to the component will succed, i.e the
component may still be running afterwards.  When the component exits, mds sends 
an avaDown but the component has already been removed
from the cb->compdb, perhaps amf could keep track of when also the component 
has exit by using the avaDown messages and keep the 
component in the cb->compdb a bit longer? Or use some retry logic in the script 
around the killproc.

/Thanks HansN

-----Original Message-----
From: [email protected] [mailto:[email protected]] 
Sent: den 6 maj 2015 16:47
To: Anders Widell; [email protected]; Hans Nordebäck; 
[email protected]; [email protected]
Cc: [email protected]
Subject: [PATCH 1 of 1] osaf: During adminrestart of node directors, before 
re-instantiating kill them [#1326]

 osaf/services/saf/cpsv/cpnd/cpnd_amf.c              |  14 +++++++++++++-
 osaf/services/saf/cpsv/cpnd/scripts/osaf-ckptnd.in  |  13 +++++++++++++
 osaf/services/saf/glsv/glnd/glnd_amf.c              |  14 ++++++++++++++
 osaf/services/saf/glsv/glnd/scripts/osaf-lcknd.in   |  13 +++++++++++++
 osaf/services/saf/immsv/immnd/immnd_amf.c           |  11 +++++++++++
 osaf/services/saf/immsv/immnd/scripts/osaf-immnd.in |  18 ++++++++++++++++++
 osaf/services/saf/mqsv/mqnd/mqnd_amf.c              |  11 +++++++++++
 osaf/services/saf/mqsv/mqnd/scripts/osaf-msgnd.in   |  13 +++++++++++++
 osaf/services/saf/smfsv/smfnd/scripts/osaf-smfnd.in |  13 +++++++++++++
 osaf/services/saf/smfsv/smfnd/smfnd_amf.c           |  11 +++++++++++
 10 files changed, 130 insertions(+), 1 deletions(-)


The command $ amf-adm restart <DN name> is one way of administratively 
restarting an AMF component.
As apart of this admin operation, AMF sends the component terminate callback to 
the PI components. It is up to the component to release all its resources and 
respond to AMF the status of its self-termination before exiting (typically) 
the process itself.
After receiving the response from the component, AMF invokes the instantiation 
script of the component. During this time, it is possible that the previously 
running instance of the process (of this component) has not yet exited. This 
situation when there is already a running daemon/process and now a new 
instantiation is being attempted can cause the instantiation script to return 
failure.
This patch creates temporary term_state_file from inside the component 
terminate callback of the node directors.
In the instantiation scripts, a check is done to distinguish a a fresh 
instantiation versus an instantiation after a termination.
If the term_state_file exists then it means, its an instantiation after 
termination.
If so, just attempt to kill (using killproc) the process again before calling 
start_daemon.

Note: There has been mention of using start_daemon -f option which will create 
another copy of the daemon if the previous daemon is still running. Using this 
option may not be ideal for us as it can create any inconsistency between the 
two daemons when using any resources and also, there is no proof or 
documentation of start_daemon -f working successfully. This is even more 
significant given that some distros are really slow in becoming LSB compliant, 
particularly the start_daemon and the likes of it.

diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_amf.c 
b/osaf/services/saf/cpsv/cpnd/cpnd_amf.c
--- a/osaf/services/saf/cpsv/cpnd/cpnd_amf.c
+++ b/osaf/services/saf/cpsv/cpnd/cpnd_amf.c
@@ -35,7 +35,9 @@
 ******************************************************************************/
 
 #include "cpnd.h"
+#include "configmake.h"
 
+static const char *term_state_file = PKGPIDDIR "/osafckptnd_termstate";
 /****************************************************************************
  * Name          : cpnd_saf_health_chk_callback
  *
@@ -232,13 +234,23 @@ void cpnd_amf_comp_terminate_callback(Sa
 {
        CPND_CB *cb = NULL;
        SaAisErrorT saErr = SA_AIS_OK;
+       int fd;
+       TRACE_ENTER();
 
-       TRACE_ENTER();
        cb = ncshm_take_hdl(NCS_SERVICE_ID_CPND, gl_cpnd_cb_hdl);
        if (cb == NULL) {
                LOG_ER("cpnd cb take handle failed in amf term callback");
                return;
        }
+
+       fd = open(term_state_file, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR);
+
+       if (fd >=0)
+               (void)close(fd);
+       else
+               LOG_NO("cannot create termstate file %s: %s",
+                                       term_state_file, strerror(errno));
+
        saAmfResponse(cb->amf_hdl, invocation, saErr);
        ncshm_give_hdl(gl_cpnd_cb_hdl);
        LOG_NO("Received AMF component terminate callback, exiting"); diff 
--git a/osaf/services/saf/cpsv/cpnd/scripts/osaf-ckptnd.in 
b/osaf/services/saf/cpsv/cpnd/scripts/osaf-ckptnd.in
--- a/osaf/services/saf/cpsv/cpnd/scripts/osaf-ckptnd.in
+++ b/osaf/services/saf/cpsv/cpnd/scripts/osaf-ckptnd.in
@@ -30,10 +30,21 @@ fi
 binary=$pkglibdir/$prog
 pidfile=$pkgpiddir/$prog.pid
 lockfile=$lockdir/$initscript
+termfile=$pkgpiddir/$prog"_termstate"
 
 RETVAL=0
 
 start() {
+       #If the term file exists, it means instantiation is
+       #attempted after a termination For eg:- during administrative
+       #restart of a component. In this case, first try to kill
+       #the component since it might be seen as still running while exiting
+       #via the termination callback or termination scripts(in case of NPI).
+       #Note: start_daemon -f may also be used to create another copy of the 
daemon,
+       #but the behaviour of -f option has not been tested yet! 
+
+       [ -e $termfile ] && killproc -p $pidfile $binary
+
        export LD_LIBRARY_PATH=$pkglibdir:$LD_LIBRARY_PATH
        [ -x $binary ] || exit 5
        echo -n "Starting $prog: "
@@ -41,6 +52,7 @@ start() {
        RETVAL=$?
        if [ $RETVAL -eq 0 ]; then
                touch $lockfile
+               rm -f $termfile
                log_success_msg
        else
                log_failure_msg
@@ -55,6 +67,7 @@ stop() {
        if [ $RETVAL -eq 0 ] || [ $RETVAL -eq 7 ]; then
                rm -f $lockfile
                log_success_msg
+               rm -f $termfile
                RETVAL=0
        else
                log_failure_msg
diff --git a/osaf/services/saf/glsv/glnd/glnd_amf.c 
b/osaf/services/saf/glsv/glnd/glnd_amf.c
--- a/osaf/services/saf/glsv/glnd/glnd_amf.c
+++ b/osaf/services/saf/glsv/glnd/glnd_amf.c
@@ -35,6 +35,8 @@
 ******************************************************************************/
 
 #include "glnd.h"
+#include "configmake.h"
+
 void glnd_amf_comp_terminate_callback(SaInvocationT invocation, const SaNameT 
*compName);  void glnd_saf_health_chk_callback(SaInvocationT invocation,
                                  const SaNameT *compName, const 
SaAmfHealthcheckKeyT *checkType); @@ -45,6 +47,7 @@ void 
glnd_amf_CSI_set_callback(SaInvocat
 void glnd_amf_csi_rmv_callback(SaInvocationT invocation,
                               const SaNameT *compName, const SaNameT *csiName, 
SaAmfCSIFlagsT csiFlags);
 
+static const char *term_state_file = PKGPIDDIR "/osaflcknd_termstate";
 /****************************************************************************
  * Name          : glnd_saf_health_chk_callback
  *
@@ -114,17 +117,28 @@ void glnd_amf_comp_terminate_callback(Sa
        GLND_CB *glnd_cb;
        SaAisErrorT error = SA_AIS_OK;
        TRACE_ENTER2("Component Name: %s", compName->value);
+       int fd;
 
        /* take the handle */
        glnd_cb = (GLND_CB *)m_GLND_TAKE_GLND_CB;
        if (!glnd_cb) {
                LOG_ER("GLND cb take handle failed");
        } else {
+
+               fd = open(term_state_file, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR);
+
+               if (fd >=0)
+                       (void)close(fd);
+               else
+                       LOG_NO("cannot create termstate file %s: %s",
+                                       term_state_file, strerror(errno));
+
                if (saAmfResponse(glnd_cb->amf_hdl, invocation, error) != 
SA_AIS_OK)
                        LOG_ER("GLND amf response failed");
                /* giveup the handle */
                m_GLND_GIVEUP_GLND_CB;
        }
+
        LOG_NO("Received AMF component terminate callback, exiting");
        TRACE_LEAVE();
 
diff --git a/osaf/services/saf/glsv/glnd/scripts/osaf-lcknd.in 
b/osaf/services/saf/glsv/glnd/scripts/osaf-lcknd.in
--- a/osaf/services/saf/glsv/glnd/scripts/osaf-lcknd.in
+++ b/osaf/services/saf/glsv/glnd/scripts/osaf-lcknd.in
@@ -30,10 +30,21 @@ fi
 binary=$pkglibdir/$prog
 pidfile=$pkgpiddir/$prog.pid
 lockfile=$lockdir/$initscript
+termfile=$pkgpiddir/$prog"_termstate"
 
 RETVAL=0
 
 start() {
+       #If the term file exists, it means instantiation is
+       #attempted after a termination For eg:- during administrative
+       #restart of a component. In this case, first try to kill
+       #the component since it might be seen as still running while exiting
+       #via the termination callback or termination scripts(in case of NPI).
+       #Note: start_daemon -f may also be used to create another copy of the 
daemon,
+       #but the behaviour of -f option has not been tested yet! 
+
+       [ -e $termfile ] && killproc -p $pidfile $binary
+
        export LD_LIBRARY_PATH=$pkglibdir:$LD_LIBRARY_PATH
        [ -x $binary ] || exit 5
        echo -n "Starting $prog: "
@@ -41,6 +52,7 @@ start() {
        RETVAL=$?
        if [ $RETVAL -eq 0 ]; then
                touch $lockfile
+               rm -f $termfile
                log_success_msg
        else
                log_failure_msg
@@ -55,6 +67,7 @@ stop() {
        if [ $RETVAL -eq 0 ] || [ $RETVAL -eq 7 ]; then
                rm -f $lockfile
                log_success_msg
+               rm -f $termfile
                RETVAL=0
        else
                log_failure_msg
diff --git a/osaf/services/saf/immsv/immnd/immnd_amf.c 
b/osaf/services/saf/immsv/immnd/immnd_amf.c
--- a/osaf/services/saf/immsv/immnd/immnd_amf.c
+++ b/osaf/services/saf/immsv/immnd/immnd_amf.c
@@ -18,7 +18,9 @@
 #include "immnd.h"
 #include <nid_start_util.h>
 #include "osaf_extended_name.h"
+#include "configmake.h"
 
+static const char *term_state_file = PKGPIDDIR "/osafimmnd_termstate";
 /****************************************************************************
  * Name          : immnd_saf_health_chk_callback
  *
@@ -73,12 +75,21 @@ static void immnd_saf_health_chk_callbac  static void 
immnd_amf_comp_terminate_callback(SaInvocationT invocation, const SaNameT 
*compName)  {
        TRACE_ENTER();
+       int fd;
        
        if (immnd_cb->pbePid > 0)
                kill(immnd_cb->pbePid, SIGTERM);
        if (immnd_cb->syncPid > 0)
                kill(immnd_cb->syncPid, SIGTERM);
 
+       fd = open(term_state_file, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR);
+       
+       if (fd >=0)
+               (void)close(fd);
+       else
+               LOG_NO("cannot create termstate file %s: %s",
+                                       term_state_file, strerror(errno));
+
        LOG_NO("Received AMF component terminate callback, exiting");
        saAmfResponse(immnd_cb->amf_hdl, invocation, SA_AIS_OK);
 
diff --git a/osaf/services/saf/immsv/immnd/scripts/osaf-immnd.in 
b/osaf/services/saf/immsv/immnd/scripts/osaf-immnd.in
--- a/osaf/services/saf/immsv/immnd/scripts/osaf-immnd.in
+++ b/osaf/services/saf/immsv/immnd/scripts/osaf-immnd.in
@@ -31,12 +31,17 @@ fi
 binary=$pkglibdir/$prog
 pidfile=$pkgpiddir/$prog.pid
 lockfile=$lockdir/$initscript
+termfile=$pkgpiddir/$prog"_termstate"
 
 RETVAL=0
 NIDSERV="IMMND"
 COMPNAMEFILE=$pkglocalstatedir/immnd_comp_name
 
 start() {
+       # remove any termination file created previously via
+       # AMF component termination callback
+       rm -f $termfile
+
        export LD_LIBRARY_PATH=$pkglibdir:$LD_LIBRARY_PATH
        [ -p $NIDFIFO ] || exit 1
         if [ ! -x $binary ]; then
@@ -58,6 +63,16 @@ start() {
 }
 
 instantiate() {
+       #If the term file exists, it means instantiation is
+       #attempted after a termination For eg:- during administrative
+       #restart of a component. In this case, first try to kill
+       #the component since it might be seen as still running while exiting
+       #via the termination callback or termination scripts(in case of NPI).
+       #Note: start_daemon -f may also be used to create another copy of the 
daemon,
+       #but the behaviour of -f option has not been tested yet! 
+
+       [ -e $termfile ] && killproc -p $pidfile $binary
+
        echo -n "AMF Instantiating $prog: "
        echo $SA_AMF_COMPONENT_NAME > $COMPNAMEFILE
        pidofproc -p $pidfile $binary
@@ -71,9 +86,11 @@ instantiate() {
        fi
        if [ $RETVAL -eq 0 ]; then
                log_success_msg
+               rm -f $termfile
        else
                log_failure_msg
        fi
+
        return $RETVAL
 }
 
@@ -86,6 +103,7 @@ stop() {
        if [ $RETVAL -eq 0 ] || [ $RETVAL -eq 7 ]; then
                rm -f $lockfile
                rm -f $COMPNAMEFILE
+               rm -f $termfile
                log_success_msg
                RETVAL=0
        else
diff --git a/osaf/services/saf/mqsv/mqnd/mqnd_amf.c 
b/osaf/services/saf/mqsv/mqnd/mqnd_amf.c
--- a/osaf/services/saf/mqsv/mqnd/mqnd_amf.c
+++ b/osaf/services/saf/mqsv/mqnd/mqnd_amf.c
@@ -35,6 +35,7 @@
 ******************************************************************************/
 
 #include "mqnd.h"
+#include "configmake.h"
 
 static void mqnd_saf_health_chk_callback(SaInvocationT invocation, const 
SaNameT *compName,
                                         SaAmfHealthcheckKeyT *checkType); @@ 
-47,6 +48,7 @@ static void mqnd_amf_CSI_set_callback(Sa
                                      const SaNameT *compName,
                                      SaAmfHAStateT haState, 
SaAmfCSIDescriptorT csiDescriptor);
 
+static const char *term_state_file = PKGPIDDIR "/osafmsgnd_termstate";
 /****************************************************************************
  * Name          : mqnd_saf_health_chk_callback
  *
@@ -227,6 +229,7 @@ static void mqnd_amf_comp_terminate_call
        TRACE_ENTER();
 
        uint32_t cb_hdl = m_MQND_GET_HDL();
+       int fd;
 
        /* Get the CB from the handle */
        mqnd_cb = ncshm_take_hdl(NCS_SERVICE_ID_MQND, cb_hdl); @@ -236,6 
+239,14 @@ static void mqnd_amf_comp_terminate_call
                return;
        }
 
+       fd = open(term_state_file, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR);
+
+       if (fd >=0)
+               (void)close(fd);
+       else
+               LOG_NO("cannot create termstate file %s: %s",
+                                       term_state_file, strerror(errno));
+
        saAmfResponse(mqnd_cb->amf_hdl, invocation, saErr);
        LOG_ER("Amf Terminate Callback called");
 
diff --git a/osaf/services/saf/mqsv/mqnd/scripts/osaf-msgnd.in 
b/osaf/services/saf/mqsv/mqnd/scripts/osaf-msgnd.in
--- a/osaf/services/saf/mqsv/mqnd/scripts/osaf-msgnd.in
+++ b/osaf/services/saf/mqsv/mqnd/scripts/osaf-msgnd.in
@@ -30,10 +30,21 @@ fi
 binary=$pkglibdir/$prog
 pidfile=$pkgpiddir/$prog.pid
 lockfile=$lockdir/$initscript
+termfile=$pkgpiddir/$prog"_termstate"
 
 RETVAL=0
 
 start() {
+       #If the term file exists, it means instantiation is
+       #attempted after a termination For eg:- during administrative
+       #restart of a component. In this case, first try to kill
+       #the component since it might be seen as still running while exiting
+       #via the termination callback or termination scripts(in case of NPI).
+       #Note: start_daemon -f may also be used to create another copy of the 
daemon,
+       #but the behaviour of -f option has not been tested yet! 
+
+       [ -e $termfile ] && killproc -p $pidfile $binary
+
        export LD_LIBRARY_PATH=$pkglibdir:$LD_LIBRARY_PATH
        [ -x $binary ] || exit 5
        echo -n "Starting $prog: "
@@ -42,6 +53,7 @@ start() {
        if [ $RETVAL -eq 0 ]; then
                touch $lockfile
                log_success_msg
+               rm -f $termfile
        else
                log_failure_msg
        fi
@@ -55,6 +67,7 @@ stop() {
        if [ $RETVAL -eq 0 ] || [ $RETVAL -eq 7 ]; then
                rm -f $lockfile
                log_success_msg
+               rm -f $termfile
                RETVAL=0
        else
                log_failure_msg
diff --git a/osaf/services/saf/smfsv/smfnd/scripts/osaf-smfnd.in 
b/osaf/services/saf/smfsv/smfnd/scripts/osaf-smfnd.in
--- a/osaf/services/saf/smfsv/smfnd/scripts/osaf-smfnd.in
+++ b/osaf/services/saf/smfsv/smfnd/scripts/osaf-smfnd.in
@@ -30,10 +30,21 @@ fi
 binary=$pkglibdir/$prog
 pidfile=$pkgpiddir/$prog.pid
 lockfile=$lockdir/$initscript
+termfile=$pkgpiddir/$prog"_termstate"
 
 RETVAL=0
 
 start() {
+       #If the term file exists, it means instantiation is
+       #attempted after a termination For eg:- during administrative
+       #restart of a component. In this case, first try to kill
+       #the component since it might be seen as still running while exiting
+       #via the termination callback or termination scripts(in case of NPI).
+       #Note: start_daemon -f may also be used to create another copy of the 
daemon,
+       #but the behaviour of -f option has not been tested yet! 
+
+       [ -e $termfile ] && killproc -p $pidfile $binary
+
        export LD_LIBRARY_PATH=$pkglibdir:$LD_LIBRARY_PATH
        [ -x $binary ] || exit 5
        echo -n "Starting $prog: "
@@ -42,6 +53,7 @@ start() {
        if [ $RETVAL -eq 0 ]; then
                touch $lockfile
                log_success_msg
+               rm -f $termfile
        else
                log_failure_msg
        fi
@@ -55,6 +67,7 @@ stop() {
        if [ $RETVAL -eq 0 ] || [ $RETVAL -eq 7 ]; then
                rm -f $lockfile
                log_success_msg
+               rm -f $termfile
                RETVAL=0
        else
                log_failure_msg
diff --git a/osaf/services/saf/smfsv/smfnd/smfnd_amf.c 
b/osaf/services/saf/smfsv/smfnd/smfnd_amf.c
--- a/osaf/services/saf/smfsv/smfnd/smfnd_amf.c
+++ b/osaf/services/saf/smfsv/smfnd/smfnd_amf.c
@@ -20,7 +20,9 @@
  */
 
 #include "smfnd.h"
+#include "configmake.h"
 
+static const char *term_state_file = PKGPIDDIR "/osafsmfnd_termstate";
 /****************************************************************************
  * Name          : amf_health_chk_callback
  *
@@ -107,6 +109,15 @@ static void amf_csi_set_callback(SaInvoc  static void 
amf_comp_terminate_callback(SaInvocationT invocation, const SaNameT * compName) 
 {
        TRACE_ENTER();
+       int fd;
+
+       fd = open(term_state_file, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR);
+       
+       if (fd >=0)
+               (void)close(fd);
+       else
+               LOG_NO("cannot create termstate file %s: %s",
+                                       term_state_file, strerror(errno));
 
        saAmfResponse(smfnd_cb->amf_hdl, invocation, SA_AIS_OK);
 

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to