You are right. I have given up on that. 
However,  alternatively What do you think if we move this 
opensaf_reboot_prepare() inside daemonize and 
call it for programs whose basename matches "fmd", "amfwd" and "clmd" ?

Thanks,
Mathi.

> -----Original Message-----
> From: Anders Widell [mailto:[email protected]]
> Sent: Friday, June 14, 2013 5:17 PM
> To: Mathivanan Naickan Palanivelu
> Cc: Hans Feldt; [email protected]
> Subject: Re: [devel] [PATCH 1 of 1] osaf: Add time supervision of
> opensaf_reboot [#437]
> 
> The problem with using capabilities is that we would need to use either libcap
> or libcap-ng, neither of which is included in LSB.
> 
> regards,
> Anders Widell
> 
> On 2013-06-13 17:39, Mathivanan Naickan Palanivelu wrote:
> > I'm playing with CAP_KILL, CAP_SYS_BOOT and PR_SET_KEEPCAP.
> > Will get back on this patch tomorrow.
> >
> > Cheers.
> >
> >> -----Original Message-----
> >> From: Mathivanan Naickan Palanivelu
> >> Sent: Thursday, June 13, 2013 8:53 PM
> >> To: Anders Widell; Hans Feldt
> >> Cc: [email protected]
> >> Subject: RE: [devel] [PATCH 1 of 1] osaf: Add time supervision of
> >> opensaf_reboot [#437]
> >>
> >> I was also looking at CAP_SYS_ADMIN(alternative for
> >> opensaf_reboot_prepare()), as an option until I ran into
> >> https://lwn.net/Articles/486306/!
> >> CAP_SYS_ADMIN would make us vulnerable.
> >>
> >> Cheers,
> >> Mathi.
> >>
> >>
> >>> -----Original Message-----
> >>> From: Anders Widell [mailto:[email protected]]
> >>> Sent: Tuesday, June 11, 2013 1:56 PM
> >>> To: Hans Feldt
> >>> Cc: [email protected]
> >>> Subject: Re: [devel] [PATCH 1 of 1] osaf: Add time supervision of
> >>> opensaf_reboot [#437]
> >>>
> >>> Maybe I should also point out that the case of getting file
> >>> descriptors 0, 1 or 2 is not just a hypothetical scenario that I
> >>> have dreamed up - it actually happens. The code will not work without
> the retry.
> >>>
> >>> I will try to make the code comments more clear, maybe mention the
> >>> daemonize() function instead of just referring to "dropping root
> privileges".
> >>>
> >>> regards,
> >>> Anders Widell
> >>>
> >>> On 2013-06-10 17:04, Anders Widell wrote:
> >>>> See comments below.
> >>>>
> >>>> regards,
> >>>> Anders Widell
> >>>>
> >>>> On 2013-06-10 15:47, Hans Feldt wrote:
> >>>>> Why is not opensaf_reboot_prepare() called from all contexts?
> >>>> What do you mean by all contexts? It is called by amfwd since it
> >>>> needs to reboot the local node without running as root. As I said
> >>>> in the review mail (but maybe also should go into the commit
> >>>> message), amfd and fmd can simply _Exit() to reboot the local node.
> >>>> This can be a separate enhancement ticket, since it works already
> >>>> now
> >>>> (opensaf_reboot() will exit when the timer has expired).
> >>>>> I think the implementation of opensaf_reboot_prepare() requires
> >>>>> some comments since it does recursion. I think I understand it but
> >>>>> it is just a little to clever to be uncommented...
> >>>> I did put a comment just at the point of recursive call, but maybe
> >>>> it wasn't clear enough? :-) Basically, I don't want to get file
> >>>> descriptors 0, 1 or 2. So if I do get one of those I try again.
> >>>>> Why is opensaf_reboot_prepare() called before daemonize()? I guess
> >>>>> that should be commented since it is probably important.
> >>>> The comment for opensaf_reboot_prepare() says that it must be
> >>>> called before dropping root privileges. daemonize() is the function
> >>>> that drops root privileges, so I think it is fairly clear why it is
> >>>> called before daemonize().
> >>>>> Thanks,
> >>>>> Hans
> >>>>>
> >>>>>
> >>>>> On 06/10/2013 12:50 PM, Anders Widell wrote:
> >>>>>> 00-README.conf                                   |   5 +
> >>>>>>     osaf/libs/core/include/ncssysf_def.h             |  20 ++++-
> >>>>>>     osaf/libs/core/leap/sysf_def.c                   |  93
> >>>>>> ++++++++++++++++++++++-
> >>>>>>     osaf/services/infrastructure/nid/config/nid.conf |   6 +
> >>>>>>     osaf/services/saf/avsv/amfwdog/amf_wdog.c        |   1 +
> >>>>>>     scripts/opensaf_reboot                           |  10 ++-
> >>>>>>     6 files changed, 126 insertions(+), 9 deletions(-)
> >>>>>>
> >>>>>>
> >>>>>> Add a time supervision of the library function opensaf_reboot()
> >>>>>> as well as the shell script opensaf_reboot. If the reboot has not
> >>>>>> happened before the timeout, the OS is rebooted hard using the
> >>>>>> SysRq trigger /proc/sysrq-trigger.
> >>>>>> This makes
> >>>>>> it possible to reboot the node also when the system is in a very
> >>>>>> bad state, for example when fork() fails because the system is
> >>>>>> out of resources (no free memory, process table full etc.).  It
> >>>>>> also handles the case when the ordinary reboot command hangs
> >>>>>> trying to sync the file system, for example due to a disk or NFS
> problem.
> >>>>>>
> >>>>>> diff --git a/00-README.conf b/00-README.conf
> >>>>>> --- a/00-README.conf
> >>>>>> +++ b/00-README.conf
> >>>>>> @@ -52,6 +52,11 @@ group/user.
> >>>>>>
> >>>>>>     - Use of MDS subslot ID needs to be enabled, add
> >>>>>> TIPC_USE_SUBSLOT_ID=YES
> >>>>>>
> >>>>>> +- Time supervision of local node reboot should be disabled or
> >>>>>> changed.  Change
> >>>>>> +  OPENSAF_REBOOT_TIMEOUT to the desired number of seconds
> >>> before a
> >>>>>> reboot is
> >>>>>> +  escalated to an immediate reboot via the SysRq interface, or
> >>>>>> + zero
> >>>>>> to disable
> >>>>>> +  this feature.
> >>>>>> +
> >>>>>>
> >>
> **********************************************************
> >>> *********************
> >>>>>>     nodeinit.conf
> >>>>>>
> >>>>>> diff --git a/osaf/libs/core/include/ncssysf_def.h
> >>>>>> b/osaf/libs/core/include/ncssysf_def.h
> >>>>>> --- a/osaf/libs/core/include/ncssysf_def.h
> >>>>>> +++ b/osaf/libs/core/include/ncssysf_def.h
> >>>>>> @@ -83,7 +83,25 @@ extern "C" {
> >>>>>>     #define m_START_CRITICAL m_NCS_OS_START_TASK_LOCK
> >>>>>>     #define m_END_CRITICAL                 m_NCS_OS_END_TASK_LOCK
> >>>>>>
> >>>>>> -extern void opensaf_reboot(unsigned int node_id, char *ee_name,
> >>>>>> const char *reason);
> >>>>>> +/**
> >>>>>> + *  Prepare for a future call to opensaf_reboot() by opening the
> >>>>>> necessary
> >>>>>> + *  file (/proc/sysrq-trigger). Call this function before
> >>>>>> + dropping root
> >>>>>> + *  privileges, if you later intend to call opensaf_reboot() to
> >>>>>> reboot the local
> >>>>>> + *  node without having root privileges.
> >>>>>> + */
> >>>>>> +void opensaf_reboot_prepare(void);
> >>>>>> +
> >>>>>> +/**
> >>>>>> + *  Reboot a node. Call this function with @a node_id zero to
> >>>>>> +reboot
> >>>>>> the local
> >>>>>> + *  node. If you intend to use this function to reboot the local
> >>>>>> node without
> >>>>>> + *  having root privileges, you must first call
> >>>>>> opensaf_reboot_prepare() before
> >>>>>> + *  dropping root privileges.
> >>>>>> + *
> >>>>>> + *  Note that this function uses the configuration option
> >>>>>> OPENSAF_REBOOT_TIMEOUT
> >>>>>> + *  in nid.conf. Therefore, this function must only be called
> >>>>>> + from
> >>>>>> services
> >>>>>> + *  that are started by NID.
> >>>>>> + */
> >>>>>> +void opensaf_reboot(unsigned node_id, const char* ee_name,
> const
> >>>>>> char* reason);
> >>>>>>
> >>>>>>
> >>
> /**********************************************************
> >>> *********
> >>>>>> **********
> >>>>>> ** **
> >>>>>> diff --git a/osaf/libs/core/leap/sysf_def.c
> >>>>>> b/osaf/libs/core/leap/sysf_def.c
> >>>>>> --- a/osaf/libs/core/leap/sysf_def.c
> >>>>>> +++ b/osaf/libs/core/leap/sysf_def.c
> >>>>>> @@ -26,7 +26,17 @@
> >>>>>>
> >>>>>>     #include <configmake.h>
> >>>>>>
> >>>>>> -#include <ncsgl_defs.h>
> >>>>>> +#include <stdio.h>
> >>>>>> +#include <errno.h>
> >>>>>> +#include <stdlib.h>
> >>>>>> +#include <stdbool.h>
> >>>>>> +#include <sys/stat.h>
> >>>>>> +#include <fcntl.h>
> >>>>>> +#include <unistd.h>
> >>>>>> +#include <signal.h>
> >>>>>> +#include <syslog.h>
> >>>>>> +#include "ncs_main_papi.h"
> >>>>>> +#include "ncsgl_defs.h"
> >>>>>>     #include "ncs_osprm.h"
> >>>>>>
> >>>>>>     #include "ncs_svd.h"
> >>>>>> @@ -38,6 +48,7 @@
> >>>>>>     #include "sysf_exc_scr.h"
> >>>>>>     #include "usrbuf.h"
> >>>>>>
> >>>>>> +static int sysrq_trigger_fd = -1;
> >>>>>>
> >>>>>>
> >>
> /**********************************************************
> >>> *********
> >>>>>> **********
> >>>>>>
> >>>>>> @@ -271,20 +282,88 @@ uint32_t leap_env_destroy()
> >>>>>>         return NCSCC_RC_SUCCESS;
> >>>>>>     }
> >>>>>>
> >>>>>> +void opensaf_reboot_prepare(void) {
> >>>>>> +    if (sysrq_trigger_fd != -1) return;
> >>>>>> +    int fd;
> >>>>>> +    do {
> >>>>>> +        fd = open("/proc/sysrq-trigger", O_WRONLY);
> >>>>>> +    } while (fd == -1 && errno == EINTR);
> >>>>>> +    if (fd >= 0 && fd <= 2) {
> >>>>>> +        /* We don't want to get file descriptors 0, 1 or 2 because:
> >>>>>> +         *   1) it would be dangerous
> >>>>>> +         *   2) it would by closed by deamonize()
> >>>>>> +         */
> >>>>>> +        opensaf_reboot_prepare();
> >>>>>> +        close(fd);
> >>>>>> +    } else {
> >>>>>> +        sysrq_trigger_fd = fd;
> >>>>>> +    }
> >>>>>> +}
> >>>>>> +
> >>>>>> +static void opensaf_reboot_fallback(int sig_no) {
> >>>>>> +    (void) sig_no;
> >>>>>> +    if (sysrq_trigger_fd == -1) {
> >>>>>> +        do {
> >>>>>> +            sysrq_trigger_fd = open("/proc/sysrq-trigger", O_WRONLY);
> >>>>>> +        } while (sysrq_trigger_fd == -1 && errno == EINTR);
> >>>>>> +    }
> >>>>>> +    if (sysrq_trigger_fd != -1) {
> >>>>>> +        char buf[] = {'b'};
> >>>>>> +        ssize_t result;
> >>>>>> +        do {
> >>>>>> +            result = write(sysrq_trigger_fd, buf, sizeof(buf));
> >>>>>> +        } while (result == -1 && errno == EINTR);
> >>>>>> +    }
> >>>>>> +    _Exit(EXIT_SUCCESS);
> >>>>>> +}
> >>>>>> +
> >>>>>>     /**
> >>>>>>      *
> >>>>>>      * @param reason
> >>>>>>      */
> >>>>>> -void opensaf_reboot(unsigned int node_id, char *ee_name, const
> >>>>>> char
> >>>>>> *reason)
> >>>>>> +void opensaf_reboot(unsigned node_id, const char* ee_name,
> const
> >>>>>> char* reason)
> >>>>>>     {
> >>>>>> +    char* env_var = getenv("OPENSAF_REBOOT_TIMEOUT");
> >>>>>> +    unsigned long supervision_time = 0;
> >>>>>> +    if (env_var != NULL) {
> >>>>>> +        char* endptr;
> >>>>>> +        errno = 0;
> >>>>>> +        supervision_time = strtoul(env_var, &endptr, 0);
> >>>>>> +        if (errno != 0 || *env_var == '\0' || *endptr != '\0') {
> >>>>>> +            supervision_time = 0;
> >>>>>> +        }
> >>>>>> +    }
> >>>>>> +
> >>>>>> +    unsigned own_node_id = ncs_get_node_id();
> >>>>>> +    bool use_fallback = supervision_time > 0 && (node_id == 0 ||
> >>>>>> node_id ==
> >>>>>> +        own_node_id);
> >>>>>> +    if (use_fallback) {
> >>>>>> +        if (signal(SIGALRM, opensaf_reboot_fallback) == SIG_ERR) {
> >>>>>> +            opensaf_reboot_fallback(0);
> >>>>>> +        }
> >>>>>> +        alarm(supervision_time);
> >>>>>> +    }
> >>>>>> +
> >>>>>> +    syslog(LOG_CRIT,
> >>>>>> +        "Rebooting OpenSAF NodeId = %u EE Name = %s, Reason: %s,
> "
> >>>>>> +        "OwnNodeId = %u, SupervisionTime = %lu",
> >>>>>> +        node_id, ee_name == NULL ? "No EE Mapped" : ee_name,
> >> reason,
> >>>>>> +        own_node_id, supervision_time);
> >>>>>>
> >>>>>>         char str[256];
> >>>>>> -    memset(str,0,256);
> >>>>>> +    snprintf(str, sizeof(str), PKGLIBDIR "/opensaf_reboot %u
> >>>>>> + %s",
> >>>>>> node_id,
> >>>>>> +        ee_name == NULL ? "" : ee_name);
> >>>>>> +    int reboot_result = system(str);
> >>>>>> +    if (reboot_result != EXIT_SUCCESS) {
> >>>>>> +            syslog(LOG_CRIT, "node reboot failure: exit code %d",
> >>>>>> +            reboot_result);
> >>>>>> +    }
> >>>>>>
> >>>>>> -    snprintf(str,255,PKGLIBDIR"/opensaf_reboot %d
> >>>>>> %s\n",node_id,((ee_name == NULL)?"":ee_name));
> >>>>>> -    syslog(LOG_CRIT,"Rebooting OpenSAF NodeId = %d EE Name =
> %s,
> >>>>>> Reason: %s\n",node_id,((ee_name == NULL)? "No EE
> >>>>>> Mapped":ee_name),reason);
> >>>>>> -    if(system(str) == -1){
> >>>>>> -            syslog(LOG_CRIT, "node reboot failure!");
> >>>>>> +    if (use_fallback) {
> >>>>>> +        /* Wait for the alarm signal we set up earlier. */
> >>>>>> +        for (;;) pause();
> >>>>>>         }
> >>>>>>     }
> >>>>>>
> >>>>>> diff --git a/osaf/services/infrastructure/nid/config/nid.conf
> >>>>>> b/osaf/services/infrastructure/nid/config/nid.conf
> >>>>>> --- a/osaf/services/infrastructure/nid/config/nid.conf
> >>>>>> +++ b/osaf/services/infrastructure/nid/config/nid.conf
> >>>>>> @@ -23,6 +23,12 @@ OPENSAF_MANAGE_TIPC="yes"
> >>>>>>     # Specifies how long "opensafd stop" should wait before stop
> >>>>>> has considered to fail
> >>>>>>     OPENSAF_TERMTIMEOUT=60
> >>>>>>
> >>>>>> +# Number of seconds before a reboot is escalated to an immediate
> >>>>>> reboot via the
> >>>>>> +# SysRq interface /proc/sysrq-trigger.  Comment it out or set it
> >>>>>> +to
> >>>>>> zero to
> >>>>>> +# disable this feature.  Note that you must make sure the kernel
> >>>>>> allows reboot
> >>>>>> +# via SysRq for this feature to work.
> >>>>>> +export OPENSAF_REBOOT_TIMEOUT=60
> >>>>>> +
> >>>>>>     # Specify the UNIX group and user OpenSAF run as
> >>>>>>     export OPENSAF_GROUP=opensaf
> >>>>>>     export OPENSAF_USER=opensaf
> >>>>>> diff --git a/osaf/services/saf/avsv/amfwdog/amf_wdog.c
> >>>>>> b/osaf/services/saf/avsv/amfwdog/amf_wdog.c
> >>>>>> --- a/osaf/services/saf/avsv/amfwdog/amf_wdog.c
> >>>>>> +++ b/osaf/services/saf/avsv/amfwdog/amf_wdog.c
> >>>>>> @@ -137,6 +137,7 @@ int main(int argc, char *argv[])
> >>>>>>         SaAmfHealthcheckKeyT hc_key;
> >>>>>>         char *hc_key_env;
> >>>>>>
> >>>>>> +    opensaf_reboot_prepare();
> >>>>>>         daemonize(argc, argv);
> >>>>>>
> >>>>>>         ava_install_amf_down_cb(amf_down_cb);
> >>>>>> diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot
> >>>>>> --- a/scripts/opensaf_reboot
> >>>>>> +++ b/scripts/opensaf_reboot
> >>>>>> @@ -67,7 +67,15 @@ if [ "$self_node_id" = "$node_id" ] || [
> >>>>>>         # uncomment the following line if debugging errors that
> >>>>>> keep restarting the node
> >>>>>>         # exit 0
> >>>>>>
> >>>>>> -    logger -t "opensaf_reboot" "Rebooting local node"
> >>>>>> +    logger -t "opensaf_reboot" "Rebooting local node;
> >>>>>> timeout=$OPENSAF_REBOOT_TIMEOUT"
> >>>>>> +
> >>>>>> +    # Start a reboot supervision background process. Note that a
> >>>>>> similar
> >>>>>> +    # supervision is also done in the opensaf_reboot() function
> >>>>>> + in
> >>>>>> LEAP.
> >>>>>> +    # However, that supervision may be stopped by one of the
> >>>>>> + pkill
> >>>>>> commands
> >>>>>> +    # below, if it was called from AMF or FM.
> >>>>>> +    if [ "${OPENSAF_REBOOT_TIMEOUT}0" -gt "0" ]; then
> >>>>>> +        (sleep "$OPENSAF_REBOOT_TIMEOUT"; echo -n "b" >
> >>>>>> "/proc/sysrq-trigger") &
> >>>>>> +    fi
> >>>>>>
> >>>>>>         # Stop some important opensaf processes to prevent bad
> >>>>>> things from happening
> >>>>>>         $icmd pkill -STOP osafamfwd
> >>>>>>
> >>>>>> -----------------------------------------------------------------
> >>>>>> -
> >>>>>> --
> >>>>>> ----------
> >>>>>>
> >>>>>> How ServiceNow helps IT people transform IT departments:
> >>>>>> 1. A cloud service to automate IT design, transition and
> >>>>>> operations 2. Dashboards that offer high-level views of
> >>>>>> enterprise
> >> services 3.
> >>>>>> A single system of record for all IT processes
> >>>>>> http://p.sf.net/sfu/servicenow-d2d-j
> >>>>>> _______________________________________________
> >>>>>> Opensaf-devel mailing list
> >>>>>> [email protected]
> >>>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
> >>>>>>
> >>>>>>
> >>>> -------------------------------------------------------------------
> >>>> -
> >>>> --
> >>>> -------- How ServiceNow helps IT people transform IT departments:
> >>>> 1. A cloud service to automate IT design, transition and operations 2.
> >>>> Dashboards that offer high-level views of enterprise services 3. A
> >>>> single system of record for all IT processes
> >>>> http://p.sf.net/sfu/servicenow-d2d-j
> >>>> _______________________________________________
> >>>> Opensaf-devel mailing list
> >>>> [email protected]
> >>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
> >>>>
> >>>>
> >>>
> >>> --------------------------------------------------------------------
> >>> --
> >>> -------- This SF.net email is sponsored by Windows:
> >>>
> >>> Build for Windows Store.
> >>>
> >>> http://p.sf.net/sfu/windows-dev2dev
> >>> _______________________________________________
> >>> Opensaf-devel mailing list
> >>> [email protected]
> >>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
> >
> 

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to