The problem with using capabilities is that we would need to use either 
libcap or libcap-ng, neither of which is included in LSB.

regards,
Anders Widell

On 2013-06-13 17:39, Mathivanan Naickan Palanivelu wrote:
> I'm playing with CAP_KILL, CAP_SYS_BOOT and PR_SET_KEEPCAP.
> Will get back on this patch tomorrow.
>
> Cheers.
>
>> -----Original Message-----
>> From: Mathivanan Naickan Palanivelu
>> Sent: Thursday, June 13, 2013 8:53 PM
>> To: Anders Widell; Hans Feldt
>> Cc: [email protected]
>> Subject: RE: [devel] [PATCH 1 of 1] osaf: Add time supervision of
>> opensaf_reboot [#437]
>>
>> I was also looking at CAP_SYS_ADMIN(alternative for
>> opensaf_reboot_prepare()), as an option until I ran into
>> https://lwn.net/Articles/486306/!
>> CAP_SYS_ADMIN would make us vulnerable.
>>
>> Cheers,
>> Mathi.
>>
>>
>>> -----Original Message-----
>>> From: Anders Widell [mailto:[email protected]]
>>> Sent: Tuesday, June 11, 2013 1:56 PM
>>> To: Hans Feldt
>>> Cc: [email protected]
>>> Subject: Re: [devel] [PATCH 1 of 1] osaf: Add time supervision of
>>> opensaf_reboot [#437]
>>>
>>> Maybe I should also point out that the case of getting file
>>> descriptors 0, 1 or 2 is not just a hypothetical scenario that I have
>>> dreamed up - it actually happens. The code will not work without the retry.
>>>
>>> I will try to make the code comments more clear, maybe mention the
>>> daemonize() function instead of just referring to "dropping root 
>>> privileges".
>>>
>>> regards,
>>> Anders Widell
>>>
>>> On 2013-06-10 17:04, Anders Widell wrote:
>>>> See comments below.
>>>>
>>>> regards,
>>>> Anders Widell
>>>>
>>>> On 2013-06-10 15:47, Hans Feldt wrote:
>>>>> Why is not opensaf_reboot_prepare() called from all contexts?
>>>> What do you mean by all contexts? It is called by amfwd since it
>>>> needs to reboot the local node without running as root. As I said in
>>>> the review mail (but maybe also should go into the commit message),
>>>> amfd and fmd can simply _Exit() to reboot the local node. This can
>>>> be a separate enhancement ticket, since it works already now
>>>> (opensaf_reboot() will exit when the timer has expired).
>>>>> I think the implementation of opensaf_reboot_prepare() requires
>>>>> some comments since it does recursion. I think I understand it but
>>>>> it is just a little to clever to be uncommented...
>>>> I did put a comment just at the point of recursive call, but maybe
>>>> it wasn't clear enough? :-) Basically, I don't want to get file
>>>> descriptors 0, 1 or 2. So if I do get one of those I try again.
>>>>> Why is opensaf_reboot_prepare() called before daemonize()? I guess
>>>>> that should be commented since it is probably important.
>>>> The comment for opensaf_reboot_prepare() says that it must be called
>>>> before dropping root privileges. daemonize() is the function that
>>>> drops root privileges, so I think it is fairly clear why it is
>>>> called before daemonize().
>>>>> Thanks,
>>>>> Hans
>>>>>
>>>>>
>>>>> On 06/10/2013 12:50 PM, Anders Widell wrote:
>>>>>> 00-README.conf                                   |   5 +
>>>>>>     osaf/libs/core/include/ncssysf_def.h             |  20 ++++-
>>>>>>     osaf/libs/core/leap/sysf_def.c                   |  93
>>>>>> ++++++++++++++++++++++-
>>>>>>     osaf/services/infrastructure/nid/config/nid.conf |   6 +
>>>>>>     osaf/services/saf/avsv/amfwdog/amf_wdog.c        |   1 +
>>>>>>     scripts/opensaf_reboot                           |  10 ++-
>>>>>>     6 files changed, 126 insertions(+), 9 deletions(-)
>>>>>>
>>>>>>
>>>>>> Add a time supervision of the library function opensaf_reboot() as
>>>>>> well as the shell script opensaf_reboot. If the reboot has not
>>>>>> happened before the timeout, the OS is rebooted hard using the
>>>>>> SysRq trigger /proc/sysrq-trigger.
>>>>>> This makes
>>>>>> it possible to reboot the node also when the system is in a very
>>>>>> bad state, for example when fork() fails because the system is out
>>>>>> of resources (no free memory, process table full etc.).  It also
>>>>>> handles the case when the ordinary reboot command hangs trying to
>>>>>> sync the file system, for example due to a disk or NFS problem.
>>>>>>
>>>>>> diff --git a/00-README.conf b/00-README.conf
>>>>>> --- a/00-README.conf
>>>>>> +++ b/00-README.conf
>>>>>> @@ -52,6 +52,11 @@ group/user.
>>>>>>
>>>>>>     - Use of MDS subslot ID needs to be enabled, add
>>>>>> TIPC_USE_SUBSLOT_ID=YES
>>>>>>
>>>>>> +- Time supervision of local node reboot should be disabled or
>>>>>> changed.  Change
>>>>>> +  OPENSAF_REBOOT_TIMEOUT to the desired number of seconds
>>> before a
>>>>>> reboot is
>>>>>> +  escalated to an immediate reboot via the SysRq interface, or
>>>>>> + zero
>>>>>> to disable
>>>>>> +  this feature.
>>>>>> +
>>>>>>
>> **********************************************************
>>> *********************
>>>>>>     nodeinit.conf
>>>>>>
>>>>>> diff --git a/osaf/libs/core/include/ncssysf_def.h
>>>>>> b/osaf/libs/core/include/ncssysf_def.h
>>>>>> --- a/osaf/libs/core/include/ncssysf_def.h
>>>>>> +++ b/osaf/libs/core/include/ncssysf_def.h
>>>>>> @@ -83,7 +83,25 @@ extern "C" {
>>>>>>     #define m_START_CRITICAL m_NCS_OS_START_TASK_LOCK
>>>>>>     #define m_END_CRITICAL                 m_NCS_OS_END_TASK_LOCK
>>>>>>
>>>>>> -extern void opensaf_reboot(unsigned int node_id, char *ee_name,
>>>>>> const char *reason);
>>>>>> +/**
>>>>>> + *  Prepare for a future call to opensaf_reboot() by opening the
>>>>>> necessary
>>>>>> + *  file (/proc/sysrq-trigger). Call this function before
>>>>>> + dropping root
>>>>>> + *  privileges, if you later intend to call opensaf_reboot() to
>>>>>> reboot the local
>>>>>> + *  node without having root privileges.
>>>>>> + */
>>>>>> +void opensaf_reboot_prepare(void);
>>>>>> +
>>>>>> +/**
>>>>>> + *  Reboot a node. Call this function with @a node_id zero to
>>>>>> +reboot
>>>>>> the local
>>>>>> + *  node. If you intend to use this function to reboot the local
>>>>>> node without
>>>>>> + *  having root privileges, you must first call
>>>>>> opensaf_reboot_prepare() before
>>>>>> + *  dropping root privileges.
>>>>>> + *
>>>>>> + *  Note that this function uses the configuration option
>>>>>> OPENSAF_REBOOT_TIMEOUT
>>>>>> + *  in nid.conf. Therefore, this function must only be called
>>>>>> + from
>>>>>> services
>>>>>> + *  that are started by NID.
>>>>>> + */
>>>>>> +void opensaf_reboot(unsigned node_id, const char* ee_name, const
>>>>>> char* reason);
>>>>>>
>>>>>>
>> /**********************************************************
>>> *********
>>>>>> **********
>>>>>> ** **
>>>>>> diff --git a/osaf/libs/core/leap/sysf_def.c
>>>>>> b/osaf/libs/core/leap/sysf_def.c
>>>>>> --- a/osaf/libs/core/leap/sysf_def.c
>>>>>> +++ b/osaf/libs/core/leap/sysf_def.c
>>>>>> @@ -26,7 +26,17 @@
>>>>>>
>>>>>>     #include <configmake.h>
>>>>>>
>>>>>> -#include <ncsgl_defs.h>
>>>>>> +#include <stdio.h>
>>>>>> +#include <errno.h>
>>>>>> +#include <stdlib.h>
>>>>>> +#include <stdbool.h>
>>>>>> +#include <sys/stat.h>
>>>>>> +#include <fcntl.h>
>>>>>> +#include <unistd.h>
>>>>>> +#include <signal.h>
>>>>>> +#include <syslog.h>
>>>>>> +#include "ncs_main_papi.h"
>>>>>> +#include "ncsgl_defs.h"
>>>>>>     #include "ncs_osprm.h"
>>>>>>
>>>>>>     #include "ncs_svd.h"
>>>>>> @@ -38,6 +48,7 @@
>>>>>>     #include "sysf_exc_scr.h"
>>>>>>     #include "usrbuf.h"
>>>>>>
>>>>>> +static int sysrq_trigger_fd = -1;
>>>>>>
>>>>>>
>> /**********************************************************
>>> *********
>>>>>> **********
>>>>>>
>>>>>> @@ -271,20 +282,88 @@ uint32_t leap_env_destroy()
>>>>>>         return NCSCC_RC_SUCCESS;
>>>>>>     }
>>>>>>
>>>>>> +void opensaf_reboot_prepare(void) {
>>>>>> +    if (sysrq_trigger_fd != -1) return;
>>>>>> +    int fd;
>>>>>> +    do {
>>>>>> +        fd = open("/proc/sysrq-trigger", O_WRONLY);
>>>>>> +    } while (fd == -1 && errno == EINTR);
>>>>>> +    if (fd >= 0 && fd <= 2) {
>>>>>> +        /* We don't want to get file descriptors 0, 1 or 2 because:
>>>>>> +         *   1) it would be dangerous
>>>>>> +         *   2) it would by closed by deamonize()
>>>>>> +         */
>>>>>> +        opensaf_reboot_prepare();
>>>>>> +        close(fd);
>>>>>> +    } else {
>>>>>> +        sysrq_trigger_fd = fd;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static void opensaf_reboot_fallback(int sig_no) {
>>>>>> +    (void) sig_no;
>>>>>> +    if (sysrq_trigger_fd == -1) {
>>>>>> +        do {
>>>>>> +            sysrq_trigger_fd = open("/proc/sysrq-trigger", O_WRONLY);
>>>>>> +        } while (sysrq_trigger_fd == -1 && errno == EINTR);
>>>>>> +    }
>>>>>> +    if (sysrq_trigger_fd != -1) {
>>>>>> +        char buf[] = {'b'};
>>>>>> +        ssize_t result;
>>>>>> +        do {
>>>>>> +            result = write(sysrq_trigger_fd, buf, sizeof(buf));
>>>>>> +        } while (result == -1 && errno == EINTR);
>>>>>> +    }
>>>>>> +    _Exit(EXIT_SUCCESS);
>>>>>> +}
>>>>>> +
>>>>>>     /**
>>>>>>      *
>>>>>>      * @param reason
>>>>>>      */
>>>>>> -void opensaf_reboot(unsigned int node_id, char *ee_name, const
>>>>>> char
>>>>>> *reason)
>>>>>> +void opensaf_reboot(unsigned node_id, const char* ee_name, const
>>>>>> char* reason)
>>>>>>     {
>>>>>> +    char* env_var = getenv("OPENSAF_REBOOT_TIMEOUT");
>>>>>> +    unsigned long supervision_time = 0;
>>>>>> +    if (env_var != NULL) {
>>>>>> +        char* endptr;
>>>>>> +        errno = 0;
>>>>>> +        supervision_time = strtoul(env_var, &endptr, 0);
>>>>>> +        if (errno != 0 || *env_var == '\0' || *endptr != '\0') {
>>>>>> +            supervision_time = 0;
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    unsigned own_node_id = ncs_get_node_id();
>>>>>> +    bool use_fallback = supervision_time > 0 && (node_id == 0 ||
>>>>>> node_id ==
>>>>>> +        own_node_id);
>>>>>> +    if (use_fallback) {
>>>>>> +        if (signal(SIGALRM, opensaf_reboot_fallback) == SIG_ERR) {
>>>>>> +            opensaf_reboot_fallback(0);
>>>>>> +        }
>>>>>> +        alarm(supervision_time);
>>>>>> +    }
>>>>>> +
>>>>>> +    syslog(LOG_CRIT,
>>>>>> +        "Rebooting OpenSAF NodeId = %u EE Name = %s, Reason: %s, "
>>>>>> +        "OwnNodeId = %u, SupervisionTime = %lu",
>>>>>> +        node_id, ee_name == NULL ? "No EE Mapped" : ee_name,
>> reason,
>>>>>> +        own_node_id, supervision_time);
>>>>>>
>>>>>>         char str[256];
>>>>>> -    memset(str,0,256);
>>>>>> +    snprintf(str, sizeof(str), PKGLIBDIR "/opensaf_reboot %u %s",
>>>>>> node_id,
>>>>>> +        ee_name == NULL ? "" : ee_name);
>>>>>> +    int reboot_result = system(str);
>>>>>> +    if (reboot_result != EXIT_SUCCESS) {
>>>>>> +            syslog(LOG_CRIT, "node reboot failure: exit code %d",
>>>>>> +            reboot_result);
>>>>>> +    }
>>>>>>
>>>>>> -    snprintf(str,255,PKGLIBDIR"/opensaf_reboot %d
>>>>>> %s\n",node_id,((ee_name == NULL)?"":ee_name));
>>>>>> -    syslog(LOG_CRIT,"Rebooting OpenSAF NodeId = %d EE Name = %s,
>>>>>> Reason: %s\n",node_id,((ee_name == NULL)? "No EE
>>>>>> Mapped":ee_name),reason);
>>>>>> -    if(system(str) == -1){
>>>>>> -            syslog(LOG_CRIT, "node reboot failure!");
>>>>>> +    if (use_fallback) {
>>>>>> +        /* Wait for the alarm signal we set up earlier. */
>>>>>> +        for (;;) pause();
>>>>>>         }
>>>>>>     }
>>>>>>
>>>>>> diff --git a/osaf/services/infrastructure/nid/config/nid.conf
>>>>>> b/osaf/services/infrastructure/nid/config/nid.conf
>>>>>> --- a/osaf/services/infrastructure/nid/config/nid.conf
>>>>>> +++ b/osaf/services/infrastructure/nid/config/nid.conf
>>>>>> @@ -23,6 +23,12 @@ OPENSAF_MANAGE_TIPC="yes"
>>>>>>     # Specifies how long "opensafd stop" should wait before stop
>>>>>> has considered to fail
>>>>>>     OPENSAF_TERMTIMEOUT=60
>>>>>>
>>>>>> +# Number of seconds before a reboot is escalated to an immediate
>>>>>> reboot via the
>>>>>> +# SysRq interface /proc/sysrq-trigger.  Comment it out or set it
>>>>>> +to
>>>>>> zero to
>>>>>> +# disable this feature.  Note that you must make sure the kernel
>>>>>> allows reboot
>>>>>> +# via SysRq for this feature to work.
>>>>>> +export OPENSAF_REBOOT_TIMEOUT=60
>>>>>> +
>>>>>>     # Specify the UNIX group and user OpenSAF run as
>>>>>>     export OPENSAF_GROUP=opensaf
>>>>>>     export OPENSAF_USER=opensaf
>>>>>> diff --git a/osaf/services/saf/avsv/amfwdog/amf_wdog.c
>>>>>> b/osaf/services/saf/avsv/amfwdog/amf_wdog.c
>>>>>> --- a/osaf/services/saf/avsv/amfwdog/amf_wdog.c
>>>>>> +++ b/osaf/services/saf/avsv/amfwdog/amf_wdog.c
>>>>>> @@ -137,6 +137,7 @@ int main(int argc, char *argv[])
>>>>>>         SaAmfHealthcheckKeyT hc_key;
>>>>>>         char *hc_key_env;
>>>>>>
>>>>>> +    opensaf_reboot_prepare();
>>>>>>         daemonize(argc, argv);
>>>>>>
>>>>>>         ava_install_amf_down_cb(amf_down_cb);
>>>>>> diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot
>>>>>> --- a/scripts/opensaf_reboot
>>>>>> +++ b/scripts/opensaf_reboot
>>>>>> @@ -67,7 +67,15 @@ if [ "$self_node_id" = "$node_id" ] || [
>>>>>>         # uncomment the following line if debugging errors that
>>>>>> keep restarting the node
>>>>>>         # exit 0
>>>>>>
>>>>>> -    logger -t "opensaf_reboot" "Rebooting local node"
>>>>>> +    logger -t "opensaf_reboot" "Rebooting local node;
>>>>>> timeout=$OPENSAF_REBOOT_TIMEOUT"
>>>>>> +
>>>>>> +    # Start a reboot supervision background process. Note that a
>>>>>> similar
>>>>>> +    # supervision is also done in the opensaf_reboot() function
>>>>>> + in
>>>>>> LEAP.
>>>>>> +    # However, that supervision may be stopped by one of the
>>>>>> + pkill
>>>>>> commands
>>>>>> +    # below, if it was called from AMF or FM.
>>>>>> +    if [ "${OPENSAF_REBOOT_TIMEOUT}0" -gt "0" ]; then
>>>>>> +        (sleep "$OPENSAF_REBOOT_TIMEOUT"; echo -n "b" >
>>>>>> "/proc/sysrq-trigger") &
>>>>>> +    fi
>>>>>>
>>>>>>         # Stop some important opensaf processes to prevent bad
>>>>>> things from happening
>>>>>>         $icmd pkill -STOP osafamfwd
>>>>>>
>>>>>> ------------------------------------------------------------------
>>>>>> --
>>>>>> ----------
>>>>>>
>>>>>> How ServiceNow helps IT people transform IT departments:
>>>>>> 1. A cloud service to automate IT design, transition and
>>>>>> operations 2. Dashboards that offer high-level views of enterprise
>> services 3.
>>>>>> A single system of record for all IT processes
>>>>>> http://p.sf.net/sfu/servicenow-d2d-j
>>>>>> _______________________________________________
>>>>>> Opensaf-devel mailing list
>>>>>> [email protected]
>>>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>>>>>>
>>>>>>
>>>> --------------------------------------------------------------------
>>>> --
>>>> -------- How ServiceNow helps IT people transform IT departments:
>>>> 1. A cloud service to automate IT design, transition and operations 2.
>>>> Dashboards that offer high-level views of enterprise services 3. A
>>>> single system of record for all IT processes
>>>> http://p.sf.net/sfu/servicenow-d2d-j
>>>> _______________________________________________
>>>> Opensaf-devel mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>>>>
>>>>
>>>
>>> ----------------------------------------------------------------------
>>> -------- This SF.net email is sponsored by Windows:
>>>
>>> Build for Windows Store.
>>>
>>> http://p.sf.net/sfu/windows-dev2dev
>>> _______________________________________________
>>> Opensaf-devel mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>


------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to