Hi Alex,

ok I'll check if there is a problem, but immnd is restartable and should be restarted after the nid phase is

finished.

After the nid phase the system should  be in a "well defined" state. That was one of the

reasons fifo monitoring was added to the nid phase.

/HansN


On 04/25/2018 03:23 PM, Alex Jones wrote:

Hi Hans,

    I understand. But, what if it doesn't fail in the nid phase?

    If you run this command in your setup: "systemctl start opensafd; sleep 2; pkill -KILL immnd", does immnd get restarted? And does opensafd successfully come up according to systemd?

Alex


On 04/25/2018 09:19 AM, Hans Nordebäck wrote:
------------------------------------------------------------------------
NOTICE: This email was received from an EXTERNAL sender
------------------------------------------------------------------------

Hi Alex,

the reboot should only happen if REBOOT_ON_FAIL_TIMEOUT is set, (i.e. not 0).

I checked the latest version, the reboot works fine if e.g. immnd fails in the nid phase and REBOOT_ON_FAIL_TIMEOUT is set.

/Thanks HansN

*From:*Alex Jones [mailto:[email protected]]
*Sent:* den 25 april 2018 15:05
*To:* Hans Nordebäck <[email protected]>; Anders Widell <[email protected]>
*Cc:* [email protected]
*Subject:* Re: SV: [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839]

Hi Hans,

    There must be a hole here, then. Because in our setup, if dtmd or immnd crashes early in the startup process, the node doesn't reboot, and the executables are not restarted. If I set "Restart=on-failure" it works fine.

    Can you test this in your setup to see if you see the same thing?

Alex

On 04/24/2018 05:04 AM, Hans Nordeback wrote:

    ------------------------------------------------------------------------

    NOTICE: This email was received from an EXTERNAL sender

    ------------------------------------------------------------------------

    Hi Alex,

    please see comment below.

    /Thanks HansN

    On 04/23/2018 03:56 PM, Alex Jones wrote:

        Hi Hans,

            I just did some tests. Maybe there is a bug in nid, but
        when I do not have "Restart=on-failure", the node does not
        reboot when I run the command "systemctl start opensafd;
        sleep 3; pkill -KILL immnd", and opensafd times out and
        fails, with REBOOT_ON_FAIL_TIMEOUT=30.

    [HansN] isn't the nid phase finished before the sleep 3 command?
    It is only during the nid phase that the REBOOT_ON_FAIL_TIMEOUT
    is used,
    After the nid phase opensaf enters "normal" operation,  no reboot
    will be performed as immnd is restartable. Instead of the sleep 3,
    you can edit the nodeinit.conf.controller file and change the
    immnd line to e.g.
    "/usr/local/lib/opensaf/clc-cli/osaf-immndx:IMMND ... " then
    nid should fail to start and REBOOT_ON_FAIL_TIMEOUT should work.


            But, opensafd restarts every time when I run that command
        with "Restart=on-failure" set.

        Alex

        On 04/19/2018 04:02 PM, Hans Nordebäck wrote:

            
------------------------------------------------------------------------

            NOTICE: This email was received from an EXTERNAL sender

            
------------------------------------------------------------------------

            Hi Alex,

            a question, if opensafd fails, (assert or exit code ne 0)
            a reboot of the node will be performed if
            REBOOT_ON_FAIL_TIMEOUT

            is configured, I have not checked, but how do systemd
            handle the reboot request if Restart=on-failure is set?

            /BR HansN

            
------------------------------------------------------------------------

            *Från:* Alex Jones <[email protected]> <mailto:[email protected]>
            *Skickat:* den 19 april 2018 17:27:27
            *Till:* Hans Nordebäck; Anders Widell
            *Kopia:* [email protected]
            <mailto:[email protected]>; Alex Jones
            *Ämne:* [PATCH 1/1] nid: restart opensafd on failure when
            systemd enabled [#2839]

            Under certain circumstances opensafd fails to start
            (immnd or dtmd crashes,
            etc).

            Apr 19 15:07:31 ams-idsp-46-novnfm osafdtmd[3315]:
            src/dtm/dtmnd/dtm_intra_svc.cc:1778:
            dtm_process_internode_service_up_msg: Assertion '0' failed.

            We can tell systemd to restart opensafd if it fails to start.
            ---
             src/nid/opensafd.service.in | 2 ++
             1 file changed, 2 insertions(+)

            diff --git a/src/nid/opensafd.service.in
            b/src/nid/opensafd.service.in
            index 7f4d75ee3..6050f5e88 100644
            --- a/src/nid/opensafd.service.in
            +++ b/src/nid/opensafd.service.in
            @@ -12,5 +12,7 @@ ControlGroup=cpu:/
             TimeoutStartSec=3hours
             KillMode=none
             @systemdtasksmax@
            +Restart=on-failure
            +
             [Install]
             WantedBy=multi-user.target
-- 2.13.6



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to