Hi Hans,

       I understand. But, what if it doesn't fail in the nid phase?

       If you run this command in your setup: "systemctl start opensafd;
   sleep 2; pkill -KILL immnd", does immnd get restarted? And does
   opensafd successfully come up according to systemd?

   Alex

   On 04/25/2018 09:19 AM, Hans Nordebäck wrote:
     __________________________________________________________________

   NOTICE: This email was received from an EXTERNAL sender
     __________________________________________________________________

   Hi Alex,


   the reboot should only happen if REBOOT_ON_FAIL_TIMEOUT is set, (i.e.
   not 0).

   I checked the latest version, the reboot works fine if e.g. immnd fails
   in the nid phase and REBOOT_ON_FAIL_TIMEOUT is set.


   /Thanks HansN


   From: Alex Jones [[1]mailto:ajo...@rbbn.com]
   Sent: den 25 april 2018 15:05
   To: Hans Nordebäck [2]<hans.nordeb...@ericsson.com>; Anders Widell
   [3]<anders.wid...@ericsson.com>
   Cc: [4]opensaf-devel@lists.sourceforge.net
   Subject: Re: SV: [PATCH 1/1] nid: restart opensafd on failure when
   systemd enabled [#2839]


   Hi Hans,


       There must be a hole here, then. Because in our setup, if dtmd or
   immnd crashes early in the startup process, the node doesn't reboot,
   and the executables are not restarted. If I set "Restart=on-failure" it
   works fine.


       Can you test this in your setup to see if you see the same thing?


   Alex


   On 04/24/2018 05:04 AM, Hans Nordeback wrote:
   _______________________________________________________________________

     NOTICE: This email was received from an EXTERNAL sender
   _______________________________________________________________________


     Hi Alex,


     please see comment below.


     /Thanks HansN


   On 04/23/2018 03:56 PM, Alex Jones wrote:

     Hi Hans,


         I just did some tests. Maybe there is a bug in nid, but when I
     do not have "Restart=on-failure", the node does not reboot when I
     run the command "systemctl start opensafd; sleep 3; pkill -KILL
     immnd", and opensafd times out and fails, with
     REBOOT_ON_FAIL_TIMEOUT=30.

     [HansN] isn't the nid phase finished before the sleep 3 command? It
     is only during the nid phase that the REBOOT_ON_FAIL_TIMEOUT is
     used,
     After the nid phase opensaf enters "normal" operation,  no reboot
     will be performed as immnd is restartable. Instead of the sleep 3,
     you can edit the nodeinit.conf.controller file and change the immnd
     line to e.g. "/usr/local/lib/opensaf/clc-cli/osaf-immndx:IMMND ... "
     then
     nid should fail to start and REBOOT_ON_FAIL_TIMEOUT should work.


         But, opensafd restarts every time when I run that command with
     "Restart=on-failure" set.


     Alex


   On 04/19/2018 04:02 PM, Hans Nordebäck wrote:
   _______________________________________________________________________

     NOTICE: This email was received from an EXTERNAL sender
   _______________________________________________________________________


   Hi Alex,


   a question, if opensafd fails, (assert or exit code ne 0) a reboot of
   the node will be performed if REBOOT_ON_FAIL_TIMEOUT

   is configured, I have not checked, but how do systemd handle the reboot
   request if Restart=on-failure is set?


   /BR HansN
    _____________________________________________________________________

   Från: Alex Jones [5]<ajo...@rbbn.com>
   Skickat: den 19 april 2018 17:27:27
   Till: Hans Nordebäck; Anders Widell
   Kopia: [6]opensaf-devel@lists.sourceforge.net; Alex Jones
   Ämne: [PATCH 1/1] nid: restart opensafd on failure when systemd enabled
   [#2839]


   Under certain circumstances opensafd fails to start (immnd or dtmd
   crashes,
   etc).
   Apr 19 15:07:31 ams-idsp-46-novnfm osafdtmd[3315]:
   src/dtm/dtmnd/dtm_intra_svc.cc:1778:
   dtm_process_internode_service_up_msg: Assertion '0' failed.
   We can tell systemd to restart opensafd if it fails to start.
   ---
    src/nid/opensafd.service.in | 2 ++
    1 file changed, 2 insertions(+)
   diff --git a/src/nid/opensafd.service.in b/src/nid/opensafd.service.in
   index 7f4d75ee3..6050f5e88 100644
   --- a/src/nid/opensafd.service.in
   +++ b/src/nid/opensafd.service.in
   @@ -12,5 +12,7 @@ ControlGroup=cpu:/
    TimeoutStartSec=3hours
    KillMode=none
    @systemdtasksmax@
   +Restart=on-failure
   +
    [Install]
    WantedBy=multi-user.target
   --
   2.13.6

References

   1. mailto:ajo...@rbbn.com
   2. mailto:hans.nordeb...@ericsson.com
   3. mailto:anders.wid...@ericsson.com
   4. mailto:opensaf-devel@lists.sourceforge.net
   5. mailto:ajo...@rbbn.com
   6. mailto:opensaf-devel@lists.sourceforge.net

Attachment: signature.asc
Description: OpenPGP digital signature

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to