Added SRU Template for Xenial
- Yakkety/Zesty not showing the issue (likely due to systemd version)
- Trusty is pre systemd and even the old version uses a --pidfile in its call 
to start-stop-daemon

** Description changed:

+ [Impact]
+ 
+  * Restarts of keepalived can leave stale processes with the old
+ configuration around.
+ 
+  * The systemd detection of the MainPID is suboptimal, and combined with
+ not waiting on signals being handled it can fail on second restart
+ killing the (still) remaining process of the first start.
+ 
+  * Upstream has a PIDFile statement, this has proven to avoid the issue
+ in the MainPID guessing code of systemd.
+ 
+ [Test Case]
+ 
+  * Set up keepalived, the more complex the config is the "bigger" is the 
reace window, below in the description is a trivial sample config that works 
well.
+  
+  * As a test run the loop restarting the service head-to-head while staying 
under the max-restart limit
+ $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo 
systemctl restart keepalived; sudo systemctl status keepalived | egrep 
'Main.*exited'; done; done
+ 
+  Expectation: no output other than timing
+  Without fix: sometimes MainPIDs do no more exist, in these cases the child 
processes are the "old" ones from last execution with the old config.
+ 
+ [Regression Potential]
+ 
+  * Low because
+    * A PIDFile statement is recommended by systemd for type=forking services 
anyway.
+    * Upstream keepalived has this statement in their service file
+    * By the kind of change, it should have no functional impact to other 
parts of the service other than for the PID detection of the job by Systemd.
+ 
+  * Yet regression potential is never zero. There might be the unlikely
+ case, which were considered working before due to a new config not
+ properly being picked up. After the fix they will behave correctly and
+ might show up as false-positives then if e.g. config was bad.
+ 
+ [Other Info]
+  
+  * Usually a fix has to be in at least the latest Development release before 
SRUing it. But as I outlined below in later Releases than Xenial systemd seems 
to have improved making this change not-required. We haven't identified the 
bits for this (there is a bug task here), and they might as well be very 
complex. I think it is correct to fix Xenial in this regard with the simple 
change to the service file for now.
+ 
+  * To eventually match I created a Debian bug task to ask them for the
+ inclusion of the PIDFile so it can slowly tickle back down to newer
+ Ubuntu Releases - also there more often people run backports where the
+ issue might occur on older systemd versions (just as it does for us on
+ Xenial)
+ 
+ ---
+ 
  Because "PIDFile=" directive is missing in the systemd unit file,
  keepalived sometimes fails to kill all old processes. The old processes
  remain with old settings and cause unexpected behaviors. The detail of
  this bug is described in this ticket in upstream:
  https://github.com/acassen/keepalived/issues/443.
  
  The official systemd unit file is available since version 1.2.24 by this
  commit:
  
  
https://github.com/acassen/keepalived/commit/635ab69afb44cd8573663e62f292c6bb84b44f15
  
  This includes "PIDFile" directive correctly:
  
  PIDFile=/var/run/keepalived.pid
  
  We should go the same way.
  
  I am using Ubuntu 16.04.1, kernel 4.4.0-45-generic.
  
  Package: keepalived
  Version: 1.2.19-1
  
  =======================================================================
  
  How to reproduce:
  
  I used the two instances of Ubuntu 16.04.2 on DigitalOcean:
  
  Configurations
  --------------
  
  MASTER server's /etc/keepalived/keepalived.conf:
  
    vrrp_script chk_nothing {
       script "/bin/true"
       interval 2
    }
  
    vrrp_instance G1 {
      interface eth1
      state BACKUP
      priority 100
  
      virtual_router_id 123
      unicast_src_ip <primal IP>
      unicast_peer {
        <secondal IP>
      }
      track_script {
        chk_nothing
      }
    }
  
  BACKUP server's /etc/keepalived/keepalived.conf:
  
    vrrp_script chk_nothing {
       script "/bin/true"
       interval 2
    }
  
    vrrp_instance G1 {
      interface eth1
      state MASTER
      priority 200
  
      virtual_router_id 123
      unicast_src_ip <secondal IP>
      unicast_peer {
        <primal IP>
      }
      track_script {
        chk_nothing
      }
    }
  
  Loop based probing for the Error to exist:
  ------------------------------------------
  After the setup above start keepalived on both servers:
-     $ sudo systemctl start keepalived.service
+     $ sudo systemctl start keepalived.service
  Then run the following loop
-     $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo 
systemctl restart keepalived; sudo systemctl status keepalived | egrep 
'Main.*exited'; done; done
+     $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo 
systemctl restart keepalived; sudo systemctl status keepalived | egrep 
'Main.*exited'; done; done
  
  Expected: no error, only time reports
  Error case: Showing Main PID exited, details below
  
  Step by Step Procedures
  -----------------------
  
  1) Start keepalived on both servers
  
    $ sudo systemctl start keepalived.service
  
  2) Restart keepalived on either one
  
    $ sudo systemctl restart keepalived.service
  
  3) Check status and PID
  
    $ systemctl status -n0 keepalived.service
  
  Result
  ------
  
  0) Before restart
  
  Main PID is 3402 and the subprocesses' PIDs are 3403-3406. So far so
  good.
  
    root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived
    ● keepalived.service - Keepalive Daemon (LVS and VRRP)
       Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor 
preset: enabled)
       Active: active (running) since Sat 2017-03-04 01:37:12 UTC; 14min ago
      Process: 3402 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, 
status=0/SUCCESS)
     Main PID: 3403 (keepalived)
        Tasks: 3
       Memory: 1.7M
          CPU: 1.900s
       CGroup: /system.slice/keepalived.service
               ├─3403 /usr/sbin/keepalived
               ├─3405 /usr/sbin/keepalived
               └─3406 /usr/sbin/keepalived
  
  1) First restart
  
  Now Main PID is 3403, which was one of the previous subprocesses and is
  actually exited. Something is wrong. Yet, the previous processes are all
  exited; we are not likely to see no weird behaviors here.
  
    root@ubuntu-2gb-sgp1-01:~# systemctl restart keepalived
    root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived
    ● keepalived.service - Keepalive Daemon (LVS and VRRP)
       Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor 
preset: enabled)
       Active: active (running) since Sat 2017-03-04 01:51:45 UTC; 1s ago
      Process: 4782 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, 
status=0/SUCCESS)
     Main PID: 3403 (code=exited, status=0/SUCCESS)
        Tasks: 3
       Memory: 1.7M
          CPU: 11ms
       CGroup: /system.slice/keepalived.service
               ├─4783 /usr/sbin/keepalived
               ├─4784 /usr/sbin/keepalived
               └─4785 /usr/sbin/keepalived
  
  2) Second restart
  
  Now Main PID is 4783 and subprocesses' PIDs are 4783-4785. This is
  problematic as 4783 is the old process, which should have exited before
  new processes arose. Therefore, keepalived remains in old settings while
  users believe it uses the new setting.
  
    root@ubuntu-2gb-sgp1-01:~# systemctl restart keepalived
    root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived
    ● keepalived.service - Keepalive Daemon (LVS and VRRP)
       Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor 
preset: enabled)
       Active: active (running) since Sat 2017-03-04 01:51:49 UTC; 1s ago
      Process: 4796 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, 
status=0/SUCCESS)
     Main PID: 4783 (keepalived)
        Tasks: 3
       Memory: 1.7M
          CPU: 6ms
       CGroup: /system.slice/keepalived.service
               ├─4783 /usr/sbin/keepalived
               ├─4784 /usr/sbin/keepalived
               └─4785 /usr/sbin/keepalived

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1644530

Title:
  keepalived fails to restart cleanly due to the wrong systemd settings

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/keepalived/+bug/1644530/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to