[Bug 1848902] Re: haproxy in bionic can get stuck

Christian Ehrhardt  Mon, 18 Nov 2019 00:06:53 -0800

Uploaded to Bionic-unapproved

** Description changed:


  [Impact]
  
-  * The master process will exit with the status of the last worker. 
-    When the worker is killed with SIGTERM, it is expected to get 143 as an
-    exit status. Therefore, we consider this exit status as normal from a
-    systemd point of view. If it happens when not stopping, the systemd
-    unit is configured to always restart, so it has no adverse effect.
+  * The master process will exit with the status of the last worker.
+    When the worker is killed with SIGTERM, it is expected to get 143 as an
+    exit status. Therefore, we consider this exit status as normal from a
+    systemd point of view. If it happens when not stopping, the systemd
+    unit is configured to always restart, so it has no adverse effect.
  
-  * Backport upstream fix - adding another accepted RC to the systemd 
-    service
+  * Backport upstream fix - adding another accepted RC to the systemd
+    service
  
  [Test Case]
  
-  * You want to install haproxy and have it running. Then sigterm it a lot.
-    With the fix it would restart the service all the time, well except 
-    restart limit. But in the bad case it will just stay down and didn't 
-    even try to restart it.
+  * You want to install haproxy and have it running. Then sigterm it a lot.
+    With the fix it would restart the service all the time, well except
+    restart limit. But in the bad case it will just stay down and didn't
+    even try to restart it.
  
-    $ apt install haproxy
-    $ for x in {1..100}; do pkill -TERM -x haproxy ; sleep 0.1 ; done
-    $ systemctl status haproxy
+    $ apt install haproxy
+    $ for x in {1..100}; do pkill -TERM -x haproxy ; sleep 0.1 ; done
+    $ systemctl status haproxy
+ 
+    The above is a hacky way to trigger some A/B behavior on the fix.
+    It isn't perfect as systemd restart counters will kick in and you 
+    essentially check a secondary symptom.
+    I'd recommend to in addition run the following:
+ 
+    $ apt install haproxy
+    $ for x in {1..1000}; do pkill -TERM -x haproxy ; sleep 0.001 systemctl 
+ reset-failed haproxy.service; done
+    $ systemctl status haproxy
+ 
+    You can do so with even smaller sleeps, that should keep the service up 
+    and running (this isn't changing with the fix, but should work with the 
new code).
  
  [Regression Potential]
  
-  * This eventually is a conffile modification, so if there are other 
-    modifications done by the user they will get a prompt. But that isn't a 
-    regression. I checked the code and I can't think of another RC=143 that 
-    would due to that "no more" detected as error. I really think other 
-    than the update itself triggering a restart (as usual for services) 
-    there is no further regression potential to this.
+  * This eventually is a conffile modification, so if there are other
+    modifications done by the user they will get a prompt. But that isn't a
+    regression. I checked the code and I can't think of another RC=143 that
+    would due to that "no more" detected as error. I really think other
+    than the update itself triggering a restart (as usual for services)
+    there is no further regression potential to this.
  
  [Other Info]
-  
-  * Fix already active in IS hosted cloud without issues since a while
-  * Also reports (comment #5) show that others use this in production as 
-    well
+ 
+  * Fix already active in IS hosted cloud without issues since a while
+  * Also reports (comment #5) show that others use this in production as
+    well
  
  ---
  
  On a Bionic/Stein cloud, after a network partition, we saw several units
  (glance, swift-proxy and cinder) fail to start haproxy, like so:
  
  root@juju-df624b-6-lxd-4:~# systemctl status haproxy.service
  ● haproxy.service - HAProxy Load Balancer
     Loaded: loaded (/lib/systemd/system/haproxy.service; enabled; vendor 
preset: enabled)
     Active: failed (Result: exit-code) since Sun 2019-10-20 00:23:18 UTC; 1h 
35min ago
       Docs: man:haproxy(1)
             file:/usr/share/doc/haproxy/configuration.txt.gz
    Process: 2002655 ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE 
$EXTRAOPTS (code=exited, status=143)
    Process: 2002649 ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q $EXTRAOPTS 
(code=exited, status=0/SUCCESS)
   Main PID: 2002655 (code=exited, status=143)
  
  Oct 20 00:16:52 juju-df624b-6-lxd-4 systemd[1]: Starting HAProxy Load 
Balancer...
  Oct 20 00:16:52 juju-df624b-6-lxd-4 systemd[1]: Started HAProxy Load Balancer.
  Oct 20 00:23:18 juju-df624b-6-lxd-4 systemd[1]: Stopping HAProxy Load 
Balancer...
  Oct 20 00:23:18 juju-df624b-6-lxd-4 haproxy[2002655]: [WARNING] 292/001652 
(2002655) : Exiting Master process...
  Oct 20 00:23:18 juju-df624b-6-lxd-4 haproxy[2002655]: [ALERT] 292/001652 
(2002655) : Current worker 2002661 exited with code 143
  Oct 20 00:23:18 juju-df624b-6-lxd-4 haproxy[2002655]: [WARNING] 292/001652 
(2002655) : All workers exited. Exiting... (143)
  Oct 20 00:23:18 juju-df624b-6-lxd-4 systemd[1]: haproxy.service: Main process 
exited, code=exited, status=143/n/a
  Oct 20 00:23:18 juju-df624b-6-lxd-4 systemd[1]: haproxy.service: Failed with 
result 'exit-code'.
  Oct 20 00:23:18 juju-df624b-6-lxd-4 systemd[1]: Stopped HAProxy Load Balancer.
  root@juju-df624b-6-lxd-4:~#
  
  The Debian maintainer came up with the following patch for this:
  
    https://www.mail-archive.com/[email protected]/msg30477.html
  
  Which was added to the 1.8.10-1 Debian upload and merged into upstream 1.8.13.
  Unfortunately Bionic is on 1.8.8-1ubuntu0.4 and doesn't have this patch.
  
  Please consider pulling this patch into an SRU for Bionic.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1848902

Title:
  haproxy in bionic can get stuck

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/haproxy/+bug/1848902/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1848902] Re: haproxy in bionic can get stuck

Reply via email to