Confirmed. Without the following patch, wrong exit status code would be reported.
Fix reporting of script exit status https://github.com/acassen/keepalived/commit/46121a8b7e4af439c5ad9e4589fb80d414e0eefc Not a big issue, but it would be better to backport this patch together. How to Reproduce The Problem In our environment, we are using a misc-checker which exits with exit code 3. Simply putting something like this should be enough to reproduce the problem. /usr/local/sbin/our_misc_checker : --- #!/bin/sh exit 3 --- Then, within a standard keepalived.conf settings, selecting the above script file will give you a necessary keepalived environment for reproducing the issue. --- real_server <IP ADDRESS> <PORT NUM> { MISC_CHECK { misc_path "/usr/local/sbin/our_misc_checker" misc_dynamic } } --- -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to keepalived in Ubuntu. https://bugs.launchpad.net/bugs/1792298 Title: keepalived: MISC healthchecker's exit status is erroneously treated as a permanent error Status in keepalived package in Ubuntu: Incomplete Bug description: 1) The release of Ubuntu we are using $ lsb_release -rd Description: Ubuntu 16.04.5 LTS Release: 16.04 2) The version of the package we are using $ apt-cache policy keepalived keepalived: Installed: 1:1.2.24-1ubuntu0.16.04.1 ... 3) What we expected to happen MISC healthcheckers would be treated normally. 4) What happened instead We are trying to use Ubuntu 16.04's keepalived with our own MISC healthchecker, which is implemented to exit with exit code 3, and getting the following log messages endlessly. --- Note: some IP fields are masked --- Sep 12 06:55:09 devsvr Keepalived[16705]: Healthcheck child process(34232) died: Respawning Sep 12 06:55:09 devsvr Keepalived[16705]: Starting Healthcheck child process, pid=34239 Sep 12 06:55:09 devsvr Keepalived_healthcheckers[34239]: Initializing ipvs Sep 12 06:55:09 devsvr Keepalived_healthcheckers[34239]: Registering Kernel netlink reflector Sep 12 06:55:09 devsvr Keepalived_healthcheckers[34239]: Registering Kernel netlink command channel Sep 12 06:55:09 devsvr Keepalived_healthcheckers[34239]: Opening file '/etc/keepalived/keepalived.conf'. Sep 12 06:55:09 devsvr Keepalived_healthcheckers[34239]: Using LinkWatch kernel netlink reflector... Sep 12 06:55:09 devsvr Keepalived_healthcheckers[34239]: Activating healthchecker for service [XX.XX.XX.18]:80 Sep 12 06:55:09 devsvr Keepalived_healthcheckers[34239]: Activating healthchecker for service [XX.XX.XX.19]:80 Sep 12 06:55:09 devsvr Keepalived_healthcheckers[34239]: Activating healthchecker for service [XX.XX.XX.18]:443 Sep 12 06:55:09 devsvr Keepalived_healthcheckers[34239]: Activating healthchecker for service [XX.XX.XX.19]:443 ... Sep 12 06:55:09 devsvr Keepalived_healthcheckers[34239]: Activating healthchecker for service [XX.XX.XX.52]:443 Sep 12 06:55:09 devsvr Keepalived_healthcheckers[34239]: Activating healthchecker for service [XX.XX.XX.53]:443 Sep 12 06:55:10 devsvr Keepalived_healthcheckers[34239]: pid 34257 exited with permanent error CONFIG. Terminating Sep 12 06:55:10 devsvr Keepalived_healthcheckers[34239]: Removing service [XX.XX.XX.24]:25 from VS [YY.YY.YY.YY]:0 Sep 12 06:55:10 devsvr Keepalived_healthcheckers[34239]: Removing service [XX.XX.XX.25]:25 from VS [YY.YY.YY.YY]:0 Sep 12 06:55:10 devsvr Keepalived_healthcheckers[34239]: Removing service [XX.XX.XX.21]:56667 from VS [ZZ.ZZ.ZZ.ZZ]:0 Sep 12 06:55:10 devsvr Keepalived_healthcheckers[34239]: Removing service [XX.XX.XX.52]:443 from VS [WW.WW.WW.WW]:0 Sep 12 06:55:10 devsvr Keepalived[16705]: Healthcheck child process(34239) died: Respawning Sep 12 06:55:10 devsvr Keepalived[16705]: Starting Healthcheck child process, pid=34260 ... --- It looks like our MISC healthchecker's exit code 3, which should be a valid value according to the following description, is treated as a permanent error since it is equal to KEEPALIVED_EXIT_CONFIG defined in keepalived's lib/scheduler.h : --- # MISC healthchecker, run a program MISC_CHECK { # External script or program ... # exit status 2-255: svc check success, weight # changed to 2 less than exit status. # (for example: exit status of 255 would set # weight to 253) misc_dynamic } --- The problem, we think, have started with this patch (we did not see the problem in Ubuntu 14.04): Stop respawning children repeatedly after permanent error - https://github.com/acassen/keepalived/commit/4ae9314af448eb8ea4f3d8ef39bcc469779b0fec The problem will be fixed by this patch (not included in Ubuntu 16.04): Make report_child_status() check for vrrp and checker child processes - https://github.com/acassen/keepalived/commit/ca955a7c1a6af324428ff04e24be68a180be127f Please consider backporting it to Ubuntu 16.04's keepalived. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/keepalived/+bug/1792298/+subscriptions _______________________________________________ Mailing list: https://launchpad.net/~ubuntu-ha Post to : ubuntu-ha@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-ha More help : https://help.launchpad.net/ListHelp