Hi Karl.
I can confirm this issue also, we encountered it this morning on a 2 node 
keepalived cluster consisting of 2 VMWARE ubuntu 18.04.1 VMs.  In our case, a 
daily update task had restarted UDEV, which in turn restarted systemd-networkd. 
 When this service restarted, the virtual ip on the MASTER node's NIC was lost, 
but nothing was recognised by keepalived and the ip was never restored on 
either MASTER or BACKUP.  This caused an outage of services hosted on the 
virtualip.

When we investigated, we found that both MASTER and BACKUP nodes only
had their own primary ip addresses, and neither node had the virtual ip.
The virtual ip was unreachable.  No managed failover by keepalived had
occurred.

We restarted keepalived on both nodes, which caused the virtual ip to
re-appear on the MASTER node's NIC.  We can reproduce this on demand
right now by manually restarting systemd-networkd, which causes the
virtual ip to vanish.  The only way to get it to return is to then
manually restart keepalived.

Notably, when this problem occurs, nothing is logged by keepalived in
syslog at all, which suggests it's not recognising the restart of
networkd, or the loss of the virtual ip, and therefore not announcing it
to the BACKUP node.

There is a good discussion on the ubuntu forums about this, and someone
has confirmed that downgrading the keepalived package to the previous
one resolves this behaviour, so it does look like the patch in the
latest package version has potentially introduced this.

Here is the thread for ref:
https://ubuntuforums.org/showthread.php?t=2406400&p=13819524#post13819524

I'm happy to test anything required on a VM if necessary.  We haven't
taken any action to workaround this yet.

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to keepalived in Ubuntu.
https://bugs.launchpad.net/bugs/1810583

Title:
  Daily cron restarts network on unattended updates but keepalived
  .service is not restarted as a dependency

Status in keepalived package in Ubuntu:
  Confirmed

Bug description:
  Description:    Ubuntu 18.04.1 LTS
  Release:        18.04
  ii  keepalived                            1:1.3.9-1ubuntu0.18.04.1          
amd64        Failover and monitoring daemon for LVS clusters

  (From unanswered
  https://answers.launchpad.net/ubuntu/+source/keepalived/+question/676267)

  Since two weeks we lost our keepalived VRRP address on on our of
  systems, closer inspection reveals that this was due to the daily
  cronjob.Apparently something triggered a udev reload (and last week
  the same seemed to happen) which obviously triggers a network restart.

  Are we right in assuming the below patch is the correct way (and
  shouldn't this be in the default install of the systemd service of
  keepalived).

  /etc/systemd/system/multi-user.target.wants/keepalived.service:
  --- keepalived.service.orig 2018-11-20 09:17:06.973924706 +0100
  +++ keepalived.service 2018-11-20 09:05:55.984773226 +0100
  @@ -4,6 +4,7 @@
   Wants=network-online.target
   # Only start if there is a configuration file
   ConditionFileNotEmpty=/etc/keepalived/keepalived.conf
  +PartOf=systemd-networkd.service

  Accompanying syslog:
  Nov 20 06:34:33 ourmachine systemd[1]: Starting Daily apt upgrade and clean 
activities...
  Nov 20 06:34:42 ourmachine systemd[1]: Reloading.
  Nov 20 06:34:44 ourmachine systemd[1]: message repeated 2 times: [ Reloading.]
  Nov 20 06:34:44 ourmachine systemd[1]: Starting Daily apt download 
activities...
  Nov 20 06:34:44 ourmachine systemd[1]: Stopping udev Kernel Device Manager...
  Nov 20 06:34:44 ourmachine systemd[1]: Stopped udev Kernel Device Manager.
  Nov 20 06:34:44 ourmachine systemd[1]: Starting udev Kernel Device Manager...
  Nov 20 06:34:44 ourmachine systemd[1]: Started udev Kernel Device Manager.
  Nov 20 06:34:45 ourmachine systemd[1]: Reloading.
  Nov 20 06:34:45 ourmachine systemd[1]: Reloading.
  Nov 20 06:35:13 ourmachine systemd[1]: Reexecuting.
  Nov 20 06:35:13 ourmachine systemd[1]: Stopped Wait for Network to be 
Configured.
  Nov 20 06:35:13 ourmachine systemd[1]: Stopping Wait for Network to be 
Configured...
  Nov 20 06:35:13 ourmachine systemd[1]: Stopping Network Service..

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/keepalived/+bug/1810583/+subscriptions

_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-ha
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp

Reply via email to