Balint, based on your input...

> thanks for the fixes in Eoan. Unfortunately we have a product based on
> disco and cannot move forward at this time. Being a networking shop,
> this issue has a serious effect on us and we would like to avoid moving
> to something like ifupdown2 within our stable branch.

So, Disco is EOL as it is not a LTS version, that is why it did not
get a fix (as the fix is very close to the one done in Eoan). Since
its unsupported by the community, it's up to you backport the Eoan
fixes to Disco if you'd like... you can even create a PPA for your
product and distribute along.

> For our users the real impact of the bug is not that that the interface
> that we are currently reconfiguring is suffering a downtime, but the
> fact that _all_ interfaces have their aliases removed if networkd is
> restarted. The proposed KeepConfiguration solution kind of beats the
> purpose of reconfiguring the interfaces, as old addresses are kept and
> need to be handled manually. Also it interferes with how DHCP works. I
> believe this might be an issue for others as well.

We are following systemd-networkd upstream decisions here. The option
"dhcp" only exists for CERTAIN scenarios (when root disk depends on
that connection, for iSCSI and/or NFS/ROOT for example). It is
explicitly said in the documentation:

"""
Takes a boolean or one of "static", "dhcp-on-stop", "dhcp". When
"static", systemd-networkd will not drop static addresses and routes
on starting up process. When set to "dhcp-on-stop", systemd-networkd
will not drop addresses and routes on stopping the daemon. When
"dhcp", the addresses and routes provided by a DHCP server will never
be dropped even if the DHCP lease expires. This is contrary to the
DHCP specification, but may be the best choice if, e.g., the root
filesystem relies on this connection. The setting "dhcp" implies
"dhcp-on-stop", and "yes" implies "dhcp" and "static". Defaults to
"no".
"""

and it is a question of choice: to have a window of opportunity for
duplicate IPs - in cases where there is no dynamic IP mapping to that
mac address - but possibly maintain the connection instead of causing
uninterruptable I/Os trying to shutdown a machine, for example. I
particularly don't like this option but it is not the default one and
was meant for a specific purpose.

>
> >From our point of view the ideal solution would be a combination of the
> keepalived patch that detects VIP removal and systemd version 244 that
> already supports "networkctl reconfigure" and "networkctl reload".

networkctl reconfigure/reload is a new functionality and won't be
added to previous already released versions as this is against SRU
guidelines. Systemd 244.2-1ubuntu1 is being included in 20.04, our
NEXT LTS version.

Like said before, you can try backporting systemd 244 to disco, or
bionic, if you are willing to support it on your own as it was already
EOL for community support. You should follow:
https://packaging.ubuntu.com/html/backports.html if you would like to
do that.

For the keepalived patches, they could be backported to Eoan, maybe
Bionic and Xenial depending on the amount of work. But then I would
need a practical example of why the systemd-networkd fix is no good in
most used scenarios.

> Is there any chance that v244 is backported to bionic? It is already
> included in focal and debian stable backports, but unfortunately I am
> not familiar enough with systemd development to tell what the impact of
> this would be.

Problem with backports is that they are unsupported even on supported
releases. I wouldn't be able to guarantee functionalities or fix it in
a constant basis. You can do it on your own and have it in a PPA of
your product, for example.

As systemd nowadays include networkd, udev management, sysV runtime
generators, tmpfiles creation, sockets creation, cgroups integration
for the process slices, etc etc... it is very very risky to backport
systemd to have "just" those 2 functionalities.

>
> As for keepalived, in bug #1819074 there was an ongoing investigation on
> the patch, that implements the keepalived transition on removing the
> VIP. We have traced back this functionality to this patch:
>
> https://github.com/acassen/keepalived/commit/0b1528c76d3fe8d1c5765841df86c59570a036da
>
> It was born before v1.3.6 was released, so we hope that it is self-
> contained enough for a backport if v2.0 of keepalived is not included in
> bionic-backports.

Let me check keepalived fix more closely and see what can be done for
the previous releases. As we are close to freeze date for our next LTS
release, it is unlikely that I do it before 2 weeks from now (as our
focus is in the development version entirely and I still need to fix
netplan to support the networkd KeepConfiguration functionality).

Lets keep talking.. I'll first patch netplan and go back with other
releases to check what can be done for them.

For now I would *strongly* recommend that in previous releases,
whoever wants to use HA related resource managers, to stick with
ifupdown / resolvconf / net-tools / bridge-utils / vlan package
combination as it works for is trying to be accomplished here.

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to keepalived in Ubuntu.
https://bugs.launchpad.net/bugs/1815101

Title:
  [master] Restarting systemd-networkd breaks keepalived, heartbeat,
  corosync, pacemaker (interface aliases are restarted)

Status in Keepalived Charm:
  New
Status in netplan:
  Confirmed
Status in heartbeat package in Ubuntu:
  Won't Fix
Status in keepalived package in Ubuntu:
  In Progress
Status in systemd package in Ubuntu:
  In Progress
Status in keepalived source package in Xenial:
  Confirmed
Status in systemd source package in Xenial:
  Confirmed
Status in keepalived source package in Bionic:
  Confirmed
Status in systemd source package in Bionic:
  Confirmed
Status in keepalived source package in Disco:
  Won't Fix
Status in systemd source package in Disco:
  Won't Fix
Status in keepalived source package in Eoan:
  In Progress
Status in systemd source package in Eoan:
  Fix Released

Bug description:
  [impact]

  - ALL related HA software has a small problem if interfaces are being
  managed by systemd-networkd: nic restarts/reconfigs are always going
  to wipe all interfaces aliases when HA software is not expecting it to
  (no coordination between them.

  - keepalived, smb ctdb, pacemaker, all suffer from this. Pacemaker is
  smarter in this case because it has a service monitor that will
  restart the virtual IP resource, in affected node & nic, before
  considering a real failure, but other HA service might consider a real
  failure when it is not.

  [test case]

  - comment #14 is a full test case: to have 3 node pacemaker, in that
  example, and cause a networkd service restart: it will trigger a
  failure for the virtual IP resource monitor.

  - other example is given in the original description for keepalived.
  both suffer from the same issue (and other HA softwares as well).

  [regression potential]

  - this backports KeepConfiguration parameter, which adds some
  significant complexity to networkd's configuration and behavior, which
  could lead to regressions in correctly configuring the network at
  networkd start, or incorrectly maintaining configuration at networkd
  restart, or losing network state at networkd stop.

  - Any regressions are most likely to occur during networkd start,
  restart, or stop, and most likely to involve missing or incorrect ip
  address(es).

  - the change is based in upstream patches adding the exact feature we
  needed to fix this issue & it will be integrated with a netplan change
  to add the needed stanza to systemd nic configuration file
  (KeepConfiguration=)

  [other info]

  original description:
  ---

  Configure netplan for interfaces, for example (a working config with
  IP addresses obfuscated)

  network:
      ethernets:
          eth0:
              addresses: [192.168.0.5/24]
              dhcp4: false
              nameservers:
                search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, 
phone.blah.com]
                addresses: [10.22.11.1]
          eth2:
              addresses:
                - 12.13.14.18/29
                - 12.13.14.19/29
              gateway4: 12.13.14.17
              dhcp4: false
              nameservers:
                search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, 
phone.blah.com]
                addresses: [10.22.11.1]
          eth3:
              addresses: [10.22.11.6/24]
              dhcp4: false
              nameservers:
                search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, 
phone.blah.com]
                addresses: [10.22.11.1]
          eth4:
              addresses: [10.22.14.6/24]
              dhcp4: false
              nameservers:
                search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, 
phone.blah.com]
                addresses: [10.22.11.1]
          eth7:
              addresses: [9.5.17.34/29]
              dhcp4: false
              optional: true
              nameservers:
                search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, 
phone.blah.com]
                addresses: [10.22.11.1]
      version: 2

  Configure keepalived (again, a working config with IP addresses
  obfuscated)

  global_defs           # Block id
  {
  notification_email {
          sysadm...@blah.com
  }
          notification_email_from keepali...@system3.hq.blah.com
          smtp_server 10.22.11.7     # IP
          smtp_connect_timeout 30      # integer, seconds
          router_id system3          # string identifying the machine,
                                       # (doesn't have to be hostname).
          vrrp_mcast_group4 224.0.0.18 # optional, default 224.0.0.18
          vrrp_mcast_group6 ff02::12   # optional, default ff02::12
          enable_traps                 # enable SNMP traps
  }
  vrrp_sync_group collection {
          group {
                  wan
                  lan
                  phone
          }
  vrrp_instance wan {
          state MASTER
          interface eth2
          virtual_router_id 77
          priority 150
          advert_int 1
          smtp_alert
          authentication {
                  auth_type PASS
                  auth_pass BlahBlah
          }
          virtual_ipaddress {
          12.13.14.20
          }
  }
  vrrp_instance lan {
          state MASTER
          interface eth3
          virtual_router_id 78
          priority 150
          advert_int 1
          smtp_alert
          authentication {
                  auth_type PASS
                  auth_pass MoreBlah
          }
          virtual_ipaddress {
                  10.22.11.13/24
          }
  }
  vrrp_instance phone {
          state MASTER
          interface eth4
          virtual_router_id 79
          priority 150
          advert_int 1
          smtp_alert
          authentication {
                  auth_type PASS
                  auth_pass MostBlah
          }
          virtual_ipaddress {
                  10.22.14.3/24
          }
  }

  At boot the affected interfaces have:
  5: eth4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group 
default qlen 1000
      link/ether ab:cd:ef:90:c0:e3 brd ff:ff:ff:ff:ff:ff
      inet 10.22.14.6/24 brd 10.22.14.255 scope global eth4
         valid_lft forever preferred_lft forever
      inet 10.22.14.3/24 scope global secondary eth4
         valid_lft forever preferred_lft forever
      inet6 fe80::ae1f:6bff:fe90:c0e3/64 scope link
         valid_lft forever preferred_lft forever
  7: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group 
default qlen 1000
      link/ether ab:cd:ef:b0:26:29 brd ff:ff:ff:ff:ff:ff
      inet 10.22.11.6/24 brd 10.22.11.255 scope global eth3
         valid_lft forever preferred_lft forever
      inet 10.22.11.13/24 scope global secondary eth3
         valid_lft forever preferred_lft forever
      inet6 fe80::ae1f:6bff:feb0:2629/64 scope link
         valid_lft forever preferred_lft forever
  9: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group 
default qlen 1000
      link/ether ab:cd:ef:b0:26:2b brd ff:ff:ff:ff:ff:ff
      inet 12.13.14.18/29 brd 12.13.14.23 scope global eth2
         valid_lft forever preferred_lft forever
      inet 12.13.14.20/32 scope global eth2
         valid_lft forever preferred_lft forever
      inet 12.33.89.19/29 brd 12.13.14.23 scope global secondary eth2
         valid_lft forever preferred_lft forever
      inet6 fe80::ae1f:6bff:feb0:262b/64 scope link
         valid_lft forever preferred_lft forever

  Run 'netplan try' (didn't even make any changes to the configuration) and the 
keepalived addresses disappear never to return, the affected interfaces have:
  5: eth4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group 
default qlen 1000
      link/ether ab:cd:ef:90:c0:e3 brd ff:ff:ff:ff:ff:ff
      inet 10.22.14.6/24 brd 10.22.14.255 scope global eth4
         valid_lft forever preferred_lft forever
      inet6 fe80::ae1f:6bff:fe90:c0e3/64 scope link
         valid_lft forever preferred_lft forever
  7: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group 
default qlen 1000
      link/ether ab:cd:ef:b0:26:29 brd ff:ff:ff:ff:ff:ff
      inet 10.22.11.6/24 brd 10.22.11.255 scope global eth3
         valid_lft forever preferred_lft forever
      inet6 fe80::ae1f:6bff:feb0:2629/64 scope link
         valid_lft forever preferred_lft forever
  9: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group 
default qlen 1000
      link/ether ab:cd:ef:b0:26:2b brd ff:ff:ff:ff:ff:ff
      inet 12.13.14.18/29 brd 12.13.14.23 scope global eth2
         valid_lft forever preferred_lft forever
      inet 12.33.89.19/29 brd 12.13.14.23 scope global secondary eth2
         valid_lft forever preferred_lft forever
      inet6 fe80::ae1f:6bff:feb0:262b/64 scope link
         valid_lft forever preferred_lft forever

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-keepalived/+bug/1815101/+subscriptions

_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-ha
Post to     : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp

Reply via email to