[Bug 1815101] Re: [master] Restarting systemd-networkd breaks keepalived clusters

2019-09-25 Thread Rafael David Tinoco
Alright,

As this is a problem that does not only affect keepalived, but, all
cluster-like softwares dealing with aliases in any existing interface,
managed or not by systemd, I have tested the same test case in a
pacemaker based cluster, with 3 nodes, having 1 virtual IP + a lighttpd
instance running in the same resource group:



(k)inaddy@kcluster01:~$ crm config show
node 1: kcluster01
node 2: kcluster02
node 3: kcluster03
primitive fence_kcluster01 stonith:fence_virsh \
params ipaddr=192.168.100.205 plug=kcluster01 action=off 
login=stonithmgr passwd= use_sudo=true delay=2 \
op monitor interval=60s
primitive fence_kcluster02 stonith:fence_virsh \
params ipaddr=192.168.100.205 plug=kcluster02 action=off 
login=stonithmgr passwd= use_sudo=true delay=4 \
op monitor interval=60s
primitive fence_kcluster03 stonith:fence_virsh \
params ipaddr=192.168.100.205 plug=kcluster03 action=off 
login=stonithmgr passwd= use_sudo=true delay=6 \
op monitor interval=60s
primitive virtual_ip IPaddr2 \
params ip=10.0.3.1 nic=eth3 \
op monitor interval=10s
primitive webserver systemd:lighttpd \
op monitor interval=10 timeout=60
group webserver_virtual_ip webserver virtual_ip
location l_fence_kcluster01 fence_kcluster01 -inf: kcluster01
location l_fence_kcluster02 fence_kcluster02 -inf: kcluster02
location l_fence_kcluster03 fence_kcluster03 -inf: kcluster03
property cib-bootstrap-options: \
have-watchdog=true \
dc-version=2.0.1-9e909a5bdd \
cluster-infrastructure=corosync \
cluster-name=debian \
stonith-enabled=true \
stonith-action=off \
no-quorum-policy=stop



(k)inaddy@kcluster01:~$ cat /etc/netplan/cluster.yaml 
network:
version: 2
renderer: networkd
ethernets:
eth1:
dhcp4: no
dhcp6: no
addresses: [10.0.1.2/24]
eth2:
dhcp4: no
dhcp6: no
addresses: [10.0.2.2/24]
eth3:
dhcp4: no
dhcp6: no
addresses: [10.0.3.2/24]
eth4:
dhcp4: no
dhcp6: no
addresses: [10.0.4.2/24]
eth5:
dhcp4: no
dhcp6: no
addresses: [10.0.5.2/24]



AND the virtual IP failed right after the netplan acted in systemd
network interface.

(k)inaddy@kcluster03:~$ sudo netplan apply
(k)inaddy@kcluster03:~$ ping 10.0.3.1
PING 10.0.3.1 (10.0.3.1) 56(84) bytes of data.
>From 10.0.3.4 icmp_seq=1 Destination Host Unreachable
>From 10.0.3.4 icmp_seq=2 Destination Host Unreachable
>From 10.0.3.4 icmp_seq=3 Destination Host Unreachable
>From 10.0.3.4 icmp_seq=4 Destination Host Unreachable
>From 10.0.3.4 icmp_seq=5 Destination Host Unreachable
>From 10.0.3.4 icmp_seq=6 Destination Host Unreachable
64 bytes from 10.0.3.1: icmp_seq=7 ttl=64 time=0.088 ms
64 bytes from 10.0.3.1: icmp_seq=8 ttl=64 time=0.076 ms

--- 10.0.3.1 ping statistics ---
8 packets transmitted, 2 received, +6 errors, 75% packet loss, time 7128ms
rtt min/avg/max/mdev = 0.076/0.082/0.088/0.006 ms, pipe 4

Liked explained in this bug description. With that, virtual_ip_monitor,
from pacemaker, realized the virtual IP was gone and re-started it in
the same node:



(k)inaddy@kcluster01:~$ crm status
Stack: corosync
Current DC: kcluster01 (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Wed Sep 25 13:11:05 2019
Last change: Wed Sep 25 12:49:56 2019 by root via cibadmin on kcluster01

3 nodes configured
5 resources configured

Online: [ kcluster01 kcluster02 kcluster03 ]

Full list of resources:

 fence_kcluster01   (stonith:fence_virsh):  Started kcluster02
 fence_kcluster02   (stonith:fence_virsh):  Started kcluster01
 fence_kcluster03   (stonith:fence_virsh):  Started kcluster01
 Resource Group: webserver_virtual_ip
 webserver  (systemd:lighttpd): Started kcluster03
 virtual_ip (ocf::heartbeat:IPaddr2):   FAILED kcluster03

Failed Resource Actions:
* virtual_ip_monitor_1 on kcluster03 'not running' (7): call=100, 
status=complete, exitreason='',
last-rc-change='Wed Sep 25 13:11:05 2019', queued=0ms, exec=0ms



(k)inaddy@kcluster01:~$ crm status
Stack: corosync
Current DC: kcluster01 (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Wed Sep 25 13:11:07 2019
Last change: Wed Sep 25 12:49:56 2019 by root via cibadmin on kcluster01

3 nodes configured
5 resources configured

Online: [ kcluster01 kcluster02 kcluster03 ]

Full list of resources:

 fence_kcluster01   (stonith:fence_virsh):  Started kcluster02
 fence_kcluster02   (stonith:fence_virsh):  Started kcluster01
 fence_kcluster03   (stonith:fence_virsh):  Started kcluster01
 Resource Group: webserver_virtual_ip
 webserver  (systemd:lighttpd): Started kcluster03
 virtual_ip (ocf::heartbeat:IPaddr2):   Started kcluster03

Failed Resource Actions:
* virtual_ip_monitor_1 

[Bug 1815101] Re: [master] Restarting systemd-networkd breaks keepalived clusters

2019-09-20 Thread Lucas Kanashiro
** Also affects: heartbeat (Ubuntu)
   Importance: Undecided
   Status: New

** Changed in: heartbeat (Ubuntu Bionic)
   Importance: Undecided => Medium

** Changed in: heartbeat (Ubuntu Bionic)
   Status: New => Triaged

** Changed in: heartbeat (Ubuntu Disco)
   Importance: Undecided => Medium

** Changed in: heartbeat (Ubuntu Disco)
   Status: New => Triaged

** Changed in: heartbeat (Ubuntu Eoan)
   Importance: Undecided => Low

** Changed in: heartbeat (Ubuntu Eoan)
   Status: New => Triaged

** Changed in: heartbeat (Ubuntu Bionic)
 Assignee: (unassigned) => Rafael David Tinoco (rafaeldtinoco)

** Changed in: heartbeat (Ubuntu Disco)
 Assignee: (unassigned) => Rafael David Tinoco (rafaeldtinoco)

** Changed in: heartbeat (Ubuntu Eoan)
 Assignee: (unassigned) => Rafael David Tinoco (rafaeldtinoco)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1815101

Title:
  [master] Restarting systemd-networkd breaks keepalived clusters

To manage notifications about this bug go to:
https://bugs.launchpad.net/netplan/+bug/1815101/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1815101] Re: [master] Restarting systemd-networkd breaks keepalived clusters

2019-09-13 Thread Rafael David Tinoco
Based on comment #12, and other comments from other duplicate cases,
I'll summarize here in a better (and consolidated way) how to reproduce
the issue, how to mitigate it using the dummy workaround, and how to fix
it (with the backports/merge requests). At the end I might provide a PPA
asking for feedback.

** Changed in: netplan
   Status: Invalid => Confirmed

** Changed in: keepalived (Ubuntu)
   Status: Triaged => Confirmed

** Changed in: systemd (Ubuntu)
   Status: Triaged => Confirmed

** Also affects: keepalived (Ubuntu Eoan)
   Importance: Undecided
   Status: Confirmed

** Also affects: systemd (Ubuntu Eoan)
   Importance: Undecided
   Status: Confirmed

** Also affects: keepalived (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Also affects: systemd (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Also affects: keepalived (Ubuntu Disco)
   Importance: Undecided
   Status: New

** Also affects: systemd (Ubuntu Disco)
   Importance: Undecided
   Status: New

** Changed in: keepalived (Ubuntu Bionic)
   Status: New => Confirmed

** Changed in: keepalived (Ubuntu Disco)
   Status: New => Confirmed

** Changed in: systemd (Ubuntu Bionic)
   Status: New => Confirmed

** Changed in: systemd (Ubuntu Disco)
   Status: New => Confirmed

** Changed in: keepalived (Ubuntu Bionic)
   Importance: Undecided => Medium

** Changed in: keepalived (Ubuntu Disco)
   Importance: Undecided => Medium

** Changed in: keepalived (Ubuntu Eoan)
   Importance: Undecided => Medium

** Changed in: systemd (Ubuntu Bionic)
   Importance: Undecided => Medium

** Changed in: systemd (Ubuntu Disco)
   Importance: Undecided => Medium

** Changed in: systemd (Ubuntu Eoan)
   Importance: Undecided => Medium

** Changed in: keepalived (Ubuntu Bionic)
 Assignee: (unassigned) => Rafael David Tinoco (rafaeldtinoco)

** Changed in: keepalived (Ubuntu Disco)
 Assignee: (unassigned) => Rafael David Tinoco (rafaeldtinoco)

** Changed in: keepalived (Ubuntu Eoan)
 Assignee: (unassigned) => Rafael David Tinoco (rafaeldtinoco)

** Changed in: systemd (Ubuntu Bionic)
 Assignee: (unassigned) => Rafael David Tinoco (rafaeldtinoco)

** Changed in: systemd (Ubuntu Disco)
 Assignee: (unassigned) => Rafael David Tinoco (rafaeldtinoco)

** Changed in: systemd (Ubuntu Eoan)
 Assignee: (unassigned) => Rafael David Tinoco (rafaeldtinoco)

** Changed in: netplan
 Assignee: (unassigned) => Rafael David Tinoco (rafaeldtinoco)

** Changed in: systemd (Ubuntu Eoan)
   Status: Confirmed => In Progress

** Changed in: keepalived (Ubuntu Eoan)
   Status: Confirmed => In Progress

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1815101

Title:
  [master] Restarting systemd-networkd breaks keepalived clusters

To manage notifications about this bug go to:
https://bugs.launchpad.net/netplan/+bug/1815101/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1815101] Re: [master] Restarting systemd-networkd breaks keepalived clusters

2019-09-13 Thread Rafael David Tinoco
The following 3 bugs:

https://bugs.launchpad.net/bugs/1815101
https://bugs.launchpad.net/bugs/1819074
https://bugs.launchpad.net/bugs/1810583

Have the same root cause: the fact that systemd-network messes with
secondary IP addresses in NICs managed by systemd.

I'm marking all other cases as a duplicate of LP: #1815101.

TODO here is the following:

- There are mainly 2 "fixes" for this issue:

1) keepalived is able to recognize systemd-networkd changes and change
cluster status in order to reconfigure managed NICs (keepalived (>
2.0.x)).

2) systemd-networkd implements a new stanza (KeepConfiguration=) to
systemd service unit files in order to fix not only this behavior but
all those HA related software that manages secondary IPs and/or aliases
to NICs being managed by systemd-networkd.

I think the most appropriate would make sure those 2 features work in
Eoan, both, together, and then make sure the SRUs are done to Disco and
Bionic. One problem w/ the item (2) is that netplan will also have to
support the new "KeepConfiguration=" systemd service file stanza, but,
the fix (2) is more appropriate for all other HA related softwares
controlling virtual IPs (CTDB, Pacemaker, and so ...).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1815101

Title:
  [master] Restarting systemd-networkd breaks keepalived clusters

To manage notifications about this bug go to:
https://bugs.launchpad.net/netplan/+bug/1815101/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1815101] Re: [master] Restarting systemd-networkd breaks keepalived clusters

2019-08-26 Thread Bryce Harrington
The aforementioned link shows there's been work towards a fix in
systemd.  Can't say if that suggests what can be done to improve
keepalived, but I've tagged this "server-next" to get it on the Ubuntu
SErver Team's high priority list, as per Robie's earlier comment.

** Tags added: server-next

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1815101

Title:
  [master] Restarting systemd-networkd breaks keepalived clusters

To manage notifications about this bug go to:
https://bugs.launchpad.net/netplan/+bug/1815101/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1815101] Re: [master] Restarting systemd-networkd breaks keepalived clusters

2019-08-21 Thread Jorge Niedbalski
For reference: https://github.com/systemd/systemd/pull/12511

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1815101

Title:
  [master] Restarting systemd-networkd breaks keepalived clusters

To manage notifications about this bug go to:
https://bugs.launchpad.net/netplan/+bug/1815101/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1815101] Re: [master] Restarting systemd-networkd breaks keepalived clusters

2019-05-09 Thread Robie Basak
It looks like there is some clear and actionable work in keepalived here
(even if as a workaround and the real fix ends up being in systemd), so
I'm marking it as Triaged.

FTR, the Ubuntu Server Team is aware of this as a high level issue and
it is high up in our list of priorities to determine how to address it
properly.

** Changed in: keepalived (Ubuntu)
   Status: Incomplete => Triaged

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1815101

Title:
  [master] Restarting systemd-networkd breaks keepalived clusters

To manage notifications about this bug go to:
https://bugs.launchpad.net/netplan/+bug/1815101/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1815101] Re: [master] Restarting systemd-networkd breaks keepalived clusters

2019-05-07 Thread Leroy Tennison
If I understand the keepalived > 2.0.x behavior referred to by cdmiller
above (see 2019-03-07 comment) that is not the appropriate response to
the problem.  Granted, it mitigates the consequences butr doesn't
address the underlying issue. A systemd-source issue should not cause
keepalived failover since failover is designed to address issues of
system or hardware failure, not the bad behavior of other system
software.  systemd needs to be made to cooperate with other software
rather than assuming it is the only authority on the system.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1815101

Title:
  [master] Restarting systemd-networkd breaks keepalived clusters

To manage notifications about this bug go to:
https://bugs.launchpad.net/netplan/+bug/1815101/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs