Bug#837759: network configuration stops working reliably
Control: severity -1 important Contro: tag -1 unreproducible In accordance with the severity definitions [1] I downgrade this to "important". It does not completely break systemd, we don't enable networkd by default, and it does not affect every installation (it's not reproducible on our side yet). Don't worry, I'm still eager to find out what's happening here; I'll look at your logs as soon as possible. Martin [1] https://www.debian.org/Bugs/Developer#severities -- Martin Pitt| http://www.piware.de Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)
Processed: Re: Bug#837759: network configuration stops working reliably
Processing control commands: > severity -1 important Bug #837759 [systemd] network configuration stops working reliably Severity set to 'important' from 'grave' -- 837759: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=837759 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#837759: network configuration stops working reliably
Hello Martin, On Monday, 19 September 2016 11:02:08 CEST Martin Pitt wrote: > Hello Wolfgang, > > Wolfgang Walter [2016-09-14 23:34 +0200]: > > > > I tested this with a script: > FTR, I tried this as welll, and I cannot reproduce the bug either. > > Wolfgang Walter [2016-09-14 17:56 +0200]: > > Yes, systemd-networkd ist active. But on most machines I only have *.link > > > entries, usually one to name the device: > *.link entries are handled by udev, not networkd. So if you can > reproduce this on a machine with only has files like > > > == > > [Match] > > MACAddress=11:22:33:44:55:66 > > > > [Link] > > Name=net > > WakeOnLan=off > > == > > then can you please "systemctl disable --now systemd-networkd" and > check if the problem still happens? I suppose not, but if so, this > tells us that this is being done through udev. When I disable systemd-networkd the problem disappears. The reason I think it is a race is because it depends on how many interfaces you set up, if you use systemd-networkd to setup some interfaces and the number of ip-addresses and things you do in /etc/network/interfaces. For example on that simple machines where I only have *.link and don't use systemd-networkd: sometimes (maybe 2 out of 10) it works, but most of the time I loose some or all ip-adresses. Here is the log (without) debugging: in this case the interface only kept the IPv6 addresses and lost its ipv4 address, all set up in /etc/network/interfaces. Sep 19 11:33:25 maiskolben systemd[1]: Starting Raise network interfaces... Sep 19 11:33:25 maiskolben systemd[1]: Starting Network Service... Sep 19 11:33:26 maiskolben systemd-networkd[480]: Enumeration completed Sep 19 11:33:26 maiskolben systemd-networkd[480]: net: Lost carrier Sep 19 11:33:26 maiskolben systemd-networkd[480]: net: Gained carrier Sep 19 11:33:26 maiskolben systemd[1]: Started Network Service. Sep 19 11:33:27 maiskolben systemd-networkd[480]: net: Gained IPv6LL Sep 19 11:33:27 maiskolben ifup[352]: Waiting for DAD... Done Sep 19 11:33:27 maiskolben systemd[1]: Started Raise network interfaces. But there is nothing special about ipv4-addresses. With more interfaces one may loose some or all of the ipv6 adresses, too. I think the crucial point is that systemd-networkd may declares the interface "net" unamanaged AFTER "net: Lost carrier" so that all addresses confgured until that point are ripped off. This " Lost carrier" is always there on startup, don't know if this is caused by udev when it detects the interface on startup. Here is the log with systemd-networkd disabled: Sep 19 11:37:20 maiskolben systemd[1]: Starting Raise network interfaces... Sep 19 11:37:22 maiskolben ifup[400]: Waiting for DAD... Done Sep 19 11:37:23 maiskolben systemd[1]: Started Raise network interfaces. > > If networkd itself is really the culprit, can you please try the > following: > > * Keep it disabled, run your test.sh to set up the dummy interface, >and run > > SYSTEMD_LOG_LEVEL=debug /lib/systemd/systemd-networkd > > (as root). Does this now cause the addresses to be removed? This > will run much later than test.sh, so this will tell us if this is a > principal logic error or a race condition, i. e. only happens if > networkd starts at the right time after test.sh. No, I don't loose any addresses then. But as you see there is no such "net: Lost carrier" or "TEST: Lost carrier" and so on. SYSTEMD_LOG_LEVEL=debug /lib/systemd/systemd-networkd Found container virtualization none Sent message type=method_call sender=n/a destination=org.freedesktop.DBus object=/org/freedesktop/DBus interface=org.freedesktop.DBus member=Hello cookie=1 reply_cookie=0 error=n/a Got message type=method_return sender=org.freedesktop.DBus destination=:1.3 object=n/a interface=n/a member=n/a cookie=1 reply_cookie=1 error=n/a Sent message type=method_call sender=n/a destination=org.freedesktop.DBus object=/org/freedesktop/DBus interface=org.freedesktop.DBus member=AddMatch cookie=2 reply_cookie=0 error=n/a Got message type=method_return sender=org.freedesktop.DBus destination=:1.3 object=n/a interface=n/a member=n/a cookie=3 reply_cookie=2 error=n/a Sent message type=method_call sender=n/a destination=org.freedesktop.DBus object=/org/freedesktop/DBus interface=org.freedesktop.DBus member=RequestName cookie=3 reply_cookie=0 error=n/a Got message type=method_return sender=org.freedesktop.DBus destination=:1.3 object=n/a interface=n/a member=n/a cookie=5 reply_cookie=3 error=n/a Failed to open configuration file '/etc/systemd/networkd.conf': No such file or directory timestamp of '/etc/systemd/network' changed timestamp of '/lib/systemd/network' changed TEST: Flags change: +UP +LOWER_UP +RUNNING +BROADCAST +NOARP Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/network1/link/_34 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=4 reply_cookie=0 error=n/a
Bug#837759: network configuration stops working reliably
Hello Wolfgang, Wolfgang Walter [2016-09-14 23:34 +0200]: > > > I tested this with a script: FTR, I tried this as welll, and I cannot reproduce the bug either. Wolfgang Walter [2016-09-14 17:56 +0200]: > Yes, systemd-networkd ist active. But on most machines I only have *.link > entries, usually one to name the device: *.link entries are handled by udev, not networkd. So if you can reproduce this on a machine with only has files like > == > [Match] > MACAddress=11:22:33:44:55:66 > > [Link] > Name=net > WakeOnLan=off > == then can you please "systemctl disable --now systemd-networkd" and check if the problem still happens? I suppose not, but if so, this tells us that this is being done through udev. If networkd itself is really the culprit, can you please try the following: * Keep it disabled, run your test.sh to set up the dummy interface, and run SYSTEMD_LOG_LEVEL=debug /lib/systemd/systemd-networkd (as root). Does this now cause the addresses to be removed? This will run much later than test.sh, so this will tell us if this is a principal logic error or a race condition, i. e. only happens if networkd starts at the right time after test.sh. * Enable networkd again, and boot with "debug" in the kernel command line. Does this still reproduce the bug? If so, please attach the output of "journalctl -b". If not, just enable debugging for networkd with mkdir -p /etc/systemd/system/systemd-networkd.service.d/ printf '[Service]\nEnvironment=SYSTEMD_LOG_LEVEL=debug' > /etc/systemd/system/systemd-networkd.service.d/debug.conf and reboot. If you catch the bug, please attach "journalctl -b". Thanks, Martin -- Martin Pitt| http://www.piware.de Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)
Bug#837759: network configuration stops working reliably
On 15 September 2016 at 12:36, Felipe Satelerwrote: > On 14 September 2016 at 18:34, Wolfgang Walter > wrote: >> On Wednesday, 14 September 2016 10:00:28 CEST Felipe Sateler wrote: >>> Control: tags -1 moreinfo >>> >>> On 14 September 2016 at 06:59, Wolfgang Walter >>> wrote: >>> > Package: systemd >>> > Version: 231-6 >>> > Severity: grave >>> > >>> > Starting with version 231-6 the configuration of network interfaces stops >>> > working reliably when rebooting a system. Downgrading to 231-5 fixes the >>> > problem. >>> > >>> > Symptoms: If a network interface is configured using >>> > /etc/network/interfaces it seems that systemd now sometimes removes the >>> > configured ip4 and/or ipv6 addresses in the boot process. It also seems >>> > to remove routes of network interfaces configured manually or with >>> > /etc/network/interfaces if the link state changes. >>> > >>> > This seems not only be the case with interfaces configured via >>> > /etc/network/ interfaces but with any interface one creates and assigns >>> > ip addresses manually. >>> > >>> > I tested this with a script: >>> > >>> > #!/bin/sh >>> > if [ "$1" = start ]; then >>> > ip link del TEST >/dev/null 2>&1 || true >>> > ip link add name TEST type dummy >>> > ip -b - <<"EOF" >>> > link set TEST up >>> > addr add 10.10.10.10/32 dev TEST nodad >>> > addr add 2a01:1:1:1::1/128 dev TEST nodad >>> > addr add 2a01:1:1:1::2/128 dev TEST nodad >>> > addr add 2a01:1:1:1::3/128 dev TEST nodad >>> > addr add 2a01:1:1:1::4/128 dev TEST nodad >>> > addr add 2a01:1:1:1::5/128 dev TEST nodad >>> > EOF >>> > ip addr ls TEST >>> > sleep 2 >>> > elif [ "$1" = stop ]; then >>> > ip addr flush dev TEST >>> > ip link del TEST >>> > fi >>> > >>> > which I start with as a systemd oneshot service with >>> > >>> > Before=systemd-networkd.service >>> > >>> > I can see in the journal that TEST has all adresses assigned but with >>> > 231-6 it looses them again (probably when systemd-networkd.service >>> > starts). With 231-5 or earlier this in not the case. >>> >>> It appears you are using systemd-networkd. Could you please attach >>> your networkd configuration? >>> >>> Version 231-6 is built with iptables support, so that may be causing >>> an interaction that was not visible before. >> >> I think this is the problem: >> >> https://anonscm.debian.org/cgit/pkg-systemd/systemd.git/commit/?h=debian/231-6=79e10aaee1cdd412bd42f13f26e558ba1cd2196b >> >> I suppose is that the check for LINK_STATE_UNMANAGED may be racy. >> The interface may go down and up before LINK_STATE_UNMANAGED is set. >> Maybe the state is LINK_STATE_PENDING ? > > Interesting. Did you test with that patch disabled? (sorry, I have not > had time to test). BTW, I have tested manually on my system during runtime and cannot reproduce. If this is a race maybe my laptop while idle managed to configure faster than networkd managed to react. -- Saludos, Felipe Sateler
Bug#837759: network configuration stops working reliably
On 14 September 2016 at 18:34, Wolfgang Walterwrote: > On Wednesday, 14 September 2016 10:00:28 CEST Felipe Sateler wrote: >> Control: tags -1 moreinfo >> >> On 14 September 2016 at 06:59, Wolfgang Walter >> wrote: >> > Package: systemd >> > Version: 231-6 >> > Severity: grave >> > >> > Starting with version 231-6 the configuration of network interfaces stops >> > working reliably when rebooting a system. Downgrading to 231-5 fixes the >> > problem. >> > >> > Symptoms: If a network interface is configured using >> > /etc/network/interfaces it seems that systemd now sometimes removes the >> > configured ip4 and/or ipv6 addresses in the boot process. It also seems >> > to remove routes of network interfaces configured manually or with >> > /etc/network/interfaces if the link state changes. >> > >> > This seems not only be the case with interfaces configured via >> > /etc/network/ interfaces but with any interface one creates and assigns >> > ip addresses manually. >> > >> > I tested this with a script: >> > >> > #!/bin/sh >> > if [ "$1" = start ]; then >> > ip link del TEST >/dev/null 2>&1 || true >> > ip link add name TEST type dummy >> > ip -b - <<"EOF" >> > link set TEST up >> > addr add 10.10.10.10/32 dev TEST nodad >> > addr add 2a01:1:1:1::1/128 dev TEST nodad >> > addr add 2a01:1:1:1::2/128 dev TEST nodad >> > addr add 2a01:1:1:1::3/128 dev TEST nodad >> > addr add 2a01:1:1:1::4/128 dev TEST nodad >> > addr add 2a01:1:1:1::5/128 dev TEST nodad >> > EOF >> > ip addr ls TEST >> > sleep 2 >> > elif [ "$1" = stop ]; then >> > ip addr flush dev TEST >> > ip link del TEST >> > fi >> > >> > which I start with as a systemd oneshot service with >> > >> > Before=systemd-networkd.service >> > >> > I can see in the journal that TEST has all adresses assigned but with >> > 231-6 it looses them again (probably when systemd-networkd.service >> > starts). With 231-5 or earlier this in not the case. >> >> It appears you are using systemd-networkd. Could you please attach >> your networkd configuration? >> >> Version 231-6 is built with iptables support, so that may be causing >> an interaction that was not visible before. > > I think this is the problem: > > https://anonscm.debian.org/cgit/pkg-systemd/systemd.git/commit/?h=debian/231-6=79e10aaee1cdd412bd42f13f26e558ba1cd2196b > > I suppose is that the check for LINK_STATE_UNMANAGED may be racy. > The interface may go down and up before LINK_STATE_UNMANAGED is set. > Maybe the state is LINK_STATE_PENDING ? Interesting. Did you test with that patch disabled? (sorry, I have not had time to test). -- Saludos, Felipe Sateler
Bug#837759: network configuration stops working reliably
On Wednesday, 14 September 2016 10:00:28 CEST Felipe Sateler wrote: > Control: tags -1 moreinfo > > On 14 September 2016 at 06:59, Wolfgang Walter> wrote: > > Package: systemd > > Version: 231-6 > > Severity: grave > > > > Starting with version 231-6 the configuration of network interfaces stops > > working reliably when rebooting a system. Downgrading to 231-5 fixes the > > problem. > > > > Symptoms: If a network interface is configured using > > /etc/network/interfaces it seems that systemd now sometimes removes the > > configured ip4 and/or ipv6 addresses in the boot process. It also seems > > to remove routes of network interfaces configured manually or with > > /etc/network/interfaces if the link state changes. > > > > This seems not only be the case with interfaces configured via > > /etc/network/ interfaces but with any interface one creates and assigns > > ip addresses manually. > > > > I tested this with a script: > > > > #!/bin/sh > > if [ "$1" = start ]; then > > ip link del TEST >/dev/null 2>&1 || true > > ip link add name TEST type dummy > > ip -b - <<"EOF" > > link set TEST up > > addr add 10.10.10.10/32 dev TEST nodad > > addr add 2a01:1:1:1::1/128 dev TEST nodad > > addr add 2a01:1:1:1::2/128 dev TEST nodad > > addr add 2a01:1:1:1::3/128 dev TEST nodad > > addr add 2a01:1:1:1::4/128 dev TEST nodad > > addr add 2a01:1:1:1::5/128 dev TEST nodad > > EOF > > ip addr ls TEST > > sleep 2 > > elif [ "$1" = stop ]; then > > ip addr flush dev TEST > > ip link del TEST > > fi > > > > which I start with as a systemd oneshot service with > > > > Before=systemd-networkd.service > > > > I can see in the journal that TEST has all adresses assigned but with > > 231-6 it looses them again (probably when systemd-networkd.service > > starts). With 231-5 or earlier this in not the case. > > It appears you are using systemd-networkd. Could you please attach > your networkd configuration? > > Version 231-6 is built with iptables support, so that may be causing > an interaction that was not visible before. I think this is the problem: https://anonscm.debian.org/cgit/pkg-systemd/systemd.git/commit/?h=debian/231-6=79e10aaee1cdd412bd42f13f26e558ba1cd2196b I suppose is that the check for LINK_STATE_UNMANAGED may be racy. The interface may go down and up before LINK_STATE_UNMANAGED is set. Maybe the state is LINK_STATE_PENDING ? Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts
Bug#837759: network configuration stops working reliably
Am Mittwoch, 14. September 2016, 10:00:28 schrieben Sie: > Control: tags -1 moreinfo > > On 14 September 2016 at 06:59, Wolfgang Walterwrote: > > Package: systemd > > Version: 231-6 > > Severity: grave > > > > Starting with version 231-6 the configuration of network interfaces stops > > working reliably when rebooting a system. Downgrading to 231-5 fixes the > > problem. > > > > Symptoms: If a network interface is configured using > > /etc/network/interfaces it seems that systemd now sometimes removes the > > configured ip4 and/or ipv6 addresses in the boot process. It also seems > > to remove routes of network interfaces configured manually or with > > /etc/network/interfaces if the link state changes. > > > > This seems not only be the case with interfaces configured via > > /etc/network/ interfaces but with any interface one creates and assigns > > ip addresses manually. > > > > I tested this with a script: > > > > #!/bin/sh > > if [ "$1" = start ]; then > > ip link del TEST >/dev/null 2>&1 || true > > ip link add name TEST type dummy > > ip -b - <<"EOF" > > link set TEST up > > addr add 10.10.10.10/32 dev TEST nodad > > addr add 2a01:1:1:1::1/128 dev TEST nodad > > addr add 2a01:1:1:1::2/128 dev TEST nodad > > addr add 2a01:1:1:1::3/128 dev TEST nodad > > addr add 2a01:1:1:1::4/128 dev TEST nodad > > addr add 2a01:1:1:1::5/128 dev TEST nodad > > EOF > > ip addr ls TEST > > sleep 2 > > elif [ "$1" = stop ]; then > > ip addr flush dev TEST > > ip link del TEST > > fi > > > > which I start with as a systemd oneshot service with > > > > Before=systemd-networkd.service > > > > I can see in the journal that TEST has all adresses assigned but with > > 231-6 it looses them again (probably when systemd-networkd.service > > starts). With 231-5 or earlier this in not the case. > > It appears you are using systemd-networkd. Could you please attach > your networkd configuration? Yes, systemd-networkd ist active. But on most machines I only have *.link entries, usually one to name the device: == [Match] MACAddress=11:22:33:44:55:66 [Link] Name=net WakeOnLan=off == Most of them are virtual machines. On those machine where I also habe *.netdev and *.network entries this also happens. The one with the simpliest has only one *.network: == [Match] Name=net [Network] LinkLocalAddressing=ipv6 IPv6AcceptRouterAdvertisements=no DHCP=no Address=10.11.12.13/24 Gateway=10.11.12.1 Address=2001:1234:1::abc1 Address=2001:1234:1::abc2 Address=2001:1234:1::abc3 Address=2001:1234:1::abc4 NTP=2001:1234:1::123 [Route] Gateway=fe80::1 PreferredSource=2001:1234:1::abc1 == This interface works fine. But other interfaces configured by /etc/network/interfaces or the manually created interface TEST loose there ipv4 and ipv6 addresses. Please note, that I did not create a *.link entry for TEST on any of the machines. If I later restart these interfaces (with ifdown + ifup for /etc/network/interfaces, systemctl restart test-network-device.service for TEST) they keep their addresses. > > Version 231-6 is built with iptables support, so that may be causing > an interaction that was not visible before. Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts
Processed: Re: Bug#837759: network configuration stops working reliably
Processing control commands: > tags -1 moreinfo Bug #837759 [systemd] network configuration stops working reliably Added tag(s) moreinfo. -- 837759: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=837759 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#837759: network configuration stops working reliably
Control: tags -1 moreinfo On 14 September 2016 at 06:59, Wolfgang Walterwrote: > Package: systemd > Version: 231-6 > Severity: grave > > Starting with version 231-6 the configuration of network interfaces stops > working reliably when rebooting a system. Downgrading to 231-5 fixes the > problem. > > Symptoms: If a network interface is configured using /etc/network/interfaces > it seems that systemd now sometimes removes the configured ip4 and/or ipv6 > addresses in the boot process. It also seems to remove routes of network > interfaces configured manually or with /etc/network/interfaces if the link > state changes. > > This seems not only be the case with interfaces configured via /etc/network/ > interfaces but with any interface one creates and assigns ip addresses > manually. > > I tested this with a script: > > #!/bin/sh > if [ "$1" = start ]; then > ip link del TEST >/dev/null 2>&1 || true > ip link add name TEST type dummy > ip -b - <<"EOF" > link set TEST up > addr add 10.10.10.10/32 dev TEST nodad > addr add 2a01:1:1:1::1/128 dev TEST nodad > addr add 2a01:1:1:1::2/128 dev TEST nodad > addr add 2a01:1:1:1::3/128 dev TEST nodad > addr add 2a01:1:1:1::4/128 dev TEST nodad > addr add 2a01:1:1:1::5/128 dev TEST nodad > EOF > ip addr ls TEST > sleep 2 > elif [ "$1" = stop ]; then > ip addr flush dev TEST > ip link del TEST > fi > > which I start with as a systemd oneshot service with > Before=systemd-networkd.service > > I can see in the journal that TEST has all adresses assigned but with 231-6 it > looses them again (probably when systemd-networkd.service starts). With 231-5 > or earlier this in not the case. It appears you are using systemd-networkd. Could you please attach your networkd configuration? Version 231-6 is built with iptables support, so that may be causing an interaction that was not visible before. -- Saludos, Felipe Sateler
Bug#837759: network configuration stops working reliably
Package: systemd Version: 231-6 Severity: grave Starting with version 231-6 the configuration of network interfaces stops working reliably when rebooting a system. Downgrading to 231-5 fixes the problem. Symptoms: If a network interface is configured using /etc/network/interfaces it seems that systemd now sometimes removes the configured ip4 and/or ipv6 addresses in the boot process. It also seems to remove routes of network interfaces configured manually or with /etc/network/interfaces if the link state changes. This seems not only be the case with interfaces configured via /etc/network/ interfaces but with any interface one creates and assigns ip addresses manually. I tested this with a script: #!/bin/sh if [ "$1" = start ]; then ip link del TEST >/dev/null 2>&1 || true ip link add name TEST type dummy ip -b - <<"EOF" link set TEST up addr add 10.10.10.10/32 dev TEST nodad addr add 2a01:1:1:1::1/128 dev TEST nodad addr add 2a01:1:1:1::2/128 dev TEST nodad addr add 2a01:1:1:1::3/128 dev TEST nodad addr add 2a01:1:1:1::4/128 dev TEST nodad addr add 2a01:1:1:1::5/128 dev TEST nodad EOF ip addr ls TEST sleep 2 elif [ "$1" = stop ]; then ip addr flush dev TEST ip link del TEST fi which I start with as a systemd oneshot service with Before=systemd-networkd.service I can see in the journal that TEST has all adresses assigned but with 231-6 it looses them again (probably when systemd-networkd.service starts). With 231-5 or earlier this in not the case. Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts