[systemd-devel] systemd-networkd deletes local IPv6 routes for /128 addresses in VRF tables for VRF enslaved interfaces

2020-05-17 Thread Marcel Menzel
Hello list,

I am using a VRF with multiple Wireguard interfaces in it, and it
contains one dummy interface with a /128 IPv6 and a /32 IPv4 on it, all
managed by systemd-networkd.
This works until I restart systemd-networkd via systemctl restart
systemd-networkd, afterwards I am not able to ping the /128 IPv6 anymore.

The test setup to reproduce the behavior:

01-test-vrf.netdev:
    [NetDev]
    Name=test-vrf
    Kind=vrf

    [VRF]
    TableId=10

01-test-vrf.network:
    [Match]
    Name=test-vrf

    [Route]
    Destination=0.0.0.0/0
    Table=10
    Type=unreachable
    Metric=4278198272

    [Route]
    Destination=::/0
    Table=10
    Type=unreachable
    Metric=4278198272

02-test-dummy.netdev:
    [NetDev]
    Name=test-dummy
    Kind=dummy

02-test-dummy.network:
    [Match]
    Name=test-dummy

    [Network]
    VRF=test-vrf
    Address=fdde:11:22::1/128
    Address=fdde:33:44::1/64
    Address=10.20.30.1/32
    Address=10.20.40.1/24

Upon boot, everything works normally. I am able to ping all IPs on the
dummy interface:
    # ip vrf exec test-vrf ping fdde:11:22::1
    PING fdde:11:22::1(fdde:11:22::1) 56 data bytes
    64 bytes from fdde:11:22::1: icmp_seq=1 ttl=64 time=0.042 ms

    # ip vrf exec test-vrf ping fdde:33:44::1
    PING fdde:33:44::1(fdde:33:44::1) 56 data bytes
    64 bytes from fdde:33:44::1: icmp_seq=1 ttl=64 time=0.042 ms

    # ip vrf exec test-vrf ping 10.20.30.1
    PING 10.20.30.1 (10.20.30.1) 56(84) bytes of data.
    64 bytes from 10.20.30.1: icmp_seq=1 ttl=64 time=0.033 ms

    # ip vrf exec test-vrf ping 10.20.40.1
    PING 10.20.40.1 (10.20.40.1) 56(84) bytes of data.
    64 bytes from 10.20.40.1: icmp_seq=1 ttl=64 time=0.023 ms

And also the local routes have been moved to the VRF table aswell:
    # ip -6 r sh t 10 | grep local
    local fdde:11:22::1 dev test-dummy proto kernel metric 0 pref medium
    local fdde:33:44::1 dev test-dummy proto kernel metric 0 pref medium

    # ip r sh t 10 | grep local
    local 10.20.30.1 dev test-dummy proto kernel scope host src 10.20.30.1
    local 10.20.40.1 dev test-dummy proto kernel scope host src 10.20.40.1

But when I restart systemd-networkd (systemctl restart
systemd-networkd), the local route for the /128 IPv6 on the dummy
interface in the VRF table vanished:
    # ip r sh t 10 | grep local
    local 10.20.30.1 dev test-dummy proto kernel scope host src 10.20.30.1
    local 10.20.40.1 dev test-dummy proto kernel scope host src 10.20.40.1

    # ip -6 r sh t 10 | grep local
    local fdde:33:44::1 dev test-dummy proto kernel metric 0 pref medium

I am able to ping all addresses on the dummy interface except the /128
IPv6 one:
    # ip vrf exec test-vrf ping fdde:11:22::1
    PING fdde:11:22::1(fdde:11:22::1) 56 data bytes
    ^C
    --- fdde:11:22::1 ping statistics ---
    2 packets transmitted, 0 received, 100% packet loss, time 1000ms

To fix this, I either have to delete the test-dummy interface and
restart systemd-networkd afterwards (ip l del test-dummy && systemctl
restart systemd-networkd),
or add this local route by hand again: ip -6 r add local fdde:11:22::1
dev test-dummy proto kernel metric 0 pref medium table 10

The metric gets changed to 1024 though so I have to delete the non-local
route for this interface in order to be able to ping it again so I
prefer the first approach:
    # ip vrf exec test-vrf ping fdde:11:22::1
    PING fdde:11:22::1(fdde:11:22::1) 56 data bytes
    ^C
    --- fdde:11:22::1 ping statistics ---
    1 packets transmitted, 0 received, 100% packet loss, time 0ms

    # ip -6 r sh t 10
    fdde:11:22::1 dev test-dummy proto kernel metric 256 pref medium
    local fdde:11:22::1 dev test-dummy proto kernel metric 1024 pref medium

    # ip -6 r del fdde:11:22::1 dev test-dummy table 10

    # ip vrf exec test-vrf ping fdde:11:22::1
    PING fdde:11:22::1(fdde:11:22::1) 56 data bytes
    64 bytes from fdde:11:22::1: icmp_seq=1 ttl=64 time=0.038 ms

I was able to reproduce this behavior with a Wireguard interface aswell,
so I think this behavior applies to all netdev types. Also worth to
mention, "networkctl reload" does not trigger this behavior aswell.

The kernel docs say:

   Local and connected routes for enslaved devices are automatically
moved to
   the table associated with VRF device. Any additional routes depending on
   the enslaved device are dropped and will need to be reinserted to the VRF
   FIB table following the enslavement.

When commenting out the "VRF=test-vrf" on the dummy's .network file and
enslaving it by hand (ip l set dev test-dummy master test-vrf), this
works aswell as expected until I restart systemd-networkd and I have to
enslave it again and do above steps.

Am I missing something out or did I hit a bug here? Version is systemd
245 (245.5-2-arch) on 5.6.13-arch1-1 (Arch Linux).

Kind regards,

Marcel Menzel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org

Re: [systemd-devel] IPv6 dhcp-acquired prefix delegation?

2020-05-17 Thread Kevin P. Fleming
You have to accept RAs from upstream, unless you have a static route
to the ISP's gateway. DHCPv6 does not provide routing information,
only end-node addressing information, so with RAs ignored you won't
have a default route out of your network.

This is probably the largest, and most surprising, difference between
DHCPv4 and DHCPv6.

On Sat, May 16, 2020 at 8:40 PM John Ioannidis  wrote:
>
> I am running systemd v241, the one that comes with debian-10.
>
> Is the following scenario possible natively (that is, without using a 
> standalone dhcpv6 client)?
>
> My residential ISP will normally hand me a /64, but will give me a /56 if I 
> ask for it. While technically not a statically-allocated prefix, it changes 
> very rarely: I had had the same prefix for over three years until I started 
> experimenting with systemd-networkd and killed my old lease files.
> I have several local vlans, and I want to give a different /64 to each one of 
> them, but if the prefix I get from upstream changes, I want them to 
> automatically renumber.
> I'm perfectly happy relying on SLAAC for the local vlans, but I'm not against 
> having to also run a dhcpv6 server to hand out addresses and things.
>
> My previous solution, using ifupdown instead of systemd was this:
>
> dhcpv6 client with -P --prefix-len-hint 56, which gets the /56 and the 
> default route.
> A script in /etc/dhcp/dhclient-exit-hooks.d/ that builds an /etc/radvd.conf 
> file with bits 56-63 appropriately numbered, and then restarts radvd.
>
> I have been totally unable to reproduce this behavior with pure networkd; in 
> fact, I cannot even get a dhcpv6 session going. I do not want to rely on RAs 
> from upstream, just DHCPv6. Here is a minimal .network file for the external 
> interface ("ethwan")
>
> [Match]
> Name=ethwan
>
> [Network]
> DHCP=yes
> IPForward=yes
> IPMasquerade=no
> IPv6PrivacyExtensions=no
> IPv6AcceptRA=no
>
> [DHCP]
> UseDNS=no
> UseNTP=yes
> UseRoutes=yes
>
> ___
> systemd-devel mailing list
> systemd-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/systemd-devel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd update "forgets" ordering for shutdown

2020-05-17 Thread Michael Chapman
On Sun, 17 May 2020, Andrei Borzenkov wrote:
> 17.05.2020 03:32, Michael Chapman пишет:
> > On Fri, 15 May 2020, Frank Steiner wrote:
> >> Hi,
> >>
> >> I need to run a script on shutdown before any other service is stopped.
> >> Due to an advice Lennart gave a while ago I'm using this service file
> >> (with multi-user.target being our default runlevel target):
> >>
> >> [Unit]
> >> After=multi-user.target
> >>
> >> [Service]
> >> Type=oneshot
> >> ExecStart=/bin/true
> >> ExecStop=/usr/lib/systemd/scripts/halt.local.bio
> >> TimeoutSec=120
> >> RemainAfterExit=yes
> > 
> > This seems inherently fragile.
> > 
> > If `multi-user.target` were to be stopped for whatever reason (and this 
> > is generally possible), the ordering dependencies between services 
> > Before=multi-user.target and services After=multi-user.target are broken. 
> 
> This is universally true. Do you have suggestion how it can be done
> differently?

Perhaps ordering dependencies should propagate through units even when 
they are not part of the transaction?

If we have A.service before B.service before C.service, we probably still 
want A.service before C.service even if B.service is not part of the 
transaction.

I must admit I haven't figured out the repercussions of this... but I'm 
pretty sure it's closer to how people *think* systemd's ordering 
dependencies work.

> Even if multi-user.target is manually stopped, there is no way normal
> service can be stopped concurrently with local filesystems, simply
> because normal service is always ordered After=basic.target. Unless of
> course we manually stop basic.target and sysinit.target as well :)

Well, that's actually my concern... there's really nothing preventing 
those targets from being "accidentally" stopped. That could happen at some 
point during the system's uptime, and utter chaos would ensue the next 
time the system was shutdown.
 
> I cannot reproduce it using trivial service definition on openSUSE Leap
> 15.1 which should have the same systemd as SLE 15 SP1. So possibilities are
> 
> 1. Something in unit definition triggers some bug in systemd. In this
> case exact full unit definition is needed. Also shutdown log with debug
> log level will certainly be useful.
> 
> 2. ExecStop command is not synchronous, it forks and continues in
> background. In this case probably systemd assumes unit is stopped and
> continues. Again, log with debug log level would confirm it.

I agree, we do need to find out exactly what happened with Frank's system. 
There's clearly something more to it if his service is surviving 
local-fs.target being stopped.___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd update "forgets" ordering for shutdown

2020-05-17 Thread Andrei Borzenkov
17.05.2020 03:32, Michael Chapman пишет:
> On Fri, 15 May 2020, Frank Steiner wrote:
>> Hi,
>>
>> I need to run a script on shutdown before any other service is stopped.
>> Due to an advice Lennart gave a while ago I'm using this service file
>> (with multi-user.target being our default runlevel target):
>>
>> [Unit]
>> After=multi-user.target
>>
>> [Service]
>> Type=oneshot
>> ExecStart=/bin/true
>> ExecStop=/usr/lib/systemd/scripts/halt.local.bio
>> TimeoutSec=120
>> RemainAfterExit=yes
> 
> This seems inherently fragile.
> 
> If `multi-user.target` were to be stopped for whatever reason (and this 
> is generally possible), the ordering dependencies between services 
> Before=multi-user.target and services After=multi-user.target are broken. 

This is universally true. Do you have suggestion how it can be done
differently?

Even if multi-user.target is manually stopped, there is no way normal
service can be stopped concurrently with local filesystems, simply
because normal service is always ordered After=basic.target. Unless of
course we manually stop basic.target and sysinit.target as well :)

I cannot reproduce it using trivial service definition on openSUSE Leap
15.1 which should have the same systemd as SLE 15 SP1. So possibilities are

1. Something in unit definition triggers some bug in systemd. In this
case exact full unit definition is needed. Also shutdown log with debug
log level will certainly be useful.

2. ExecStop command is not synchronous, it forks and continues in
background. In this case probably systemd assumes unit is stopped and
continues. Again, log with debug log level would confirm it.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel