Re: [systemd-devel] systemd-container: Trying to use a bookworm chroot with a buster host fails / Failed to create /init.scope control group

2022-12-05 Thread Bernhard Übelacker




Am 03.12.22 um 23:38 schrieb Bernhard Übelacker:


I thought if strace can observe the process in question, would gdb also
be able. And found starting nspawn with gdbserver, 'set follow-fork-mode 
child'

and gdb from inside the container via plain chroot seems working well.

So it looks like the failing "syscall_0x1b7" from strace is "faccessat2" 
[2].


And it seems "faccessat2" got added just in kernel 5.8 [3],
therefore it might fail with the kernel 4.19.
So I fear this needs a newer kernel, and/or this is more a glibc issue 
then?




Hello,
just a few short additions.
I was looking further into this issue, and found disabling apparmor
by booting the host with "apparmor=0" did not improve the situation.


Then I found following entry in the systemd debian package changelog [1][2]:

   * seccomp: allow turning off of seccomp filtering via env var.
 Since glibc 2.33 faccessat() is implemented via faccessat2(), which
 is breaking running containers that use such a version of glibc under
 systemd-nspawn in Buster.
 Turning off seccomp filtering via the SYSTEMD_SECCOMP env var makes it
 possible to run such new containers. (Closes: #984573)


This fits perfectly the situation and the container starts
successfully with this workaround:

SYSTEMD_SECCOMP=0 systemd-nspawn 
--directory=/var/lib/machines/test-bookworm --boot


Kind regards,
Bernhard


[1] 
https://metadata.ftp-master.debian.org/changelogs//main/s/systemd/systemd_241-7~deb10u8_changelog
[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=984573



Re: [systemd-devel] systemd-container: Trying to use a bookworm chroot with a buster host fails / Failed to create /init.scope control group

2022-12-03 Thread Bernhard Übelacker

(Resent after subscription, as non-subscribers get rejected.)



Hello,
I opened the initial Debian bug report, but did took the time to
ask at systemd-devel and found this thread was already asked,
so I am trying to provide further information.




> Do you have any MACs in effect?
No SELinux or Apparmor active


As far as I see in my test VM with minimal Debian Buster there is no SELinux.
"aa-status" returns "apparmor module is loaded.", but I did not intentionally
configure anything to it.




> Does the host use cgroupsv2 or cgroupsv2 or hybrid? The host system uses 
systemd v241, compiled with default-hierarchy=hybrid

> Was the container configured to use either?
The container uses systemd v251 with default-hierarchy=unified


At the host:
   # systemd --version
   systemd 241 (241)
   +PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP 
+GCRYPT +GNUTLS \
   +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 
default-hierarchy=hybrid

In the container:
   # systemd --version
   systemd 252 (252.2-1)
   +PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL 
+ACL +BLKID \
   +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK 
+PCRE2 -PWQUALITY \
   -P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK 
-XKBCOMMON +UTMP \
   +SYSVINIT default-hierarchy=unified




> What is mounted to /sys/fs/cgroup and below?


At the host:
   # mount | grep /sys/fs/cgroup
   tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
   cgroup2 on /sys/fs/cgroup/unified type cgroup2 
(rw,nosuid,nodev,noexec,relatime,nsdelegate)
   cgroup on /sys/fs/cgroup/systemd type cgroup 
(rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
   cgroup on /sys/fs/cgroup/blkio type cgroup 
(rw,nosuid,nodev,noexec,relatime,blkio)
   cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup 
(rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
   cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup 
(rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
   cgroup on /sys/fs/cgroup/freezer type cgroup 
(rw,nosuid,nodev,noexec,relatime,freezer)
   cgroup on /sys/fs/cgroup/devices type cgroup 
(rw,nosuid,nodev,noexec,relatime,devices)
   cgroup on /sys/fs/cgroup/cpuset type cgroup 
(rw,nosuid,nodev,noexec,relatime,cpuset)
   cgroup on /sys/fs/cgroup/rdma type cgroup 
(rw,nosuid,nodev,noexec,relatime,rdma)
   cgroup on /sys/fs/cgroup/perf_event type cgroup 
(rw,nosuid,nodev,noexec,relatime,perf_event)
   cgroup on /sys/fs/cgroup/memory type cgroup 
(rw,nosuid,nodev,noexec,relatime,memory)
   cgroup on /sys/fs/cgroup/pids type cgroup 
(rw,nosuid,nodev,noexec,relatime,pids)




> This is new payload on old host?


Yes, it is an test to use on an older Debian Buster with kernel 4.19.260-1
a quite recent Debian Bookworm/testing system.




> if you force container into cgroupsv1 mode as the host (by adding
> systemd.unified_cgroup_hierarchy=no to the nspawn cmdline, does that
> work?


I am not sure if I am using it right, but as far as I see
"systemd.unified_cgroup_hierarchy=no" does not help.
I added "debug" too, see below in [1].





> Also, please provide the relevant output from "strace -f -s 500 -y -o
> /tmp/log.strace" (put on some pastebin)


Following pastebin contains the last quarter of the log.strace
file recorded by the command in [1]:

  https://paste.debian.net/1262752/




I thought if strace can observe the process in question, would gdb also
be able. And found starting nspawn with gdbserver, 'set follow-fork-mode child'
and gdb from inside the container via plain chroot seems working well.

So it looks like the failing "syscall_0x1b7" from strace is "faccessat2" [2].

And it seems "faccessat2" got added just in kernel 5.8 [3],
therefore it might fail with the kernel 4.19.
So I fear this needs a newer kernel, and/or this is more a glibc issue then?


Kind regards,
Bernhard






[1]# strace -f -s 500 -y -o /tmp/log.strace systemd-nspawn 
--directory=/var/lib/machines/test-bookworm --boot 
systemd.unified_cgroup_hierarchy=no debug
Spawning container test-bookworm on /var/lib/machines/test-bookworm.
Press ^] three times within 1s to kill container.
systemd 252.2-1 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA 
+SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 
+IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT 
+QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP 
+SYSVINIT default-hierarchy=unified)
Detected virtualization systemd-nspawn.
Detected architecture x86-64.
Detected initialized system, this is not the first boot.
Kernel version 4.19.0-22-amd64, our baseline is 4.15

Welcome to Debian GNU/Linux bookworm/sid!

Hostname set to .
sd-netlink: Failed to enable NETLINK_GET_STRICT_CHK option, ignoring: 
Protocol not available
Failed to add address 127.0.0.1 to loopback interface: Operation not 
permitted