Hello,
I opened the initial Debian bug report, but did took the time to
ask at systemd-devel and found this thread was already asked,
so I am trying to provide further information.
> Do you have any MACs in effect?
No SELinux or Apparmor active
As far as I see in my test VM with minimal Debian Buster there is no SELinux.
"aa-status" returns "apparmor module is loaded.", but I did not intentionally
configure anything to it.
> Does the host use cgroupsv2 or cgroupsv2 or hybrid?
The host system uses systemd v241, compiled with default-hierarchy=hybrid
> Was the container configured to use either?
The container uses systemd v251 with default-hierarchy=unified
At the host:
# systemd --version
systemd 241 (241)
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP
+GCRYPT +GNUTLS \
+ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2
default-hierarchy=hybrid
In the container:
# systemd --version
systemd 252 (252.2-1)
+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL
+ACL +BLKID \
+CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK
+PCRE2 -PWQUALITY \
-P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK
-XKBCOMMON +UTMP \
+SYSVINIT default-hierarchy=unified
> What is mounted to /sys/fs/cgroup and below?
At the host:
# mount | grep /sys/fs/cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup2 on /sys/fs/cgroup/unified type cgroup2
(rw,nosuid,nodev,noexec,relatime,nsdelegate)
cgroup on /sys/fs/cgroup/systemd type cgroup
(rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/blkio type cgroup
(rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup
(rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup
(rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/freezer type cgroup
(rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/devices type cgroup
(rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/cpuset type cgroup
(rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/rdma type cgroup
(rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/perf_event type cgroup
(rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/memory type cgroup
(rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/pids type cgroup
(rw,nosuid,nodev,noexec,relatime,pids)
> This is new payload on old host?
Yes, it is an test to use on an older Debian Buster with kernel 4.19.260-1
a quite recent Debian Bookworm/testing system.
> if you force container into cgroupsv1 mode as the host (by adding
> systemd.unified_cgroup_hierarchy=no to the nspawn cmdline, does that
> work?
I am not sure if I am using it right, but as far as I see
"systemd.unified_cgroup_hierarchy=no" does not help.
I added "debug" too, see below in [1].
> Also, please provide the relevant output from "strace -f -s 500 -y -o
> /tmp/log.strace" (put on some pastebin)
Following pastebin contains the last quarter of the log.strace
file recorded by the command in [1]:
https://paste.debian.net/1262752/
I thought if strace can observe the process in question, would gdb also
be able. And found starting nspawn with gdbserver, 'set follow-fork-mode child'
and gdb from inside the container via plain chroot seems working well.
So it looks like the failing "syscall_0x1b7" from strace is "faccessat2" [2].
And it seems "faccessat2" got added just in kernel 5.8 [3],
therefore it might fail with the kernel 4.19.
So I fear this needs a newer kernel, and/or this is more a glibc issue then?
Kind regards,
Bernhard
[1] # strace -f -s 500 -y -o /tmp/log.strace systemd-nspawn
--directory=/var/lib/machines/test-bookworm --boot
systemd.unified_cgroup_hierarchy=no debug
Spawning container test-bookworm on /var/lib/machines/test-bookworm.
Press ^] three times within 1s to kill container.
systemd 252.2-1 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA
+SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2
+IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT
+QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP
+SYSVINIT default-hierarchy=unified)
Detected virtualization systemd-nspawn.
Detected architecture x86-64.
Detected initialized system, this is not the first boot.
Kernel version 4.19.0-22-amd64, our baseline is 4.15
Welcome to Debian GNU/Linux bookworm/sid!
Hostname set to <debian>.
sd-netlink: Failed to enable NETLINK_GET_STRICT_CHK option, ignoring:
Protocol not available
Failed to add address 127.0.0.1 to loopback interface: Operation not
permitted
Failed to add address ::1 to loopback interface: Operation not permitted
Failed to bring loopback interface up: Operation not permitted
Setting '/proc/sys/fs/file-max' to '9223372036854775807
'
No credentials passed via fw_cfg.
Failed to open '/sys/firmware/dmi/entries/11-0/raw', ignoring: No such file
or directory
Found cgroup on /sys/fs/cgroup/systemd, legacy hierarchy
Using cgroup controller name=systemd. File system hierarchy is at
/sys/fs/cgroup/systemd.
Failed to create /init.scope control group: Operation not permitted
Failed to allocate manager object: Operation not permitted
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...
Container test-bookworm failed with error code 255.
[2]
(gdb) stepi
0x00007ffff79c93ec 29 int ret = INLINE_SYSCALL_CALL
(faccessat2, fd, file, mode, flag);
1: x/i $pc
=> 0x7ffff79c93ec <__faccessat+44>: syscall
(gdb) bt
#0 0x00007ffff79c93ec in __faccessat (fd=fd@entry=-100,
file=file@entry=0x7fffffffe3c0 "/sys/fs/cgroup/systemd", mode=mode@entry=0,
flag=flag@entry=256) at ../sysdeps/unix/sysv/linux/faccessat.c:29
#1 0x00007ffff7c11380 in controller_is_v1_accessible (root=root@entry=0x0,
controller=controller@entry=0x7ffff7f061ee "_systemd") at
../src/basic/cgroup-util.c:590
#2 0x00007ffff7c12432 in cg_get_path_and_check (controller=0x7ffff7f061ee
"_systemd", path=0x7fffffffe4e0 "/init.scope", suffix=0x0, fs=0x7fffffffe480)
at ../src/basic/cgroup-util.c:612
#3 0x00007ffff7b50eb0 in cg_create (controller=controller@entry=0x7ffff7f061ee
"_systemd", path=path@entry=0x7fffffffe4e0 "/init.scope") at
../src/shared/cgroup-setup.c:292
#4 0x00007ffff7b511db in cg_create_and_attach (controller=controller@entry=0x7ffff7f061ee
"_systemd", path=path@entry=0x7fffffffe4e0 "/init.scope", pid=pid@entry=0) at
../src/shared/cgroup-setup.c:324
#5 0x00007ffff7e3faa4 in manager_setup_cgroup (m=0x55555556edb0) at
../src/core/cgroup.c:3468
#6 0x00007ffff7ea463b in manager_new (scope=<optimized out>,
test_run_flags=MANAGER_TEST_NORMAL, _m=_m@entry=0x7fffffffe600) at
../src/core/manager.c:939
#7 0x000055555555bf5c in main (argc=3, argv=0x7fffffffecd8) at
../src/core/main.c:2928
(gdb) print/x $eax
$1 = 0x1b7
(gdb) stepi
29 int ret = INLINE_SYSCALL_CALL (faccessat2, fd, file, mode, flag);
1: x/i $pc
=> 0x7ffff79c93ee <__faccessat+46>: cmp $0xfffffffffffff000,%rax
(gdb) print/x $eax
$2 = 0xffffffff
(gdb) list faccessat.c:29
24
25
26 int
27 __faccessat (int fd, const char *file, int mode, int flag)
28 {
29 int ret = INLINE_SYSCALL_CALL (faccessat2, fd, file, mode, flag);
30 #if __ASSUME_FACCESSAT2
31 return ret;
32 #else
33 if (ret == 0 || errno != ENOSYS)
(gdb) list cgroup-util.c:590
585 /* If root if specified, we check that:
586 * - possible subcgroup is created at root,
587 * - we can modify the hierarchy. */
588
589 cpath = strjoina("/sys/fs/cgroup/", dn, root, root ?
"/cgroup.procs" : NULL);
590 return laccess(cpath, root ? W_OK : F_OK);
591 }
592
593 int cg_get_path_and_check(const char *controller, const char *path,
const char *suffix, char **fs) {
594 int r;
(gdb) list cgroup-util.c:612
607 * except for the named hierarchies */
608 if (startswith(controller, "name="))
609 return -EOPNOTSUPP;
610 } else {
611 /* Check if the specified controller is actually
accessible */
612 r = controller_is_v1_accessible(NULL, controller);
613 if (r < 0)
614 return r;
615 }
616
[3]
https://bugs.archlinux.org/task/69563
https://man.archlinux.org/man/faccessat2.2.en
"faccessat2() was added to Linux in version 5.8."