(Resent after subscription, as non-subscribers get rejected.)



Hello,
I opened the initial Debian bug report, but did took the time to
ask at systemd-devel and found this thread was already asked,
so I am trying to provide further information.



> Do you have any MACs in effect?
No SELinux or Apparmor active

As far as I see in my test VM with minimal Debian Buster there is no SELinux.
"aa-status" returns "apparmor module is loaded.", but I did not intentionally
configure anything to it.



> Does the host use cgroupsv2 or cgroupsv2 or hybrid? The host system uses 
systemd v241, compiled with default-hierarchy=hybrid

> Was the container configured to use either?
The container uses systemd v251 with default-hierarchy=unified

At the host:
   # systemd --version
   systemd 241 (241)
   +PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP 
+GCRYPT +GNUTLS \
   +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 
default-hierarchy=hybrid

In the container:
   # systemd --version
   systemd 252 (252.2-1)
   +PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL 
+ACL +BLKID \
   +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK 
+PCRE2 -PWQUALITY \
   -P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK 
-XKBCOMMON +UTMP \
   +SYSVINIT default-hierarchy=unified



> What is mounted to /sys/fs/cgroup and below?

At the host:
   # mount | grep /sys/fs/cgroup
   tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
   cgroup2 on /sys/fs/cgroup/unified type cgroup2 
(rw,nosuid,nodev,noexec,relatime,nsdelegate)
   cgroup on /sys/fs/cgroup/systemd type cgroup 
(rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
   cgroup on /sys/fs/cgroup/blkio type cgroup 
(rw,nosuid,nodev,noexec,relatime,blkio)
   cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup 
(rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
   cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup 
(rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
   cgroup on /sys/fs/cgroup/freezer type cgroup 
(rw,nosuid,nodev,noexec,relatime,freezer)
   cgroup on /sys/fs/cgroup/devices type cgroup 
(rw,nosuid,nodev,noexec,relatime,devices)
   cgroup on /sys/fs/cgroup/cpuset type cgroup 
(rw,nosuid,nodev,noexec,relatime,cpuset)
   cgroup on /sys/fs/cgroup/rdma type cgroup 
(rw,nosuid,nodev,noexec,relatime,rdma)
   cgroup on /sys/fs/cgroup/perf_event type cgroup 
(rw,nosuid,nodev,noexec,relatime,perf_event)
   cgroup on /sys/fs/cgroup/memory type cgroup 
(rw,nosuid,nodev,noexec,relatime,memory)
   cgroup on /sys/fs/cgroup/pids type cgroup 
(rw,nosuid,nodev,noexec,relatime,pids)



> This is new payload on old host?

Yes, it is an test to use on an older Debian Buster with kernel 4.19.260-1
a quite recent Debian Bookworm/testing system.



> if you force container into cgroupsv1 mode as the host (by adding
> systemd.unified_cgroup_hierarchy=no to the nspawn cmdline, does that
> work?

I am not sure if I am using it right, but as far as I see
"systemd.unified_cgroup_hierarchy=no" does not help.
I added "debug" too, see below in [1].




> Also, please provide the relevant output from "strace -f -s 500 -y -o
> /tmp/log.strace" (put on some pastebin)

Following pastebin contains the last quarter of the log.strace
file recorded by the command in [1]:

  https://paste.debian.net/1262752/




I thought if strace can observe the process in question, would gdb also
be able. And found starting nspawn with gdbserver, 'set follow-fork-mode child'
and gdb from inside the container via plain chroot seems working well.

So it looks like the failing "syscall_0x1b7" from strace is "faccessat2" [2].

And it seems "faccessat2" got added just in kernel 5.8 [3],
therefore it might fail with the kernel 4.19.
So I fear this needs a newer kernel, and/or this is more a glibc issue then?


Kind regards,
Bernhard






[1]    # strace -f -s 500 -y -o /tmp/log.strace systemd-nspawn 
--directory=/var/lib/machines/test-bookworm --boot 
systemd.unified_cgroup_hierarchy=no debug
    Spawning container test-bookworm on /var/lib/machines/test-bookworm.
    Press ^] three times within 1s to kill container.
    systemd 252.2-1 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA 
+SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 
+IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT 
+QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP 
+SYSVINIT default-hierarchy=unified)
    Detected virtualization systemd-nspawn.
    Detected architecture x86-64.
    Detected initialized system, this is not the first boot.
    Kernel version 4.19.0-22-amd64, our baseline is 4.15

    Welcome to Debian GNU/Linux bookworm/sid!

    Hostname set to <debian>.
    sd-netlink: Failed to enable NETLINK_GET_STRICT_CHK option, ignoring: 
Protocol not available
    Failed to add address 127.0.0.1 to loopback interface: Operation not 
permitted
    Failed to add address ::1 to loopback interface: Operation not permitted
    Failed to bring loopback interface up: Operation not permitted
    Setting '/proc/sys/fs/file-max' to '9223372036854775807
    '
    No credentials passed via fw_cfg.
    Failed to open '/sys/firmware/dmi/entries/11-0/raw', ignoring: No such file 
or directory
    Found cgroup on /sys/fs/cgroup/systemd, legacy hierarchy
    Using cgroup controller name=systemd. File system hierarchy is at 
/sys/fs/cgroup/systemd.
    Failed to create /init.scope control group: Operation not permitted
    Failed to allocate manager object: Operation not permitted
    [!!!!!!] Failed to allocate manager object.
    Exiting PID 1...
    Container test-bookworm failed with error code 255.





[2]
    (gdb) stepi
    0x00007ffff79c93ec      29        int ret = INLINE_SYSCALL_CALL 
(faccessat2, fd, file, mode, flag);
    1: x/i $pc
    => 0x7ffff79c93ec <__faccessat+44>:     syscall
    (gdb) bt
    #0  0x00007ffff79c93ec in __faccessat (fd=fd@entry=-100, 
file=file@entry=0x7fffffffe3c0 "/sys/fs/cgroup/systemd", mode=mode@entry=0, 
flag=flag@entry=256) at ../sysdeps/unix/sysv/linux/faccessat.c:29
    #1  0x00007ffff7c11380 in controller_is_v1_accessible (root=root@entry=0x0, 
controller=controller@entry=0x7ffff7f061ee "_systemd") at 
../src/basic/cgroup-util.c:590
    #2  0x00007ffff7c12432 in cg_get_path_and_check (controller=0x7ffff7f061ee 
"_systemd", path=0x7fffffffe4e0 "/init.scope", suffix=0x0, fs=0x7fffffffe480) 
at ../src/basic/cgroup-util.c:612
    #3  0x00007ffff7b50eb0 in cg_create (controller=controller@entry=0x7ffff7f061ee 
"_systemd", path=path@entry=0x7fffffffe4e0 "/init.scope") at 
../src/shared/cgroup-setup.c:292
    #4  0x00007ffff7b511db in cg_create_and_attach (controller=controller@entry=0x7ffff7f061ee 
"_systemd", path=path@entry=0x7fffffffe4e0 "/init.scope", pid=pid@entry=0) at 
../src/shared/cgroup-setup.c:324
    #5  0x00007ffff7e3faa4 in manager_setup_cgroup (m=0x55555556edb0) at 
../src/core/cgroup.c:3468
    #6  0x00007ffff7ea463b in manager_new (scope=<optimized out>, 
test_run_flags=MANAGER_TEST_NORMAL, _m=_m@entry=0x7fffffffe600) at 
../src/core/manager.c:939
    #7  0x000055555555bf5c in main (argc=3, argv=0x7fffffffecd8) at 
../src/core/main.c:2928
    (gdb) print/x $eax
    $1 = 0x1b7
    (gdb) stepi
    29        int ret = INLINE_SYSCALL_CALL (faccessat2, fd, file, mode, flag);
    1: x/i $pc
    => 0x7ffff79c93ee <__faccessat+46>:     cmp    $0xfffffffffffff000,%rax
    (gdb) print/x $eax
    $2 = 0xffffffff

    (gdb) list faccessat.c:29
    24
    25
    26      int
    27      __faccessat (int fd, const char *file, int mode, int flag)
    28      {
    29        int ret = INLINE_SYSCALL_CALL (faccessat2, fd, file, mode, flag);
    30      #if __ASSUME_FACCESSAT2
    31        return ret;
    32      #else
    33        if (ret == 0 || errno != ENOSYS)

    (gdb) list cgroup-util.c:590
    585             /* If root if specified, we check that:
    586              * - possible subcgroup is created at root,
    587              * - we can modify the hierarchy. */
    588
    589             cpath = strjoina("/sys/fs/cgroup/", dn, root, root ? 
"/cgroup.procs" : NULL);
    590             return laccess(cpath, root ? W_OK : F_OK);
    591     }
    592
    593     int cg_get_path_and_check(const char *controller, const char *path, 
const char *suffix, char **fs) {
    594             int r;

    (gdb) list cgroup-util.c:612
    607                      * except for the named hierarchies */
    608                     if (startswith(controller, "name="))
    609                             return -EOPNOTSUPP;
    610             } else {
    611                     /* Check if the specified controller is actually 
accessible */
    612                     r = controller_is_v1_accessible(NULL, controller);
    613                     if (r < 0)
    614                             return r;
    615             }
    616



[3]
    https://bugs.archlinux.org/task/69563
    https://man.archlinux.org/man/faccessat2.2.en
      "faccessat2() was added to Linux in version 5.8."

Reply via email to