Re: [systemd-devel] How to disable seccomp in systemd-nspawn?
On Sun, 16 Aug 2020 at 16:32, Steve Dodd wrote: Ah, looks like we need to seccomp_attr_get(, SCMP_FLTATR_CTL_LOG, ..) > somewhere for this to work. Not sure if that should be done > unconditionally... > https://github.com/systemd/systemd/pull/16752 makes it conditional on an environment variable, "SYSTEMD_LOG_SECCOMP", which seems neat enough. I've tried to open a discussion about the ENOSYS handling in libseccomp at https://github.com/seccomp/libseccomp/issues/286, but I'm probably not being very coherent.. S. > ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] How to disable seccomp in systemd-nspawn?
On Sun, 16 Aug 2020 at 16:05, Steve Dodd wrote: That's interesting .. it's possible things don't work quite the way I think > they do, but I will try to find previous examples - I remember borgbackup > was affected on armhf fairly recently, for example. > Ah, the borgbackup thing was different - sync_file_range2 was missing from systemd's filter set. Here's the last "new syscall" issue though: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1883447 Hmm, this would make a ton of sense. We currently have a "log" seccomp >> action, but it will just log and allow anyway. we'd need another >> action that would log and refuse. Please file an RFE, or even better >> prep a PR for this! >> > > Looking at the kernel seccomp doc, I'm not actually sure it's possible, > from code at least: > > https://www.kernel.org/doc/html/latest/userspace-api/seccomp_filter.html > > But there is /proc/sys/kernel/seccomp/actions_logged which might do the > trick! > Ah, looks like we need to seccomp_attr_get(, SCMP_FLTATR_CTL_LOG, ..) somewhere for this to work. Not sure if that should be done unconditionally... S. > ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] How to disable seccomp in systemd-nspawn?
On Sun, 16 Aug 2020 at 15:47, Lennart Poettering wrote: I think it would be wise to use do fallback logic for EPERM too. It's > the error that nspawn uses since day #1 basically. I am a bit puzzled > noone noticed this before, afaik glibc test cases at least on Fedora > (where most glibc upstream devs work on) run in nspawn, so how did > noone notice? > That's interesting .. it's possible things don't work quite the way I think they do, but I will try to find previous examples - I remember borgbackup was affected on armhf fairly recently, for example. I suspect trying to convince glibc maintainers to check for EPERM could turn into a holy war quite quickly :) > > A rule of thumb might be to return ENOSYS for anything libseccomp doesn't > > know about - is it possible to look things up that way around? > > libseccomp doesn't allow us to install filters for syscalls it doesn't > know anyway iirc... > > Not sure I follow though? Why would that help? > Well, my logic was if seccomp didn't know about a syscall when it was built then that syscall is "new", and userland can probably live without it. If we're going to block it anyway (because seccomp doesn't know about it, it won't end up in the whitelist, even if systemd/nspawn is more up-to-date), we might as well return ENOSYS and let userland try a fallback (e.g. openat instead of openat2.) We can still return EPERM for well-known-but-blocked syscalls which hopefully indicates to sufficiently caffeinated users that there's a security filter in place :) > > Another useful thing might be to allow whitelisting by syscall number - > > again don't know if seccomp allows this. Would allow easier work arounds > in > > cases like this without having to go off and backport libseccomp... > > syscall numbers are highly arch dep, we currently don't support that > because you cannot reasonably express this in unit files, as they'd > become very much arch dependent then. > > That said, I'd be happy to review/merge a patch that adds a syntax > where you could spell out SystemCallFilter=x86-64:345 for example, > i.e. specify arch plus syscall nr. But it's still ugly, since it would > do result in different filers on different archs. > Yeah, I'm not suggesting anyone should deploy that in a published unit file. But for individual admins/users to "bodge" a system in an override file it might be handy. It's fractionally less messy to my mind than manually backporting system libraries! > Third thing on my wishlist might be a log entry for denied syscalls > > somewhere .. > > Hmm, this would make a ton of sense. We currently have a "log" seccomp > action, but it will just log and allow anyway. we'd need another > action that would log and refuse. Please file an RFE, or even better > prep a PR for this! > Looking at the kernel seccomp doc, I'm not actually sure it's possible, from code at least: https://www.kernel.org/doc/html/latest/userspace-api/seccomp_filter.html But there is /proc/sys/kernel/seccomp/actions_logged which might do the trick! S. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] How to disable seccomp in systemd-nspawn?
On So, 16.08.20 15:01, Steve Dodd (steved...@gmail.com) wrote: > On Sun, 16 Aug 2020 at 14:54, Lennart Poettering > wrote: > > > > > I've just been bitten by this - last time I looked into a similar > > problem, > > > it seemed the calling code was confused by getting EPERM instead of > > ENOSYS. > > > Could we distinguish between these two cases and generate the right error > > > code? It would save a lot of aggro when working with containers.. > > > > > > Which error to return is a bit of a bikeshedding thing. > > > > We return EPERM because this is about sandboxing for us, i.e. access > > control. And we want to communicate that correctly to payloads, so we > > say so. > > > > ENOSYS would be something we'd return if we'd pretend that something > > isn't available even though it is. > > > > I'm assuming we don't actually check what's available on the host kernel.. > All the problems I've hit around this have been new syscalls which libc > tests for by checking for ENOSYS - if it gets that, it falls back to a > different implementation. If it gets EPERM, however, it just assumes the > operation failed and returns to caller, which leaves poor users like me and > the OP scratching their heads :) Hmm, well, noone knows what seccomp filters people install with the myriad of seccomp using tools we have these days. I think it would be wise to use do fallback logic for EPERM too. It's the error that nspawn uses since day #1 basically. I am a bit puzzled noone noticed this before, afaik glibc test cases at least on Fedora (where most glibc upstream devs work on) run in nspawn, so how did noone notice? I also think glibc should probably continue to use the old syscalls if possible and only use the new syscalls when the old ones won't do... After all by needlessly using new syscalls won't just trip up thins here, but all across the board where people decode/track syscalls, even in strace or so... > A rule of thumb might be to return ENOSYS for anything libseccomp doesn't > know about - is it possible to look things up that way around? libseccomp doesn't allow us to install filters for syscalls it doesn't know anyway iirc... Not sure I follow though? Why would that help? > Another useful thing might be to allow whitelisting by syscall number - > again don't know if seccomp allows this. Would allow easier work arounds in > cases like this without having to go off and backport libseccomp... syscall numbers are highly arch dep, we currently don't support that because you cannot reasonably express this in unit files, as they'd become very much arch dependent then. That said, I'd be happy to review/merge a patch that adds a syntax where you could spell out SystemCallFilter=x86-64:345 for example, i.e. specify arch plus syscall nr. But it's still ugly, since it would do result in different filers on different archs. > Third thing on my wishlist might be a log entry for denied syscalls > somewhere .. Hmm, this would make a ton of sense. We currently have a "log" seccomp action, but it will just log and allow anyway. we'd need another action that would log and refuse. Please file an RFE, or even better prep a PR for this! Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] How to disable seccomp in systemd-nspawn?
On Sun, 16 Aug 2020 at 14:54, Lennart Poettering wrote: > > I've just been bitten by this - last time I looked into a similar > problem, > > it seemed the calling code was confused by getting EPERM instead of > ENOSYS. > > Could we distinguish between these two cases and generate the right error > > code? It would save a lot of aggro when working with containers.. > > Which error to return is a bit of a bikeshedding thing. > > We return EPERM because this is about sandboxing for us, i.e. access > control. And we want to communicate that correctly to payloads, so we > say so. > > ENOSYS would be something we'd return if we'd pretend that something > isn't available even though it is. > I'm assuming we don't actually check what's available on the host kernel.. All the problems I've hit around this have been new syscalls which libc tests for by checking for ENOSYS - if it gets that, it falls back to a different implementation. If it gets EPERM, however, it just assumes the operation failed and returns to caller, which leaves poor users like me and the OP scratching their heads :) A rule of thumb might be to return ENOSYS for anything libseccomp doesn't know about - is it possible to look things up that way around? Another useful thing might be to allow whitelisting by syscall number - again don't know if seccomp allows this. Would allow easier work arounds in cases like this without having to go off and backport libseccomp... Third thing on my wishlist might be a log entry for denied syscalls somewhere .. S. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] How to disable seccomp in systemd-nspawn?
On Sa, 15.08.20 13:33, Steve Dodd (steved...@gmail.com) wrote: > On Fri, 26 Jun 2020 at 16:53, Lennart Poettering > wrote: > > > > We implement a system call allow list, i.e. everything that isn't > > > > explicitly allowed is denied. You can use --system-call-filter=openat2 > > > > to allow a specific syscall on top of our defaults, i.e. extend the > > > > allow list, or remove entries from it. > > > [..] > > > You might need a newer libseccomp so that the syscall is actually > > known by it. openat2 is a very recent syscall addition, and you need > > to update libseccomp in lockstep if you want it to grok it. > > > > I've just been bitten by this - last time I looked into a similar problem, > it seemed the calling code was confused by getting EPERM instead of ENOSYS. > Could we distinguish between these two cases and generate the right error > code? It would save a lot of aggro when working with containers.. Which error to return is a bit of a bikeshedding thing. We return EPERM because this is about sandboxing for us, i.e. access control. And we want to communicate that correctly to payloads, so we say so. ENOSYS would be something we'd return if we'd pretend that something isn't available even though it is. Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] How to disable seccomp in systemd-nspawn?
On Fri, 26 Jun 2020 at 16:53, Lennart Poettering wrote: > > We implement a system call allow list, i.e. everything that isn't > > > explicitly allowed is denied. You can use --system-call-filter=openat2 > > > to allow a specific syscall on top of our defaults, i.e. extend the > > > allow list, or remove entries from it. > [..] > You might need a newer libseccomp so that the syscall is actually > known by it. openat2 is a very recent syscall addition, and you need > to update libseccomp in lockstep if you want it to grok it. > I've just been bitten by this - last time I looked into a similar problem, it seemed the calling code was confused by getting EPERM instead of ENOSYS. Could we distinguish between these two cases and generate the right error code? It would save a lot of aggro when working with containers.. S. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] How to disable seccomp in systemd-nspawn?
On Fr, 26.06.20 21:43, Mohan R (mohan...@gmail.com) wrote: > Hi > > On Fri, Jun 26, 2020 at 9:23 PM Lennart Poettering > wrote: > > You might need a newer libseccomp so that the syscall is actually > > known by it. openat2 is a very recent syscall addition, and you need > > to update libseccomp in lockstep if you want it to grok it. > > Thanks for the details, I'll look into it. Anyway, is there any > specific reason for not providing an option to disable seccomp (or > make seccomp opt-in instead of default)? Noone asked for this, and it's a bit hacky to do this. That said, I'd merge a patch that would make it optional, depending on some env var being set. (env vars is how we make the stuff configurable in nspawn we don't really want people to use). Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] How to disable seccomp in systemd-nspawn?
Hi On Fri, Jun 26, 2020 at 9:23 PM Lennart Poettering wrote: > You might need a newer libseccomp so that the syscall is actually > known by it. openat2 is a very recent syscall addition, and you need > to update libseccomp in lockstep if you want it to grok it. Thanks for the details, I'll look into it. Anyway, is there any specific reason for not providing an option to disable seccomp (or make seccomp opt-in instead of default)? Thanks, Mohan R ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] How to disable seccomp in systemd-nspawn?
On Do, 25.06.20 20:19, Mohan R (mohan...@gmail.com) wrote: > Hi > > On Thu, Jun 25, 2020 at 2:17 PM Lennart Poettering > wrote: > > You can't disable seccomp right now. > > Any future plan to include a flag or some other way? > > > We implement a system call allow list, i.e. everything that isn't > > explicitly allowed is denied. You can use --system-call-filter=openat2 > > to allow a specific syscall on top of our defaults, i.e. extend the > > allow list, or remove entries from it. > > This '--system-call-filter' isn't working, > https://gist.github.com/mohan43u/6ed44eff564f10cc04c709772b02c323 > > Is this a bug in systemd-nspawn? You might need a newer libseccomp so that the syscall is actually known by it. openat2 is a very recent syscall addition, and you need to update libseccomp in lockstep if you want it to grok it. Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] How to disable seccomp in systemd-nspawn?
Hi On Thu, Jun 25, 2020 at 2:17 PM Lennart Poettering wrote: > You can't disable seccomp right now. Any future plan to include a flag or some other way? > We implement a system call allow list, i.e. everything that isn't > explicitly allowed is denied. You can use --system-call-filter=openat2 > to allow a specific syscall on top of our defaults, i.e. extend the > allow list, or remove entries from it. This '--system-call-filter' isn't working, https://gist.github.com/mohan43u/6ed44eff564f10cc04c709772b02c323 Is this a bug in systemd-nspawn? Thanks, Mohan R ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] How to disable seccomp in systemd-nspawn?
On Mi, 24.06.20 23:13, Mohan R (mohan...@gmail.com) wrote: > Hi, > > How to disable seccomp in systemd-nspawn? I'm facing issue while > running fuse-overlayfs and I reported it You can't disable seccomp right now. > https://github.com/containers/fuse-overlayfs/issues/220#issuecomment-648865831 > > Developer asked me to check if the container is seccomp filtered, as > suspected systemd-nspawn put the container inside seccomp faltered > (Seccomp: 2). But I'm not able to get the list of filtered syscalls or > I'm not able to find out why 'openat2()' is returning EPERM inside the > systemd-nspawn container. We implement a system call allow list, i.e. everything that isn't explicitly allowed is denied. You can use --system-call-filter=openat2 to allow a specific syscall on top of our defaults, i.e. extend the allow list, or remove entries from it. Generic application code should have fallbacks in place when it comes to new system calls such as openat2(), if they are supposed to work on kernels that aren't the very newest or in containerized environments, since pretty much all of them employ a syscall filter allow list these days. Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] How to disable seccomp in systemd-nspawn?
Hi, How to disable seccomp in systemd-nspawn? I'm facing issue while running fuse-overlayfs and I reported it https://github.com/containers/fuse-overlayfs/issues/220#issuecomment-648865831 Developer asked me to check if the container is seccomp filtered, as suspected systemd-nspawn put the container inside seccomp faltered (Seccomp: 2). But I'm not able to get the list of filtered syscalls or I'm not able to find out why 'openat2()' is returning EPERM inside the systemd-nspawn container. Thanks, Mohan R ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel