Re: [systemd-devel] How to disable seccomp in systemd-nspawn?

2020-08-16 Thread Steve Dodd
On Sun, 16 Aug 2020 at 16:32, Steve Dodd  wrote:

Ah, looks like we need to seccomp_attr_get(, SCMP_FLTATR_CTL_LOG, ..)
> somewhere for this to work. Not sure if that should be done
> unconditionally...
>

https://github.com/systemd/systemd/pull/16752 makes it conditional on an
environment variable, "SYSTEMD_LOG_SECCOMP", which seems neat enough.

I've tried to open a discussion about the ENOSYS handling in libseccomp at
https://github.com/seccomp/libseccomp/issues/286, but I'm probably not
being very coherent..

S.

>
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] How to disable seccomp in systemd-nspawn?

2020-08-16 Thread Steve Dodd
On Sun, 16 Aug 2020 at 16:05, Steve Dodd  wrote:

That's interesting .. it's possible things don't work quite the way I think
> they do, but I will try to find previous examples - I remember borgbackup
> was affected on armhf fairly recently, for example.
>

Ah, the borgbackup thing was different - sync_file_range2 was missing from
systemd's filter set. Here's the last "new syscall" issue though:

https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1883447

Hmm, this would make a ton of sense. We currently have a "log" seccomp
>> action, but it will just log and allow anyway. we'd need another
>> action that would log and refuse. Please file an RFE, or even better
>> prep a PR for this!
>>
>
> Looking at the kernel seccomp doc, I'm not actually sure it's possible,
> from code at least:
>
> https://www.kernel.org/doc/html/latest/userspace-api/seccomp_filter.html
>
> But there is  /proc/sys/kernel/seccomp/actions_logged which might do the
> trick!
>

Ah, looks like we need to seccomp_attr_get(, SCMP_FLTATR_CTL_LOG, ..)
somewhere for this to work. Not sure if that should be done
unconditionally...

S.

>
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] How to disable seccomp in systemd-nspawn?

2020-08-16 Thread Steve Dodd
On Sun, 16 Aug 2020 at 15:47, Lennart Poettering 
wrote:

I think it would be wise to use do fallback logic for EPERM too. It's
> the error that nspawn uses since day #1 basically. I am a bit puzzled
> noone noticed this before, afaik glibc test cases at least on Fedora
> (where most glibc upstream devs work on) run in nspawn, so how did
> noone notice?
>

That's interesting .. it's possible things don't work quite the way I think
they do, but I will try to find previous examples - I remember borgbackup
was affected on armhf fairly recently, for example.

I suspect trying to convince glibc maintainers to check for EPERM could
turn into a holy war quite quickly :)


> > A rule of thumb might be to return ENOSYS for anything libseccomp doesn't
> > know about - is it possible to look things up that way around?
>
> libseccomp doesn't allow us to install filters for syscalls it doesn't
> know anyway iirc...
>
> Not sure I follow though? Why would that help?
>

Well, my logic was if seccomp didn't know about a syscall when it was built
then that syscall is "new", and userland can probably live without it. If
we're going to block it anyway (because seccomp doesn't know about it, it
won't end up in the whitelist, even if systemd/nspawn is more up-to-date),
we might as well return ENOSYS and let userland try a fallback (e.g. openat
instead of openat2.) We can still return EPERM for well-known-but-blocked
syscalls which hopefully indicates to sufficiently caffeinated users that
there's a security filter in place :)


> > Another useful thing might be to allow whitelisting by syscall number -
> > again don't know if seccomp allows this. Would allow easier work arounds
> in
> > cases like this without having to go off and backport libseccomp...
>
> syscall numbers are highly arch dep, we currently don't support that
> because you cannot reasonably express this in unit files, as they'd
> become very much arch dependent then.
>
> That said, I'd be happy to review/merge a patch that adds a syntax
> where you could spell out SystemCallFilter=x86-64:345 for example,
> i.e. specify arch plus syscall nr. But it's still ugly, since it would
> do result in different filers on different archs.
>

Yeah, I'm not suggesting anyone should deploy that in a published unit
file. But for individual admins/users to "bodge" a system in an override
file it might be handy. It's fractionally less messy to my mind than
manually backporting system libraries!

> Third thing on my wishlist might be a log entry for denied syscalls
> > somewhere ..
>
> Hmm, this would make a ton of sense. We currently have a "log" seccomp
> action, but it will just log and allow anyway. we'd need another
> action that would log and refuse. Please file an RFE, or even better
> prep a PR for this!
>

Looking at the kernel seccomp doc, I'm not actually sure it's possible,
from code at least:

https://www.kernel.org/doc/html/latest/userspace-api/seccomp_filter.html

But there is  /proc/sys/kernel/seccomp/actions_logged which might do the
trick!

S.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] How to disable seccomp in systemd-nspawn?

2020-08-16 Thread Lennart Poettering
On So, 16.08.20 15:01, Steve Dodd (steved...@gmail.com) wrote:

> On Sun, 16 Aug 2020 at 14:54, Lennart Poettering 
> wrote:
>
>
> > > I've just been bitten by this - last time I looked into a similar
> > problem,
> > > it seemed the calling code was confused by getting EPERM instead of
> > ENOSYS.
> > > Could we distinguish between these two cases and generate the right error
> > > code? It would save a lot of aggro when working with containers..
> >
>
>
> > Which error to return is a bit of a bikeshedding thing.
> >
> > We return EPERM because this is about sandboxing for us, i.e. access
> > control. And we want to communicate that correctly to payloads, so we
> > say so.
> >
> > ENOSYS would be something we'd return if we'd pretend that something
> > isn't available even though it is.
> >
>
> I'm assuming we don't actually check what's available on the host kernel..
> All the problems I've hit around this have been new syscalls which libc
> tests for by checking for ENOSYS - if it gets that, it falls back to a
> different implementation. If it gets EPERM, however, it just assumes the
> operation failed and returns to caller, which leaves poor users like me and
> the OP scratching their heads :)

Hmm, well, noone knows what seccomp filters people install with the
myriad of seccomp using tools we have these days.

I think it would be wise to use do fallback logic for EPERM too. It's
the error that nspawn uses since day #1 basically. I am a bit puzzled
noone noticed this before, afaik glibc test cases at least on Fedora
(where most glibc upstream devs work on) run in nspawn, so how did
noone notice?

I also think glibc should probably continue to use the old syscalls if
possible and only use the new syscalls when the old ones won't
do... After all by needlessly using new syscalls won't just trip up
thins here, but all across the board where people decode/track
syscalls, even in strace or so...

> A rule of thumb might be to return ENOSYS for anything libseccomp doesn't
> know about - is it possible to look things up that way around?

libseccomp doesn't allow us to install filters for syscalls it doesn't
know anyway iirc...

Not sure I follow though? Why would that help?

> Another useful thing might be to allow whitelisting by syscall number -
> again don't know if seccomp allows this. Would allow easier work arounds in
> cases like this without having to go off and backport libseccomp...

syscall numbers are highly arch dep, we currently don't support that
because you cannot reasonably express this in unit files, as they'd
become very much arch dependent then.

That said, I'd be happy to review/merge a patch that adds a syntax
where you could spell out SystemCallFilter=x86-64:345 for example,
i.e. specify arch plus syscall nr. But it's still ugly, since it would
do result in different filers on different archs.

> Third thing on my wishlist might be a log entry for denied syscalls
> somewhere ..

Hmm, this would make a ton of sense. We currently have a "log" seccomp
action, but it will just log and allow anyway. we'd need another
action that would log and refuse. Please file an RFE, or even better
prep a PR for this!

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] How to disable seccomp in systemd-nspawn?

2020-08-16 Thread Steve Dodd
On Sun, 16 Aug 2020 at 14:54, Lennart Poettering 
wrote:


> > I've just been bitten by this - last time I looked into a similar
> problem,
> > it seemed the calling code was confused by getting EPERM instead of
> ENOSYS.
> > Could we distinguish between these two cases and generate the right error
> > code? It would save a lot of aggro when working with containers..
>


> Which error to return is a bit of a bikeshedding thing.
>
> We return EPERM because this is about sandboxing for us, i.e. access
> control. And we want to communicate that correctly to payloads, so we
> say so.
>
> ENOSYS would be something we'd return if we'd pretend that something
> isn't available even though it is.
>

I'm assuming we don't actually check what's available on the host kernel..
All the problems I've hit around this have been new syscalls which libc
tests for by checking for ENOSYS - if it gets that, it falls back to a
different implementation. If it gets EPERM, however, it just assumes the
operation failed and returns to caller, which leaves poor users like me and
the OP scratching their heads :)

A rule of thumb might be to return ENOSYS for anything libseccomp doesn't
know about - is it possible to look things up that way around?

Another useful thing might be to allow whitelisting by syscall number -
again don't know if seccomp allows this. Would allow easier work arounds in
cases like this without having to go off and backport libseccomp...

Third thing on my wishlist might be a log entry for denied syscalls
somewhere ..

S.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] How to disable seccomp in systemd-nspawn?

2020-08-16 Thread Lennart Poettering
On Sa, 15.08.20 13:33, Steve Dodd (steved...@gmail.com) wrote:

> On Fri, 26 Jun 2020 at 16:53, Lennart Poettering 
> wrote:
>
> > > We implement a system call allow list, i.e. everything that isn't
> > > > explicitly allowed is denied. You can use --system-call-filter=openat2
> > > > to allow a specific syscall on top of our defaults, i.e. extend the
> > > > allow list, or remove entries from it.
> >
> [..]
>
> > You might need a newer libseccomp so that the syscall is actually
> > known by it. openat2 is a very recent syscall addition, and you need
> > to update libseccomp in lockstep if you want it to grok it.
> >
>
> I've just been bitten by this - last time I looked into a similar problem,
> it seemed the calling code was confused by getting EPERM instead of ENOSYS.
> Could we distinguish between these two cases and generate the right error
> code? It would save a lot of aggro when working with containers..

Which error to return is a bit of a bikeshedding thing.

We return EPERM because this is about sandboxing for us, i.e. access
control. And we want to communicate that correctly to payloads, so we
say so.

ENOSYS would be something we'd return if we'd pretend that something
isn't available even though it is.

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] How to disable seccomp in systemd-nspawn?

2020-08-15 Thread Steve Dodd
On Fri, 26 Jun 2020 at 16:53, Lennart Poettering 
wrote:

> > We implement a system call allow list, i.e. everything that isn't
> > > explicitly allowed is denied. You can use --system-call-filter=openat2
> > > to allow a specific syscall on top of our defaults, i.e. extend the
> > > allow list, or remove entries from it.
>
[..]

> You might need a newer libseccomp so that the syscall is actually
> known by it. openat2 is a very recent syscall addition, and you need
> to update libseccomp in lockstep if you want it to grok it.
>

I've just been bitten by this - last time I looked into a similar problem,
it seemed the calling code was confused by getting EPERM instead of ENOSYS.
Could we distinguish between these two cases and generate the right error
code? It would save a lot of aggro when working with containers..

S.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] How to disable seccomp in systemd-nspawn?

2020-06-26 Thread Lennart Poettering
On Fr, 26.06.20 21:43, Mohan R (mohan...@gmail.com) wrote:

> Hi
>
> On Fri, Jun 26, 2020 at 9:23 PM Lennart Poettering
>  wrote:
> > You might need a newer libseccomp so that the syscall is actually
> > known by it. openat2 is a very recent syscall addition, and you need
> > to update libseccomp in lockstep if you want it to grok it.
>
> Thanks for the details, I'll look into it. Anyway, is there any
> specific reason for not providing an option to disable seccomp (or
> make seccomp opt-in instead of default)?

Noone asked for this, and it's a bit hacky to do this.

That said, I'd merge a patch that would make it optional, depending on
some env var being set. (env vars is how we make the stuff
configurable in nspawn we don't really want people to use).

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] How to disable seccomp in systemd-nspawn?

2020-06-26 Thread Mohan R
Hi

On Fri, Jun 26, 2020 at 9:23 PM Lennart Poettering
 wrote:
> You might need a newer libseccomp so that the syscall is actually
> known by it. openat2 is a very recent syscall addition, and you need
> to update libseccomp in lockstep if you want it to grok it.

Thanks for the details, I'll look into it. Anyway, is there any
specific reason for not providing an option to disable seccomp (or
make seccomp opt-in instead of default)?

Thanks,
Mohan R
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] How to disable seccomp in systemd-nspawn?

2020-06-26 Thread Lennart Poettering
On Do, 25.06.20 20:19, Mohan R (mohan...@gmail.com) wrote:

> Hi
>
> On Thu, Jun 25, 2020 at 2:17 PM Lennart Poettering
>  wrote:
> > You can't disable seccomp right now.
>
> Any future plan to include a flag or some other way?
>
> > We implement a system call allow list, i.e. everything that isn't
> > explicitly allowed is denied. You can use --system-call-filter=openat2
> > to allow a specific syscall on top of our defaults, i.e. extend the
> > allow list, or remove entries from it.
>
> This '--system-call-filter' isn't working,
> https://gist.github.com/mohan43u/6ed44eff564f10cc04c709772b02c323
>
> Is this a bug in systemd-nspawn?

You might need a newer libseccomp so that the syscall is actually
known by it. openat2 is a very recent syscall addition, and you need
to update libseccomp in lockstep if you want it to grok it.

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] How to disable seccomp in systemd-nspawn?

2020-06-25 Thread Mohan R
Hi

On Thu, Jun 25, 2020 at 2:17 PM Lennart Poettering
 wrote:
> You can't disable seccomp right now.

Any future plan to include a flag or some other way?

> We implement a system call allow list, i.e. everything that isn't
> explicitly allowed is denied. You can use --system-call-filter=openat2
> to allow a specific syscall on top of our defaults, i.e. extend the
> allow list, or remove entries from it.

This '--system-call-filter' isn't working,
https://gist.github.com/mohan43u/6ed44eff564f10cc04c709772b02c323

Is this a bug in systemd-nspawn?

Thanks,
Mohan R
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] How to disable seccomp in systemd-nspawn?

2020-06-25 Thread Lennart Poettering
On Mi, 24.06.20 23:13, Mohan R (mohan...@gmail.com) wrote:

> Hi,
>
> How to disable seccomp in systemd-nspawn? I'm facing issue while
> running fuse-overlayfs and I reported it

You can't disable seccomp right now.

> https://github.com/containers/fuse-overlayfs/issues/220#issuecomment-648865831
>
> Developer asked me to check if the container is seccomp filtered, as
> suspected systemd-nspawn put the container inside seccomp faltered
> (Seccomp: 2). But I'm not able to get the list of filtered syscalls or
> I'm not able to find out why 'openat2()' is returning EPERM inside the
> systemd-nspawn container.

We implement a system call allow list, i.e. everything that isn't
explicitly allowed is denied. You can use --system-call-filter=openat2
to allow a specific syscall on top of our defaults, i.e. extend the
allow list, or remove entries from it.

Generic application code should have fallbacks in place when it comes
to new system calls such as openat2(), if they are supposed to work on
kernels that aren't the very newest or in containerized environments,
since pretty much all of them employ a syscall filter allow list these
days.

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] How to disable seccomp in systemd-nspawn?

2020-06-24 Thread Mohan R
Hi,

How to disable seccomp in systemd-nspawn? I'm facing issue while
running fuse-overlayfs and I reported it

https://github.com/containers/fuse-overlayfs/issues/220#issuecomment-648865831

Developer asked me to check if the container is seccomp filtered, as
suspected systemd-nspawn put the container inside seccomp faltered
(Seccomp: 2). But I'm not able to get the list of filtered syscalls or
I'm not able to find out why 'openat2()' is returning EPERM inside the
systemd-nspawn container.

Thanks,
Mohan R
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel