Re: FWIW: sysrestrict

2016-08-01 Thread Thor Lancelot Simon
On Mon, Aug 01, 2016 at 12:31:01PM +0930, LYMN wrote:
> On Thu, Jul 28, 2016 at 08:42:49PM +0200, Joerg Sonnenberger wrote:
> > 
> > The difference is that correctly configured veriexec is a system-wide
> > property. It doesn't matter if you can exec something, you don't get to
> > execute binaries that weren't signed. 
> > 
> 
> Technically,  veriexec only runs files that have a valid fingerprint.
> We don't, currently, have signing but that would be useful and probably
> could be done now.  One thing that does seem to get overlooked a lot is

That would require an RSA implementation in the kernel, plus some PKCS bits.

I have code around here somewhere...

Thor


Re: FWIW: sysrestrict

2016-08-01 Thread LYMN
On Thu, Jul 28, 2016 at 08:42:49PM +0200, Joerg Sonnenberger wrote:
> 
> The difference is that correctly configured veriexec is a system-wide
> property. It doesn't matter if you can exec something, you don't get to
> execute binaries that weren't signed. 
> 

Technically,  veriexec only runs files that have a valid fingerprint.
We don't, currently, have signing but that would be useful and probably
could be done now.  One thing that does seem to get overlooked a lot is
that you can mark a binary as being "indirect" which means that it is
allowed to be an interpreter for a shell script but cannot be invoked
direct on the command line.  So, if you marked /bin/sh as indirect then
all properly fingerprinted shell scripts would continue to function but
anyone trying to exec /bin/sh would be prevented from doing so.  This
would provide a bit of a speed hump for some script kiddies, the feature
is more intended to provide a way of permitting powerful scripting
languages (think perl and the like) without leaving the system wide open

(apologies for the following rubbish...)

-- 
Brett Lymn
This email has been sent on behalf of one of the following companies within the 
BAE Systems Australia group of companies:

BAE Systems Australia Limited - Australian Company Number 008 423 005
BAE Systems Australia Defence Pty Limited - Australian Company Number 006 
870 846
BAE Systems Australia Logistics Pty Limited - Australian Company Number 086 
228 864

Our registered office is Evans Building, Taranaki Road, Edinburgh Parks,
Edinburgh, South Australia, 5111. If the identity of the sending company is
not clear from the content of this email please contact the sender.

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy or
disclose its content, but please reply to this email immediately and highlight
the error to the sender and then immediately delete the message.



Re: FWIW: sysrestrict

2016-07-29 Thread Maxime Villard

Le 28/07/2016 à 20:42, Joerg Sonnenberger a écrit :

On Wed, Jul 27, 2016 at 02:48:44PM +0200, Maxime Villard wrote:

It is not trying to "prevent" an attack, it is supposed to restrict what
the attacker can do. Veriexec too is useful only if there is already an
intruder. And mapping .rodata as R reduces the possibility of ROP attacks
only if there is already a vulnerability that could allow an attacker to
jump to a chosen address.


The difference is that correctly configured veriexec is a system-wide
property. It doesn't matter if you can exec something, you don't get to
execute binaries that weren't signed. Separate PT_LOAD for .rodata only
has some memory use and legacy compatibility issues, IMO it doesn't even
qualify as a "security" mechanism but just as actually enforcing the
constraints that exist.


If a vulnerability exists in a software (root or not) that allows control
of the execution flow, the attacker will often have a small payload of
shellcode, and this shellcode will try to load a bigger shellcode. Doing
this involves using special syscalls. If these syscalls are not available
the attack fails. Of course, there are many other things the attacker can
do with the small shellcode, but at least it restricts the attack surface.


The shell-payload model is only that popular because it is trivial to
adopt. Consider it a small portable VM for exploits. As I said earlier,
if your capability system allows more capabilities after an exec than
before without very specific introductions to do so, it is plainly
broken. Arguably, suid is exactly that, but it is a well understood
issue.

So let's look at a RCE exploit in an FTP server. I must be able to do
pretty much arbitrary network IO as FTP server, so the exploited FTP
server process must be able to do the same. I can access all files the
FTP user has access to and possible alo write to given locactions. In
short, I can pretty much do whatever I want with the box and the
operational constraints are likely to be irrelevant. Yes, it is harder
to exploit if you can't actually exec something. For a typical
environment, it won't stop writing a DSO and dlopen'ing it for example.
Or just creating it in memory with all the system call stubs needed.

So let's try again, except protection from script kiddies, what's the
real point?



Let's try again: I have already said it is not the perfect feature, I have
already said that it does not offer the exhaustive isolation people like
you may expect, I have already said that it just restricts syscalls without
totally preventing attacks, I have already said that I don't intend to
commit it anyway, and finally, I have already said that it is just a wild
idea with some code that I have never tested.

You (and Thor) believe the thin layer of security it offers is not that
interesting after all - and at some point, I agree with you. If people
believe that with a few adaptations it could be made better, they are
obviously free to take the code and do whatever they want with it.

But your point more or less comes down to arguing that if a house does not
have a bullet-proof door, then we should just let that door wide open since
someone will still be able to break it with a bazooka anyway. The truth of
the matter is, only few people have bazookas - and only few exploits (as
far as I know) play with dlopen to exploit simple bugs. Most intruders only
have knives and hammers, and it does not mean they are amateurs - most
exploits rely on syscalls, and it does not mean they are written by script
kiddies.

I am not going to insist more on this, I'm not trying to sell anything.


Re: FWIW: sysrestrict

2016-07-28 Thread Joerg Sonnenberger
On Wed, Jul 27, 2016 at 02:48:44PM +0200, Maxime Villard wrote:
> It is not trying to "prevent" an attack, it is supposed to restrict what
> the attacker can do. Veriexec too is useful only if there is already an
> intruder. And mapping .rodata as R reduces the possibility of ROP attacks
> only if there is already a vulnerability that could allow an attacker to
> jump to a chosen address.

The difference is that correctly configured veriexec is a system-wide
property. It doesn't matter if you can exec something, you don't get to
execute binaries that weren't signed. Separate PT_LOAD for .rodata only
has some memory use and legacy compatibility issues, IMO it doesn't even
qualify as a "security" mechanism but just as actually enforcing the
constraints that exist.

> If a vulnerability exists in a software (root or not) that allows control
> of the execution flow, the attacker will often have a small payload of
> shellcode, and this shellcode will try to load a bigger shellcode. Doing
> this involves using special syscalls. If these syscalls are not available
> the attack fails. Of course, there are many other things the attacker can
> do with the small shellcode, but at least it restricts the attack surface.

The shell-payload model is only that popular because it is trivial to
adopt. Consider it a small portable VM for exploits. As I said earlier,
if your capability system allows more capabilities after an exec than
before without very specific introductions to do so, it is plainly
broken. Arguably, suid is exactly that, but it is a well understood
issue.

So let's look at a RCE exploit in an FTP server. I must be able to do
pretty much arbitrary network IO as FTP server, so the exploited FTP
server process must be able to do the same. I can access all files the
FTP user has access to and possible alo write to given locactions. In
short, I can pretty much do whatever I want with the box and the
operational constraints are likely to be irrelevant. Yes, it is harder
to exploit if you can't actually exec something. For a typical
environment, it won't stop writing a DSO and dlopen'ing it for example.
Or just creating it in memory with all the system call stubs needed.

So let's try again, except protection from script kiddies, what's the
real point?

Joerg


Re: FWIW: sysrestrict

2016-07-27 Thread Thor Lancelot Simon
On Wed, Jul 27, 2016 at 02:48:44PM +0200, Maxime Villard wrote:
> 
> For example, if a vulnerability in ftpd could allow a RCE, it is highly
> likely that the shellcode will only consist in execve'ing a downloaded
> executable.

This kind of mitigation is not without value, but I think its value is
quite limited.  Attackers adapt to this sort of thing faster than we tend
to expect.  And they tend to see it as a fun challenge, so even though one
might wonder why they'd bother adapting their exploits specifically to work
under sysrestrict on NetBSD, in my experience they in fact will do so more
often than you think.

> There are also many other examples in which restricting
> syscalls would actually entirely prevent the exploitation of
> vulnerabilities.

I would suggest, rather, that some appropriate adjustments to what you've
already done (specifically, to make it possible to restrict how new file
descriptors can be obtained) would actually make it possible to prove
useful statements about what the attacker can do _even if_ she adapts
her shellcode.  Examples in which just prohibiting syscalls entirely prevents
the explotiation of vulnerabilities generally boil down to examples of bugs
in those syscalls.

I've done something, as I said, extremely similar to what you're doing here,
as a prototype in a commercial product that ran NetBSD.  It did not quite
give us the benefit we expected, but with some adjustments (basically the
ones I outlined in my first message), it was pretty cool.

Thor


Re: FWIW: sysrestrict

2016-07-27 Thread Maxime Villard

Le 26/07/2016 à 11:56, Joerg Sonnenberger a écrit :

It's just obvious: we don't want ftpd to call modctl, or execve (even if it
currently does), or mount, or reboot, or swapctl, etc. And it gets solved
by restricting those syscalls.


You haven't answered my question. "I don't want to allow calls to foo"
is not a problem. Let's ignore for a moment that the majority of your list
is restricted to root and you have lost already in the UNIX security
model if your code is running as root. What's the purpose of not
allowing execve? In a sensible capability system (which pledge is not
for exactly this reason), switching to a different binary is just
another form of running arbitrary code. If you can do the latter
already, the former doesn't gain you anything. But this is still a
detail of the mechanism. It doesn't answer the fundamental question of
what problem you are trying to solve. What attack is this mechanism
supposed to prevent?


It is not trying to "prevent" an attack, it is supposed to restrict what
the attacker can do. Veriexec too is useful only if there is already an
intruder. And mapping .rodata as R reduces the possibility of ROP attacks
only if there is already a vulnerability that could allow an attacker to
jump to a chosen address.

If a vulnerability exists in a software (root or not) that allows control
of the execution flow, the attacker will often have a small payload of
shellcode, and this shellcode will try to load a bigger shellcode. Doing
this involves using special syscalls. If these syscalls are not available
the attack fails. Of course, there are many other things the attacker can
do with the small shellcode, but at least it restricts the attack surface.

For example, if a vulnerability in ftpd could allow a RCE, it is highly
likely that the shellcode will only consist in execve'ing a downloaded
executable. There are also many other examples in which restricting
syscalls would actually entirely prevent the exploitation of
vulnerabilities.

And beyond the security aspect, a feature like sysrestrict could be useful
for general consistency; by restricting syscalls in the base binaries, we
could make sure no change (in libc, for example) would make them execute an
unusual syscall.


Re: FWIW: sysrestrict

2016-07-26 Thread Joerg Sonnenberger
On Mon, Jul 25, 2016 at 02:23:00PM +0200, Maxime Villard wrote:
> Le 24/07/2016 à 22:57, Joerg Sonnenberger a écrit :
> > On Sun, Jul 24, 2016 at 01:09:46PM +0200, Maxime Villard wrote:
> > > The goal of sysrestrict (and pledge, and whatever else) is not to provide 
> > > the
> > > perfect feature that will control absolutely everything. The goal is just 
> > > to
> > > provide an additionnal, simple layer of restriction. It is a combination 
> > > of
> > > such features that can mostly reach the granularity you want. Sysrestrict 
> > > for
> > > syscalls, UNIX file permissions for VFS, kauth for kernel permissions, 
> > > Veriexec
> > > for binary permissions, etc.
> > 
> > Frankly, I haven't seen many use cases for pledge so far that actually
> > make sense. While I do see a certain sense in allowing a fully sandboxed
> > process hierachy, that can already be obtained to a degree with ptrace.
> > If you want to actually get something like this into the tree, you should
> > start at the beginning. What problem is it trying to solve, why is that
> > problem relevant and how does is it gotten solved?
> > 
> 
> It's just obvious: we don't want ftpd to call modctl, or execve (even if it
> currently does), or mount, or reboot, or swapctl, etc. And it gets solved
> by restricting those syscalls.

You haven't answered my question. "I don't want to allow calls to foo"
is not a problem. Let's ignore for a moment that the majority of your list
is restricted to root and you have lost already in the UNIX security
model if your code is running as root. What's the purpose of not
allowing execve? In a sensible capability system (which pledge is not
for exactly this reason), switching to a different binary is just
another form of running arbitrary code. If you can do the latter
already, the former doesn't gain you anything. But this is still a
detail of the mechanism. It doesn't answer the fundamental question of
what problem you are trying to solve. What attack is this mechanism
supposed to prevent?

Joerg


Re: FWIW: sysrestrict

2016-07-26 Thread Kamil Rytarowski


On 23.07.2016 10:36, Maxime Villard wrote:
> Eight months ago, I shared with a few developers the code for a kernel
> interface [1] that can disable syscalls in user processes.
> 
> The idea is the following: a syscall bitmap is embedded into the ELF binary
> itself (in a note section, like PaX), and each time the binary performs a
> syscall, the kernel checks whether the syscall in question is allowed in
> the bitmap.
> 
> In details:
>  - the ELF section is a bitmap of 64 bytes, which means 512 bits, the
>number of syscalls. 0 means allowed, 1 means restricted.
>  - in the proc structure, 64 bytes are present, just a copy of the
>ELF section.
>  - when a syscall is performed, the kernel calls sysrestrict_enforce
>with the proc structure and the syscall number, and gives a look
>at the bitmap to make sure it is allowed. If it isn't, the process
>is killed.
>  - a new syscall is added, sysrestrict, so that programs can restrict
>a syscall at runtime. This might be useful, particularly if a
>program calls a syscall once and wants to make sure it is not
>allowed any longer.
>  - a userland tool (that I didn't write) can add and update such an ELF
>section in the binary.
> 
> This interface has the following advantages over most already-existing
> implementations:
>  - it is system-independent, it could almost be copied as-is in FreeBSD.
>  - it is syscall-independent, we don't need to patch each syscall.
>  - it does not require binaries to be recompiled.
>  - the performance cost is low, if not non-existent.
> 
> I've never tested this code. But in case it inspires or motivates someone.
> 
> [1] http://m00nbsd.net/garbage/sysrestrict/

I like this approach of not shipping external toolchain for new ABI
(CloudABI) and not patching and rebuilding software (pledge).

About the restrictions with paths (like prohibiting/permitting $HOME or
/etc access), how about making it a separate interface? It's currently
built into the pledge() interface:

"int pledge(const char *promises, const char *paths[]);"

That way people can use one or the other mechanism, or both. I think it
could also make sense to have compatibility support with the pledge()
interface - with an external libpledge library. To achieve this it would
be needed to have a capability to drop access to previously allowed
syscalls by an executable.



signature.asc
Description: OpenPGP digital signature


Re: FWIW: sysrestrict

2016-07-25 Thread Thor Lancelot Simon
On Sat, Jul 23, 2016 at 03:52:03PM -0700, Alistair Crooks wrote:
> 
> My main problem is that simply outlawing system calls is a very
> coarse-grained hammer. I may want a binary to be able to open files
> for writing in /tmp, but not open any files in /etc for writing. Or
> reading files in my home directory, except for anything in ~/.ssh or
> ~/.gnupg. How does sysrestrict cope with this?

Having been down this path before, I agree.  In particular, though you
might think you could get somewhere by forbidding the system calls that
return new file descriptors, this turns out to break most programs that
weren't specially written to work with sysrestrict (which we called
something else, but it was the same thing).

When I did this before, we ended up with something roughly like sysrestrict
plus a set of restrictions on each system call that could return a new fd.
The ones to do with networking had to be a little more subtle.  It is also
probably necessary to restrict mmap() but that is even harder.  However,
with all that done, I think a system like this could even allow you to prove
many useful properties about the impact of a particular program on system
security under realistic assumptions.

We did use kauth; I considered using a bitmap like yours to short-circuit
the logic for calls with no restrictions, but for my application kauth was
fast enough.

I do like baking the restrictions into the binary.

Thor


Re: FWIW: sysrestrict

2016-07-25 Thread Maxime Villard

Le 24/07/2016 à 22:57, Joerg Sonnenberger a écrit :

On Sun, Jul 24, 2016 at 01:09:46PM +0200, Maxime Villard wrote:

The goal of sysrestrict (and pledge, and whatever else) is not to provide the
perfect feature that will control absolutely everything. The goal is just to
provide an additionnal, simple layer of restriction. It is a combination of
such features that can mostly reach the granularity you want. Sysrestrict for
syscalls, UNIX file permissions for VFS, kauth for kernel permissions, Veriexec
for binary permissions, etc.


Frankly, I haven't seen many use cases for pledge so far that actually
make sense. While I do see a certain sense in allowing a fully sandboxed
process hierachy, that can already be obtained to a degree with ptrace.
If you want to actually get something like this into the tree, you should
start at the beginning. What problem is it trying to solve, why is that
problem relevant and how does is it gotten solved?



It's just obvious: we don't want ftpd to call modctl, or execve (even if it
currently does), or mount, or reboot, or swapctl, etc. And it gets solved
by restricting those syscalls.

I didn't start this thread with the intention of getting anything into the
tree. As I said, it is just an idea.


Re: FWIW: sysrestrict

2016-07-24 Thread Joerg Sonnenberger
On Sun, Jul 24, 2016 at 01:09:46PM +0200, Maxime Villard wrote:
> The goal of sysrestrict (and pledge, and whatever else) is not to provide the
> perfect feature that will control absolutely everything. The goal is just to
> provide an additionnal, simple layer of restriction. It is a combination of
> such features that can mostly reach the granularity you want. Sysrestrict for
> syscalls, UNIX file permissions for VFS, kauth for kernel permissions, 
> Veriexec
> for binary permissions, etc.

Frankly, I haven't seen many use cases for pledge so far that actually
make sense. While I do see a certain sense in allowing a fully sandboxed
process hierachy, that can already be obtained to a degree with ptrace.
If you want to actually get something like this into the tree, you should
start at the beginning. What problem is it trying to solve, why is that
problem relevant and how does is it gotten solved?

Joerg


Re: FWIW: sysrestrict

2016-07-24 Thread Maxime Villard

Le 24/07/2016 à 00:52, Alistair Crooks a écrit :

ISTM that your sysretsrict suffers from one of the same drawbacks as
pledge/tame/name-du-jour - the restrictions are being burned into the
binary at compile/link time.


No. As I said, the userland tool could add or modify the bitmap in the ELF
section. Sysrestrict does not require any modification at compile or link
time.

You could just take the default firefox binary provided on the project
servers, and sysrestrictctl would add the section.


That might be fine for system binaries
(but some people download distributions from the project servers) that
are built locally - what about anything more than the basics, like an
apache with loadable modules? How do you specify the modular
restrictions? How do we make it so that an apache binary can be
successfully have its restriction set "expanded" to allow modules to
do their job, when that is what sysretsrict is trying to prevent?

I'd be much happier with a variant of seccomp-bpf, or even using lua
to do the same job (if it was performant, JIT-enabled and safe to do
such a thing, I expect not :().

My main problem is that simply outlawing system calls is a very
coarse-grained hammer. I may want a binary to be able to open files
for writing in /tmp, but not open any files in /etc for writing. Or
reading files in my home directory, except for anything in ~/.ssh or
~/.gnupg. How does sysrestrict cope with this?


It is just impossible to reach the perfect granularity. Even with a JIT engine,
we could still demonstrate that we cannot handle the pointer that comes from
a copyin, which comes from a copyin, which comes from another copyin. And even
if we were trying to implement such a feature, we would end up virtualizing the
whole userland->kernel path, and the performance, security and stability impact
would be high.

The goal of sysrestrict (and pledge, and whatever else) is not to provide the
perfect feature that will control absolutely everything. The goal is just to
provide an additionnal, simple layer of restriction. It is a combination of
such features that can mostly reach the granularity you want. Sysrestrict for
syscalls, UNIX file permissions for VFS, kauth for kernel permissions, Veriexec
for binary permissions, etc.


Re: FWIW: sysrestrict

2016-07-24 Thread Maxime Villard

Le 23/07/2016 à 21:36, Matt Thomas a écrit :



On Jul 23, 2016, at 1:36 AM, Maxime Villard  wrote:

Eight months ago, I shared with a few developers the code for a kernel
interface [1] that can disable syscalls in user processes.

The idea is the following: a syscall bitmap is embedded into the ELF binary
itself (in a note section, like PaX), and each time the binary performs a
syscall, the kernel checks whether the syscall in question is allowed in
the bitmap.

In details:
- the ELF section is a bitmap of 64 bytes, which means 512 bits, the
  number of syscalls. 0 means allowed, 1 means restricted.


Seems you only need the number of bytes needed to encode the hightest
restricted syscall.


I don't understand what you mean.


However, I think I'd prefer a level of indirection.  Have
a name of a bitmap embedded which references to a bitmap already loaded.
These would be visible via kern.restriction_sets. which would contain the 
bitmap.
There would also be a sysctl controlling what happens if you try to run a 
program
with an unknown bitmap set which only take effect where securelevel is non-zero.


My idea was to do it rather in userland: for example, a conf file in /etc/ that
associates aliases to several syscalls. Then, the userland tool reads this file
and creates the bitmap as expected if an alias is given in argv.

Like: /etc/sysrestrict.cfg has the following entry

SYSCALL_VFS = SYS_read, SYS_write, SYS_seek

And then, you could just do:

$ sysrestrictctl restrict SYSCALL_VFS [binary]

We would then just add rules for different types of syscalls.




- in the proc structure, 64 bytes are present, just a copy of the
  ELF section.
- when a syscall is performed, the kernel calls sysrestrict_enforce
  with the proc structure and the syscall number, and gives a look
  at the bitmap to make sure it is allowed. If it isn't, the process
  is killed.


What happens when we get more than 512 syscalls?  Is this for NetBSD
binaries only?


- a new syscall is added, sysrestrict, so that programs can restrict
  a syscall at runtime. This might be useful, particularly if a
  program calls a syscall once and wants to make sure it is not
  allowed any longer.


I assume it can't unrestrict.  do you pass the size of the array(s)?


Yes, it can't unrestrict. I don't know which array you are talking about,
but in the syscall, struct sysrestrict_list contains the number of entries
and a int array, and they are copied in.




- a userland tool (that I didn't write) can add and update such an ELF
  section in the binary.

This interface has the following advantages over most already-existing
implementations:
- it is system-independent, it could almost be copied as-is in FreeBSD.
- it is syscall-independent, we don't need to patch each syscall.
- it does not require binaries to be recompiled.
- the performance cost is low, if not non-existent.


If a syscall is restricted, what error is returned?  EPERM?  ENOSYS?


I said the process is killed.


Re: FWIW: sysrestrict

2016-07-24 Thread Maxime Villard

Le 23/07/2016 à 23:50, Paul Goyette a écrit :

I would assume that the checking of syscall restrictions would be done
within the kauth(9) framework?



As I wrote it, it is not. It wouldn't be hard to switch to kauth, but I fear
the performance cost would be higher.


Re: FWIW: sysrestrict

2016-07-23 Thread Alistair Crooks
ISTM that your sysretsrict suffers from one of the same drawbacks as
pledge/tame/name-du-jour - the restrictions are being burned into the
binary at compile/link time. That might be fine for system binaries
(but some people download distributions from the project servers) that
are built locally - what about anything more than the basics, like an
apache with loadable modules? How do you specify the modular
restrictions? How do we make it so that an apache binary can be
successfully have its restriction set "expanded" to allow modules to
do their job, when that is what sysretsrict is trying to prevent?

I'd be much happier with a variant of seccomp-bpf, or even using lua
to do the same job (if it was performant, JIT-enabled and safe to do
such a thing, I expect not :().

My main problem is that simply outlawing system calls is a very
coarse-grained hammer. I may want a binary to be able to open files
for writing in /tmp, but not open any files in /etc for writing. Or
reading files in my home directory, except for anything in ~/.ssh or
~/.gnupg. How does sysrestrict cope with this?

Thanks,
Alistair

On 23 July 2016 at 14:50, Paul Goyette  wrote:
> I would assume that the checking of syscall restrictions would be done
> within the kauth(9) framework?
>
>
> On Sat, 23 Jul 2016, Maxime Villard wrote:
>
>> Eight months ago, I shared with a few developers the code for a kernel
>> interface [1] that can disable syscalls in user processes.
>>
>> The idea is the following: a syscall bitmap is embedded into the ELF
>> binary
>> itself (in a note section, like PaX), and each time the binary performs a
>> syscall, the kernel checks whether the syscall in question is allowed in
>> the bitmap.
>>
>> In details:
>> - the ELF section is a bitmap of 64 bytes, which means 512 bits, the
>>   number of syscalls. 0 means allowed, 1 means restricted.
>> - in the proc structure, 64 bytes are present, just a copy of the
>>   ELF section.
>> - when a syscall is performed, the kernel calls sysrestrict_enforce
>>   with the proc structure and the syscall number, and gives a look
>>   at the bitmap to make sure it is allowed. If it isn't, the process
>>   is killed.
>> - a new syscall is added, sysrestrict, so that programs can restrict
>>   a syscall at runtime. This might be useful, particularly if a
>>   program calls a syscall once and wants to make sure it is not
>>   allowed any longer.
>> - a userland tool (that I didn't write) can add and update such an ELF
>>   section in the binary.
>>
>> This interface has the following advantages over most already-existing
>> implementations:
>> - it is system-independent, it could almost be copied as-is in FreeBSD.
>> - it is syscall-independent, we don't need to patch each syscall.
>> - it does not require binaries to be recompiled.
>> - the performance cost is low, if not non-existent.
>>
>> I've never tested this code. But in case it inspires or motivates someone.
>>
>> [1] http://m00nbsd.net/garbage/sysrestrict/
>>
>> !DSPAM:5793b16a87246213503!
>>
>>
>
> +--+--++
> | Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
> | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com   |
> | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
> +--+--++
>


Re: FWIW: sysrestrict

2016-07-23 Thread Paul Goyette
I would assume that the checking of syscall restrictions would be done 
within the kauth(9) framework?


On Sat, 23 Jul 2016, Maxime Villard wrote:


Eight months ago, I shared with a few developers the code for a kernel
interface [1] that can disable syscalls in user processes.

The idea is the following: a syscall bitmap is embedded into the ELF binary
itself (in a note section, like PaX), and each time the binary performs a
syscall, the kernel checks whether the syscall in question is allowed in
the bitmap.

In details:
- the ELF section is a bitmap of 64 bytes, which means 512 bits, the
  number of syscalls. 0 means allowed, 1 means restricted.
- in the proc structure, 64 bytes are present, just a copy of the
  ELF section.
- when a syscall is performed, the kernel calls sysrestrict_enforce
  with the proc structure and the syscall number, and gives a look
  at the bitmap to make sure it is allowed. If it isn't, the process
  is killed.
- a new syscall is added, sysrestrict, so that programs can restrict
  a syscall at runtime. This might be useful, particularly if a
  program calls a syscall once and wants to make sure it is not
  allowed any longer.
- a userland tool (that I didn't write) can add and update such an ELF
  section in the binary.

This interface has the following advantages over most already-existing
implementations:
- it is system-independent, it could almost be copied as-is in FreeBSD.
- it is syscall-independent, we don't need to patch each syscall.
- it does not require binaries to be recompiled.
- the performance cost is low, if not non-existent.

I've never tested this code. But in case it inspires or motivates someone.

[1] http://m00nbsd.net/garbage/sysrestrict/

!DSPAM:5793b16a87246213503!




+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
+--+--++


Re: FWIW: sysrestrict

2016-07-23 Thread Matt Thomas

> On Jul 23, 2016, at 1:36 AM, Maxime Villard  wrote:
> 
> Eight months ago, I shared with a few developers the code for a kernel
> interface [1] that can disable syscalls in user processes.
> 
> The idea is the following: a syscall bitmap is embedded into the ELF binary
> itself (in a note section, like PaX), and each time the binary performs a
> syscall, the kernel checks whether the syscall in question is allowed in
> the bitmap.
> 
> In details:
> - the ELF section is a bitmap of 64 bytes, which means 512 bits, the
>   number of syscalls. 0 means allowed, 1 means restricted.

Seems you only need the number of bytes needed to encode the hightest
restricted syscall.  However, I think I'd prefer a level of indirection.  Have 
a name of a bitmap embedded which references to a bitmap already loaded.
These would be visible via kern.restriction_sets. which would contain the 
bitmap.
There would also be a sysctl controlling what happens if you try to run a 
program
with an unknown bitmap set which only take effect where securelevel is non-zero.

> - in the proc structure, 64 bytes are present, just a copy of the
>   ELF section.
> - when a syscall is performed, the kernel calls sysrestrict_enforce
>   with the proc structure and the syscall number, and gives a look
>   at the bitmap to make sure it is allowed. If it isn't, the process
>   is killed.

What happens when we get more than 512 syscalls?  Is this for NetBSD
binaries only?  

> - a new syscall is added, sysrestrict, so that programs can restrict
>   a syscall at runtime. This might be useful, particularly if a
>   program calls a syscall once and wants to make sure it is not
>   allowed any longer.

I assume it can't unrestrict.  do you pass the size of the array(s)?

> - a userland tool (that I didn't write) can add and update such an ELF
>   section in the binary.
> 
> This interface has the following advantages over most already-existing
> implementations:
> - it is system-independent, it could almost be copied as-is in FreeBSD.
> - it is syscall-independent, we don't need to patch each syscall.
> - it does not require binaries to be recompiled.
> - the performance cost is low, if not non-existent.

If a syscall is restricted, what error is returned?  EPERM?  ENOSYS?

> I've never tested this code. But in case it inspires or motivates someone.
> 
> [1] http://m00nbsd.net/garbage/sysrestrict/