Re: FWIW: sysrestrict
On Mon, Aug 01, 2016 at 12:31:01PM +0930, LYMN wrote: > On Thu, Jul 28, 2016 at 08:42:49PM +0200, Joerg Sonnenberger wrote: > > > > The difference is that correctly configured veriexec is a system-wide > > property. It doesn't matter if you can exec something, you don't get to > > execute binaries that weren't signed. > > > > Technically, veriexec only runs files that have a valid fingerprint. > We don't, currently, have signing but that would be useful and probably > could be done now. One thing that does seem to get overlooked a lot is That would require an RSA implementation in the kernel, plus some PKCS bits. I have code around here somewhere... Thor
Re: FWIW: sysrestrict
On Thu, Jul 28, 2016 at 08:42:49PM +0200, Joerg Sonnenberger wrote: > > The difference is that correctly configured veriexec is a system-wide > property. It doesn't matter if you can exec something, you don't get to > execute binaries that weren't signed. > Technically, veriexec only runs files that have a valid fingerprint. We don't, currently, have signing but that would be useful and probably could be done now. One thing that does seem to get overlooked a lot is that you can mark a binary as being "indirect" which means that it is allowed to be an interpreter for a shell script but cannot be invoked direct on the command line. So, if you marked /bin/sh as indirect then all properly fingerprinted shell scripts would continue to function but anyone trying to exec /bin/sh would be prevented from doing so. This would provide a bit of a speed hump for some script kiddies, the feature is more intended to provide a way of permitting powerful scripting languages (think perl and the like) without leaving the system wide open (apologies for the following rubbish...) -- Brett Lymn This email has been sent on behalf of one of the following companies within the BAE Systems Australia group of companies: BAE Systems Australia Limited - Australian Company Number 008 423 005 BAE Systems Australia Defence Pty Limited - Australian Company Number 006 870 846 BAE Systems Australia Logistics Pty Limited - Australian Company Number 086 228 864 Our registered office is Evans Building, Taranaki Road, Edinburgh Parks, Edinburgh, South Australia, 5111. If the identity of the sending company is not clear from the content of this email please contact the sender. This email and any attachments may contain confidential and legally privileged information. If you are not the intended recipient, do not copy or disclose its content, but please reply to this email immediately and highlight the error to the sender and then immediately delete the message.
Re: FWIW: sysrestrict
Le 28/07/2016 à 20:42, Joerg Sonnenberger a écrit : On Wed, Jul 27, 2016 at 02:48:44PM +0200, Maxime Villard wrote: It is not trying to "prevent" an attack, it is supposed to restrict what the attacker can do. Veriexec too is useful only if there is already an intruder. And mapping .rodata as R reduces the possibility of ROP attacks only if there is already a vulnerability that could allow an attacker to jump to a chosen address. The difference is that correctly configured veriexec is a system-wide property. It doesn't matter if you can exec something, you don't get to execute binaries that weren't signed. Separate PT_LOAD for .rodata only has some memory use and legacy compatibility issues, IMO it doesn't even qualify as a "security" mechanism but just as actually enforcing the constraints that exist. If a vulnerability exists in a software (root or not) that allows control of the execution flow, the attacker will often have a small payload of shellcode, and this shellcode will try to load a bigger shellcode. Doing this involves using special syscalls. If these syscalls are not available the attack fails. Of course, there are many other things the attacker can do with the small shellcode, but at least it restricts the attack surface. The shell-payload model is only that popular because it is trivial to adopt. Consider it a small portable VM for exploits. As I said earlier, if your capability system allows more capabilities after an exec than before without very specific introductions to do so, it is plainly broken. Arguably, suid is exactly that, but it is a well understood issue. So let's look at a RCE exploit in an FTP server. I must be able to do pretty much arbitrary network IO as FTP server, so the exploited FTP server process must be able to do the same. I can access all files the FTP user has access to and possible alo write to given locactions. In short, I can pretty much do whatever I want with the box and the operational constraints are likely to be irrelevant. Yes, it is harder to exploit if you can't actually exec something. For a typical environment, it won't stop writing a DSO and dlopen'ing it for example. Or just creating it in memory with all the system call stubs needed. So let's try again, except protection from script kiddies, what's the real point? Let's try again: I have already said it is not the perfect feature, I have already said that it does not offer the exhaustive isolation people like you may expect, I have already said that it just restricts syscalls without totally preventing attacks, I have already said that I don't intend to commit it anyway, and finally, I have already said that it is just a wild idea with some code that I have never tested. You (and Thor) believe the thin layer of security it offers is not that interesting after all - and at some point, I agree with you. If people believe that with a few adaptations it could be made better, they are obviously free to take the code and do whatever they want with it. But your point more or less comes down to arguing that if a house does not have a bullet-proof door, then we should just let that door wide open since someone will still be able to break it with a bazooka anyway. The truth of the matter is, only few people have bazookas - and only few exploits (as far as I know) play with dlopen to exploit simple bugs. Most intruders only have knives and hammers, and it does not mean they are amateurs - most exploits rely on syscalls, and it does not mean they are written by script kiddies. I am not going to insist more on this, I'm not trying to sell anything.
Re: FWIW: sysrestrict
On Wed, Jul 27, 2016 at 02:48:44PM +0200, Maxime Villard wrote: > It is not trying to "prevent" an attack, it is supposed to restrict what > the attacker can do. Veriexec too is useful only if there is already an > intruder. And mapping .rodata as R reduces the possibility of ROP attacks > only if there is already a vulnerability that could allow an attacker to > jump to a chosen address. The difference is that correctly configured veriexec is a system-wide property. It doesn't matter if you can exec something, you don't get to execute binaries that weren't signed. Separate PT_LOAD for .rodata only has some memory use and legacy compatibility issues, IMO it doesn't even qualify as a "security" mechanism but just as actually enforcing the constraints that exist. > If a vulnerability exists in a software (root or not) that allows control > of the execution flow, the attacker will often have a small payload of > shellcode, and this shellcode will try to load a bigger shellcode. Doing > this involves using special syscalls. If these syscalls are not available > the attack fails. Of course, there are many other things the attacker can > do with the small shellcode, but at least it restricts the attack surface. The shell-payload model is only that popular because it is trivial to adopt. Consider it a small portable VM for exploits. As I said earlier, if your capability system allows more capabilities after an exec than before without very specific introductions to do so, it is plainly broken. Arguably, suid is exactly that, but it is a well understood issue. So let's look at a RCE exploit in an FTP server. I must be able to do pretty much arbitrary network IO as FTP server, so the exploited FTP server process must be able to do the same. I can access all files the FTP user has access to and possible alo write to given locactions. In short, I can pretty much do whatever I want with the box and the operational constraints are likely to be irrelevant. Yes, it is harder to exploit if you can't actually exec something. For a typical environment, it won't stop writing a DSO and dlopen'ing it for example. Or just creating it in memory with all the system call stubs needed. So let's try again, except protection from script kiddies, what's the real point? Joerg
Re: FWIW: sysrestrict
On Wed, Jul 27, 2016 at 02:48:44PM +0200, Maxime Villard wrote: > > For example, if a vulnerability in ftpd could allow a RCE, it is highly > likely that the shellcode will only consist in execve'ing a downloaded > executable. This kind of mitigation is not without value, but I think its value is quite limited. Attackers adapt to this sort of thing faster than we tend to expect. And they tend to see it as a fun challenge, so even though one might wonder why they'd bother adapting their exploits specifically to work under sysrestrict on NetBSD, in my experience they in fact will do so more often than you think. > There are also many other examples in which restricting > syscalls would actually entirely prevent the exploitation of > vulnerabilities. I would suggest, rather, that some appropriate adjustments to what you've already done (specifically, to make it possible to restrict how new file descriptors can be obtained) would actually make it possible to prove useful statements about what the attacker can do _even if_ she adapts her shellcode. Examples in which just prohibiting syscalls entirely prevents the explotiation of vulnerabilities generally boil down to examples of bugs in those syscalls. I've done something, as I said, extremely similar to what you're doing here, as a prototype in a commercial product that ran NetBSD. It did not quite give us the benefit we expected, but with some adjustments (basically the ones I outlined in my first message), it was pretty cool. Thor
Re: FWIW: sysrestrict
Le 26/07/2016 à 11:56, Joerg Sonnenberger a écrit : It's just obvious: we don't want ftpd to call modctl, or execve (even if it currently does), or mount, or reboot, or swapctl, etc. And it gets solved by restricting those syscalls. You haven't answered my question. "I don't want to allow calls to foo" is not a problem. Let's ignore for a moment that the majority of your list is restricted to root and you have lost already in the UNIX security model if your code is running as root. What's the purpose of not allowing execve? In a sensible capability system (which pledge is not for exactly this reason), switching to a different binary is just another form of running arbitrary code. If you can do the latter already, the former doesn't gain you anything. But this is still a detail of the mechanism. It doesn't answer the fundamental question of what problem you are trying to solve. What attack is this mechanism supposed to prevent? It is not trying to "prevent" an attack, it is supposed to restrict what the attacker can do. Veriexec too is useful only if there is already an intruder. And mapping .rodata as R reduces the possibility of ROP attacks only if there is already a vulnerability that could allow an attacker to jump to a chosen address. If a vulnerability exists in a software (root or not) that allows control of the execution flow, the attacker will often have a small payload of shellcode, and this shellcode will try to load a bigger shellcode. Doing this involves using special syscalls. If these syscalls are not available the attack fails. Of course, there are many other things the attacker can do with the small shellcode, but at least it restricts the attack surface. For example, if a vulnerability in ftpd could allow a RCE, it is highly likely that the shellcode will only consist in execve'ing a downloaded executable. There are also many other examples in which restricting syscalls would actually entirely prevent the exploitation of vulnerabilities. And beyond the security aspect, a feature like sysrestrict could be useful for general consistency; by restricting syscalls in the base binaries, we could make sure no change (in libc, for example) would make them execute an unusual syscall.
Re: FWIW: sysrestrict
On Mon, Jul 25, 2016 at 02:23:00PM +0200, Maxime Villard wrote: > Le 24/07/2016 à 22:57, Joerg Sonnenberger a écrit : > > On Sun, Jul 24, 2016 at 01:09:46PM +0200, Maxime Villard wrote: > > > The goal of sysrestrict (and pledge, and whatever else) is not to provide > > > the > > > perfect feature that will control absolutely everything. The goal is just > > > to > > > provide an additionnal, simple layer of restriction. It is a combination > > > of > > > such features that can mostly reach the granularity you want. Sysrestrict > > > for > > > syscalls, UNIX file permissions for VFS, kauth for kernel permissions, > > > Veriexec > > > for binary permissions, etc. > > > > Frankly, I haven't seen many use cases for pledge so far that actually > > make sense. While I do see a certain sense in allowing a fully sandboxed > > process hierachy, that can already be obtained to a degree with ptrace. > > If you want to actually get something like this into the tree, you should > > start at the beginning. What problem is it trying to solve, why is that > > problem relevant and how does is it gotten solved? > > > > It's just obvious: we don't want ftpd to call modctl, or execve (even if it > currently does), or mount, or reboot, or swapctl, etc. And it gets solved > by restricting those syscalls. You haven't answered my question. "I don't want to allow calls to foo" is not a problem. Let's ignore for a moment that the majority of your list is restricted to root and you have lost already in the UNIX security model if your code is running as root. What's the purpose of not allowing execve? In a sensible capability system (which pledge is not for exactly this reason), switching to a different binary is just another form of running arbitrary code. If you can do the latter already, the former doesn't gain you anything. But this is still a detail of the mechanism. It doesn't answer the fundamental question of what problem you are trying to solve. What attack is this mechanism supposed to prevent? Joerg
Re: FWIW: sysrestrict
On 23.07.2016 10:36, Maxime Villard wrote: > Eight months ago, I shared with a few developers the code for a kernel > interface [1] that can disable syscalls in user processes. > > The idea is the following: a syscall bitmap is embedded into the ELF binary > itself (in a note section, like PaX), and each time the binary performs a > syscall, the kernel checks whether the syscall in question is allowed in > the bitmap. > > In details: > - the ELF section is a bitmap of 64 bytes, which means 512 bits, the >number of syscalls. 0 means allowed, 1 means restricted. > - in the proc structure, 64 bytes are present, just a copy of the >ELF section. > - when a syscall is performed, the kernel calls sysrestrict_enforce >with the proc structure and the syscall number, and gives a look >at the bitmap to make sure it is allowed. If it isn't, the process >is killed. > - a new syscall is added, sysrestrict, so that programs can restrict >a syscall at runtime. This might be useful, particularly if a >program calls a syscall once and wants to make sure it is not >allowed any longer. > - a userland tool (that I didn't write) can add and update such an ELF >section in the binary. > > This interface has the following advantages over most already-existing > implementations: > - it is system-independent, it could almost be copied as-is in FreeBSD. > - it is syscall-independent, we don't need to patch each syscall. > - it does not require binaries to be recompiled. > - the performance cost is low, if not non-existent. > > I've never tested this code. But in case it inspires or motivates someone. > > [1] http://m00nbsd.net/garbage/sysrestrict/ I like this approach of not shipping external toolchain for new ABI (CloudABI) and not patching and rebuilding software (pledge). About the restrictions with paths (like prohibiting/permitting $HOME or /etc access), how about making it a separate interface? It's currently built into the pledge() interface: "int pledge(const char *promises, const char *paths[]);" That way people can use one or the other mechanism, or both. I think it could also make sense to have compatibility support with the pledge() interface - with an external libpledge library. To achieve this it would be needed to have a capability to drop access to previously allowed syscalls by an executable. signature.asc Description: OpenPGP digital signature
Re: FWIW: sysrestrict
On Sat, Jul 23, 2016 at 03:52:03PM -0700, Alistair Crooks wrote: > > My main problem is that simply outlawing system calls is a very > coarse-grained hammer. I may want a binary to be able to open files > for writing in /tmp, but not open any files in /etc for writing. Or > reading files in my home directory, except for anything in ~/.ssh or > ~/.gnupg. How does sysrestrict cope with this? Having been down this path before, I agree. In particular, though you might think you could get somewhere by forbidding the system calls that return new file descriptors, this turns out to break most programs that weren't specially written to work with sysrestrict (which we called something else, but it was the same thing). When I did this before, we ended up with something roughly like sysrestrict plus a set of restrictions on each system call that could return a new fd. The ones to do with networking had to be a little more subtle. It is also probably necessary to restrict mmap() but that is even harder. However, with all that done, I think a system like this could even allow you to prove many useful properties about the impact of a particular program on system security under realistic assumptions. We did use kauth; I considered using a bitmap like yours to short-circuit the logic for calls with no restrictions, but for my application kauth was fast enough. I do like baking the restrictions into the binary. Thor
Re: FWIW: sysrestrict
Le 24/07/2016 à 22:57, Joerg Sonnenberger a écrit : On Sun, Jul 24, 2016 at 01:09:46PM +0200, Maxime Villard wrote: The goal of sysrestrict (and pledge, and whatever else) is not to provide the perfect feature that will control absolutely everything. The goal is just to provide an additionnal, simple layer of restriction. It is a combination of such features that can mostly reach the granularity you want. Sysrestrict for syscalls, UNIX file permissions for VFS, kauth for kernel permissions, Veriexec for binary permissions, etc. Frankly, I haven't seen many use cases for pledge so far that actually make sense. While I do see a certain sense in allowing a fully sandboxed process hierachy, that can already be obtained to a degree with ptrace. If you want to actually get something like this into the tree, you should start at the beginning. What problem is it trying to solve, why is that problem relevant and how does is it gotten solved? It's just obvious: we don't want ftpd to call modctl, or execve (even if it currently does), or mount, or reboot, or swapctl, etc. And it gets solved by restricting those syscalls. I didn't start this thread with the intention of getting anything into the tree. As I said, it is just an idea.
Re: FWIW: sysrestrict
On Sun, Jul 24, 2016 at 01:09:46PM +0200, Maxime Villard wrote: > The goal of sysrestrict (and pledge, and whatever else) is not to provide the > perfect feature that will control absolutely everything. The goal is just to > provide an additionnal, simple layer of restriction. It is a combination of > such features that can mostly reach the granularity you want. Sysrestrict for > syscalls, UNIX file permissions for VFS, kauth for kernel permissions, > Veriexec > for binary permissions, etc. Frankly, I haven't seen many use cases for pledge so far that actually make sense. While I do see a certain sense in allowing a fully sandboxed process hierachy, that can already be obtained to a degree with ptrace. If you want to actually get something like this into the tree, you should start at the beginning. What problem is it trying to solve, why is that problem relevant and how does is it gotten solved? Joerg
Re: FWIW: sysrestrict
Le 24/07/2016 à 00:52, Alistair Crooks a écrit : ISTM that your sysretsrict suffers from one of the same drawbacks as pledge/tame/name-du-jour - the restrictions are being burned into the binary at compile/link time. No. As I said, the userland tool could add or modify the bitmap in the ELF section. Sysrestrict does not require any modification at compile or link time. You could just take the default firefox binary provided on the project servers, and sysrestrictctl would add the section. That might be fine for system binaries (but some people download distributions from the project servers) that are built locally - what about anything more than the basics, like an apache with loadable modules? How do you specify the modular restrictions? How do we make it so that an apache binary can be successfully have its restriction set "expanded" to allow modules to do their job, when that is what sysretsrict is trying to prevent? I'd be much happier with a variant of seccomp-bpf, or even using lua to do the same job (if it was performant, JIT-enabled and safe to do such a thing, I expect not :(). My main problem is that simply outlawing system calls is a very coarse-grained hammer. I may want a binary to be able to open files for writing in /tmp, but not open any files in /etc for writing. Or reading files in my home directory, except for anything in ~/.ssh or ~/.gnupg. How does sysrestrict cope with this? It is just impossible to reach the perfect granularity. Even with a JIT engine, we could still demonstrate that we cannot handle the pointer that comes from a copyin, which comes from a copyin, which comes from another copyin. And even if we were trying to implement such a feature, we would end up virtualizing the whole userland->kernel path, and the performance, security and stability impact would be high. The goal of sysrestrict (and pledge, and whatever else) is not to provide the perfect feature that will control absolutely everything. The goal is just to provide an additionnal, simple layer of restriction. It is a combination of such features that can mostly reach the granularity you want. Sysrestrict for syscalls, UNIX file permissions for VFS, kauth for kernel permissions, Veriexec for binary permissions, etc.
Re: FWIW: sysrestrict
Le 23/07/2016 à 21:36, Matt Thomas a écrit : On Jul 23, 2016, at 1:36 AM, Maxime Villardwrote: Eight months ago, I shared with a few developers the code for a kernel interface [1] that can disable syscalls in user processes. The idea is the following: a syscall bitmap is embedded into the ELF binary itself (in a note section, like PaX), and each time the binary performs a syscall, the kernel checks whether the syscall in question is allowed in the bitmap. In details: - the ELF section is a bitmap of 64 bytes, which means 512 bits, the number of syscalls. 0 means allowed, 1 means restricted. Seems you only need the number of bytes needed to encode the hightest restricted syscall. I don't understand what you mean. However, I think I'd prefer a level of indirection. Have a name of a bitmap embedded which references to a bitmap already loaded. These would be visible via kern.restriction_sets. which would contain the bitmap. There would also be a sysctl controlling what happens if you try to run a program with an unknown bitmap set which only take effect where securelevel is non-zero. My idea was to do it rather in userland: for example, a conf file in /etc/ that associates aliases to several syscalls. Then, the userland tool reads this file and creates the bitmap as expected if an alias is given in argv. Like: /etc/sysrestrict.cfg has the following entry SYSCALL_VFS = SYS_read, SYS_write, SYS_seek And then, you could just do: $ sysrestrictctl restrict SYSCALL_VFS [binary] We would then just add rules for different types of syscalls. - in the proc structure, 64 bytes are present, just a copy of the ELF section. - when a syscall is performed, the kernel calls sysrestrict_enforce with the proc structure and the syscall number, and gives a look at the bitmap to make sure it is allowed. If it isn't, the process is killed. What happens when we get more than 512 syscalls? Is this for NetBSD binaries only? - a new syscall is added, sysrestrict, so that programs can restrict a syscall at runtime. This might be useful, particularly if a program calls a syscall once and wants to make sure it is not allowed any longer. I assume it can't unrestrict. do you pass the size of the array(s)? Yes, it can't unrestrict. I don't know which array you are talking about, but in the syscall, struct sysrestrict_list contains the number of entries and a int array, and they are copied in. - a userland tool (that I didn't write) can add and update such an ELF section in the binary. This interface has the following advantages over most already-existing implementations: - it is system-independent, it could almost be copied as-is in FreeBSD. - it is syscall-independent, we don't need to patch each syscall. - it does not require binaries to be recompiled. - the performance cost is low, if not non-existent. If a syscall is restricted, what error is returned? EPERM? ENOSYS? I said the process is killed.
Re: FWIW: sysrestrict
Le 23/07/2016 à 23:50, Paul Goyette a écrit : I would assume that the checking of syscall restrictions would be done within the kauth(9) framework? As I wrote it, it is not. It wouldn't be hard to switch to kauth, but I fear the performance cost would be higher.
Re: FWIW: sysrestrict
ISTM that your sysretsrict suffers from one of the same drawbacks as pledge/tame/name-du-jour - the restrictions are being burned into the binary at compile/link time. That might be fine for system binaries (but some people download distributions from the project servers) that are built locally - what about anything more than the basics, like an apache with loadable modules? How do you specify the modular restrictions? How do we make it so that an apache binary can be successfully have its restriction set "expanded" to allow modules to do their job, when that is what sysretsrict is trying to prevent? I'd be much happier with a variant of seccomp-bpf, or even using lua to do the same job (if it was performant, JIT-enabled and safe to do such a thing, I expect not :(). My main problem is that simply outlawing system calls is a very coarse-grained hammer. I may want a binary to be able to open files for writing in /tmp, but not open any files in /etc for writing. Or reading files in my home directory, except for anything in ~/.ssh or ~/.gnupg. How does sysrestrict cope with this? Thanks, Alistair On 23 July 2016 at 14:50, Paul Goyettewrote: > I would assume that the checking of syscall restrictions would be done > within the kauth(9) framework? > > > On Sat, 23 Jul 2016, Maxime Villard wrote: > >> Eight months ago, I shared with a few developers the code for a kernel >> interface [1] that can disable syscalls in user processes. >> >> The idea is the following: a syscall bitmap is embedded into the ELF >> binary >> itself (in a note section, like PaX), and each time the binary performs a >> syscall, the kernel checks whether the syscall in question is allowed in >> the bitmap. >> >> In details: >> - the ELF section is a bitmap of 64 bytes, which means 512 bits, the >> number of syscalls. 0 means allowed, 1 means restricted. >> - in the proc structure, 64 bytes are present, just a copy of the >> ELF section. >> - when a syscall is performed, the kernel calls sysrestrict_enforce >> with the proc structure and the syscall number, and gives a look >> at the bitmap to make sure it is allowed. If it isn't, the process >> is killed. >> - a new syscall is added, sysrestrict, so that programs can restrict >> a syscall at runtime. This might be useful, particularly if a >> program calls a syscall once and wants to make sure it is not >> allowed any longer. >> - a userland tool (that I didn't write) can add and update such an ELF >> section in the binary. >> >> This interface has the following advantages over most already-existing >> implementations: >> - it is system-independent, it could almost be copied as-is in FreeBSD. >> - it is syscall-independent, we don't need to patch each syscall. >> - it does not require binaries to be recompiled. >> - the performance cost is low, if not non-existent. >> >> I've never tested this code. But in case it inspires or motivates someone. >> >> [1] http://m00nbsd.net/garbage/sysrestrict/ >> >> !DSPAM:5793b16a87246213503! >> >> > > +--+--++ > | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | > | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | > | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org | > +--+--++ >
Re: FWIW: sysrestrict
I would assume that the checking of syscall restrictions would be done within the kauth(9) framework? On Sat, 23 Jul 2016, Maxime Villard wrote: Eight months ago, I shared with a few developers the code for a kernel interface [1] that can disable syscalls in user processes. The idea is the following: a syscall bitmap is embedded into the ELF binary itself (in a note section, like PaX), and each time the binary performs a syscall, the kernel checks whether the syscall in question is allowed in the bitmap. In details: - the ELF section is a bitmap of 64 bytes, which means 512 bits, the number of syscalls. 0 means allowed, 1 means restricted. - in the proc structure, 64 bytes are present, just a copy of the ELF section. - when a syscall is performed, the kernel calls sysrestrict_enforce with the proc structure and the syscall number, and gives a look at the bitmap to make sure it is allowed. If it isn't, the process is killed. - a new syscall is added, sysrestrict, so that programs can restrict a syscall at runtime. This might be useful, particularly if a program calls a syscall once and wants to make sure it is not allowed any longer. - a userland tool (that I didn't write) can add and update such an ELF section in the binary. This interface has the following advantages over most already-existing implementations: - it is system-independent, it could almost be copied as-is in FreeBSD. - it is syscall-independent, we don't need to patch each syscall. - it does not require binaries to be recompiled. - the performance cost is low, if not non-existent. I've never tested this code. But in case it inspires or motivates someone. [1] http://m00nbsd.net/garbage/sysrestrict/ !DSPAM:5793b16a87246213503! +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org | +--+--++
Re: FWIW: sysrestrict
> On Jul 23, 2016, at 1:36 AM, Maxime Villardwrote: > > Eight months ago, I shared with a few developers the code for a kernel > interface [1] that can disable syscalls in user processes. > > The idea is the following: a syscall bitmap is embedded into the ELF binary > itself (in a note section, like PaX), and each time the binary performs a > syscall, the kernel checks whether the syscall in question is allowed in > the bitmap. > > In details: > - the ELF section is a bitmap of 64 bytes, which means 512 bits, the > number of syscalls. 0 means allowed, 1 means restricted. Seems you only need the number of bytes needed to encode the hightest restricted syscall. However, I think I'd prefer a level of indirection. Have a name of a bitmap embedded which references to a bitmap already loaded. These would be visible via kern.restriction_sets. which would contain the bitmap. There would also be a sysctl controlling what happens if you try to run a program with an unknown bitmap set which only take effect where securelevel is non-zero. > - in the proc structure, 64 bytes are present, just a copy of the > ELF section. > - when a syscall is performed, the kernel calls sysrestrict_enforce > with the proc structure and the syscall number, and gives a look > at the bitmap to make sure it is allowed. If it isn't, the process > is killed. What happens when we get more than 512 syscalls? Is this for NetBSD binaries only? > - a new syscall is added, sysrestrict, so that programs can restrict > a syscall at runtime. This might be useful, particularly if a > program calls a syscall once and wants to make sure it is not > allowed any longer. I assume it can't unrestrict. do you pass the size of the array(s)? > - a userland tool (that I didn't write) can add and update such an ELF > section in the binary. > > This interface has the following advantages over most already-existing > implementations: > - it is system-independent, it could almost be copied as-is in FreeBSD. > - it is syscall-independent, we don't need to patch each syscall. > - it does not require binaries to be recompiled. > - the performance cost is low, if not non-existent. If a syscall is restricted, what error is returned? EPERM? ENOSYS? > I've never tested this code. But in case it inspires or motivates someone. > > [1] http://m00nbsd.net/garbage/sysrestrict/