I have been working for a while on a subsystem to restrict programs
into a reduced feature operating model.
Other people have made such systems in the past, but I have never been
happy with them. I don't think I am alone.
Generally there are two models of operation. The first model requires
a major rewrite of application software for effective use
(ie. capsicum). The other model in common use lacks granularity, and
allows or denies an operation throughout the entire lifetime of a
process. As a result, they lack differentiation between program
initialization versus main servicing loop. systrace had the same
problem. My observation is that programs need a large variety of
calls during initialization, but few in their main loops.
Some BPF-style approaches have showed up. So you need to write a
program to observe your program, to keep things secure? That is
insane.
So I asked myself if I could invent a simple system call, which people
would place directly into programs, between initialization and
main-loop.
Secondly, I wondered what kind of semantics such programs would need.
Not just directly themselves, but for DNS and other macro operations.
Anyways, enough explanation. A manual page follows.
Then the kernel diff.
Finally, a sample of 29 userland programs protected to various
degrees by using it:
cat pax ps dmesg ping ping6 dc diff finger from id kdump
logger script sed signify uniq w wc whois arp authpf bgpd
httpd ntpd relayd syslogd tcpdump traceroute
Not all these are perfect, but it shows the trend. The changes
are fairly simple. In the simplest non-network programs, network
access is disabled. In simple network programs, file access goes
away. That is the trend.
Sometimes a program is easily modified, making it better, because
the integration of tame hints at an improvement which will make it
tighter under tame. sed is an example...
TAME(2) System Calls Manual TAME(2)
NAME
tame - restrict system operations
SYNOPSIS
#include sys/tame.h
int
tame(int flags);
DESCRIPTION
The current process is forced into a restricted-service operating mode.
A few subsets are available, roughly described as computation, memory
management, read-write operations on file descriptors, opening of files,
networking. In general, these modes were selected by studying the
operation of many programs using libc and other such interfaces.
Use of tame in an application will require at least some study and
understanding of the interfaces called.
Subsequent calls to tame() can reduce abilities further, but abilities
can never be regained.
A process which attempts a restricted operation is killed with SIGKILL.
If TAME_ABORT is set, then a non-blockable SIGABRT is delivered instead,
possibly resulting in a core(5) file.
A flags value of 0 restricts the process to the _exit(2) system call.
This can be used for pure computation operating on memory shared with
another process.
All TAME_* options below (with the exception of TAME_ABORT) permit the
following system calls:
clock_getres(2), clock_gettime(2), fchdir(2), getdtablecount(2),
getegid(2), geteuid(2), getgid(2), getgroups(2), getitimer(2),
getlogin(2), getpgid(2), getpgrp(2), getpid(2), getppid(2),
getresgid(2), getresuid(2), getrlimit(2), getsid(2), getthrid(2),
gettimeofday(2), getuid(2), getuid(2), issetugid(2), nanosleep(2),
sendsyslog(2), setitimer(2), sigaction(2), sigprocmask(2),
sigreturn(2), umask(2), wait4(2).
Calls allowed with restrictions include:
sysctl(3) A small set of read-only operations are allowed,
sufficient to support: getifaddrs(3),
getdomainname(3), gethostname(3), system sensor
readings.
access(2) May check for existance of /etc/localtime.
adjtime(2)Read-only, for ntpd(8).
open(2) May open /etc/localtime, any files below
/usr/share/zoneinfo and files ending in libc.cat in
below the directory /usr/share/nls/.
readlink(2) May operate on /etc/malloc.conf.
tame(2) Can only reduce permissions.
The flags are specified as a bitwise OR of the following values:
TAME_MALLOC To allow use of the malloc(3) family of functions,
the following system calls are permitted:
getentropy(2), madvise(2), minherit(2), mmap(2),
mprotect(2), mquery(2), munmap(2).
TAME_RW The following system calls are permitted to allow
most types of IO operations on previously allocated
file descriptors, including libevent or handwritten