Re: [RFC 05/10] selinux: support per-task/cred selinux namespace

2017-10-05 Thread James Morris
On Mon, 2 Oct 2017, Stephen Smalley wrote:

> An alternative would be to hang the selinux namespace off of the
> user namespace, which itself is associated with the cred.  This
> seems undesirable however since DAC and MAC are orthogonal, and
> there appear to be real use cases where one will want to use selinux
> namespaces without user namespaces and vice versa. 

Indeed, an Oracle use-case is for privileged containers and for this MAC 
must remain separate.



-- 
James Morris





Re: [RFC 04/10] netns, selinux: create the selinux netlink socket per network namespace

2017-10-05 Thread James Morris
On Mon, 2 Oct 2017, Stephen Smalley wrote:

> This change presumes that one will always unshare the network namespace
> when unsharing a new selinux namespace (the reverse is not required).
> Otherwise, the same inconsistencies could arise between the notifications
> and the relevant policy.  At present, nothing enforces this guarantee
> at the kernel level; it is left up to userspace (e.g. container runtimes).
> It is an open question as to whether this is a good idea or whether
> unsharing of the selinux namespace should automatically unshare the network
> namespace.  

What about logging a kernel warning if just SELinux is unshared?

I think we want to avoid surprising the user by unsharing things for them, 
and yes, it will be possible to mess your system up if you configure it 
badly.

> However, keeping them separate is consistent with the handling
> of the mount namespace currently, which also should be unshared so that
> a private selinuxfs mount can be created.

Right, and this will in practice always be automated and abstracted from 
an end user pov.


-- 
James Morris





Re: [PATCH net-next 4/4] selinux: bpf: Add addtional check for bpf object file receive

2017-10-05 Thread Stephen Smalley
On Thu, 2017-10-05 at 09:37 -0400, Stephen Smalley wrote:
> On Wed, 2017-10-04 at 11:29 -0700, Chenbo Feng wrote:
> > From: Chenbo Feng 
> > 
> > Introduce a bpf object related check when sending and receiving
> > files
> > through unix domain socket as well as binder. It checks if the
> > receiving
> > process have privilege to read/write the bpf map or use the bpf
> > program.
> > This check is necessary because the bpf maps and programs are using
> > a
> > anonymous inode as their shared inode so the normal way of checking
> > the
> > files and sockets when passing between processes cannot work
> > properly
> > on
> > eBPF object. This check only works when the BPF_SYSCALL is
> > configured.
> > 
> > Signed-off-by: Chenbo Feng 
> > ---
> >  include/linux/bpf.h  |  3 +++
> >  kernel/bpf/syscall.c |  4 ++--
> >  security/selinux/hooks.c | 57
> > +++-
> >  3 files changed, 61 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index d757ea3f2228..ac8428a36d56 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -250,6 +250,9 @@ int bpf_prog_test_run_skb(struct bpf_prog
> > *prog,
> > const union bpf_attr *kattr,
> >  #ifdef CONFIG_BPF_SYSCALL
> >  DECLARE_PER_CPU(int, bpf_prog_active);
> >  
> > +extern const struct file_operations bpf_map_fops;
> > +extern const struct file_operations bpf_prog_fops;
> > +
> >  #define BPF_PROG_TYPE(_id, _ops) \
> >     extern const struct bpf_verifier_ops _ops;
> >  #define BPF_MAP_TYPE(_id, _ops) \
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 58ff769d58ab..5789a5359f0a 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -313,7 +313,7 @@ static ssize_t bpf_dummy_write(struct file
> > *filp,
> > const char __user *buf,
> >     return -EINVAL;
> >  }
> >  
> > -static const struct file_operations bpf_map_fops = {
> > +const struct file_operations bpf_map_fops = {
> >  #ifdef CONFIG_PROC_FS
> >     .show_fdinfo= bpf_map_show_fdinfo,
> >  #endif
> > @@ -965,7 +965,7 @@ static void bpf_prog_show_fdinfo(struct
> > seq_file
> > *m, struct file *filp)
> >  }
> >  #endif
> >  
> > -static const struct file_operations bpf_prog_fops = {
> > +const struct file_operations bpf_prog_fops = {
> >  #ifdef CONFIG_PROC_FS
> >     .show_fdinfo= bpf_prog_show_fdinfo,
> >  #endif
> > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > index 41aba4e3d57c..381474ce3216 100644
> > --- a/security/selinux/hooks.c
> > +++ b/security/selinux/hooks.c
> > @@ -1847,6 +1847,7 @@ static int file_has_perm(const struct cred
> > *cred,
> >  
> >     /* av is zero if only checking access to the descriptor.
> > */
> >     rc = 0;
> > +
> >     if (av)
> >     rc = inode_has_perm(cred, inode, av, );
> >  
> > @@ -2142,6 +2143,10 @@ static int
> > selinux_binder_transfer_binder(struct task_struct *from,
> >     NULL);
> >  }
> >  
> > +#ifdef CONFIG_BPF_SYSCALL
> > +static int bpf_fd_pass(struct file *file, u32 sid);
> > +#endif
> > +
> >  static int selinux_binder_transfer_file(struct task_struct *from,
> >     struct task_struct *to,
> >     struct file *file)
> > @@ -2165,6 +2170,12 @@ static int
> > selinux_binder_transfer_file(struct
> > task_struct *from,
> >     return rc;
> >     }
> >  
> > +#ifdef CONFIG_BPF_SYSCALL
> > +   rc = bpf_fd_pass(file, sid);
> > +   if (rc)
> > +   return rc;
> > +#endif
> > +
> >     if (unlikely(IS_PRIVATE(d_backing_inode(dentry
> >     return 0;
> >  
> > @@ -3735,8 +3746,18 @@ static int
> > selinux_file_send_sigiotask(struct
> > task_struct *tsk,
> >  static int selinux_file_receive(struct file *file)
> >  {
> >     const struct cred *cred = current_cred();
> > +   int rc;
> > +
> > +   rc = file_has_perm(cred, file, file_to_av(file));
> > +   if (rc)
> > +   goto out;
> > +
> > +#ifdef CONFIG_BPF_SYSCALL
> > +   rc = bpf_fd_pass(file, cred_sid(sid));
> > +#endif
> >  
> > -   return file_has_perm(cred, file, file_to_av(file));
> > +out:
> > +   return rc;
> >  }
> >  
> >  static int selinux_file_open(struct file *file, const struct cred
> > *cred)
> > @@ -6288,6 +6309,40 @@ static u32 bpf_map_fmode_to_av(fmode_t
> > fmode)
> >     return av;
> >  }
> >  
> > +/* This function will check the file pass through unix socket or
> > binder to see
> > + * if it is a bpf related object. And apply correspinding checks
> > on
> > the bpf
> > + * object based on the type. The bpf maps and programs, not like
> > other files and
> > + * socket, are using a shared anonymous inode inside the kernel as
> > their inode.
> > + * So checking that inode cannot identify if the process have
> > privilege to
> > + * access the bpf object and that's why we have to add this
> > additional check in
> > + * selinux_file_receive and 

Re: [RFC 09/10] selinux: add a selinuxfs interface to unshare selinux namespace

2017-10-05 Thread Stephen Smalley
On Thu, 2017-10-05 at 11:49 -0400, Stephen Smalley wrote:
> On Thu, 2017-10-05 at 11:27 -0400, Stephen Smalley wrote:
> > On Mon, 2017-10-02 at 11:58 -0400, Stephen Smalley wrote:
> > > Provide a userspace API to unshare the selinux namespace.
> > > Currently implemented via a selinuxfs node. This could be
> > > coupled with unsharing of other namespaces (e.g.  mount
> > > namespace,
> > > network namespace) that will always be needed or left
> > > independent.
> > > Don't get hung up on the interface itself, it is just to allow
> > > experimentation and testing.
> > > 
> > > Sample usage:
> > > echo 1 > /sys/fs/selinux/unshare
> > > unshare -m -n
> > > umount /sys/fs/selinux
> > > mount -t selinuxfs none /sys/fs/selinux
> > > load_policy
> > > getenforce
> > > id
> > > echo $$
> > 
> > For added fun, you can do the following after unsharing and loading
> > a
> > policy into your namespace above:
> > # Transition from kernel context to an unconfined context.
> > runcon unconfined_u:unconfined_u:unconfined_t:s0:c0.c1023 /bin/bash

That should be:
runcon unconfined_u:unconfined_r:unconfined_t:s0:c0.c1023 /bin/bash


> > # Allow use of file descriptors inherited from the parent
> > namespace,
> > e.g the pty.
> > cat < allowunlabeledfd.cil
> > (allow domain unlabeled_t (fd (use)))
> > EOF
> > semodule -i allowunlabeledfd.cil
> 
> Also:
> restorecon -R /dev
> to fix up the /dev node contexts in your namespace.
> 
> > # Switch namespace to enforcing mode
> > setenforce 1
> > # Run the selinux testsuite
> > cd /path/to/selinux-testsuite
> > make test
> > 
> > inet_socket test failures are expected due to running in a non-init
> > network namespace; they don't work even without unsharing the
> > selinux
> > namespace.
> > 
> > > 
> > > The above will show that the process now views itself as running
> > > in
> > > the
> > > kernel domain in permissive mode, as would be the case at boot.
> > > > From a different shell on the host system, running ps -eZ or
> > > 
> > > cat /proc//attr/current will show that the process that
> > > unshared its selinux namespace is still running in its original
> > > context in the initial namespace, and getenforce will show the
> > > the initial namespace remains enforcing.  Enforcing mode or
> > > policy
> > > changes in the child will not affect the parent.
> > > 
> > > This is not yet safe; do not use on production systems.
> > > Known issues include at least the following items:
> > > 
> > > * The policy loading code has not been thoroughly audited
> > > and hardened for use by unprivileged code, both with respect to
> > > ensuring that the policy is internally consistent and restricting
> > > the range of values used from the policy as loop bounds and
> > > memory
> > > allocation sizes to sane limits.
> > > 
> > > * The SELinux hook functions have not been modified to be
> > > namespace-aware, so the hooks only perform checking against the
> > > current namespace.  Thus, unsharing allows the process to escape
> > > confinement by the parent.  Fixing this requires updating each
> > > hook
> > > to
> > > perform its processing on the current namespace and all of its
> > > ancestors
> > > up to the init namespace.
> > > 
> > > * Some of the hook functions can be called outside of process
> > > context
> > > (e.g. task_kill, send_sigiotask, network input/forward) and
> > > should
> > > not use
> > > the current task's selinux namespace. These hooks need to be
> > > updated
> > > to
> > > obtain the proper selinux namespace to use instead from the
> > > caller
> > > or
> > > cached in a suitable data structure (e.g. the file or sock
> > > security
> > > structures).
> > > 
> > > * There are number of issues with the inode and superblock
> > > security
> > > blob
> > > handling for multiple namespaces, see those commits for more
> > > details.
> > > 
> > > * Only a subset of object security blobs have been updated to
> > > be namespace-aware and support multiple namespaces.  The ones
> > > that
> > > have not yet been updated could end up performing permission
> > > checks
> > > or
> > > other operations on SIDs created in a different selinux
> > > namespace.
> > > 
> > > * The network SID caches (netif, netnode, netport) have not yet
> > > been instantiated per selinux namespace, unlike the AVC and SS.
> > > 
> > > * There is no way currently to restrict or bound nesting of
> > > namespaces; if you allow it to a domain in the init namespace,
> > > then that domain can in turn unshare to arbitrary depths and can
> > > grant the same to any domain in its own policy.  Related to this
> > > is the fact that there is no way to control resource usage due to
> > > selinux namespaces and they can be substantial (per-namespace
> > > policydb, sidtab, AVC, etc).
> > > 
> > > * SIDs may be cached by audit and networking code and in external
> > > kernel data structures and used later, potentially in a different
> > > selinux namespace than the one in which the SID was originally
> > > 

Re: [PATCH net-next 3/4] selinux: bpf: Add selinux check for eBPF syscall operations

2017-10-05 Thread Daniel Borkmann

On 10/05/2017 03:28 PM, Stephen Smalley wrote:
[...]

+static int selinux_bpf_prog(struct bpf_prog *prog)
+{
+   u32 sid = current_sid();
+   struct bpf_security_struct *bpfsec;
+
+   bpfsec = prog->aux->security;


I haven't looked closely at the bpf code, but is it guaranteed that
prog->aux cannot be NULL here?  What's the difference in lifecycle for
bpf_prog vs bpf_prog_aux?  Could the aux field be shared across progs
created by different processes?


prog->aux cannot be NULL here, its lifetime is 1:1 with prog,
it holds slow-path auxiliary data unlike prog itself which is
additionally set to read-only after initial setup until teardown;
aux is also never shared across BPF progs, so always 1:1 relation
to prog itself.

Things that can be shared across multiple BPF programs and user
space processes are BPF maps themselves. From user space side
they can be passed via fd or pinned/retrieved from bpf fs, so
as currently implemented the void *security member you'd hold
in struct bpf_map would refer to the initial creator process of
the map here that doesn't strictly need to be alive anymore;
similarly for prog itself, when its shared between multiple
user space processes, *security ctx would point to the one that
installed it into the kernel initially.


Re: [RFC 09/10] selinux: add a selinuxfs interface to unshare selinux namespace

2017-10-05 Thread Stephen Smalley
On Thu, 2017-10-05 at 11:27 -0400, Stephen Smalley wrote:
> On Mon, 2017-10-02 at 11:58 -0400, Stephen Smalley wrote:
> > Provide a userspace API to unshare the selinux namespace.
> > Currently implemented via a selinuxfs node. This could be
> > coupled with unsharing of other namespaces (e.g.  mount namespace,
> > network namespace) that will always be needed or left independent.
> > Don't get hung up on the interface itself, it is just to allow
> > experimentation and testing.
> > 
> > Sample usage:
> > echo 1 > /sys/fs/selinux/unshare
> > unshare -m -n
> > umount /sys/fs/selinux
> > mount -t selinuxfs none /sys/fs/selinux
> > load_policy
> > getenforce
> > id
> > echo $$
> 
> For added fun, you can do the following after unsharing and loading a
> policy into your namespace above:
> # Transition from kernel context to an unconfined context.
> runcon unconfined_u:unconfined_u:unconfined_t:s0:c0.c1023 /bin/bash
> # Allow use of file descriptors inherited from the parent namespace,
> e.g the pty.
> cat < allowunlabeledfd.cil
> (allow domain unlabeled_t (fd (use)))
> EOF
> semodule -i allowunlabeledfd.cil

Also:
restorecon -R /dev
to fix up the /dev node contexts in your namespace.

> # Switch namespace to enforcing mode
> setenforce 1
> # Run the selinux testsuite
> cd /path/to/selinux-testsuite
> make test
> 
> inet_socket test failures are expected due to running in a non-init
> network namespace; they don't work even without unsharing the selinux
> namespace.
> 
> > 
> > The above will show that the process now views itself as running in
> > the
> > kernel domain in permissive mode, as would be the case at boot.
> > > From a different shell on the host system, running ps -eZ or
> > 
> > cat /proc//attr/current will show that the process that
> > unshared its selinux namespace is still running in its original
> > context in the initial namespace, and getenforce will show the
> > the initial namespace remains enforcing.  Enforcing mode or policy
> > changes in the child will not affect the parent.
> > 
> > This is not yet safe; do not use on production systems.
> > Known issues include at least the following items:
> > 
> > * The policy loading code has not been thoroughly audited
> > and hardened for use by unprivileged code, both with respect to
> > ensuring that the policy is internally consistent and restricting
> > the range of values used from the policy as loop bounds and memory
> > allocation sizes to sane limits.
> > 
> > * The SELinux hook functions have not been modified to be
> > namespace-aware, so the hooks only perform checking against the
> > current namespace.  Thus, unsharing allows the process to escape
> > confinement by the parent.  Fixing this requires updating each hook
> > to
> > perform its processing on the current namespace and all of its
> > ancestors
> > up to the init namespace.
> > 
> > * Some of the hook functions can be called outside of process
> > context
> > (e.g. task_kill, send_sigiotask, network input/forward) and should
> > not use
> > the current task's selinux namespace. These hooks need to be
> > updated
> > to
> > obtain the proper selinux namespace to use instead from the caller
> > or
> > cached in a suitable data structure (e.g. the file or sock security
> > structures).
> > 
> > * There are number of issues with the inode and superblock security
> > blob
> > handling for multiple namespaces, see those commits for more
> > details.
> > 
> > * Only a subset of object security blobs have been updated to
> > be namespace-aware and support multiple namespaces.  The ones that
> > have not yet been updated could end up performing permission checks
> > or
> > other operations on SIDs created in a different selinux namespace.
> > 
> > * The network SID caches (netif, netnode, netport) have not yet
> > been instantiated per selinux namespace, unlike the AVC and SS.
> > 
> > * There is no way currently to restrict or bound nesting of
> > namespaces; if you allow it to a domain in the init namespace,
> > then that domain can in turn unshare to arbitrary depths and can
> > grant the same to any domain in its own policy.  Related to this
> > is the fact that there is no way to control resource usage due to
> > selinux namespaces and they can be substantial (per-namespace
> > policydb, sidtab, AVC, etc).
> > 
> > * SIDs may be cached by audit and networking code and in external
> > kernel data structures and used later, potentially in a different
> > selinux namespace than the one in which the SID was originally
> > created.
> > 
> > * No doubt other things I'm forgetting or haven't thought of.
> > Use at your own risk.
> > 
> > Not-signed-off-by: Stephen Smalley 
> > ---
> >  security/selinux/include/classmap.h |  3 +-
> >  security/selinux/selinuxfs.c| 66
> > +
> >  2 files changed, 68 insertions(+), 1 deletion(-)
> > 
> > diff --git a/security/selinux/include/classmap.h
> > 

Re: [RFC 09/10] selinux: add a selinuxfs interface to unshare selinux namespace

2017-10-05 Thread Stephen Smalley
On Mon, 2017-10-02 at 11:58 -0400, Stephen Smalley wrote:
> Provide a userspace API to unshare the selinux namespace.
> Currently implemented via a selinuxfs node. This could be
> coupled with unsharing of other namespaces (e.g.  mount namespace,
> network namespace) that will always be needed or left independent.
> Don't get hung up on the interface itself, it is just to allow
> experimentation and testing.
> 
> Sample usage:
> echo 1 > /sys/fs/selinux/unshare
> unshare -m -n
> umount /sys/fs/selinux
> mount -t selinuxfs none /sys/fs/selinux
> load_policy
> getenforce
> id
> echo $$

For added fun, you can do the following after unsharing and loading a
policy into your namespace above:
# Transition from kernel context to an unconfined context.
runcon unconfined_u:unconfined_u:unconfined_t:s0:c0.c1023 /bin/bash
# Allow use of file descriptors inherited from the parent namespace, e.g the 
pty.
cat < allowunlabeledfd.cil
(allow domain unlabeled_t (fd (use)))
EOF
semodule -i allowunlabeledfd.cil
# Switch namespace to enforcing mode
setenforce 1
# Run the selinux testsuite
cd /path/to/selinux-testsuite
make test

inet_socket test failures are expected due to running in a non-init
network namespace; they don't work even without unsharing the selinux
namespace.

> 
> The above will show that the process now views itself as running in
> the
> kernel domain in permissive mode, as would be the case at boot.
> > From a different shell on the host system, running ps -eZ or
> 
> cat /proc//attr/current will show that the process that
> unshared its selinux namespace is still running in its original
> context in the initial namespace, and getenforce will show the
> the initial namespace remains enforcing.  Enforcing mode or policy
> changes in the child will not affect the parent.
> 
> This is not yet safe; do not use on production systems.
> Known issues include at least the following items:
> 
> * The policy loading code has not been thoroughly audited
> and hardened for use by unprivileged code, both with respect to
> ensuring that the policy is internally consistent and restricting
> the range of values used from the policy as loop bounds and memory
> allocation sizes to sane limits.
> 
> * The SELinux hook functions have not been modified to be
> namespace-aware, so the hooks only perform checking against the
> current namespace.  Thus, unsharing allows the process to escape
> confinement by the parent.  Fixing this requires updating each hook
> to
> perform its processing on the current namespace and all of its
> ancestors
> up to the init namespace.
> 
> * Some of the hook functions can be called outside of process context
> (e.g. task_kill, send_sigiotask, network input/forward) and should
> not use
> the current task's selinux namespace. These hooks need to be updated
> to
> obtain the proper selinux namespace to use instead from the caller or
> cached in a suitable data structure (e.g. the file or sock security
> structures).
> 
> * There are number of issues with the inode and superblock security
> blob
> handling for multiple namespaces, see those commits for more details.
> 
> * Only a subset of object security blobs have been updated to
> be namespace-aware and support multiple namespaces.  The ones that
> have not yet been updated could end up performing permission checks
> or
> other operations on SIDs created in a different selinux namespace.
> 
> * The network SID caches (netif, netnode, netport) have not yet
> been instantiated per selinux namespace, unlike the AVC and SS.
> 
> * There is no way currently to restrict or bound nesting of
> namespaces; if you allow it to a domain in the init namespace,
> then that domain can in turn unshare to arbitrary depths and can
> grant the same to any domain in its own policy.  Related to this
> is the fact that there is no way to control resource usage due to
> selinux namespaces and they can be substantial (per-namespace
> policydb, sidtab, AVC, etc).
> 
> * SIDs may be cached by audit and networking code and in external
> kernel data structures and used later, potentially in a different
> selinux namespace than the one in which the SID was originally
> created.
> 
> * No doubt other things I'm forgetting or haven't thought of.
> Use at your own risk.
> 
> Not-signed-off-by: Stephen Smalley 
> ---
>  security/selinux/include/classmap.h |  3 +-
>  security/selinux/selinuxfs.c| 66
> +
>  2 files changed, 68 insertions(+), 1 deletion(-)
> 
> diff --git a/security/selinux/include/classmap.h
> b/security/selinux/include/classmap.h
> index 35ffb29..82c8f9c 100644
> --- a/security/selinux/include/classmap.h
> +++ b/security/selinux/include/classmap.h
> @@ -39,7 +39,8 @@ struct security_class_mapping secclass_map[] = {
>     { "compute_av", "compute_create", "compute_member",
>   "check_context", "load_policy", "compute_relabel",
>   "compute_user", "setenforce", "setbool", 

Re: [RFC 04/10] netns, selinux: create the selinux netlink socket per network namespace

2017-10-05 Thread Stephen Smalley
On Thu, 2017-10-05 at 10:06 -0400, Stephen Smalley wrote:
> On Thu, 2017-10-05 at 00:47 -0500, Serge E. Hallyn wrote:
> > On Mon, Oct 02, 2017 at 11:58:19AM -0400, Stephen Smalley wrote:
> > > The selinux netlink socket is used to notify userspace of changes
> > > to
> > > the enforcing mode and policy reloads.  At present, these
> > > notifications
> > > are always sent to the initial network namespace.  In order to
> > > support
> > > multiple selinux namespaces, each with its own enforcing mode and
> > > policy, we need to create and use a separate selinux netlink
> > > socket
> > > for each network namespace.
> > 
> > ...
> > 
> > > +static int __init selnl_init(void)
> > > +{
> > > + if (register_pernet_subsys(_net_ops))
> > > + panic("Could not register selinux netlink
> > > operations\n");
> > >   return 0;
> > >  }
> > 
> > This doesn't seem right to me.  If the socket is only used to send
> > notifications to userspace, then every net_ns doesn't need a
> > socket,
> > only the first netns that the selinux ns was associated, right?
> 
> What does "the first netns that the selinux ns was associated" mean?
> We could unshare them in any order; in the sample command sequence I
> gave in the patch description for "selinux: add a selinuxfs interface
> to unshare selinux namespace", I unshared the SELinux namespace
> first,
> then the network namespace, but we could just as easily do it in the
> reverse order (or at the same time if unshare(2) supported that).  So
> you can't assume that the network namespace in which you are running
> at
> the time you unshare selinux namespace is the right one, nor that the
> first unshare of the network namespace after unsharing the selinux
> namespace is the right one (not that we even have a way to catch that
> currently).
> 
> > So long as there is a way to find the netns to which an selinux ns
> > is tied, a userspace logger could even setns into that netns to
> > listen
> > for updates, if it wasn't certain to be in the right ns when it
> > ran.
> > 
> > Otherwise (I haven't peeked ahead) you'll have to keep the *list*
> > of
> > net_ns which live in a given selinuxfs and copy all messages to all
> > of
> > those namesapces?
> 
> No, we only deliver to the network namespace of the process that
> performed the setenforce or policy load (most commonly init, could
> also
> be an admin running a management command or installing a policy
> rpm). 
> We assume the container runtime properly handles unsharing of the
> mount, network, and selinux namespaces before launching the container
> init.  A container process that subsequently unshares its network
> namespace won't see notifications for any subsequent policy reloads
> or
> setenforce calls.  I don't know if that will prove to be a problem in
> practice.

It should be noted however that this wouldn't be a regression, since
today the netlink notifications are only delivered to the init network
namespace.




Re: [RFC 04/10] netns, selinux: create the selinux netlink socket per network namespace

2017-10-05 Thread Stephen Smalley
On Thu, 2017-10-05 at 00:47 -0500, Serge E. Hallyn wrote:
> On Mon, Oct 02, 2017 at 11:58:19AM -0400, Stephen Smalley wrote:
> > The selinux netlink socket is used to notify userspace of changes
> > to
> > the enforcing mode and policy reloads.  At present, these
> > notifications
> > are always sent to the initial network namespace.  In order to
> > support
> > multiple selinux namespaces, each with its own enforcing mode and
> > policy, we need to create and use a separate selinux netlink socket
> > for each network namespace.
> 
> ...
> 
> > +static int __init selnl_init(void)
> > +{
> > +   if (register_pernet_subsys(_net_ops))
> > +   panic("Could not register selinux netlink
> > operations\n");
> >     return 0;
> >  }
> 
> This doesn't seem right to me.  If the socket is only used to send
> notifications to userspace, then every net_ns doesn't need a socket,
> only the first netns that the selinux ns was associated, right?

What does "the first netns that the selinux ns was associated" mean?
We could unshare them in any order; in the sample command sequence I
gave in the patch description for "selinux: add a selinuxfs interface
to unshare selinux namespace", I unshared the SELinux namespace first,
then the network namespace, but we could just as easily do it in the
reverse order (or at the same time if unshare(2) supported that).  So
you can't assume that the network namespace in which you are running at
the time you unshare selinux namespace is the right one, nor that the
first unshare of the network namespace after unsharing the selinux
namespace is the right one (not that we even have a way to catch that
currently).

> So long as there is a way to find the netns to which an selinux ns
> is tied, a userspace logger could even setns into that netns to
> listen
> for updates, if it wasn't certain to be in the right ns when it ran.
> 
> Otherwise (I haven't peeked ahead) you'll have to keep the *list* of
> net_ns which live in a given selinuxfs and copy all messages to all
> of
> those namesapces?

No, we only deliver to the network namespace of the process that
performed the setenforce or policy load (most commonly init, could also
be an admin running a management command or installing a policy rpm). 
We assume the container runtime properly handles unsharing of the
mount, network, and selinux namespaces before launching the container
init.  A container process that subsequently unshares its network
namespace won't see notifications for any subsequent policy reloads or
setenforce calls.  I don't know if that will prove to be a problem in
practice.



Re: [PATCH net-next 3/4] selinux: bpf: Add selinux check for eBPF syscall operations

2017-10-05 Thread Stephen Smalley
On Wed, 2017-10-04 at 11:29 -0700, Chenbo Feng wrote:
> From: Chenbo Feng 
> 
> Implement the actual checks introduced to eBPF related syscalls. This
> implementation use the security field inside bpf object to store a
> sid that
> identify the bpf object. And when processes try to access the object,
> selinux will check if processes have the right privileges. The
> creation
> of eBPF object are also checked at the general bpf check hook and new
> cmd introduced to eBPF domain can also be checked there.
> 
> Signed-off-by: Chenbo Feng 
> Acked-by: Alexei Starovoitov 
> ---
>  security/selinux/hooks.c| 111
> 
>  security/selinux/include/classmap.h |   2 +
>  security/selinux/include/objsec.h   |   4 ++
>  3 files changed, 117 insertions(+)
> 
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index f5d304736852..41aba4e3d57c 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -85,6 +85,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "avc.h"
>  #include "objsec.h"
> @@ -6252,6 +6253,106 @@ static void selinux_ib_free_security(void
> *ib_sec)
>  }
>  #endif
>  
> +#ifdef CONFIG_BPF_SYSCALL
> +static int selinux_bpf(int cmd, union bpf_attr *attr,
> +  unsigned int size)
> +{
> + u32 sid = current_sid();
> + int ret;
> +
> + switch (cmd) {
> + case BPF_MAP_CREATE:
> + ret = avc_has_perm(sid, sid, SECCLASS_BPF_MAP,
> BPF_MAP__CREATE,
> +    NULL);
> + break;
> + case BPF_PROG_LOAD:
> + ret = avc_has_perm(sid, sid, SECCLASS_BPF_PROG,
> BPF_PROG__LOAD,
> +    NULL);
> + break;
> + default:
> + ret = 0;
> + break;
> + }
> +
> + return ret;
> +}
> +
> +static u32 bpf_map_fmode_to_av(fmode_t fmode)
> +{
> + u32 av = 0;
> +
> + if (f_mode & FMODE_READ)
> + av |= BPF_MAP__READ;
> + if (f_mode & FMODE_WRITE)
> + av |= BPF_MAP__WRITE;
> + return av;
> +}
> +
> +static int selinux_bpf_map(struct bpf_map *map, fmode_t fmode)
> +{
> + u32 sid = current_sid();
> + struct bpf_security_struct *bpfsec;
> +
> + bpfsec = map->security;
> + return avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF_MAP,
> + bpf_map_fmode_to_av(fmode), NULL);
> +}
> +
> +static int selinux_bpf_prog(struct bpf_prog *prog)
> +{
> + u32 sid = current_sid();
> + struct bpf_security_struct *bpfsec;
> +
> + bpfsec = prog->aux->security;

I haven't looked closely at the bpf code, but is it guaranteed that
prog->aux cannot be NULL here?  What's the difference in lifecycle for
bpf_prog vs bpf_prog_aux?  Could the aux field be shared across progs
created by different processes?

> + return avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF_PROG,
> + BPF_PROG__USE, NULL);
> +}
> +
> +static int selinux_bpf_map_alloc(struct bpf_map *map)
> +{
> + struct bpf_security_struct *bpfsec;
> +
> + bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL);
> + if (!bpfsec)
> + return -ENOMEM;
> +
> + bpfsec->sid = current_sid();
> + map->security = bpfsec;
> +
> + return 0;
> +}
> +
> +static void selinux_bpf_map_free(struct bpf_map *map)
> +{
> + struct bpf_security_struct *bpfsec = map->security;
> +
> + map->security = NULL;
> + kfree(bpfsec);
> +}
> +
> +static int selinux_bpf_prog_alloc(struct bpf_prog_aux *aux)
> +{
> + struct bpf_security_struct *bpfsec;
> +
> + bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL);
> + if (!bpfsec)
> + return -ENOMEM;
> +
> + bpfsec->sid = current_sid();
> + aux->security = bpfsec;
> +
> + return 0;
> +}
> +
> +static void selinux_bpf_prog_free(struct bpf_prog_aux *aux)
> +{
> + struct bpf_security_struct *bpfsec = aux->security;
> +
> + aux->security = NULL;
> + kfree(bpfsec);
> +}
> +#endif
> +
>  static struct security_hook_list selinux_hooks[] __lsm_ro_after_init
> = {
>   LSM_HOOK_INIT(binder_set_context_mgr,
> selinux_binder_set_context_mgr),
>   LSM_HOOK_INIT(binder_transaction,
> selinux_binder_transaction),
> @@ -6471,6 +6572,16 @@ static struct security_hook_list
> selinux_hooks[] __lsm_ro_after_init = {
>   LSM_HOOK_INIT(audit_rule_match, selinux_audit_rule_match),
>   LSM_HOOK_INIT(audit_rule_free, selinux_audit_rule_free),
>  #endif
> +
> +#ifdef CONFIG_BPF_SYSCALL
> + LSM_HOOK_INIT(bpf, selinux_bpf),
> + LSM_HOOK_INIT(bpf_map, selinux_bpf_map),
> + LSM_HOOK_INIT(bpf_prog, selinux_bpf_prog),
> + LSM_HOOK_INIT(bpf_map_alloc_security,
> selinux_bpf_map_alloc),
> + LSM_HOOK_INIT(bpf_prog_alloc_security,
> selinux_bpf_prog_alloc),
> + LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free),
> + LSM_HOOK_INIT(bpf_prog_free_security,
> 

Re: [PATCH net-next 1/4] bpf: Add file mode configuration into bpf maps

2017-10-05 Thread Daniel Borkmann

On 10/05/2017 01:58 AM, Chenbo Feng wrote:

On Wed, Oct 4, 2017 at 4:29 PM, Daniel Borkmann  wrote:

On 10/04/2017 08:29 PM, Chenbo Feng wrote:

From: Chenbo Feng 

Introduce the map read/write flags to the eBPF syscalls that returns the
map fd. The flags is used to set up the file mode when construct a new
file descriptor for bpf maps. To not break the backward capability, the
f_flags is set to O_RDWR if the flag passed by syscall is 0. Otherwise
it should be O_RDONLY or O_WRONLY. When the userspace want to modify or
read the map content, it will check the file mode to see if it is
allowed to make the change.


[...]


+int bpf_get_file_flag(int flags)
+{
+   if ((flags & BPF_F_RDONLY) && (flags & BPF_F_WRONLY))
+   return -EINVAL;
+   if (flags & BPF_F_RDONLY)
+   return O_RDONLY;
+   if (flags & BPF_F_WRONLY)
+   return O_WRONLY;
+   return O_RDWR;
   }

   /* helper macro to check that unused fields 'union bpf_attr' are zero */
@@ -345,12 +376,17 @@ static int map_create(union bpf_attr *attr)
   {
 int numa_node = bpf_map_attr_numa_node(attr);
 struct bpf_map *map;
+   int f_flags;
 int err;

 err = CHECK_ATTR(BPF_MAP_CREATE);
 if (err)
 return -EINVAL;

+   f_flags = bpf_get_file_flag(attr->map_flags);
+   if (f_flags < 0)
+   return f_flags;


Wait, I just noticed, given you add BPF_F_RDONLY/BPF_F_WRONLY
to attr->map_flags, and later go into find_and_alloc_map(),
for map alloc, which is e.g. array_map_alloc(). There, we
bail out with EINVAL on attr->map_flags & ~BPF_F_NUMA_NODE,
which is the case for both BPF_F_RDONLY/BPF_F_WRONLY ... I
would have expected that the entire code was tested properly;
what was tested exactly in the set?


Thanks for pointing out this, my test for the patch create the map
with RD/WR flag which is 0 that's why I didn't catch this. And
bpf_obj_get do not have similar checks for map_flags.


Ok, please make sure to extend tools/testing/selftests/bpf/test_maps.c
regarding the two added flags, should be really straight forward to
integrate there and it would also help tracking potential regressions
in future as it's run by various ci bots (like 0day bot).

Thanks,
Daniel


Re: [PATCH net-next 1/4] bpf: Add file mode configuration into bpf maps

2017-10-05 Thread Chenbo Feng via Selinux
On Wed, Oct 4, 2017 at 4:29 PM, Daniel Borkmann  wrote:
> On 10/04/2017 08:29 PM, Chenbo Feng wrote:
>>
>> From: Chenbo Feng 
>>
>> Introduce the map read/write flags to the eBPF syscalls that returns the
>> map fd. The flags is used to set up the file mode when construct a new
>> file descriptor for bpf maps. To not break the backward capability, the
>> f_flags is set to O_RDWR if the flag passed by syscall is 0. Otherwise
>> it should be O_RDONLY or O_WRONLY. When the userspace want to modify or
>> read the map content, it will check the file mode to see if it is
>> allowed to make the change.
>
> [...]
>>
>> +int bpf_get_file_flag(int flags)
>> +{
>> +   if ((flags & BPF_F_RDONLY) && (flags & BPF_F_WRONLY))
>> +   return -EINVAL;
>> +   if (flags & BPF_F_RDONLY)
>> +   return O_RDONLY;
>> +   if (flags & BPF_F_WRONLY)
>> +   return O_WRONLY;
>> +   return O_RDWR;
>>   }
>>
>>   /* helper macro to check that unused fields 'union bpf_attr' are zero */
>> @@ -345,12 +376,17 @@ static int map_create(union bpf_attr *attr)
>>   {
>> int numa_node = bpf_map_attr_numa_node(attr);
>> struct bpf_map *map;
>> +   int f_flags;
>> int err;
>>
>> err = CHECK_ATTR(BPF_MAP_CREATE);
>> if (err)
>> return -EINVAL;
>>
>> +   f_flags = bpf_get_file_flag(attr->map_flags);
>> +   if (f_flags < 0)
>> +   return f_flags;
>
>
> Wait, I just noticed, given you add BPF_F_RDONLY/BPF_F_WRONLY
> to attr->map_flags, and later go into find_and_alloc_map(),
> for map alloc, which is e.g. array_map_alloc(). There, we
> bail out with EINVAL on attr->map_flags & ~BPF_F_NUMA_NODE,
> which is the case for both BPF_F_RDONLY/BPF_F_WRONLY ... I
> would have expected that the entire code was tested properly;
> what was tested exactly in the set?
>

Thanks for pointing out this, my test for the patch create the map
with RD/WR flag which is 0 that's why I didn't catch this. And
bpf_obj_get do not have similar checks for map_flags.
>> if (numa_node != NUMA_NO_NODE &&
>> ((unsigned int)numa_node >= nr_node_ids ||
>>  !node_online(numa_node)))
>> @@ -376,7 +412,7 @@ static int map_create(union bpf_attr *attr)
>> if (err)
>> goto free_map;
>>
>> -   err = bpf_map_new_fd(map);
>> +   err = bpf_map_new_fd(map, f_flags);
>> if (err < 0) {
>> /* failed to allocate fd.
>>  * bpf_map_put() is needed because the above
>> @@ -491,6 +527,11 @@ static int map_lookup_elem(union bpf_attr *attr)
>
> [...]


Re: [PATCH net-next 4/4] selinux: bpf: Add addtional check for bpf object file receive

2017-10-05 Thread Daniel Borkmann

On 10/04/2017 08:29 PM, Chenbo Feng wrote:

From: Chenbo Feng 

Introduce a bpf object related check when sending and receiving files
through unix domain socket as well as binder. It checks if the receiving
process have privilege to read/write the bpf map or use the bpf program.
This check is necessary because the bpf maps and programs are using a
anonymous inode as their shared inode so the normal way of checking the
files and sockets when passing between processes cannot work properly on
eBPF object. This check only works when the BPF_SYSCALL is configured.


[...]

+/* This function will check the file pass through unix socket or binder to see
+ * if it is a bpf related object. And apply correspinding checks on the bpf
+ * object based on the type. The bpf maps and programs, not like other files 
and
+ * socket, are using a shared anonymous inode inside the kernel as their inode.
+ * So checking that inode cannot identify if the process have privilege to
+ * access the bpf object and that's why we have to add this additional check in
+ * selinux_file_receive and selinux_binder_transfer_files.
+ */

[...]

If selinux/lsm folks have some input on this one in particular, would
be great. The issue is not really tied to bpf specifically, but to the
use of anon-inodes wrt fd passing. Maybe some generic resolution can
be found to tackle this ...


Re: [PATCH net-next 4/4] selinux: bpf: Add addtional check for bpf object file receive

2017-10-05 Thread Daniel Borkmann

On 10/05/2017 01:44 AM, Daniel Borkmann wrote:

On 10/04/2017 08:29 PM, Chenbo Feng wrote:

From: Chenbo Feng 

Introduce a bpf object related check when sending and receiving files
through unix domain socket as well as binder. It checks if the receiving
process have privilege to read/write the bpf map or use the bpf program.
This check is necessary because the bpf maps and programs are using a
anonymous inode as their shared inode so the normal way of checking the
files and sockets when passing between processes cannot work properly on
eBPF object. This check only works when the BPF_SYSCALL is configured.


[...]

+/* This function will check the file pass through unix socket or binder to see
+ * if it is a bpf related object. And apply correspinding checks on the bpf
+ * object based on the type. The bpf maps and programs, not like other files 
and
+ * socket, are using a shared anonymous inode inside the kernel as their inode.
+ * So checking that inode cannot identify if the process have privilege to
+ * access the bpf object and that's why we have to add this additional check in
+ * selinux_file_receive and selinux_binder_transfer_files.
+ */

[...]

If selinux/lsm folks have some input on this one in particular, would
be great. The issue is not really tied to bpf specifically, but to the
use of anon-inodes wrt fd passing. Maybe some generic resolution can
be found to tackle this ...


Perhaps even just a generic callback in struct file_operations might be
better in order to just retrieve the secctx from the underlying object
in case of anon-inodes and then have a generic check in selinux_file_receive()
for the case when such callback is set, such that we don't need to put
specific bpf logic there.


Re: [PATCH net-next 1/4] bpf: Add file mode configuration into bpf maps

2017-10-05 Thread Daniel Borkmann

On 10/04/2017 08:29 PM, Chenbo Feng wrote:

From: Chenbo Feng 

Introduce the map read/write flags to the eBPF syscalls that returns the
map fd. The flags is used to set up the file mode when construct a new
file descriptor for bpf maps. To not break the backward capability, the
f_flags is set to O_RDWR if the flag passed by syscall is 0. Otherwise
it should be O_RDONLY or O_WRONLY. When the userspace want to modify or
read the map content, it will check the file mode to see if it is
allowed to make the change.

[...]

+int bpf_get_file_flag(int flags)
+{
+   if ((flags & BPF_F_RDONLY) && (flags & BPF_F_WRONLY))
+   return -EINVAL;
+   if (flags & BPF_F_RDONLY)
+   return O_RDONLY;
+   if (flags & BPF_F_WRONLY)
+   return O_WRONLY;
+   return O_RDWR;
  }

  /* helper macro to check that unused fields 'union bpf_attr' are zero */
@@ -345,12 +376,17 @@ static int map_create(union bpf_attr *attr)
  {
int numa_node = bpf_map_attr_numa_node(attr);
struct bpf_map *map;
+   int f_flags;
int err;

err = CHECK_ATTR(BPF_MAP_CREATE);
if (err)
return -EINVAL;

+   f_flags = bpf_get_file_flag(attr->map_flags);
+   if (f_flags < 0)
+   return f_flags;


Wait, I just noticed, given you add BPF_F_RDONLY/BPF_F_WRONLY
to attr->map_flags, and later go into find_and_alloc_map(),
for map alloc, which is e.g. array_map_alloc(). There, we
bail out with EINVAL on attr->map_flags & ~BPF_F_NUMA_NODE,
which is the case for both BPF_F_RDONLY/BPF_F_WRONLY ... I
would have expected that the entire code was tested properly;
what was tested exactly in the set?


if (numa_node != NUMA_NO_NODE &&
((unsigned int)numa_node >= nr_node_ids ||
 !node_online(numa_node)))
@@ -376,7 +412,7 @@ static int map_create(union bpf_attr *attr)
if (err)
goto free_map;

-   err = bpf_map_new_fd(map);
+   err = bpf_map_new_fd(map, f_flags);
if (err < 0) {
/* failed to allocate fd.
 * bpf_map_put() is needed because the above
@@ -491,6 +527,11 @@ static int map_lookup_elem(union bpf_attr *attr)

[...]