Re: [systemd-devel] deny access to GPU devices

2016-11-14 Thread Lennart Poettering
On Mon, 14.11.16 13:13, Markus Koeberl (markus.koeb...@tugraz.at) wrote:

> > > I am using slurm to manage GPU resources. On a host with several
> > > GPUs installed a user gets only access to the GPUs he asks slurm
> > > for. This is implemented by using the devices cgroup controller. For
> > > each job slurm starts, all devices which are not allowed get denied
> > > using cgroup devices.deny.  But by default users get access to all
> > > GPUs at login. As my users have ssh access to the host they can
> > > bypass slurm and access all GPUs directly. Therefore I would like to
> > > deny access to GPU devices for all user logins.
> > 
> > I have no idea what "slurm" is, but do note that the "devices" cgroup
> > controller has no future, it is unlikely to ever become available in
> > cgroupsv2.
> 
> That are bad news. Is there a place where I can read about the
> future of cgroups?

There have been articles on LWN, and there were various discussions at the
Linux Plumber Conference. Also, ping Tejun, he's the cgroups guy to go to.

> > Device access to local users is normally managed through ACLs on the
> > device node, via udev/logind's "uaccess" logic. Using the "devices"
> > cgroup controller for this appears pretty misguided...
> 
> Using devices cgroup there is the possibility to extend this:
> It is possible to grand one process of a user access to a device 
> and at the same time deny an other process of the same user access
> to the same device.

Hmm, on UNIX the primary credentials used for access controls are
users and groups of course (with secondary concepts such as labels,
and caps and suchlike), but they usually are attached to the process,
instead of some external concept such as a cgroup. As such the devices
cgroup subsystem is kind of an outlier on this one already...

> > > I did not find anything in the documentation how to implement
> > > this. It seams to me that there is no way at the moment to configure
> > > sytemd to alter the cgroup device config when creating the session
> > > for the user.  It would be nice if somebody could give me some hints
> > > how to implement this or a link to an implementation or the right
> > > documentation.
> > 
> > You can alter the DevicesAllow= property of the "user-1000.slice"
> > (where 1000 is the uid of your user) unit. But do note that the whole
> > "devices" cgroup controller is going away (as mentioned above), so
> > this is not future proof. And in general ACL-based device access
> > management is usually the better idea.
> 
> I had the impression that I need the opposite because using this to deny 
> access wont work but I have to admit I did not test it.
> Would "DeviceAllow=/dev/nvidia? " (omit rwm) remove r, w and m
> attributes form /dev/nvidia[0-9]

Omiting the rwm thing does the right thing. But if you want to block
entire subsystems you need a syntax like "DeviceAllow=char-foobar",
where "foobar" is a subsystem as listed in /proc/devices.

> I also did not see a way to specify this for all users therefore
> this would mean to maintain the configuration on all hosts for each
> individual user which I do not like. Although I have a small number
> of users and hosts this sounds complicated to maintain especially in
> my case the environment is highly inhomogeneous.

Yes, this is kinda nasty I have to admit, the way it is right
now. Ideally we could stuff this kind of information into the user
database, but UNIX is pretty limited there right now I fear and
I don't see this changing anytime soon.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] deny access to GPU devices

2016-11-14 Thread Markus Koeberl
On Friday 11 November 2016 21:09:14 Lennart Poettering wrote:
> On Mon, 07.11.16 16:15, Markus Koeberl (markus.koeb...@tugraz.at) wrote:
> 
> > hi!
> > 
> > I am using slurm to manage GPU resources. On a host with several
> > GPUs installed a user gets only access to the GPUs he asks slurm
> > for. This is implemented by using the devices cgroup controller. For
> > each job slurm starts, all devices which are not allowed get denied
> > using cgroup devices.deny.  But by default users get access to all
> > GPUs at login. As my users have ssh access to the host they can
> > bypass slurm and access all GPUs directly. Therefore I would like to
> > deny access to GPU devices for all user logins.
> 
> I have no idea what "slurm" is, but do note that the "devices" cgroup
> controller has no future, it is unlikely to ever become available in
> cgroupsv2.

That are bad news. Is there a place where I can read about the future of 
cgroups?
Slurm is a workload manager running on about 60% of the TOP500 supercomputers. 

> Device access to local users is normally managed through ACLs on the
> device node, via udev/logind's "uaccess" logic. Using the "devices"
> cgroup controller for this appears pretty misguided...

Using devices cgroup there is the possibility to extend this:
It is possible to grand one process of a user access to a device 
and at the same time deny an other process of the same user access to the same 
device.

In case of a batch system which should manage all resources this is a very 
welcome feature:
For example I manage hosts with 6 Nvidia Tesla K40 GPUs, 2 Intel Xeon 14 core 
CPUs and 256GB RAM.
To ensure that all resources are best utilized over time several users are 
allowed to use the same host at the same time as it is unlikely that only one 
user will be able to utilized it alone.
Therefore all users get permissions for accessing the GPUs (or any resource) 
but a system to manage the resources will be used (for example slurm) which 
knows about all the resource requirements of each individual process. This 
system traditionally monitors the used resources of all processes and in case a 
process violates the resource limits it asked for it gets terminated to ensure 
a stable system.
Using cgroups makes the monitoring job much easier for the resource management 
system and at the same time makes it easier to use for users because much more 
of there mistakes can be handled in a save manner without interfering with 
other users.
For example:
cpuset cgroup controller: pin the process and all sub processes to the same 
CPUs.
memory cgroup controller: very tight and secure limitation compared to 
monitoring it every 30 sec. and terminate processes.
device cgroup controller: deny access to a device so that it cannot be used by 
accident.
It also provides a very easy way to get accounting information.

For me it sounded like a very promising and clean solution to use the devices 
cgroup controller.
I am only a system administrator which has no insight in the development 
process of the Linux kernel.
Considering the information you provided I will stop wasting time on that and 
send the information to the slurm mailing list in case the developers do not 
know about that.


> > Basically what I want is for all users logins: 
> > echo "c 195:* rwm" > /sys/fs/cgroup/devices/... /devices.deny
> > Which should deny access to all Nvidia GPUs (this is what slurm does
> > in his own hierarchy which looks like
> > /sys/fs/cgroup/devices/slurm/uid_1044/job_555359/step_0).
> 
> Well, this is just broken. If you use systemd, then the cgroup tree in
> the hierarchies it manages are property of systemd, and if you want to
> make cgroups, you can do so only in subhierarchies of systemd's own
> tree, by setting on the Delegate=yes setting. The path above however
> indicates that this is not done here. hence you are really on your
> own, sorry.

I saw a posting about this earlier on this mailing list therefore I hope the 
developer already know about that.
But there seams to be no bug report in the slurm bug reporting system. I will 
create one to be sure.


> Also, were does 195 come from? is that a hardcoded major of the
> closed-source nvidia driver? Yuck, code really shouldn't hardcode
> major/minor numbers these days... And sec

I do not know what is going on in the closed-source nvidia driver. In the slurm 
source code I checked it is not hardcode.

> > I did not find anything in the documentation how to implement
> > this. It seams to me that there is no way at the moment to configure
> > sytemd to alter the cgroup device config when creating the session
> > for the user.  It would be nice if somebody could give me some hints
> > how to implement this or a link to an implementation or the right
> > documentation.
> 
> You can alter the DevicesAllow= property of the "user-1000.slice"
> (where 1000 is the uid of your user) unit. But do note that the whole
> "devices" cgroup controller is going away (as mentioned

Re: [systemd-devel] deny access to GPU devices

2016-11-14 Thread Daniel P. Berrange
On Mon, Nov 14, 2016 at 12:35:17PM +0100, Lennart Poettering wrote:
> On Sat, 12.11.16 07:43, Topi Miettinen (toiwo...@gmail.com) wrote:
> 
> > On 11/11/16 20:09, Lennart Poettering wrote:
> > > I have no idea what "slurm" is, but do note that the "devices" cgroup
> > > controller has no future, it is unlikely to ever become available in
> > > cgroupsv2.
> > 
> > This is unwelcome news, I think it is a simple and well contained MAC
> > that has been available in systems without a full blown MAC like SELinux
> > and with systemd support it has been very easy to set up. What will
> > happen to DevicePolicy, DeviceAllow etc. directives? Or will systemd
> > stick to cgroupsv1 forever?
> 
> No, our plan is to switch to cgroupsv2 as default as quickly as we
> can. Where "quickly as we can" means mostly: the "cpu" controllers is
> ported to cgroupsv2 in vanilla kernels.
> 
> The thing with the "devices" cgroup controller is that it is not about
> resource control, but about access control, and hence should not live
> in "cgroups" at all, but in some other framework.  "cgroups" is all
> about dynamic resource control and accounting, but "devices" doesn't
> fit that at all, hence it should move elsewhere.
> 
> We'll keep DeviceAllow/DevicePolicy around for now, and there's a TODO
> list item to implement at least the "m" part of it via seccomp, as a
> second level of protection that will still work even if cgroupsv2 is
> used. I think in the long run it might make sense to also do the "rw"
> part of it somehow in the kernel, via some new kernel subsystem, but
> we'll have to see if and how this will be implemented.

Since there is support for stackable LSM's now, I could see the cgroup
devices ACL feature being replaced with a new LSM. I imagine if stackable
LSMs had been supported back in cgroup v1 days, it probably would have
been done that way in the first place instead of adding MAC to cgroups.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] deny access to GPU devices

2016-11-14 Thread Lennart Poettering
On Sat, 12.11.16 07:43, Topi Miettinen (toiwo...@gmail.com) wrote:

> On 11/11/16 20:09, Lennart Poettering wrote:
> > I have no idea what "slurm" is, but do note that the "devices" cgroup
> > controller has no future, it is unlikely to ever become available in
> > cgroupsv2.
> 
> This is unwelcome news, I think it is a simple and well contained MAC
> that has been available in systems without a full blown MAC like SELinux
> and with systemd support it has been very easy to set up. What will
> happen to DevicePolicy, DeviceAllow etc. directives? Or will systemd
> stick to cgroupsv1 forever?

No, our plan is to switch to cgroupsv2 as default as quickly as we
can. Where "quickly as we can" means mostly: the "cpu" controllers is
ported to cgroupsv2 in vanilla kernels.

The thing with the "devices" cgroup controller is that it is not about
resource control, but about access control, and hence should not live
in "cgroups" at all, but in some other framework.  "cgroups" is all
about dynamic resource control and accounting, but "devices" doesn't
fit that at all, hence it should move elsewhere.

We'll keep DeviceAllow/DevicePolicy around for now, and there's a TODO
list item to implement at least the "m" part of it via seccomp, as a
second level of protection that will still work even if cgroupsv2 is
used. I think in the long run it might make sense to also do the "rw"
part of it somehow in the kernel, via some new kernel subsystem, but
we'll have to see if and how this will be implemented.

I think primarily we are just spectators of all of this. The kernel
folks need to figure out how they want this to look like in the long
run. Consider inquiring Tejun about all of this. If they kernel folks
agree on something we can adopt it quickly in systemd.

> > Device access to local users is normally managed through ACLs on the
> > device node, via udev/logind's "uaccess" logic. Using the "devices"
> > cgroup controller for this appears pretty misguided...
> 
> ACLs only limit access via the path that they are controlling, device
> cgroup controlled the whole system. And if you have a MAC system that
> can do that, it could perform the same task as ACLs but in a much better
> way.
> 
> With cgroup you could also deny access to nodes that need to be
> available for interactive users (like TTYs, audio, input devices, GPUs,
> USB devices), but which are not useful for system services. Perhaps some
> sort of ACL could be constructed with the same effect.

Hmm? udev has been doing precisely this for ages now.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] deny access to GPU devices

2016-11-11 Thread Topi Miettinen
On 11/11/16 20:09, Lennart Poettering wrote:
> I have no idea what "slurm" is, but do note that the "devices" cgroup
> controller has no future, it is unlikely to ever become available in
> cgroupsv2.

This is unwelcome news, I think it is a simple and well contained MAC
that has been available in systems without a full blown MAC like SELinux
and with systemd support it has been very easy to set up. What will
happen to DevicePolicy, DeviceAllow etc. directives? Or will systemd
stick to cgroupsv1 forever?

> Device access to local users is normally managed through ACLs on the
> device node, via udev/logind's "uaccess" logic. Using the "devices"
> cgroup controller for this appears pretty misguided...

ACLs only limit access via the path that they are controlling, device
cgroup controlled the whole system. And if you have a MAC system that
can do that, it could perform the same task as ACLs but in a much better
way.

With cgroup you could also deny access to nodes that need to be
available for interactive users (like TTYs, audio, input devices, GPUs,
USB devices), but which are not useful for system services. Perhaps some
sort of ACL could be constructed with the same effect.

> Also, were does 195 come from? is that a hardcoded major of the
> closed-source nvidia driver? Yuck, code really shouldn't hardcode
> major/minor numbers these days... And sec

The reason seems to be that kernel devs chose not to expose the required
API to non-GPL modules, probably to give pressure to switch to GPL. As
that has not happened, the situation is not optimal for end user point
of view, but it's certainly within the devs' and module authors' rights
to continue using incompatible licences.

NVIDIA tackles this by shipping a SUID root helper nvidia-modprobe,
which is of course even worse from security point of view but it works.
This also highlights why having the device cgroup is a good idea, for
example the helper could be fooled to create new device nodes without
the ACLs. In my setup I have disabled the helper and the device nodes
are created with tmpfiles, which means I'm able to remove CAP_MKNOD
capability and any device cgroup 'm' rights from Xorg service running as
non-root user.

-Topi

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] deny access to GPU devices

2016-11-11 Thread Lennart Poettering
On Mon, 07.11.16 16:15, Markus Koeberl (markus.koeb...@tugraz.at) wrote:

> hi!
> 
> I am using slurm to manage GPU resources. On a host with several
> GPUs installed a user gets only access to the GPUs he asks slurm
> for. This is implemented by using the devices cgroup controller. For
> each job slurm starts, all devices which are not allowed get denied
> using cgroup devices.deny.  But by default users get access to all
> GPUs at login. As my users have ssh access to the host they can
> bypass slurm and access all GPUs directly. Therefore I would like to
> deny access to GPU devices for all user logins.

I have no idea what "slurm" is, but do note that the "devices" cgroup
controller has no future, it is unlikely to ever become available in
cgroupsv2.

Device access to local users is normally managed through ACLs on the
device node, via udev/logind's "uaccess" logic. Using the "devices"
cgroup controller for this appears pretty misguided...

> Basically what I want is for all users logins: 
> echo "c 195:* rwm" > /sys/fs/cgroup/devices/... /devices.deny
> Which should deny access to all Nvidia GPUs (this is what slurm does
> in his own hierarchy which looks like
> /sys/fs/cgroup/devices/slurm/uid_1044/job_555359/step_0).

Well, this is just broken. If you use systemd, then the cgroup tree in
the hierarchies it manages are property of systemd, and if you want to
make cgroups, you can do so only in subhierarchies of systemd's own
tree, by setting on the Delegate=yes setting. The path above however
indicates that this is not done here. hence you are really on your
own, sorry.

Also, were does 195 come from? is that a hardcoded major of the
closed-source nvidia driver? Yuck, code really shouldn't hardcode
major/minor numbers these days... And sec

> I did not find anything in the documentation how to implement
> this. It seams to me that there is no way at the moment to configure
> sytemd to alter the cgroup device config when creating the session
> for the user.  It would be nice if somebody could give me some hints
> how to implement this or a link to an implementation or the right
> documentation.

You can alter the DevicesAllow= property of the "user-1000.slice"
(where 1000 is the uid of your user) unit. But do note that the whole
"devices" cgroup controller is going away (as mentioned above), so
this is not future proof. And in general ACL-based device access
management is usually the better idea.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] deny access to GPU devices

2016-11-07 Thread Markus Koeberl
hi!

I am using slurm to manage GPU resources. On a host with several GPUs installed 
a user gets only access to the GPUs he asks slurm for. This is implemented by 
using the devices cgroup controller. For each job slurm starts, all devices 
which are not allowed get denied using cgroup devices.deny.
But by default users get access to all GPUs at login. As my users have ssh 
access to the host they can bypass slurm and access all GPUs directly. 
Therefore I would like to deny access to GPU devices for all user logins.
Basically what I want is for all users logins: 
echo "c 195:* rwm" > /sys/fs/cgroup/devices/... /devices.deny
Which should deny access to all Nvidia GPUs (this is what slurm does in his own 
hierarchy which looks like 
/sys/fs/cgroup/devices/slurm/uid_1044/job_555359/step_0).

On my system for my user with UID=1044 this would be:
echo "c 195:* rwm" > 
/sys/fs/cgroup/devices/user.slice/user-1044.slice/devices.deny
based on:
$ awk -F':' '$2 ~ /devices/ {print $3}' /proc/self/cgroup 
/user.slice/user-1044.slice

I did not find anything in the documentation how to implement this. It seams to 
me that there is no way at the moment to configure sytemd to alter the cgroup 
device config when creating the session for the user.
It would be nice if somebody could give me some hints how to implement this or 
a link to an implementation or the right documentation.

My idea how to implement this but I am not sure if it is the right way or if it 
will work:
write a PAM session module which runs the echo "c 195:* rwm" to the right 
cgroup devices.deny file based on the information from /proc/self/cgroup

I am using debian stable/unstable, at the moment I have installed systemd 230 
from jessie-backports. I saw systemd 232 in unstable which should be no problem 
to install.

Thanks for any help or advise!


regards
Markus Köberl
-- 
Markus Koeberl
Graz University of Technology
Signal Processing and Speech Communication Laboratory
E-mail: markus.koeb...@tugraz.at
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel