On Friday 11 November 2016 21:09:14 Lennart Poettering wrote:
> On Mon, 07.11.16 16:15, Markus Koeberl (markus.koeb...@tugraz.at) wrote:
> 
> > hi!
> > 
> > I am using slurm to manage GPU resources. On a host with several
> > GPUs installed a user gets only access to the GPUs he asks slurm
> > for. This is implemented by using the devices cgroup controller. For
> > each job slurm starts, all devices which are not allowed get denied
> > using cgroup devices.deny.  But by default users get access to all
> > GPUs at login. As my users have ssh access to the host they can
> > bypass slurm and access all GPUs directly. Therefore I would like to
> > deny access to GPU devices for all user logins.
> 
> I have no idea what "slurm" is, but do note that the "devices" cgroup
> controller has no future, it is unlikely to ever become available in
> cgroupsv2.

That are bad news. Is there a place where I can read about the future of 
cgroups?
Slurm is a workload manager running on about 60% of the TOP500 supercomputers. 

> Device access to local users is normally managed through ACLs on the
> device node, via udev/logind's "uaccess" logic. Using the "devices"
> cgroup controller for this appears pretty misguided...

Using devices cgroup there is the possibility to extend this:
It is possible to grand one process of a user access to a device 
and at the same time deny an other process of the same user access to the same 
device.

In case of a batch system which should manage all resources this is a very 
welcome feature:
For example I manage hosts with 6 Nvidia Tesla K40 GPUs, 2 Intel Xeon 14 core 
CPUs and 256GB RAM.
To ensure that all resources are best utilized over time several users are 
allowed to use the same host at the same time as it is unlikely that only one 
user will be able to utilized it alone.
Therefore all users get permissions for accessing the GPUs (or any resource) 
but a system to manage the resources will be used (for example slurm) which 
knows about all the resource requirements of each individual process. This 
system traditionally monitors the used resources of all processes and in case a 
process violates the resource limits it asked for it gets terminated to ensure 
a stable system.
Using cgroups makes the monitoring job much easier for the resource management 
system and at the same time makes it easier to use for users because much more 
of there mistakes can be handled in a save manner without interfering with 
other users.
For example:
cpuset cgroup controller: pin the process and all sub processes to the same 
CPUs.
memory cgroup controller: very tight and secure limitation compared to 
monitoring it every 30 sec. and terminate processes.
device cgroup controller: deny access to a device so that it cannot be used by 
accident.
It also provides a very easy way to get accounting information.

For me it sounded like a very promising and clean solution to use the devices 
cgroup controller.
I am only a system administrator which has no insight in the development 
process of the Linux kernel.
Considering the information you provided I will stop wasting time on that and 
send the information to the slurm mailing list in case the developers do not 
know about that.


> > Basically what I want is for all users logins: 
> > echo "c 195:* rwm" > /sys/fs/cgroup/devices/... /devices.deny
> > Which should deny access to all Nvidia GPUs (this is what slurm does
> > in his own hierarchy which looks like
> > /sys/fs/cgroup/devices/slurm/uid_1044/job_555359/step_0).
> 
> Well, this is just broken. If you use systemd, then the cgroup tree in
> the hierarchies it manages are property of systemd, and if you want to
> make cgroups, you can do so only in subhierarchies of systemd's own
> tree, by setting on the Delegate=yes setting. The path above however
> indicates that this is not done here. hence you are really on your
> own, sorry.

I saw a posting about this earlier on this mailing list therefore I hope the 
developer already know about that.
But there seams to be no bug report in the slurm bug reporting system. I will 
create one to be sure.


> Also, were does 195 come from? is that a hardcoded major of the
> closed-source nvidia driver? Yuck, code really shouldn't hardcode
> major/minor numbers these days... And sec

I do not know what is going on in the closed-source nvidia driver. In the slurm 
source code I checked it is not hardcode.

> > I did not find anything in the documentation how to implement
> > this. It seams to me that there is no way at the moment to configure
> > sytemd to alter the cgroup device config when creating the session
> > for the user.  It would be nice if somebody could give me some hints
> > how to implement this or a link to an implementation or the right
> > documentation.
> 
> You can alter the DevicesAllow= property of the "user-1000.slice"
> (where 1000 is the uid of your user) unit. But do note that the whole
> "devices" cgroup controller is going away (as mentioned above), so
> this is not future proof. And in general ACL-based device access
> management is usually the better idea.

I had the impression that I need the opposite because using this to deny access 
wont work but I have to admit I did not test it.
Would "DeviceAllow=/dev/nvidia? " (omit rwm) remove r, w and m attributes form 
/dev/nvidia[0-9]

I also did not see a way to specify this for all users therefore this would 
mean to maintain the configuration on all hosts for each individual user which 
I do not like. Although I have a small number of users and hosts this sounds 
complicated to maintain especially in my case the environment is highly 
inhomogeneous.


Thank you very much for all the information!


regards
Markus Köberl
-- 
Markus Koeberl
Graz University of Technology
Signal Processing and Speech Communication Laboratory
E-mail: markus.koeb...@tugraz.at
_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Reply via email to