Re: [systemd-devel] deny access to GPU devices
On Mon, 14.11.16 13:13, Markus Koeberl (markus.koeb...@tugraz.at) wrote: > > > I am using slurm to manage GPU resources. On a host with several > > > GPUs installed a user gets only access to the GPUs he asks slurm > > > for. This is implemented by using the devices cgroup controller. For > > > each job slurm starts, all devices which are not allowed get denied > > > using cgroup devices.deny. But by default users get access to all > > > GPUs at login. As my users have ssh access to the host they can > > > bypass slurm and access all GPUs directly. Therefore I would like to > > > deny access to GPU devices for all user logins. > > > > I have no idea what "slurm" is, but do note that the "devices" cgroup > > controller has no future, it is unlikely to ever become available in > > cgroupsv2. > > That are bad news. Is there a place where I can read about the > future of cgroups? There have been articles on LWN, and there were various discussions at the Linux Plumber Conference. Also, ping Tejun, he's the cgroups guy to go to. > > Device access to local users is normally managed through ACLs on the > > device node, via udev/logind's "uaccess" logic. Using the "devices" > > cgroup controller for this appears pretty misguided... > > Using devices cgroup there is the possibility to extend this: > It is possible to grand one process of a user access to a device > and at the same time deny an other process of the same user access > to the same device. Hmm, on UNIX the primary credentials used for access controls are users and groups of course (with secondary concepts such as labels, and caps and suchlike), but they usually are attached to the process, instead of some external concept such as a cgroup. As such the devices cgroup subsystem is kind of an outlier on this one already... > > > I did not find anything in the documentation how to implement > > > this. It seams to me that there is no way at the moment to configure > > > sytemd to alter the cgroup device config when creating the session > > > for the user. It would be nice if somebody could give me some hints > > > how to implement this or a link to an implementation or the right > > > documentation. > > > > You can alter the DevicesAllow= property of the "user-1000.slice" > > (where 1000 is the uid of your user) unit. But do note that the whole > > "devices" cgroup controller is going away (as mentioned above), so > > this is not future proof. And in general ACL-based device access > > management is usually the better idea. > > I had the impression that I need the opposite because using this to deny > access wont work but I have to admit I did not test it. > Would "DeviceAllow=/dev/nvidia? " (omit rwm) remove r, w and m > attributes form /dev/nvidia[0-9] Omiting the rwm thing does the right thing. But if you want to block entire subsystems you need a syntax like "DeviceAllow=char-foobar", where "foobar" is a subsystem as listed in /proc/devices. > I also did not see a way to specify this for all users therefore > this would mean to maintain the configuration on all hosts for each > individual user which I do not like. Although I have a small number > of users and hosts this sounds complicated to maintain especially in > my case the environment is highly inhomogeneous. Yes, this is kinda nasty I have to admit, the way it is right now. Ideally we could stuff this kind of information into the user database, but UNIX is pretty limited there right now I fear and I don't see this changing anytime soon. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] deny access to GPU devices
On Friday 11 November 2016 21:09:14 Lennart Poettering wrote: > On Mon, 07.11.16 16:15, Markus Koeberl (markus.koeb...@tugraz.at) wrote: > > > hi! > > > > I am using slurm to manage GPU resources. On a host with several > > GPUs installed a user gets only access to the GPUs he asks slurm > > for. This is implemented by using the devices cgroup controller. For > > each job slurm starts, all devices which are not allowed get denied > > using cgroup devices.deny. But by default users get access to all > > GPUs at login. As my users have ssh access to the host they can > > bypass slurm and access all GPUs directly. Therefore I would like to > > deny access to GPU devices for all user logins. > > I have no idea what "slurm" is, but do note that the "devices" cgroup > controller has no future, it is unlikely to ever become available in > cgroupsv2. That are bad news. Is there a place where I can read about the future of cgroups? Slurm is a workload manager running on about 60% of the TOP500 supercomputers. > Device access to local users is normally managed through ACLs on the > device node, via udev/logind's "uaccess" logic. Using the "devices" > cgroup controller for this appears pretty misguided... Using devices cgroup there is the possibility to extend this: It is possible to grand one process of a user access to a device and at the same time deny an other process of the same user access to the same device. In case of a batch system which should manage all resources this is a very welcome feature: For example I manage hosts with 6 Nvidia Tesla K40 GPUs, 2 Intel Xeon 14 core CPUs and 256GB RAM. To ensure that all resources are best utilized over time several users are allowed to use the same host at the same time as it is unlikely that only one user will be able to utilized it alone. Therefore all users get permissions for accessing the GPUs (or any resource) but a system to manage the resources will be used (for example slurm) which knows about all the resource requirements of each individual process. This system traditionally monitors the used resources of all processes and in case a process violates the resource limits it asked for it gets terminated to ensure a stable system. Using cgroups makes the monitoring job much easier for the resource management system and at the same time makes it easier to use for users because much more of there mistakes can be handled in a save manner without interfering with other users. For example: cpuset cgroup controller: pin the process and all sub processes to the same CPUs. memory cgroup controller: very tight and secure limitation compared to monitoring it every 30 sec. and terminate processes. device cgroup controller: deny access to a device so that it cannot be used by accident. It also provides a very easy way to get accounting information. For me it sounded like a very promising and clean solution to use the devices cgroup controller. I am only a system administrator which has no insight in the development process of the Linux kernel. Considering the information you provided I will stop wasting time on that and send the information to the slurm mailing list in case the developers do not know about that. > > Basically what I want is for all users logins: > > echo "c 195:* rwm" > /sys/fs/cgroup/devices/... /devices.deny > > Which should deny access to all Nvidia GPUs (this is what slurm does > > in his own hierarchy which looks like > > /sys/fs/cgroup/devices/slurm/uid_1044/job_555359/step_0). > > Well, this is just broken. If you use systemd, then the cgroup tree in > the hierarchies it manages are property of systemd, and if you want to > make cgroups, you can do so only in subhierarchies of systemd's own > tree, by setting on the Delegate=yes setting. The path above however > indicates that this is not done here. hence you are really on your > own, sorry. I saw a posting about this earlier on this mailing list therefore I hope the developer already know about that. But there seams to be no bug report in the slurm bug reporting system. I will create one to be sure. > Also, were does 195 come from? is that a hardcoded major of the > closed-source nvidia driver? Yuck, code really shouldn't hardcode > major/minor numbers these days... And sec I do not know what is going on in the closed-source nvidia driver. In the slurm source code I checked it is not hardcode. > > I did not find anything in the documentation how to implement > > this. It seams to me that there is no way at the moment to configure > > sytemd to alter the cgroup device config when creating the session > > for the user. It would be nice if somebody could give me some hints > > how to implement this or a link to an implementation or the right > > documentation. > > You can alter the DevicesAllow= property of the "user-1000.slice" > (where 1000 is the uid of your user) unit. But do note that the whole > "devices" cgroup controller is going away (as mentioned
Re: [systemd-devel] deny access to GPU devices
On Mon, Nov 14, 2016 at 12:35:17PM +0100, Lennart Poettering wrote: > On Sat, 12.11.16 07:43, Topi Miettinen (toiwo...@gmail.com) wrote: > > > On 11/11/16 20:09, Lennart Poettering wrote: > > > I have no idea what "slurm" is, but do note that the "devices" cgroup > > > controller has no future, it is unlikely to ever become available in > > > cgroupsv2. > > > > This is unwelcome news, I think it is a simple and well contained MAC > > that has been available in systems without a full blown MAC like SELinux > > and with systemd support it has been very easy to set up. What will > > happen to DevicePolicy, DeviceAllow etc. directives? Or will systemd > > stick to cgroupsv1 forever? > > No, our plan is to switch to cgroupsv2 as default as quickly as we > can. Where "quickly as we can" means mostly: the "cpu" controllers is > ported to cgroupsv2 in vanilla kernels. > > The thing with the "devices" cgroup controller is that it is not about > resource control, but about access control, and hence should not live > in "cgroups" at all, but in some other framework. "cgroups" is all > about dynamic resource control and accounting, but "devices" doesn't > fit that at all, hence it should move elsewhere. > > We'll keep DeviceAllow/DevicePolicy around for now, and there's a TODO > list item to implement at least the "m" part of it via seccomp, as a > second level of protection that will still work even if cgroupsv2 is > used. I think in the long run it might make sense to also do the "rw" > part of it somehow in the kernel, via some new kernel subsystem, but > we'll have to see if and how this will be implemented. Since there is support for stackable LSM's now, I could see the cgroup devices ACL feature being replaced with a new LSM. I imagine if stackable LSMs had been supported back in cgroup v1 days, it probably would have been done that way in the first place instead of adding MAC to cgroups. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o-http://search.cpan.org/~danberr/ :| ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] deny access to GPU devices
On Sat, 12.11.16 07:43, Topi Miettinen (toiwo...@gmail.com) wrote: > On 11/11/16 20:09, Lennart Poettering wrote: > > I have no idea what "slurm" is, but do note that the "devices" cgroup > > controller has no future, it is unlikely to ever become available in > > cgroupsv2. > > This is unwelcome news, I think it is a simple and well contained MAC > that has been available in systems without a full blown MAC like SELinux > and with systemd support it has been very easy to set up. What will > happen to DevicePolicy, DeviceAllow etc. directives? Or will systemd > stick to cgroupsv1 forever? No, our plan is to switch to cgroupsv2 as default as quickly as we can. Where "quickly as we can" means mostly: the "cpu" controllers is ported to cgroupsv2 in vanilla kernels. The thing with the "devices" cgroup controller is that it is not about resource control, but about access control, and hence should not live in "cgroups" at all, but in some other framework. "cgroups" is all about dynamic resource control and accounting, but "devices" doesn't fit that at all, hence it should move elsewhere. We'll keep DeviceAllow/DevicePolicy around for now, and there's a TODO list item to implement at least the "m" part of it via seccomp, as a second level of protection that will still work even if cgroupsv2 is used. I think in the long run it might make sense to also do the "rw" part of it somehow in the kernel, via some new kernel subsystem, but we'll have to see if and how this will be implemented. I think primarily we are just spectators of all of this. The kernel folks need to figure out how they want this to look like in the long run. Consider inquiring Tejun about all of this. If they kernel folks agree on something we can adopt it quickly in systemd. > > Device access to local users is normally managed through ACLs on the > > device node, via udev/logind's "uaccess" logic. Using the "devices" > > cgroup controller for this appears pretty misguided... > > ACLs only limit access via the path that they are controlling, device > cgroup controlled the whole system. And if you have a MAC system that > can do that, it could perform the same task as ACLs but in a much better > way. > > With cgroup you could also deny access to nodes that need to be > available for interactive users (like TTYs, audio, input devices, GPUs, > USB devices), but which are not useful for system services. Perhaps some > sort of ACL could be constructed with the same effect. Hmm? udev has been doing precisely this for ages now. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] deny access to GPU devices
On 11/11/16 20:09, Lennart Poettering wrote: > I have no idea what "slurm" is, but do note that the "devices" cgroup > controller has no future, it is unlikely to ever become available in > cgroupsv2. This is unwelcome news, I think it is a simple and well contained MAC that has been available in systems without a full blown MAC like SELinux and with systemd support it has been very easy to set up. What will happen to DevicePolicy, DeviceAllow etc. directives? Or will systemd stick to cgroupsv1 forever? > Device access to local users is normally managed through ACLs on the > device node, via udev/logind's "uaccess" logic. Using the "devices" > cgroup controller for this appears pretty misguided... ACLs only limit access via the path that they are controlling, device cgroup controlled the whole system. And if you have a MAC system that can do that, it could perform the same task as ACLs but in a much better way. With cgroup you could also deny access to nodes that need to be available for interactive users (like TTYs, audio, input devices, GPUs, USB devices), but which are not useful for system services. Perhaps some sort of ACL could be constructed with the same effect. > Also, were does 195 come from? is that a hardcoded major of the > closed-source nvidia driver? Yuck, code really shouldn't hardcode > major/minor numbers these days... And sec The reason seems to be that kernel devs chose not to expose the required API to non-GPL modules, probably to give pressure to switch to GPL. As that has not happened, the situation is not optimal for end user point of view, but it's certainly within the devs' and module authors' rights to continue using incompatible licences. NVIDIA tackles this by shipping a SUID root helper nvidia-modprobe, which is of course even worse from security point of view but it works. This also highlights why having the device cgroup is a good idea, for example the helper could be fooled to create new device nodes without the ACLs. In my setup I have disabled the helper and the device nodes are created with tmpfiles, which means I'm able to remove CAP_MKNOD capability and any device cgroup 'm' rights from Xorg service running as non-root user. -Topi ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] deny access to GPU devices
On Mon, 07.11.16 16:15, Markus Koeberl (markus.koeb...@tugraz.at) wrote: > hi! > > I am using slurm to manage GPU resources. On a host with several > GPUs installed a user gets only access to the GPUs he asks slurm > for. This is implemented by using the devices cgroup controller. For > each job slurm starts, all devices which are not allowed get denied > using cgroup devices.deny. But by default users get access to all > GPUs at login. As my users have ssh access to the host they can > bypass slurm and access all GPUs directly. Therefore I would like to > deny access to GPU devices for all user logins. I have no idea what "slurm" is, but do note that the "devices" cgroup controller has no future, it is unlikely to ever become available in cgroupsv2. Device access to local users is normally managed through ACLs on the device node, via udev/logind's "uaccess" logic. Using the "devices" cgroup controller for this appears pretty misguided... > Basically what I want is for all users logins: > echo "c 195:* rwm" > /sys/fs/cgroup/devices/... /devices.deny > Which should deny access to all Nvidia GPUs (this is what slurm does > in his own hierarchy which looks like > /sys/fs/cgroup/devices/slurm/uid_1044/job_555359/step_0). Well, this is just broken. If you use systemd, then the cgroup tree in the hierarchies it manages are property of systemd, and if you want to make cgroups, you can do so only in subhierarchies of systemd's own tree, by setting on the Delegate=yes setting. The path above however indicates that this is not done here. hence you are really on your own, sorry. Also, were does 195 come from? is that a hardcoded major of the closed-source nvidia driver? Yuck, code really shouldn't hardcode major/minor numbers these days... And sec > I did not find anything in the documentation how to implement > this. It seams to me that there is no way at the moment to configure > sytemd to alter the cgroup device config when creating the session > for the user. It would be nice if somebody could give me some hints > how to implement this or a link to an implementation or the right > documentation. You can alter the DevicesAllow= property of the "user-1000.slice" (where 1000 is the uid of your user) unit. But do note that the whole "devices" cgroup controller is going away (as mentioned above), so this is not future proof. And in general ACL-based device access management is usually the better idea. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] deny access to GPU devices
hi! I am using slurm to manage GPU resources. On a host with several GPUs installed a user gets only access to the GPUs he asks slurm for. This is implemented by using the devices cgroup controller. For each job slurm starts, all devices which are not allowed get denied using cgroup devices.deny. But by default users get access to all GPUs at login. As my users have ssh access to the host they can bypass slurm and access all GPUs directly. Therefore I would like to deny access to GPU devices for all user logins. Basically what I want is for all users logins: echo "c 195:* rwm" > /sys/fs/cgroup/devices/... /devices.deny Which should deny access to all Nvidia GPUs (this is what slurm does in his own hierarchy which looks like /sys/fs/cgroup/devices/slurm/uid_1044/job_555359/step_0). On my system for my user with UID=1044 this would be: echo "c 195:* rwm" > /sys/fs/cgroup/devices/user.slice/user-1044.slice/devices.deny based on: $ awk -F':' '$2 ~ /devices/ {print $3}' /proc/self/cgroup /user.slice/user-1044.slice I did not find anything in the documentation how to implement this. It seams to me that there is no way at the moment to configure sytemd to alter the cgroup device config when creating the session for the user. It would be nice if somebody could give me some hints how to implement this or a link to an implementation or the right documentation. My idea how to implement this but I am not sure if it is the right way or if it will work: write a PAM session module which runs the echo "c 195:* rwm" to the right cgroup devices.deny file based on the information from /proc/self/cgroup I am using debian stable/unstable, at the moment I have installed systemd 230 from jessie-backports. I saw systemd 232 in unstable which should be no problem to install. Thanks for any help or advise! regards Markus Köberl -- Markus Koeberl Graz University of Technology Signal Processing and Speech Communication Laboratory E-mail: markus.koeb...@tugraz.at ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel