I'll lay a few more cards down on the table for the sake of clarity, and we can see if this thread goes somewhere helpful for all of us. :-)
On Tue, May 19, 2015 at 10:48 PM, Kilian Cavalotti <[email protected]> wrote: > > On Tue, May 19, 2015 at 1:40 PM, David Bigagli <[email protected]> wrote: >> You can create a user inside a docker machine just like any other and then >> just ssh to it. Yes, and that's definitely one of many options we're looking at. Due to current limitations in Docker and its security model, it may be necessary to launch the container with an SSH daemon as its entrypoint rather than the end-user application. This would put additional restrictions/burdens on scientists when building and supplying their repositories to us, which ideally I'd like to avoid, but I admit it may be necessary (at least short-term). > You can, but nothing forces you to. :) > I guess it's a matter or how much you trust your users, then. Yes, there's a *lot* of truth to that statement. The set of options considered "viable" for any one site/system/use case depends entirely on who the users are and the trust model involved between them and the system operators. > I'd looked into doing this some time ago- there's been some desire for this > expressed within our community and I can see some value in supporting docker > in our environment. > > Ultimately we had to say no because of the security issues you've indicated. > We're hoping to provide some resources for prototyping with docker (both > using 3rd party containers and developing our own), but ultimately we decided > that if it was worth running on our cluster, it was worth porting the docker > application into our environment (via modules or similar). > > That said, it might be possible to safely start a container for the end user > if there was some mechanism for starting a container in a sanitized mode: run > in the foreground, no privileges, no devices, no data volumes, a standard > networking configuration, etc. Since slurmd runs as root, it would have the > necessary privileges... whether slurmd would be able to manage the container > properly is another question. Something similar to this is another option we're considering: Creating a well-defined methodology of having SLURM launch containers in such a way that user jobs could execute inside them without creating opportunities for exploitation. Moe pointed out the ChosLoc parameter in slurm.conf of which I was previously unaware; this may provide a mechanism to do exactly that at the system level. Or it may be necessary to have the users request it manually and write a wrapper script to ensure that the Docker launch is done with a known configuration and no user-supplied parameters. > On Tue, May 19, 2015 at 10:14 AM, Kilian Cavalotti > <[email protected]> wrote: > > One major downside to running Docker containers in a shared HPC > cluster (to me at least), is that the default user in a container is > root. And that it can easily map and access the host filesystem from > inside the container. Letting users run as root on a shared cluster is > a major no-go from my perspective. So until Docker folks figure out a > way to avoid this (and work on this seems to have just started very > recently: https://github.com/docker/docker/issues/12949), I don't see > much appeal from running Docker containers on a shared HPC cluster. > There may be other use cases, of course. Yep -- that's exactly the problem we're trying to solve. We need to be able to have normal users launch app containers as themselves in a secure manner, or we need to launch the containers on their behalf and provide a mechanism for them to launch their application inside the container without the ability to run it directly. If Docker had more fine-grained access control or some way to provide sysadmins with ways to grant limited access to users to run containers without giving them free reign, this whole thing would be easy! :-) Unfortunately the demand for Docker is growing rapidly, largely due to papers such as this one: http://arxiv.org/pdf/1410.0846.pdf which tout Docker images as a prudent deliverable for research scientists wanting to ensure reproducibility for their results...which means we need to be able to facilitate their use on our systems or provide them a better alternative to accomplish the same task. Maybe Rocket and the AppC spec are that better alternative, or will be someday, but for now Docker seems to be the best choice. > But if users running as root is not an issue, what more is needed from > Slurm to launch containers? I may very well be missing something, but > If you have a docker daemon running on all of your compute nodes, and > provided users can access the docker socket/port, they can submit jobs > that call "docker run", can't they? Yes, they can, and in some cases that may be the right answer (small sets of relatively trusted users). In most cases, though, only root should be allowed to execute "docker run" while fully controlling the supplied parameters. So, to summarize, there are several ways to do HPC with Docker (lax security, or wrapper script, or root-launched contained sshd, or...), and they all have different security and usability implications. SLURM integration would certainly have the potential to make this whole thing easier, but ultimately the right solution may be orthogonal to SLURM since it's the same problem regardless of scheduler. I'm not sure yet. At the risk of further putting words in Chris' mouth (which I risk doing only because I know he'll forgive me if I get it wrong, and it will help him out if I get it right), I'll say what the two of us are asking for is if anyone has a working implementation of running jobs under SLURM which execute inside a Docker container (or similar container technology), and if so, how you wound up choosing to do it! :-) That being said, I know NERSC have done multiple of these things (including CHOS, a custom controlled Docker wrapper MyDock, and a Cray-based tool called Shifter) as specified in https://www.nersc.gov/assets/Uploads/cug2015udi.pdf, and I'll be speaking to them next week to discuss what they did and what they learned. But I'm/we're very curious to hear other ideas and experiences! Thanks! Michael -- Michael Jennings <[email protected]> Senior HPC Systems Engineer High-Performance Computing Services Lawrence Berkeley National Laboratory Bldg 50B-3209E W: 510-495-2687 MS 050B-3209 F: 510-486-8615
