[slurm-dev] Re: Slurm and docker/containers

Michael Jennings Wed, 20 May 2015 07:37:56 -0700

I'll lay a few more cards down on the table for the sake of clarity,
and we can see if this thread goes somewhere helpful for all of us.
:-)

On Tue, May 19, 2015 at 10:48 PM, Kilian Cavalotti
<[email protected]> wrote:
>
> On Tue, May 19, 2015 at 1:40 PM, David Bigagli <[email protected]> wrote:
>> You can create a user inside a docker machine just like any other and then
>> just ssh to it.

Yes, and that's definitely one of many options we're looking at.  Due
to current limitations in Docker and its security model, it may be
necessary to launch the container with an SSH daemon as its entrypoint
rather than the end-user application.  This would put additional
restrictions/burdens on scientists when building and supplying their
repositories to us, which ideally I'd like to avoid, but I admit it
may be necessary (at least short-term).

> You can, but nothing forces you to. :)
> I guess it's a matter or how much you trust your users, then.

Yes, there's a *lot* of truth to that statement.  The set of options
considered "viable" for any one site/system/use case depends entirely
on who the users are and the trust model involved between them and the
system operators.

> I'd looked into doing this some time ago- there's been some desire for this 
> expressed within our community and I can see some value in supporting docker 
> in our environment.
>
> Ultimately we had to say no because of the security issues you've indicated. 
> We're hoping to provide some resources for prototyping with docker (both 
> using 3rd party containers and developing our own), but ultimately we decided 
> that if it was worth running on our cluster, it was worth porting the docker 
> application into our environment (via modules or similar).
>
> That said, it might be possible to safely start a container for the end user 
> if there was some mechanism for starting a container in a sanitized mode: run 
> in the foreground, no privileges, no devices, no data volumes, a standard 
> networking configuration, etc. Since slurmd runs as root, it would have the 
> necessary privileges... whether slurmd would be able to manage the container 
> properly is another question.

Something similar to this is another option we're considering:
Creating a well-defined methodology of having SLURM launch containers
in such a way that user jobs could execute inside them without
creating opportunities for exploitation.  Moe pointed out the ChosLoc
parameter in slurm.conf of which I was previously unaware; this may
provide a mechanism to do exactly that at the system level.  Or it may
be necessary to have the users request it manually and write a wrapper
script to ensure that the Docker launch is done with a known
configuration and no user-supplied parameters.

> On Tue, May 19, 2015 at 10:14 AM, Kilian Cavalotti 
> <[email protected]> wrote:
>
> One major downside to running Docker containers in a shared HPC
> cluster (to me at least), is that the default user in a container is
> root. And that it can easily map and access the host filesystem from
> inside the container. Letting users run as root on a shared cluster is
> a major no-go from my perspective. So until Docker folks figure out a
> way to avoid this (and work on this seems to have just started very
> recently: https://github.com/docker/docker/issues/12949), I don't see
> much appeal from running Docker containers on a shared HPC cluster.
> There may be other use cases, of course.

Yep -- that's exactly the problem we're trying to solve.  We need to
be able to have normal users launch app containers as themselves in a
secure manner, or we need to launch the containers on their behalf and
provide a mechanism for them to launch their application inside the
container without the ability to run it directly.

If Docker had more fine-grained access control or some way to provide
sysadmins with ways to grant limited access to users to run containers
without giving them free reign, this whole thing would be easy!  :-)

Unfortunately the demand for Docker is growing rapidly, largely due to
papers such as this one:  http://arxiv.org/pdf/1410.0846.pdf which
tout Docker images as a prudent deliverable for research scientists
wanting to ensure reproducibility for their results...which means we
need to be able to facilitate their use on our systems or provide them
a better alternative to accomplish the same task.  Maybe Rocket and
the AppC spec are that better alternative, or will be someday, but for
now Docker seems to be the best choice.

> But if users running as root is not an issue, what more is needed from
> Slurm to launch containers? I may very well be missing something, but
> If you have a docker daemon running on all of your compute nodes, and
> provided users can access the docker socket/port, they can submit jobs
> that call "docker run", can't they?

Yes, they can, and in some cases that may be the right answer (small
sets of relatively trusted users).  In most cases, though, only root
should be allowed to execute "docker run" while fully controlling the
supplied parameters.

So, to summarize, there are several ways to do HPC with Docker (lax
security, or wrapper script, or root-launched contained sshd, or...),
and they all have different security and usability implications.
SLURM integration would certainly have the potential to make this
whole thing easier, but ultimately the right solution may be
orthogonal to SLURM since it's the same problem regardless of
scheduler.  I'm not sure yet.

At the risk of further putting words in Chris' mouth (which I risk
doing only because I know he'll forgive me if I get it wrong, and it
will help him out if I get it right), I'll say what the two of us are
asking for is if anyone has a working implementation of running jobs
under SLURM which execute inside a Docker container (or similar
container technology), and if so, how you wound up choosing to do it!
:-)

That being said, I know NERSC have done multiple of these things
(including CHOS, a custom controlled Docker wrapper MyDock, and a
Cray-based tool called Shifter) as specified in
https://www.nersc.gov/assets/Uploads/cug2015udi.pdf, and I'll be
speaking to them next week to discuss what they did and what they
learned.  But I'm/we're very curious to hear other ideas and
experiences!

Thanks!
Michael

-- 
Michael Jennings <[email protected]>
Senior HPC Systems Engineer
High-Performance Computing Services
Lawrence Berkeley National Laboratory
Bldg 50B-3209E        W: 510-495-2687
MS 050B-3209          F: 510-486-8615

[slurm-dev] Re: Slurm and docker/containers

Reply via email to