Re: Understanding the YARN Linux container manager

Eric Badger Fri, 19 Oct 2018 08:57:02 -0700

Hi Daniel,

As far as I know, there is no namespace isolation for the default runtime
of LinuxContainerExecutor outside of cgroups. *Someone please correct me if
I am wrong. *

There has been a significant amount of work over the past few years related
to DockerLinuxContainerRuntime, which is a specific runtime that
LinuxContainerExecutor can use. DockerContainerExecutor has been deprecated
and was removed in 3.0, I believe. There are certainly security
implications with running the DockerLinuxContainerRuntime, but we have done
a ton of work to close these security holes and harden the infrastructure
around running Docker. I very strongly suggest that you don't run Docker on
Hadoop unless you are running at least 3.0, preferably 3.1. There is some
code in 2.9, but it has not been maintained closely and is very
experimental. Much of the code was rewritten in 3.0 to provide better
security. See YARN-3611 and YARN-8472 for a list of Docker improvements
made.

If you do choose to run Docker on Hadoop, you need to make sure to read up
on the implicit security ramifications of running Docker (NM talks to
dockerd, which is a root daemon, privileged containers are dangerous,
bind-mounts directly affect the host and can allow breaking out of the
container, etc.). Personally, I believe that Docker can be run on Hadoop
with adequate security that makes it an improvement over running YARN
containers bare-metal. I'd be happy to talk in more detail about the things
to consider and steps to take to harden your setup, if you'd like.

Eric

On Fri, Oct 19, 2018 at 10:40 AM Daniel Peebles <pumpkin...@gmail.com>
wrote:

> Hi all,
>
> Please tell me if this is the wrong place to ask!
>
> I'm trying to understand the isolation properties of
> LinuxContainerExecutor in YARN. I've looked through the documentation and
> traced through the code down to the C helper tool and as far as I've been
> able to determine, it's only apply cgroups to the subprocess. Is that
> right? I was trying to figure out if it's also unsharing any namespaces
> (filesystem, pid, network, etc.) from the parent process or otherwise
> isolating itself in other ways.
>
> If I'm correct and it doesn't do namespaces, does that mean I should use
> the DockerContainerExecutor instead to get namespace isolation? That one
> has a big scary security warning saying that using it might allow privilege
> escalation so I'm hesitant.
>
> I've also been trying to understand during a normal hadoop/YARN (or e.g.,
> Spark) execution, whether any parts of the application run outside of the
> container. Is there a good place to read up on the container architecture
> in general?
>
> Thanks,
> Dan Peebles
>

Re: Understanding the YARN Linux container manager

Reply via email to