Hi Daniel, As far as I know, there is no namespace isolation for the default runtime of LinuxContainerExecutor outside of cgroups. *Someone please correct me if I am wrong. *
There has been a significant amount of work over the past few years related to DockerLinuxContainerRuntime, which is a specific runtime that LinuxContainerExecutor can use. DockerContainerExecutor has been deprecated and was removed in 3.0, I believe. There are certainly security implications with running the DockerLinuxContainerRuntime, but we have done a ton of work to close these security holes and harden the infrastructure around running Docker. I very strongly suggest that you don't run Docker on Hadoop unless you are running at least 3.0, preferably 3.1. There is some code in 2.9, but it has not been maintained closely and is very experimental. Much of the code was rewritten in 3.0 to provide better security. See YARN-3611 and YARN-8472 for a list of Docker improvements made. If you do choose to run Docker on Hadoop, you need to make sure to read up on the implicit security ramifications of running Docker (NM talks to dockerd, which is a root daemon, privileged containers are dangerous, bind-mounts directly affect the host and can allow breaking out of the container, etc.). Personally, I believe that Docker can be run on Hadoop with adequate security that makes it an improvement over running YARN containers bare-metal. I'd be happy to talk in more detail about the things to consider and steps to take to harden your setup, if you'd like. Eric On Fri, Oct 19, 2018 at 10:40 AM Daniel Peebles <pumpkin...@gmail.com> wrote: > Hi all, > > Please tell me if this is the wrong place to ask! > > I'm trying to understand the isolation properties of > LinuxContainerExecutor in YARN. I've looked through the documentation and > traced through the code down to the C helper tool and as far as I've been > able to determine, it's only apply cgroups to the subprocess. Is that > right? I was trying to figure out if it's also unsharing any namespaces > (filesystem, pid, network, etc.) from the parent process or otherwise > isolating itself in other ways. > > If I'm correct and it doesn't do namespaces, does that mean I should use > the DockerContainerExecutor instead to get namespace isolation? That one > has a big scary security warning saying that using it might allow privilege > escalation so I'm hesitant. > > I've also been trying to understand during a normal hadoop/YARN (or e.g., > Spark) execution, whether any parts of the application run outside of the > container. Is there a good place to read up on the container architecture > in general? > > Thanks, > Dan Peebles >