[jira] [Commented] (YARN-7506) Overhaul the design of the Linux container-executor regarding Docker and future runtimes

2017-11-20 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259740#comment-16259740
 ] 

Miklos Szegedi commented on YARN-7506:
--

The startup time might prevent us to use a root java process. The question is 
the CLI. What are the reasons it is better than a long running root Java 
process listening to a Unix socket accessible by yarn only? It does parameter 
checking, but does not the docker daemon do it anyway? CLI is slower to start 
up, it has all the risks with environment, shell, etc.

> Overhaul the design of the Linux container-executor regarding Docker and 
> future runtimes
> 
>
> Key: YARN-7506
> URL: https://issues.apache.org/jira/browse/YARN-7506
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Miklos Szegedi
>  Labels: Docker, container-executor
> Attachments: YARN-Docker control options.pdf
>
>
> I raise this topic to discuss a potential improvement of the container 
> executor tool in node manager.
> container-executor has two main purposes. It executes Linux *system calls not 
> available from Java*, and it executes tasks *available to root that are not 
> available to the yarn user*. Historically container-executor did both by 
> doing impersonation. The yarn user is separated from root because it runs 
> network services, so *the yarn user should be restricted* by design. Because 
> of this it has it's own config file container-executor.cfg writable by root 
> only that specifies what actions are allowed for the yarn user. However, the 
> requirements have changed with Docker and that raises the following questions:
> 1. The Docker feature of YARN requires root permissions to *access the Docker 
> socket* but it does not run any system calls, so could the Docker related 
> code in container-executor be *refactored into a separate Java process ran as 
> root*? Java would make the development much faster and more secure. 
> 2. The Docker feature only needs the Docker unix socket. It is not a good 
> idea to let the yarn user directly access the socket, since that would 
> elevate its privileges to root. However, the Java tool running as root 
> mentioned in the previous question could act as a *proxy on the Docker 
> socket* operating directly on the Docker REST API *eliminating the need to 
> use the Docker CLI*. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7506) Overhaul the design of the Linux container-executor regarding Docker and future runtimes

2017-11-17 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257446#comment-16257446
 ] 

Eric Yang commented on YARN-7506:
-

[~miklos.szeg...@cloudera.com] We agree on trust and verify is the way to go.  
YARN-6623 is a good lesson, and there is a structure payload defined for YARN 
Java to talk to container executor.  It would be best if the data models are 
passed using JSON or protobuf to improve reliability of serialization errors.  
If the workflow is identified correctly, YARN java process and root program 
have a protocol approach to compute required validations.  This minimize the 
interactions between root and YARN.  Hadoop community can control the protocol 
with fine grind control.
There might be no need for root java process.  Thoughts?


> Overhaul the design of the Linux container-executor regarding Docker and 
> future runtimes
> 
>
> Key: YARN-7506
> URL: https://issues.apache.org/jira/browse/YARN-7506
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Miklos Szegedi
>  Labels: Docker, container-executor
> Attachments: YARN-Docker control options.pdf
>
>
> I raise this topic to discuss a potential improvement of the container 
> executor tool in node manager.
> container-executor has two main purposes. It executes Linux *system calls not 
> available from Java*, and it executes tasks *available to root that are not 
> available to the yarn user*. Historically container-executor did both by 
> doing impersonation. The yarn user is separated from root because it runs 
> network services, so *the yarn user should be restricted* by design. Because 
> of this it has it's own config file container-executor.cfg writable by root 
> only that specifies what actions are allowed for the yarn user. However, the 
> requirements have changed with Docker and that raises the following questions:
> 1. The Docker feature of YARN requires root permissions to *access the Docker 
> socket* but it does not run any system calls, so could the Docker related 
> code in container-executor be *refactored into a separate Java process ran as 
> root*? Java would make the development much faster and more secure. 
> 2. The Docker feature only needs the Docker unix socket. It is not a good 
> idea to let the yarn user directly access the socket, since that would 
> elevate its privileges to root. However, the Java tool running as root 
> mentioned in the previous question could act as a *proxy on the Docker 
> socket* operating directly on the Docker REST API *eliminating the need to 
> use the Docker CLI*. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7506) Overhaul the design of the Linux container-executor regarding Docker and future runtimes

2017-11-17 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257375#comment-16257375
 ] 

Miklos Szegedi commented on YARN-7506:
--

Thank you for the comments.
[~ebadger], the main reason a Java root process is more secure than 
container-executor is that it protects against exploitable buffer overflows. 
This is why I raised the suggestion. I was not sure why this approach was not 
followed before, this is why I raised this jira. It is also easier to use for 
most Hadoop developers, as you mentioned.
[~vinodkv], this jira already builds on the experiences of YARN-6623, I would 
rather consider it as a subtask of YARN-5673, if even considered. Now that you 
mentioned, a possible solution for YARN-5673 considering this (YARN-7506) 
suggestion would be to have a root Java based container executor framework that 
loads Java or native C modules. However, Docker has its own unique design with 
the CLI and the socket and no native system call dependencies, that it could be 
handled separately.
bq. Side note: One more important consideration in the container-executor 
design was to not have long running root processes as it may increase the 
attack scope. Assuming that is still intact.
Suggestion 1. above does not require any long running root user process. 2. 
does, however the only surface would be the proxied docker socket and config 
file that is protected with file system permissions just like the 
container-executor executable.
[~eyang]
bq. Both docker and hadoop use "trusted" users...
I have to remind about the rule of defense in depth. In case of defense in 
depth, there is no trusted user. Every input is evil and each component 
(container-executor in this case) has to do its proper error checking.
bq. YARN user tap directly into docker.sock goes against our original 
philosophy of having both "trusted" user and root to perform validation.
Indeed. I agree.
bq. Root power may be used for validation logic when trusted user can not 
validate, such as symlink to local file system access that YARN-6623 solved.
Indeed, and I would mention volume white and blacklists, that the yarn user 
cannot validate because of the defense in depth rule.
bq. We can consider to keep most of logic in Java as long as root privileges is 
not required.
I disagree here. Most of the functionality that YARN-6623 implemented requires 
that root does the validation, so if done in Java, it should be in a Java root 
process.
bq. The performance gain from tapping into docker socket is saving the cost of 
one fork but we would lose a lot of validations done by docker CLI.
The validations are important indeed, but making validations is much more 
difficult on command line options than on easily parseable JSON as the recent 
issues showed.
bq. If it can be helped, calling root cli is preferred than calling root owned 
network socket.
There is a solution for that. We could still use the CLI from Java node manager 
running as yarn on a unix socket writable to yarn that is proxied and security 
filtered with a root java process running in the background and that works on 
the original socket. (See attached diagram)
bq. I don't fully agree with YARN-5673 modules API design. The description is 
another plug-in architecture to enable more functionality with root power. I 
think this is a slippy slope to enable more risks in container-executor.
I agree, I also raised my concerns there.
bq. It is best to avoid running java as root. Java runtime includes a lot of 
third party code, which can be unpredictable with root power.
That is a risk. I would minimize the number of non-JDK dependencies, if java 
root process is chosen. I still think it may be more favorable in this case.

I summarized the options in the attached diagram. That shows which one is the 
most simple.

> Overhaul the design of the Linux container-executor regarding Docker and 
> future runtimes
> 
>
> Key: YARN-7506
> URL: https://issues.apache.org/jira/browse/YARN-7506
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: nodemanager
>Reporter: Miklos Szegedi
>  Labels: Docker, container-executor
> Attachments: YARN-Docker control options.pdf
>
>
> I raise this topic to discuss a potential improvement of the container 
> executor tool in node manager.
> container-executor has two main purposes. It executes Linux *system calls not 
> available from Java*, and it executes tasks *available to root that are not 
> available to the yarn user*. Historically container-executor did both by 
> doing impersonation. The yarn user is separated from root because it runs 
> network services, so *the yarn user should be restricted* by design. Because 
> of this it has it's own config file containe

[jira] [Commented] (YARN-7506) Overhaul the design of the Linux container-executor regarding Docker and future runtimes

2017-11-16 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256115#comment-16256115
 ] 

Eric Yang commented on YARN-7506:
-

IMO, Container-executor is essentially a light weight sudo with extra 
validations in place to make sure that we protect root power from mistakes.  
Both docker and hadoop use "trusted" users to define security to ensure that 
trusted users have done their due diligence of validation before enabling root 
power.  YARN user tap directly into docker.sock goes against our original 
philosophy of having both "trusted" user and root to perform validation. Root 
power may be used for validation logic when trusted user can not validate, such 
as symlink to local file system access that YARN-6623 solved.

We can consider to keep most of logic in Java as long as root privileges is not 
required.  The performance gain from tapping into docker socket is saving the 
cost of one fork but we would lose a lot of validations done by docker CLI.  
Conversely, I am in favor of keeping the balance.  Not all code goes into 
container-executor, if they can be done in Java.  If it can be helped, calling 
root cli is preferred than calling root owned network socket.
I don't fully agree with YARN-5673 modules API design.  The description is 
another plug-in architecture to enable more functionality with root power.  I 
think this is a slippy slope to enable more risks in container-executor.  

[~ebadger] It is best to avoid running java as root.  Java runtime includes a 
lot of third party code, which can be unpredictable with root power.


> Overhaul the design of the Linux container-executor regarding Docker and 
> future runtimes
> 
>
> Key: YARN-7506
> URL: https://issues.apache.org/jira/browse/YARN-7506
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: nodemanager
>Reporter: Miklos Szegedi
>  Labels: Docker, container-executor
>
> I raise this topic to discuss a potential improvement of the container 
> executor tool in node manager.
> container-executor has two main purposes. It executes Linux *system calls not 
> available from Java*, and it executes tasks *available to root that are not 
> available to the yarn user*. Historically container-executor did both by 
> doing impersonation. The yarn user is separated from root because it runs 
> network services, so *the yarn user should be restricted* by design. Because 
> of this it has it's own config file container-executor.cfg writable by root 
> only that specifies what actions are allowed for the yarn user. However, the 
> requirements have changed with Docker and that raises the following questions:
> 1. The Docker feature of YARN requires root permissions to *access the Docker 
> socket* but it does not run any system calls, so could the Docker related 
> code in container-executor be *refactored into a separate Java process ran as 
> root*? Java would make the development much faster and more secure. 
> 2. The Docker feature only needs the Docker unix socket. It is not a good 
> idea to let the yarn user directly access the socket, since that would 
> elevate its privileges to root. However, the Java tool running as root 
> mentioned in the previous question could act as a *proxy on the Docker 
> socket* operating directly on the Docker REST API *eliminating the need to 
> use the Docker CLI*. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7506) Overhaul the design of the Linux container-executor regarding Docker and future runtimes

2017-11-16 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256051#comment-16256051
 ] 

Vinod Kumar Vavilapalli commented on YARN-7506:
---

This is either a dup or sub-task of YARN-5673 I think.

Originally, moving some / most of the container-executor code into dynamic 
modules and let developers code these modules potentially in different 
languages was a concrete goal of YARN-5673. We have realized a bulk of our 
goals of cleaning executor up via YARN-6623 (though the title there is 
non-descriptive).

I think it is still a goal. But for that, we need to modularize 
container-executor and let it load modules dynamically.

Side note: One more important consideration in the container-executor design 
was to not have long running root processes as it may increase the attack 
scope. Assuming that is still intact.

> Overhaul the design of the Linux container-executor regarding Docker and 
> future runtimes
> 
>
> Key: YARN-7506
> URL: https://issues.apache.org/jira/browse/YARN-7506
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: nodemanager
>Reporter: Miklos Szegedi
>  Labels: Docker, container-executor
>
> I raise this topic to discuss a potential improvement of the container 
> executor tool in node manager.
> container-executor has two main purposes. It executes Linux *system calls not 
> available from Java*, and it executes tasks *available to root that are not 
> available to the yarn user*. Historically container-executor did both by 
> doing impersonation. The yarn user is separated from root because it runs 
> network services, so *the yarn user should be restricted* by design. Because 
> of this it has it's own config file container-executor.cfg writable by root 
> only that specifies what actions are allowed for the yarn user. However, the 
> requirements have changed with Docker and that raises the following questions:
> 1. The Docker feature of YARN requires root permissions to *access the Docker 
> socket* but it does not run any system calls, so could the Docker related 
> code in container-executor be *refactored into a separate Java process ran as 
> root*? Java would make the development much faster and more secure. 
> 2. The Docker feature only needs the Docker unix socket. It is not a good 
> idea to let the yarn user directly access the socket, since that would 
> elevate its privileges to root. However, the Java tool running as root 
> mentioned in the previous question could act as a *proxy on the Docker 
> socket* operating directly on the Docker REST API *eliminating the need to 
> use the Docker CLI*. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7506) Overhaul the design of the Linux container-executor regarding Docker and future runtimes

2017-11-16 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16255814#comment-16255814
 ] 

Eric Badger commented on YARN-7506:
---

I suppose we could move the docker portion of container-executor into a 
standalone root java process, but I'm not convinced that gives us a whole lot 
in terms of security. Moving the docker portion out of container-executor 
doesn't get rid of the container-executor. So if someone compromises yarn, they 
will still be able to wield the container-executor. I guess what moving docker 
out of the container-executor would give us is that we would have a smaller 
surface area to introduce bugs that would misuse executing system calls. But we 
weren't making those system calls in the docker portion of the 
container-executor anyway. We would also have a smaller C code surface area, 
which may be more secure since most Hadoop programmers are more comfortable in 
Java as opposed to C. 

The idea of moving the docker portion of container-executor into a java process 
is attractive because of the development aspect. However, that would be a 
pretty giant effort and would require a complete rewrite of most of the docker 
implementation. So I'm not sure that the effort would be worth it. 

> Overhaul the design of the Linux container-executor regarding Docker and 
> future runtimes
> 
>
> Key: YARN-7506
> URL: https://issues.apache.org/jira/browse/YARN-7506
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: nodemanager
>Reporter: Miklos Szegedi
>  Labels: Docker, container-executor
>
> I raise this topic to discuss a potential improvement of the container 
> executor tool in node manager.
> container-executor has two main purposes. It executes Linux *system calls not 
> available from Java*, and it executes tasks *available to root that are not 
> available to the yarn user*. Historically container-executor did both by 
> doing impersonation. The yarn user is separated from root because it runs 
> network services, so *the yarn user should be restricted* by design. Because 
> of this it has it's own config file container-executor.cfg writable by root 
> only that specifies what actions are allowed for the yarn user. However, the 
> requirements have changed with Docker and that raises the following questions:
> 1. The Docker feature of YARN requires root permissions to *access the Docker 
> socket* but it does not run any system calls, so could the Docker related 
> code in container-executor be *refactored into a separate Java process ran as 
> root*? Java would make the development much faster and more secure. 
> 2. The Docker feature only needs the Docker unix socket. It is not a good 
> idea to let the yarn user directly access the socket, since that would 
> elevate its privileges to root. However, the Java tool running as root 
> mentioned in the previous question could act as a *proxy on the Docker 
> socket* operating directly on the Docker REST API *eliminating the need to 
> use the Docker CLI*. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org