[ 
https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16703023#comment-16703023
 ] 

Zhankun Tang commented on YARN-9060:
------------------------------------

Thanks for the review, [~leftnoteasy]!
{quote}1) Could u add some examples about parameters of --module-devices
{quote}
{color:#d04437}Zhankun{color}=> Sure. It's like this:
{code:java}
c-e --excluded_devices b-8:16-rwm,c-244:0-rwm,c-244:1-rwm --container_id 
container_x_y
{code}
The value format for excluded_devices is 
"[type]-[majorNumber]:[minorNumber]-rwm".

The type is "c" or "b" represents character device or block device. A device 
type can get from Java code given major and minor number through new 
File("/sys/dev/char/[major]:[minor]").exists(). The "-" will be replaced by a 
space character in c-e so that can be used to update cgroup value
{quote}2) Inside c-e.cfg, what field needs to be set, and please share some 
example of configs. And what if user doesn't set the configs, will c-e allow 
all devices controlled by c-e or none devices. (the previous one sounds super 
dangerous).
{quote}
{color:#d04437}Zhankukn{color}=> For this version of the code, it follows the 
GPU/FPGA behavior. If the list is set, the requested denied devices should be a 
subset of this configured allowed list. If the list is not set, it'll accept 
*any denied devices* requested by java layer.
{code:java}
[devices]
  module.enabled=true
  devices.allowed-numbers=243:0,243:1,243:2,8:16,8:32 #major:minor,..
{code}
The current GPU/FPGA behavior indeed has an issue when c-g.cfg doesn't align 
with yarn-site.xml (I can file a Jira later). Take GPU for instance:

One host has 1,2,3,4,5. And "GPU.allowed = 1,2,3" configured in c-e.cfg. But 
yarn-site.xml configured auto which means 1,2,3,4,5.

And one application request 4 GPU, the scheduler allocated 1,2,4,5. So 
--excluded-gpus is "3". And c-e will check that 3 is in allowed list(1,2,3) and 
then only deny 3 in cgroups.

In this case, c-e's allowed-list (1,2,3) doesn't work because the application 
can access 4 and 5.

Besides this issue, I haven't thought about other security issues. Could you 
explain a little bit more?
{quote}3) Could u add changes to c-e.cfg 
(hadoop-yarn-project/hadoop-yarn/conf/container-executor.cfg) for the new 
config and example options like other options.
{quote}
{color:#d04437}Zhankun{color}=> Sure. Will add it.

> [YARN-8851] Phase 1 - Support device isolation in native container-executor
> ---------------------------------------------------------------------------
>
>                 Key: YARN-9060
>                 URL: https://issues.apache.org/jira/browse/YARN-9060
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Zhankun Tang
>            Assignee: Zhankun Tang
>            Priority: Major
>         Attachments: YARN-9060-trunk.001.patch
>
>
> Due to the cgroups v1 implementation policy in linux kernel, we cannot update 
> the value of the device cgroups controller unless we have the root permission 
> ([here|https://github.com/torvalds/linux/blob/6f0d349d922ba44e4348a17a78ea51b7135965b1/security/device_cgroup.c#L604]).
>  So we need to support this in container-executor for Java layer to invoke.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to