[
https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16703023#comment-16703023
]
Zhankun Tang commented on YARN-9060:
------------------------------------
Thanks for the review, [~leftnoteasy]!
{quote}1) Could u add some examples about parameters of --module-devices
{quote}
{color:#d04437}Zhankun{color}=> Sure. It's like this:
{code:java}
c-e --excluded_devices b-8:16-rwm,c-244:0-rwm,c-244:1-rwm --container_id
container_x_y
{code}
The value format for excluded_devices is
"[type]-[majorNumber]:[minorNumber]-rwm".
The type is "c" or "b" represents character device or block device. A device
type can get from Java code given major and minor number through new
File("/sys/dev/char/[major]:[minor]").exists(). The "-" will be replaced by a
space character in c-e so that can be used to update cgroup value
{quote}2) Inside c-e.cfg, what field needs to be set, and please share some
example of configs. And what if user doesn't set the configs, will c-e allow
all devices controlled by c-e or none devices. (the previous one sounds super
dangerous).
{quote}
{color:#d04437}Zhankukn{color}=> For this version of the code, it follows the
GPU/FPGA behavior. If the list is set, the requested denied devices should be a
subset of this configured allowed list. If the list is not set, it'll accept
*any denied devices* requested by java layer.
{code:java}
[devices]
module.enabled=true
devices.allowed-numbers=243:0,243:1,243:2,8:16,8:32 #major:minor,..
{code}
The current GPU/FPGA behavior indeed has an issue when c-g.cfg doesn't align
with yarn-site.xml (I can file a Jira later). Take GPU for instance:
One host has 1,2,3,4,5. And "GPU.allowed = 1,2,3" configured in c-e.cfg. But
yarn-site.xml configured auto which means 1,2,3,4,5.
And one application request 4 GPU, the scheduler allocated 1,2,4,5. So
--excluded-gpus is "3". And c-e will check that 3 is in allowed list(1,2,3) and
then only deny 3 in cgroups.
In this case, c-e's allowed-list (1,2,3) doesn't work because the application
can access 4 and 5.
Besides this issue, I haven't thought about other security issues. Could you
explain a little bit more?
{quote}3) Could u add changes to c-e.cfg
(hadoop-yarn-project/hadoop-yarn/conf/container-executor.cfg) for the new
config and example options like other options.
{quote}
{color:#d04437}Zhankun{color}=> Sure. Will add it.
> [YARN-8851] Phase 1 - Support device isolation in native container-executor
> ---------------------------------------------------------------------------
>
> Key: YARN-9060
> URL: https://issues.apache.org/jira/browse/YARN-9060
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Zhankun Tang
> Assignee: Zhankun Tang
> Priority: Major
> Attachments: YARN-9060-trunk.001.patch
>
>
> Due to the cgroups v1 implementation policy in linux kernel, we cannot update
> the value of the device cgroups controller unless we have the root permission
> ([here|https://github.com/torvalds/linux/blob/6f0d349d922ba44e4348a17a78ea51b7135965b1/security/device_cgroup.c#L604]).
> So we need to support this in container-executor for Java layer to invoke.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]