[ 
https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707359#comment-16707359
 ] 

Zhankun Tang edited comment on YARN-9060 at 12/3/18 3:27 PM:
-------------------------------------------------------------

[~leftnoteasy] ,

Let's first see the bug(YARN-9073) we involve in current implementation like 
GPU/FPGA.
{code:java}
Scenario:
One host has 1,2,3,4,5,6. And "GPU.allowed = 1,2,3" configured in c-e.cfg. But 
yarn-site.xml configured "auto" which means allow 1,2,3,4,5,6.
And one application request 4 GPU, the scheduler allocated 1,2,4,5. So 
--excluded-gpus is "3". And c-e will check that 3 is in allowed list(1,2,3) and 
then only deny 3 in cgroups.
In this case, c-e's allowed-list (1,2,3) doesn't work because the application 
can access 4,5,6 now.
{code}
 

It seems that if we passed allowed devices from Java layer (1,2,4,5) and check 
it with "GPU.allowed"(1,2,3) should solve this issue. In this case, it does 
solve the bug. The 4 and 5 is not in (1,2,3) and will throw an error.

But another bug still exists. Still, use an example, assume one host has 
(1,2,3,4,5,6). And "GPU.allowed=1,2,3,4" configured in c-e.cfg. yarn-site.xml 
indicates devices(1,2,3) can be scheduled. An application request 2 devices, 
java layers allowed devices are (1). Denied devices will be (2,3). Both (1) and 
(2,3) are in configured allowed devices. But the application can actually 
consume (4,5,6).

*The root cause of these bugs* is that the c-e cannot know the exact devices to 
deny based on "GPU.allowed" and java layer excluded GPUs. To avoid the above 
bugs, we can use below solutions.

The configuration in c-e.cfg is as follows. We use "denied-numbers" to let the 
administrator define what is not permitted exactly. The original 
"devices.allowed-numbers" can exist but is unnecessary once we use 
denied-numbers. Better to remove it.
{code:java}
[devices]
 module.enabled=true
 device.allowed-numbers=8:32 # this will be unnecessary.
 devices.denied-numbers=8:48,8:16 #comma separated major:minor. Empty means 
allow default devices reported by device plugin.{code}
The CLI options are as below:
{code:java}
c-e --module-devices \
  --excluded_devices b-8:32-rwm \
  --allowed_devices 8:16,8:48 \
  --container_id container_x_y
{code}
The "devices.denied" in c-e.cfg is a blacklist that will be added(no duplicate 
update) to cgroup "devices.deny" just like the handling of "–excluded_devices" 
values.

In the above examples, the value of "–allowed_devices" passed from java layer 
is checked against "devices.denied-numbers" to see if any devices want by Java 
layer are invalid. Will report error if found. Without this "–allowed_devices" 
check and error threw, a bug will exist (all devices are (1,2,3). 
"devices.denied-numbers" is 3, an app request 2 devices, scheduler allocated 
(1,3). The value of "–excluded_devies" is 2, (2,3) are updated to cgroups. And 
the app can only use 1 device which is less than expected. When we have 
--allowed_devices, (1,3) contains denied value 3 configured in c-e.cfg and will 
report an error to avoid the bug).


was (Author: tangzhankun):
[~leftnoteasy] ,

Let's first see the bug(YARN-9073) we involve in current implementation as 
GPU/FPGA.
{code:java}
Scenario:
One host has 1,2,3,4,5,6. And "GPU.allowed = 1,2,3" configured in c-e.cfg. But 
yarn-site.xml configured "auto" which means allow 1,2,3,4,5,6.
And one application request 4 GPU, the scheduler allocated 1,2,4,5. So 
--excluded-gpus is "3". And c-e will check that 3 is in allowed list(1,2,3) and 
then only deny 3 in cgroups.
In this case, c-e's allowed-list (1,2,3) doesn't work because the application 
can access 4,5,6 now.
{code}
 

It seems that if we passed allowed devices from Java layer (1,2,4,5) and check 
it with "GPU.allowed"(1,2,3) should solve this issue. In this case, it does 
solve the bug. The 4 and 5 is not in (1,2,3) and will throw an error.

But another bug still exists. Still, use an example, assume one host has 
(1,2,3,4,5,6). And "GPU.allowed=1,2,3,4" configured in c-e.cfg. yarn-site.xml 
indicates devices(1,2,3) can be scheduled. An application request 2 devices, 
java layers allowed devices are (1). Denied devices will be (2,3). Both (1) and 
(2,3) are in configured allowed devices. But the application can actually 
consume (4,5,6).

*The root cause of these bugs* is that the c-e cannot know the exact devices to 
deny based on "GPU.allowed" and java layer excluded GPUs. To avoid the above 
bugs, we can use below solutions.

The configuration in c-e.cfg is as follows. We use "denied-numbers" to let the 
administrator define what is not permitted exactly. The original 
"devices.allowed-numbers" can exist but is unnecessary once we use 
denied-numbers. Better to remove it.
{code:java}
[devices]
 module.enabled=true
 device.allowed-numbers=8:32 # this will be unnecessary.
 devices.denied-numbers=8:48,8:16 #comma separated major:minor. Empty means 
allow default devices reported by device plugin.{code}
The CLI options are as below:
{code:java}
c-e --module-devices \
  --excluded_devices b-8:32-rwm \
  --allowed_devices 8:16,8:48 \
  --container_id container_x_y
{code}
The "devices.denied" in c-e.cfg is a blacklist that will be added(no duplicate 
update) to cgroup "devices.deny" just like the handling of "–excluded_devices" 
values.

In the above examples, the value of "–allowed_devices" passed from java layer 
is checked against "devices.denied-numbers" to see if any devices want by Java 
layer are invalid. Will report error if found. Without this "–allowed_devices" 
check and error threw, a bug will exist (all devices are (1,2,3). 
"devices.denied-numbers" is 3, an app request 2 devices, scheduler allocated 
(1,3). The value of "–excluded_devies" is 2, (2,3) are updated to cgroups. And 
the app can only use 1 device which is less than expected. When we have 
--allowed_devices, (1,3) contains denied value 3 configured in c-e.cfg and will 
report an error to avoid the bug).

> [YARN-8851] Phase 1 - Support device isolation in native container-executor
> ---------------------------------------------------------------------------
>
>                 Key: YARN-9060
>                 URL: https://issues.apache.org/jira/browse/YARN-9060
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Zhankun Tang
>            Assignee: Zhankun Tang
>            Priority: Major
>         Attachments: YARN-9060-trunk.001.patch, YARN-9060-trunk.002.patch
>
>
> Due to the cgroups v1 implementation policy in linux kernel, we cannot update 
> the value of the device cgroups controller unless we have the root permission 
> ([here|https://github.com/torvalds/linux/blob/6f0d349d922ba44e4348a17a78ea51b7135965b1/security/device_cgroup.c#L604]).
>  So we need to support this in container-executor for Java layer to invoke.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to