[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-16 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-6852:
-
Attachment: YARN-6852.009.patch

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, 
> YARN-6852.006.patch, YARN-6852.007.patch, YARN-6852.008.patch, 
> YARN-6852.009.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-16 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-6852:
-
Attachment: YARN-6852.008.patch

Attached ver.008 patch, fixed rest cc warnings.

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, 
> YARN-6852.006.patch, YARN-6852.007.patch, YARN-6852.008.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-10 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-6852:
-
Attachment: YARN-6852.007.patch

Attached 007 patch, fixed more cc warnings, will try to fix all cc warnings in 
the next patch.

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, 
> YARN-6852.006.patch, YARN-6852.007.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-10 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-6852:
-
Attachment: YARN-6852.006.patch

Fixed compile issues and added additional tests (006)

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, 
> YARN-6852.006.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-09 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-6852:
-
Attachment: YARN-6852.005.patch

Thanks for comments from [~sunil.gov...@gmail.com], attached 005 patch, 
addressed all comments except #4, since it is a little bit out of the scope. I 
prefer to do it once we have more requirements of cgroup.

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-07 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-6852:
-
Attachment: YARN-6852.004.patch

Again, thanks [~miklos.szeg...@cloudera.com] for your reviews.

bq. could not find a doc about this, so I ask. Is there a reason to have a 
disable list of devices, instead of an enable list? 
Actually whitelist doesn't work for me, I tried to debug however I cannot find 
the root cause. Since blacklist and whitelist are just two different approaches 
for the same purpose. I think we could continue this approach, do you have any 
concern here? 

And regarding to file name style. I was just about to rename to underscore, 
however I found the container-executor itself is "-", I suggest to convert 
existing files to "-" connected, otherwise we have to rename a lot of files. I 
don't quite care about which style to go, I just want to avoid change too many 
file.

Attached ver.004 patch, (TODO): unit test related cleanups.

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-04 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-6852:
-
Attachment: YARN-6852.003.patch

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-04 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-6852:
-
Attachment: (was: YARN-6033.009.patch)

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-04 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-6852:
-
Attachment: YARN-6033.009.patch

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-07-25 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-6852:
-
Attachment: YARN-6852.002.patch

Removed docker-related logics from the patch. (ver. 002). This patch depends on 
YARN-6033, so will submit patch once YARN-6033 get committed.

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-07-20 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-6852:
-
Attachment: YARN-6852.001.patch

Attached YARN-6852.001 patch for review.

Thanks [~chris.douglas]/[~vinodkv]/[~sunilg]/[~subru]/[~sidharta-s]/[~vvasudev] 
for their inputs. 

*For the ver.001 patch, it added:*
1) "Module" concept to container-executor binary. Which we can easier manage 
code and decouple their functionalities. In the future, as described in 
YARN-5673, we can use dlopen, etc. to dynamically load modules for even better 
code isolation.

2) "common-module": This is some common logics which will be shared by 
different modules (located in impl/modules/common), for example, check if 
module is enabled or not.

3) "cgroups-module": Modify devices subcomponent in cgroup needs root 
permission (see \[1\]). So we have to move some funtionalities from Java to C 
side. The cgroups-module is added for this purpose. (located in 
impl/modules/cgroups). Please note that to avoid security issues, cgroups 
module doesn't have any user-invokable interface. So "yarn" user cannot use 
"container-executor" to do any cgroups related operations directly.
Configs of cgroups module see below. Please note that it doesn't include a knob 
to enable or disable it since it is not directly invoked by user (using CLI).

4) "gpu-module": Do gpu isolation. User can call the gpu operation by using 
following commandlines to block container accessing GPUs.
{code}
container-executor gpu --excluded_gpus=0,1 --container_id=container_x_y_z
{code}
We will do strict checks for container_id to make sure malicious user won't use 
this tool to block GPU devices of cgroups not owned by a YARN launched 
container.

*For configs, this patch is based on YARN-6033 (not committed yet). So configs 
of different modules are placed under different sections, like following:*
{code}
## Original container-executor configs ##
yarn.nodemanager.linux-container-executor.group=#configured value of 
yarn.nodemanager.linux-container-executor.group
banned.users=#comma separated list of users who can not run applications
min.user.id=1000#Prevent other super-users
allowed.system.users=##comma separated list of system users who CAN run 
applications
#

[cgroups]
# Root of system cgroups (Cannot be empty or "/")
root=/sys/fs/cgroups
# Parent folder of YARN's CGroups
yarn-hierarchy=yarn (Cannot be empty)

[gpu]
# Enable / disable the module
module.enabled=true/false 
# Major device number of GPU, by default is 195
gpu.major-device-number=195
# Allowed minor device numbers, empty means all GPU devices managed by YARN.
gpu.allowed-device-minor-numbers=0,1,2,3
{code}

\[1\]: 
https://lists.linuxfoundation.org/pipermail/containers/2012-November/030840.html
 

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-07-20 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-6852:
-
Description: 
This JIRA plan to add support of:
1) Isolation in CGroups. (native side).

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org