[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-6852: - Attachment: YARN-6852.009.patch > [YARN-6223] Native code changes to support isolate GPU devices by using > CGroups > --- > > Key: YARN-6852 > URL: https://issues.apache.org/jira/browse/YARN-6852 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6852.001.patch, YARN-6852.002.patch, > YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, > YARN-6852.006.patch, YARN-6852.007.patch, YARN-6852.008.patch, > YARN-6852.009.patch > > > This JIRA plan to add support of: > 1) Isolation in CGroups. (native side). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-6852: - Attachment: YARN-6852.008.patch Attached ver.008 patch, fixed rest cc warnings. > [YARN-6223] Native code changes to support isolate GPU devices by using > CGroups > --- > > Key: YARN-6852 > URL: https://issues.apache.org/jira/browse/YARN-6852 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6852.001.patch, YARN-6852.002.patch, > YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, > YARN-6852.006.patch, YARN-6852.007.patch, YARN-6852.008.patch > > > This JIRA plan to add support of: > 1) Isolation in CGroups. (native side). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-6852: - Attachment: YARN-6852.007.patch Attached 007 patch, fixed more cc warnings, will try to fix all cc warnings in the next patch. > [YARN-6223] Native code changes to support isolate GPU devices by using > CGroups > --- > > Key: YARN-6852 > URL: https://issues.apache.org/jira/browse/YARN-6852 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6852.001.patch, YARN-6852.002.patch, > YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, > YARN-6852.006.patch, YARN-6852.007.patch > > > This JIRA plan to add support of: > 1) Isolation in CGroups. (native side). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-6852: - Attachment: YARN-6852.006.patch Fixed compile issues and added additional tests (006) > [YARN-6223] Native code changes to support isolate GPU devices by using > CGroups > --- > > Key: YARN-6852 > URL: https://issues.apache.org/jira/browse/YARN-6852 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6852.001.patch, YARN-6852.002.patch, > YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, > YARN-6852.006.patch > > > This JIRA plan to add support of: > 1) Isolation in CGroups. (native side). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-6852: - Attachment: YARN-6852.005.patch Thanks for comments from [~sunil.gov...@gmail.com], attached 005 patch, addressed all comments except #4, since it is a little bit out of the scope. I prefer to do it once we have more requirements of cgroup. > [YARN-6223] Native code changes to support isolate GPU devices by using > CGroups > --- > > Key: YARN-6852 > URL: https://issues.apache.org/jira/browse/YARN-6852 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6852.001.patch, YARN-6852.002.patch, > YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch > > > This JIRA plan to add support of: > 1) Isolation in CGroups. (native side). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-6852: - Attachment: YARN-6852.004.patch Again, thanks [~miklos.szeg...@cloudera.com] for your reviews. bq. could not find a doc about this, so I ask. Is there a reason to have a disable list of devices, instead of an enable list? Actually whitelist doesn't work for me, I tried to debug however I cannot find the root cause. Since blacklist and whitelist are just two different approaches for the same purpose. I think we could continue this approach, do you have any concern here? And regarding to file name style. I was just about to rename to underscore, however I found the container-executor itself is "-", I suggest to convert existing files to "-" connected, otherwise we have to rename a lot of files. I don't quite care about which style to go, I just want to avoid change too many file. Attached ver.004 patch, (TODO): unit test related cleanups. > [YARN-6223] Native code changes to support isolate GPU devices by using > CGroups > --- > > Key: YARN-6852 > URL: https://issues.apache.org/jira/browse/YARN-6852 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6852.001.patch, YARN-6852.002.patch, > YARN-6852.003.patch, YARN-6852.004.patch > > > This JIRA plan to add support of: > 1) Isolation in CGroups. (native side). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-6852: - Attachment: YARN-6852.003.patch > [YARN-6223] Native code changes to support isolate GPU devices by using > CGroups > --- > > Key: YARN-6852 > URL: https://issues.apache.org/jira/browse/YARN-6852 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6852.001.patch, YARN-6852.002.patch, > YARN-6852.003.patch > > > This JIRA plan to add support of: > 1) Isolation in CGroups. (native side). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-6852: - Attachment: (was: YARN-6033.009.patch) > [YARN-6223] Native code changes to support isolate GPU devices by using > CGroups > --- > > Key: YARN-6852 > URL: https://issues.apache.org/jira/browse/YARN-6852 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6852.001.patch, YARN-6852.002.patch, > YARN-6852.003.patch > > > This JIRA plan to add support of: > 1) Isolation in CGroups. (native side). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-6852: - Attachment: YARN-6033.009.patch > [YARN-6223] Native code changes to support isolate GPU devices by using > CGroups > --- > > Key: YARN-6852 > URL: https://issues.apache.org/jira/browse/YARN-6852 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6852.001.patch, YARN-6852.002.patch, > YARN-6852.003.patch > > > This JIRA plan to add support of: > 1) Isolation in CGroups. (native side). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-6852: - Attachment: YARN-6852.002.patch Removed docker-related logics from the patch. (ver. 002). This patch depends on YARN-6033, so will submit patch once YARN-6033 get committed. > [YARN-6223] Native code changes to support isolate GPU devices by using > CGroups > --- > > Key: YARN-6852 > URL: https://issues.apache.org/jira/browse/YARN-6852 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6852.001.patch, YARN-6852.002.patch > > > This JIRA plan to add support of: > 1) Isolation in CGroups. (native side). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-6852: - Attachment: YARN-6852.001.patch Attached YARN-6852.001 patch for review. Thanks [~chris.douglas]/[~vinodkv]/[~sunilg]/[~subru]/[~sidharta-s]/[~vvasudev] for their inputs. *For the ver.001 patch, it added:* 1) "Module" concept to container-executor binary. Which we can easier manage code and decouple their functionalities. In the future, as described in YARN-5673, we can use dlopen, etc. to dynamically load modules for even better code isolation. 2) "common-module": This is some common logics which will be shared by different modules (located in impl/modules/common), for example, check if module is enabled or not. 3) "cgroups-module": Modify devices subcomponent in cgroup needs root permission (see \[1\]). So we have to move some funtionalities from Java to C side. The cgroups-module is added for this purpose. (located in impl/modules/cgroups). Please note that to avoid security issues, cgroups module doesn't have any user-invokable interface. So "yarn" user cannot use "container-executor" to do any cgroups related operations directly. Configs of cgroups module see below. Please note that it doesn't include a knob to enable or disable it since it is not directly invoked by user (using CLI). 4) "gpu-module": Do gpu isolation. User can call the gpu operation by using following commandlines to block container accessing GPUs. {code} container-executor gpu --excluded_gpus=0,1 --container_id=container_x_y_z {code} We will do strict checks for container_id to make sure malicious user won't use this tool to block GPU devices of cgroups not owned by a YARN launched container. *For configs, this patch is based on YARN-6033 (not committed yet). So configs of different modules are placed under different sections, like following:* {code} ## Original container-executor configs ## yarn.nodemanager.linux-container-executor.group=#configured value of yarn.nodemanager.linux-container-executor.group banned.users=#comma separated list of users who can not run applications min.user.id=1000#Prevent other super-users allowed.system.users=##comma separated list of system users who CAN run applications # [cgroups] # Root of system cgroups (Cannot be empty or "/") root=/sys/fs/cgroups # Parent folder of YARN's CGroups yarn-hierarchy=yarn (Cannot be empty) [gpu] # Enable / disable the module module.enabled=true/false # Major device number of GPU, by default is 195 gpu.major-device-number=195 # Allowed minor device numbers, empty means all GPU devices managed by YARN. gpu.allowed-device-minor-numbers=0,1,2,3 {code} \[1\]: https://lists.linuxfoundation.org/pipermail/containers/2012-November/030840.html > [YARN-6223] Native code changes to support isolate GPU devices by using > CGroups > --- > > Key: YARN-6852 > URL: https://issues.apache.org/jira/browse/YARN-6852 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6852.001.patch > > > This JIRA plan to add support of: > 1) Isolation in CGroups. (native side). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-6852: - Description: This JIRA plan to add support of: 1) Isolation in CGroups. (native side). > [YARN-6223] Native code changes to support isolate GPU devices by using > CGroups > --- > > Key: YARN-6852 > URL: https://issues.apache.org/jira/browse/YARN-6852 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > > This JIRA plan to add support of: > 1) Isolation in CGroups. (native side). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org