[ https://issues.apache.org/jira/browse/YARN-4122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258594#comment-15258594 ]
Daniel Templeton commented on YARN-4122: ---------------------------------------- Seems like a generally reasonable approach. From the SLURM lists, it looks like prior to CUDA 7, the environment variable was not working correctly: https://devtalk.nvidia.com/default/topic/512869/cuda-accessing-all-devices-even-those-which-are-blacklisted/?offset=2 This design will probably also have to adjust for the work being done in YARN-4726. In the doc you say that YARN is currently providing you GPU isolation. How are you making that work? > Add support for GPU as a resource > --------------------------------- > > Key: YARN-4122 > URL: https://issues.apache.org/jira/browse/YARN-4122 > Project: Hadoop YARN > Issue Type: New Feature > Reporter: Jun Gong > Assignee: Jun Gong > Attachments: GPUAsAResourceDesign.pdf > > > Use [cgroups > devcies|https://www.kernel.org/doc/Documentation/cgroups/devices.txt] to > isolate GPUs for containers. For docker containers, we could use 'docker run > --device=...'. > Reference: [SLURM Resources isolation through > cgroups|http://slurm.schedmd.com/slurm_ug_2011/SLURM_UserGroup2011_cgroups.pdf]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)