[ https://issues.apache.org/jira/browse/YARN-10248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zhao yufei updated YARN-10248: ------------------------------ Description: I have a server with two GPU, and i want to use only one of them within yarn cluster. according to hadoop document, i set configs: {code:java} <property> <name>yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices</name> <value>0:1</value> </property> <property> <name>yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables</name> <value>/etc/alternatives/x86_64-linux-gnu_nvidia_smi</value> </property> {code} then i running following command to test: {code:java} yarn jar ./share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.2.1.jar \ -jar ./share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.2.1.jar -shell_command ' nvidia-smi & sleep 3 ' \ -container_resources memory-mb=3072,vcores=1,yarn.io/gpu=1 \ -num_containers 1 -queue yufei -node_label_expression slaves {code} iI expected gpu with minor number 0 will not visible to container, but in the launched container, nvidia-smi print two gpu information. I check the related source code and find it is a bug. the problem is: when you specify allowed-gpu-devices, GpuDiscoverer will populate usable gpus from it, then when assign to a container some of the gpus, it will set denied gpus for the container, but it never consider excluded gpu of the host. was: I have a server with two GPU, and i want to use only one of them within yarn cluster. according to hadoop document, i set configs: {code:java} <property> <name>yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices</name> <value>0:1</value> </property> <property> <name>yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables</name> <value>/etc/alternatives/x86_64-linux-gnu_nvidia_smi</value> </property> {code} then i running following command to test: {code:java} yarn jar ./share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.2.1.jar \ -jar ./share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.2.1.jar -shell_command ' nvidia-smi & sleep 3 ' \ -container_resources memory-mb=3072,vcores=1,yarn.io/gpu=1 \ -num_containers 1 -queue yufei -node_label_expression slaves {code} iI expected gpu with minor number 0 will not visible to container, but in the launched container, nvidia-smi print two gpu information. I check the related source code and find it is a bug. the problem is: when you specify allowed-gpu-devices, GpuDiscoverer will populate usable gpus from it, then when assign to a container some of the gpus, it will set denied gpus for the container, but it never consider excluded gpu of the host. > when config allowed-gpu-devices , excluded GPUs still be visible to containers > ------------------------------------------------------------------------------ > > Key: YARN-10248 > URL: https://issues.apache.org/jira/browse/YARN-10248 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 3.2.1 > Reporter: zhao yufei > Priority: Minor > Labels: pull-request-available > > I have a server with two GPU, and i want to use only one of them within yarn > cluster. > according to hadoop document, i set configs: > {code:java} > <property> > <name>yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices</name> > <value>0:1</value> > </property> > <property> > > <name>yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables</name> > <value>/etc/alternatives/x86_64-linux-gnu_nvidia_smi</value> > </property> > {code} > then i running following command to test: > {code:java} > yarn jar > ./share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.2.1.jar \ > -jar > ./share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.2.1.jar > -shell_command ' nvidia-smi & sleep 3 ' \ > -container_resources memory-mb=3072,vcores=1,yarn.io/gpu=1 \ > -num_containers 1 -queue yufei -node_label_expression slaves > {code} > iI expected gpu with minor number 0 will not visible to container, but in the > launched container, nvidia-smi print two gpu information. > I check the related source code and find it is a bug. > the problem is: > when you specify allowed-gpu-devices, GpuDiscoverer will populate usable gpus > from it, > then when assign to a container some of the gpus, it will set denied gpus for > the container, > but it never consider excluded gpu of the host. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org