[ 
https://issues.apache.org/jira/browse/YARN-10248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173568#comment-17173568
 ] 

zhao yufei edited comment on YARN-10248 at 8/8/20, 3:48 AM:
------------------------------------------------------------

[~tangzhankun]  my thoughts for TestGpuResourceHandler is that :
all tests within TestGpuResourceHandler should succeed when test server has no 
gpus.

we can mock GpuDiscoverer: 
   
{code:java}
   GpuDiscoverer mockDiscoverer = mock(GpuDiscoverer.class);
    PerGpuDeviceInformation perGpuDeviceInformation1=new 
PerGpuDeviceInformation();
    perGpuDeviceInformation1.setMinorNumber(0);
    PerGpuDeviceInformation perGpuDeviceInformation2=new 
PerGpuDeviceInformation();
    perGpuDeviceInformation2.setMinorNumber(1);
    PerGpuDeviceInformation perGpuDeviceInformation3=new 
PerGpuDeviceInformation();
    perGpuDeviceInformation3.setMinorNumber(2);
    List<PerGpuDeviceInformation> 
gpus=Lists.newArrayList(perGpuDeviceInformation1,perGpuDeviceInformation2,perGpuDeviceInformation3);
    GpuDeviceInformation gpuDeviceInformation=new GpuDeviceInformation();
    gpuDeviceInformation.setGpus(gpus);
    
when(gpuDiscoverer.getGpuDeviceInformation()).thenReturn(gpuDeviceInformation);
{code}

by this way, even there is no gpus on test server  , we can still run related 
tests


    


was (Author: jasstionzyf):
[~tangzhankun]  my thoughts for TestGpuResourceHandler is that :
all tests within TestGpuResourceHandler should succeed when test server has no 
gpus.

we can mock GpuDiscoverer: 
   
{code:java}
 GpuDiscoverer mockDiscoverer = mock(GpuDiscoverer.class);
    PerGpuDeviceInformation perGpuDeviceInformation1=new 
PerGpuDeviceInformation();
    perGpuDeviceInformation1.setMinorNumber(0);
    PerGpuDeviceInformation perGpuDeviceInformation2=new 
PerGpuDeviceInformation();
    perGpuDeviceInformation2.setMinorNumber(1);
    PerGpuDeviceInformation perGpuDeviceInformation3=new 
PerGpuDeviceInformation();
    perGpuDeviceInformation3.setMinorNumber(2);

    List<PerGpuDeviceInformation> gpus= 
Lists.newArrayList(perGpuDeviceInformation1,perGpuDeviceInformation2,perGpuDeviceInformation3);
    GpuDeviceInformation gpuDeviceInformation=new GpuDeviceInformation();
    gpuDeviceInformation.setGpus(gpus);
{code}



    
when(gpuDiscoverer.getGpuDeviceInformation()).thenReturn(gpuDeviceInformation);

> when config allowed-gpu-devices , excluded GPUs still be visible to containers
> ------------------------------------------------------------------------------
>
>                 Key: YARN-10248
>                 URL: https://issues.apache.org/jira/browse/YARN-10248
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.2.1
>            Reporter: zhao yufei
>            Assignee: zhao yufei
>            Priority: Minor
>              Labels: pull-request-available
>         Attachments: YARN-10248-branch-3.2.001.path, 
> YARN-10248-branch-3.2.001.path
>
>
> I have a server with two GPU, and i want to use only one of them within yarn 
> cluster.
> according to hadoop document, i set configs:
> {code:java}
> <property>
>     <name>yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices</name>
>     <value>0:1</value>
>   </property>
>     <property>
>     
> <name>yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables</name>
>     <value>/etc/alternatives/x86_64-linux-gnu_nvidia_smi</value>
>   </property>
> {code}
> then i running following command to test:
> {code:java}
> yarn jar 
> ./share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.2.1.jar \
>          -jar 
> ./share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.2.1.jar  
> -shell_command ' nvidia-smi & sleep 3  ' \
>          -container_resources memory-mb=3072,vcores=1,yarn.io/gpu=1  \
>          -num_containers 1 -queue yufei -node_label_expression slaves
> {code}
> iI expected gpu with minor number 0 will not visible to container, but in the 
> launched container, nvidia-smi  print two gpu information.
> I check the related source code and find it is a bug.
> the problem is:
> when you specify allowed-gpu-devices, GpuDiscoverer will populate usable gpus 
> from it,  
> then when assign to a container some of the gpus, it will set denied gpus for 
> the container,
> but it never consider excluded gpu of the host. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to