Szilard Nemeth created YARN-9099:
------------------------------------

             Summary: GpuResourceAllocator.getReleasingGpus calculates number 
of GPUs in a wrong way
                 Key: YARN-9099
                 URL: https://issues.apache.org/jira/browse/YARN-9099
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Szilard Nemeth
            Assignee: Szilard Nemeth


getReleasingGpus plays an important role in the calculation which happens when 
GpuAllocator assign GPUs to a container, see: 
GpuResourceAllocator#internalAssignGpus.

If multiple GPUs are assigned to the same container, getReleasingGpus will 
return an invalid number.
The iterator goes over on mappings of (GPU device, container ID) and it 
retrieves the container by its ID the number of times the container ID is 
mapped to any device.
Then for every container, the resource value for the GPU resource is added to a 
running sum.
Obviously, if a container is mapped to 2 or more devices, then the container's 
GPU resource counter is added to the running sum as many times as the number of 
GPU devices the container has.

Example: 
Let's suppose {{usedDevices}} contains these mappings: 
- (GPU1, container1)
- (GPU2, container1)
- (GPU3, container2)

GPU resource value is 2 for container1 and 
GPU resource value is 1 for container2.
Then, if container1 is in a running state, getReleasingGpus will return 4 
instead of 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Reply via email to