[
https://issues.apache.org/jira/browse/YARN-9099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sunil Govindan updated YARN-9099:
---------------------------------
Summary: GpuResourceAllocator#getReleasingGpus calculates number of GPUs in
a wrong way (was: GpuResourceAllocator.getReleasingGpus calculates number of
GPUs in a wrong way)
> GpuResourceAllocator#getReleasingGpus calculates number of GPUs in a wrong way
> ------------------------------------------------------------------------------
>
> Key: YARN-9099
> URL: https://issues.apache.org/jira/browse/YARN-9099
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Szilard Nemeth
> Assignee: Szilard Nemeth
> Priority: Major
> Attachments: YARN-9099.001.patch, YARN-9099.002.patch
>
>
> getReleasingGpus plays an important role in the calculation which happens
> when GpuAllocator assign GPUs to a container, see:
> GpuResourceAllocator#internalAssignGpus.
> If multiple GPUs are assigned to the same container, getReleasingGpus will
> return an invalid number.
> The iterator goes over on mappings of (GPU device, container ID) and it
> retrieves the container by its ID the number of times the container ID is
> mapped to any device.
> Then for every container, the resource value for the GPU resource is added to
> a running sum.
> Obviously, if a container is mapped to 2 or more devices, then the
> container's GPU resource counter is added to the running sum as many times as
> the number of GPU devices the container has.
> Example:
> Let's suppose {{usedDevices}} contains these mappings:
> - (GPU1, container1)
> - (GPU2, container1)
> - (GPU3, container2)
> GPU resource value is 2 for container1 and
> GPU resource value is 1 for container2.
> Then, if container1 is in a running state, getReleasingGpus will return 4
> instead of 2.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]