[
https://issues.apache.org/jira/browse/YARN-9099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756845#comment-16756845
]
Hudson commented on YARN-9099:
------------------------------
SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15859 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/15859/])
YARN-9099. GpuResourceAllocator#getReleasingGpus calculates number of (sunilg:
rev 71c49fa60faad2504b0411979a6e46e595b97a85)
* (edit)
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceAllocator.java
> GpuResourceAllocator#getReleasingGpus calculates number of GPUs in a wrong way
> ------------------------------------------------------------------------------
>
> Key: YARN-9099
> URL: https://issues.apache.org/jira/browse/YARN-9099
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Szilard Nemeth
> Assignee: Szilard Nemeth
> Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9099.001.patch, YARN-9099.002.patch
>
>
> getReleasingGpus plays an important role in the calculation which happens
> when GpuAllocator assign GPUs to a container, see:
> GpuResourceAllocator#internalAssignGpus.
> If multiple GPUs are assigned to the same container, getReleasingGpus will
> return an invalid number.
> The iterator goes over on mappings of (GPU device, container ID) and it
> retrieves the container by its ID the number of times the container ID is
> mapped to any device.
> Then for every container, the resource value for the GPU resource is added to
> a running sum.
> Obviously, if a container is mapped to 2 or more devices, then the
> container's GPU resource counter is added to the running sum as many times as
> the number of GPU devices the container has.
> Example:
> Let's suppose {{usedDevices}} contains these mappings:
> - (GPU1, container1)
> - (GPU2, container1)
> - (GPU3, container2)
> GPU resource value is 2 for container1 and
> GPU resource value is 1 for container2.
> Then, if container1 is in a running state, getReleasingGpus will return 4
> instead of 2.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]