[
https://issues.apache.org/jira/browse/YARN-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhankun Tang updated YARN-8821:
-------------------------------
Description:
GPU topology affects performance dramatically. There's been a discussion in
YARN-7481. But we'd like to move related discussions here.
Please note that YARN-8851 will provide a pluggable device framework which can
support plugin custom scheduler. And based on the framework, GPU plugin could
have own topology scheduler. The proposed patch has a topology algorithm
implemented as below:
# When plugin inits, parse the output of "nvidia-smi topo -m" to build a hash
map whose key is all pairs of GPUs and the value is the communication cost
between the two. The map is like \{"0 - 1"=> 2, "0 - 2"=>4, ...} which means
the minimum cost of GPU 0 to 1 is 2. The cost is set based on the connection
type. Haven't considered CPU affinity or NUMA node yet.
# And then it constructs a cost table which caches all combinations of GPUs
and corresponding cost between them. The cost table is a map whose structure is
like \{2=>{[0,1]=>2,..}, 3=>\{[0,1,2]=>10,..}, 4=>\{[0,1,2,3]=>18}}. The key of
the map is the count of GPUs, the value of it is a map whose key is the
combination of GPUs and the value is the calculated communication cost of the
numbers of GPUs. The cost calculation algorithm is to sum all non-duplicate
pairs of GPU's cost. For instance, the total cost of [0,1,2] GPUs are the sum
of cost "0 - 1", "0 - 2" and "1 - 2". And each cost can get from the map built
in step 1.
# After the cache table is built, when allocating GPUs based on topology. We
provide two policy which container can set through an environment variable "".
was:
GPU topology affects performance dramatically. There's been a discussion in
YARN-7481. But we'd like to move related discussions here.
Please note that YARN-8851 will provide a pluggable device framework which can
support plugin custom scheduler. And based on the framework, GPU plugin could
have own topology scheduler.
> GPU hierarchy/topology scheduling support
> -----------------------------------------
>
> Key: YARN-8821
> URL: https://issues.apache.org/jira/browse/YARN-8821
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Zhankun Tang
> Assignee: Zhankun Tang
> Priority: Major
> Attachments: YARN-8821-trunk.001.patch
>
>
> GPU topology affects performance dramatically. There's been a discussion in
> YARN-7481. But we'd like to move related discussions here.
> Please note that YARN-8851 will provide a pluggable device framework which
> can support plugin custom scheduler. And based on the framework, GPU plugin
> could have own topology scheduler. The proposed patch has a topology
> algorithm implemented as below:
> # When plugin inits, parse the output of "nvidia-smi topo -m" to build a
> hash map whose key is all pairs of GPUs and the value is the communication
> cost between the two. The map is like \{"0 - 1"=> 2, "0 - 2"=>4, ...} which
> means the minimum cost of GPU 0 to 1 is 2. The cost is set based on the
> connection type. Haven't considered CPU affinity or NUMA node yet.
> # And then it constructs a cost table which caches all combinations of GPUs
> and corresponding cost between them. The cost table is a map whose structure
> is like \{2=>{[0,1]=>2,..}, 3=>\{[0,1,2]=>10,..}, 4=>\{[0,1,2,3]=>18}}. The
> key of the map is the count of GPUs, the value of it is a map whose key is
> the combination of GPUs and the value is the calculated communication cost of
> the numbers of GPUs. The cost calculation algorithm is to sum all
> non-duplicate pairs of GPU's cost. For instance, the total cost of [0,1,2]
> GPUs are the sum of cost "0 - 1", "0 - 2" and "1 - 2". And each cost can get
> from the map built in step 1.
> # After the cache table is built, when allocating GPUs based on topology. We
> provide two policy which container can set through an environment variable "".
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]