[jira] [Updated] (YARN-10621) GPU management using OpenCL instead of vendor-specific solutions

Sotiris Niarchos (Jira) Tue, 09 Feb 2021 14:59:05 -0800


     [ 
https://issues.apache.org/jira/browse/YARN-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sotiris Niarchos updated YARN-10621:
------------------------------------
    Description: 
As part of the [E2Data research project|https://e2data.eu/], we at the 
[Institute of Communication and Computer Systems 
(ICCS)|https://www.iccs.gr/en/?noredirect=en_US] of the National Technical 
University of Athens, Greece, have been working on a modified version of Hadoop 
Yarn where the GPU devices that are available in the underlying cluster are 
discovered via a Java wrapper of the OpenCL framework API (namely 
[JOCL|https://github.com/gpu/JOCL]), instead of vendor-specific binaries.

In other words, we have shifted towards *a more uniform and high-level handling 
of GPUs as "OpenCL-enabled" devices*. This way, we manage to *decouple GPU 
discovery/management from vendor-specific technicalities*; every GPU, no matter 
the vendor, is the same for E2Data YARN (more specifically, for the 
{{NodeManager}} component), provided that the OpenCL runtime and drivers for 
the GPU(s) of interest are installed on the respective node(s) of the cluster.

This way, we *managed to use GPUs other than NVIDIA* (which are the only ones 
officially supported via the {{nvidia-smi}} binary) with minimal additional 
effort, after our initial changes.

Ultimately, our goal is to *unify every processing unit* that YARN can possible 
utilize (CPU cores, GPUs, FPGAs) *behind a common, simple, high-level 
interface; that of the OpenCL-enabled device*.

The only drawback of our approach is that vendor-specific info regarding the 
GPUs is lost (e.g. temperature). We believe, however, that the lost information 
is not necessary for YARN; everything that Hadoop needs in order to discover 
and handle GPU devices is provided by OpenCL.

This is just a proposition/a prompt for discussion for the time being. This 
modified version is a work in progress. We consider community feedback 
regarding the core concept (and the fact that it may constitute a paradigm 
shift for YARN) crucial before attaching any patch file and diving into more 
(technical) details.

  was:
As part of the [E2Data research project|https://e2data.eu/], we at the 
[Institute of Communication and Computer Systems 
(ICCS)|https://www.iccs.gr/en/?noredirect=en_US] of the National Technical 
University of Athens, Greece, have been working on a modified version of Hadoop 
Yarn where the GPU devices that are available in the underlying cluster are 
discovered via a Java wrapper of the OpenCL framework API (namely 
[JOCL|https://github.com/gpu/JOCL]), instead of vendor-specific binaries.

In other words, we have shifted towards *a more uniform and high-level handling 
of GPUs as "OpenCL-enabled" devices*. This way, we manage to *decouple GPU 
discovery/management from vendor-specific technicalities*; every GPU, no matter 
the vendor, is the same for E2Data YARN (more specifically, by the 
{{NodeManager}} component), provided that the OpenCL runtime and drivers for 
the GPU(s) of interest are installed on the respective node(s) of the cluster.

This way, we *managed to use GPUs other than NVIDIA* (which are the only ones 
officially supported via the {{nvidia-smi}} binary) with minimal additional 
effort, after our initial changes.

Ultimately, our goal is to *unify every processing unit* that YARN can possible 
utilize (CPU cores, GPUs, FPGAs) *behind a common, simple, high-level 
interface; that of the OpenCL-enabled device*.

The only drawback of our approach is that vendor-specific info regarding the 
GPUs is lost (e.g. temperature). We believe, however, that the lost information 
is not necessary for YARN; everything that Hadoop needs in order to discover 
and handle GPU devices is provided by OpenCL.

This is just a proposition/a prompt for discussion for the time being. This 
modified version is a work in progress. We consider community feedback 
regarding the core concept (and the fact that it may constitute a paradigm 
shift for YARN) crucial before attaching any patch file and diving into more 
(technical) details.


> GPU management using OpenCL instead of vendor-specific solutions
> ----------------------------------------------------------------
>
>                 Key: YARN-10621
>                 URL: https://issues.apache.org/jira/browse/YARN-10621
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager, yarn
>            Reporter: Sotiris Niarchos
>            Priority: Minor
>
> As part of the [E2Data research project|https://e2data.eu/], we at the 
> [Institute of Communication and Computer Systems 
> (ICCS)|https://www.iccs.gr/en/?noredirect=en_US] of the National Technical 
> University of Athens, Greece, have been working on a modified version of 
> Hadoop Yarn where the GPU devices that are available in the underlying 
> cluster are discovered via a Java wrapper of the OpenCL framework API (namely 
> [JOCL|https://github.com/gpu/JOCL]), instead of vendor-specific binaries.
> In other words, we have shifted towards *a more uniform and high-level 
> handling of GPUs as "OpenCL-enabled" devices*. This way, we manage to 
> *decouple GPU discovery/management from vendor-specific technicalities*; 
> every GPU, no matter the vendor, is the same for E2Data YARN (more 
> specifically, for the {{NodeManager}} component), provided that the OpenCL 
> runtime and drivers for the GPU(s) of interest are installed on the 
> respective node(s) of the cluster.
> This way, we *managed to use GPUs other than NVIDIA* (which are the only ones 
> officially supported via the {{nvidia-smi}} binary) with minimal additional 
> effort, after our initial changes.
> Ultimately, our goal is to *unify every processing unit* that YARN can 
> possible utilize (CPU cores, GPUs, FPGAs) *behind a common, simple, 
> high-level interface; that of the OpenCL-enabled device*.
> The only drawback of our approach is that vendor-specific info regarding the 
> GPUs is lost (e.g. temperature). We believe, however, that the lost 
> information is not necessary for YARN; everything that Hadoop needs in order 
> to discover and handle GPU devices is provided by OpenCL.
> This is just a proposition/a prompt for discussion for the time being. This 
> modified version is a work in progress. We consider community feedback 
> regarding the core concept (and the fact that it may constitute a paradigm 
> shift for YARN) crucial before attaching any patch file and diving into more 
> (technical) details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (YARN-10621) GPU management using OpenCL instead of vendor-specific solutions

Reply via email to