[
https://issues.apache.org/jira/browse/YARN-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sotiris Niarchos updated YARN-10621:
------------------------------------
Description:
As part of the [E2Data research project|https://e2data.eu/], we at the
[Institute of Communication and Computer Systems
(ICCS)|https://www.iccs.gr/en/?noredirect=en_US] of the National Technical
University of Athens, Greece, have been working on a modified version of Hadoop
Yarn where the GPU devices that are available in the underlying cluster are
discovered via a Java wrapper of the OpenCL framework API (namely
[JOCL|https://github.com/gpu/JOCL]), instead of vendor-specific binaries.
In other words, we have shifted towards *a more uniform and high-level handling
of GPUs as "OpenCL-enabled" devices*. This way, we manage to *decouple GPU
discovery/management from vendor-specific technicalities*; every GPU, no matter
the vendor, is the same for E2Data YARN (more specifically, for the
{{NodeManager}} component), provided that the OpenCL runtime and drivers for
the GPU(s) of interest are installed on the respective node(s) of the cluster.
This way, we *managed to use GPUs other than NVIDIA* (which are the only ones
officially supported via the {{nvidia-smi}} binary) with minimal additional
effort, after our initial changes.
Ultimately, our goal is to *unify every processing unit* that YARN can possible
utilize (CPU cores, GPUs, FPGAs) *behind a common, simple, high-level
interface; that of the OpenCL-enabled device*.
The only drawback of our approach is that vendor-specific info regarding the
GPUs is lost (e.g. temperature). We believe, however, that the lost information
is not necessary for YARN; everything that Hadoop needs in order to discover
and handle GPU devices is provided by OpenCL.
This is just a proposition/a prompt for discussion for the time being. This
modified version is a work in progress. We consider community feedback
regarding the core concept (and the fact that it may constitute a paradigm
shift for YARN) crucial before attaching any patch file and diving into more
(technical) details.
was:
As part of the [E2Data research project|https://e2data.eu/], we at the
[Institute of Communication and Computer Systems
(ICCS)|https://www.iccs.gr/en/?noredirect=en_US] of the National Technical
University of Athens, Greece, have been working on a modified version of Hadoop
Yarn where the GPU devices that are available in the underlying cluster are
discovered via a Java wrapper of the OpenCL framework API (namely
[JOCL|https://github.com/gpu/JOCL]), instead of vendor-specific binaries.
In other words, we have shifted towards *a more uniform and high-level handling
of GPUs as "OpenCL-enabled" devices*. This way, we manage to *decouple GPU
discovery/management from vendor-specific technicalities*; every GPU, no matter
the vendor, is the same for E2Data YARN (more specifically, by the
{{NodeManager}} component), provided that the OpenCL runtime and drivers for
the GPU(s) of interest are installed on the respective node(s) of the cluster.
This way, we *managed to use GPUs other than NVIDIA* (which are the only ones
officially supported via the {{nvidia-smi}} binary) with minimal additional
effort, after our initial changes.
Ultimately, our goal is to *unify every processing unit* that YARN can possible
utilize (CPU cores, GPUs, FPGAs) *behind a common, simple, high-level
interface; that of the OpenCL-enabled device*.
The only drawback of our approach is that vendor-specific info regarding the
GPUs is lost (e.g. temperature). We believe, however, that the lost information
is not necessary for YARN; everything that Hadoop needs in order to discover
and handle GPU devices is provided by OpenCL.
This is just a proposition/a prompt for discussion for the time being. This
modified version is a work in progress. We consider community feedback
regarding the core concept (and the fact that it may constitute a paradigm
shift for YARN) crucial before attaching any patch file and diving into more
(technical) details.
> GPU management using OpenCL instead of vendor-specific solutions
> ----------------------------------------------------------------
>
> Key: YARN-10621
> URL: https://issues.apache.org/jira/browse/YARN-10621
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: nodemanager, yarn
> Reporter: Sotiris Niarchos
> Priority: Minor
>
> As part of the [E2Data research project|https://e2data.eu/], we at the
> [Institute of Communication and Computer Systems
> (ICCS)|https://www.iccs.gr/en/?noredirect=en_US] of the National Technical
> University of Athens, Greece, have been working on a modified version of
> Hadoop Yarn where the GPU devices that are available in the underlying
> cluster are discovered via a Java wrapper of the OpenCL framework API (namely
> [JOCL|https://github.com/gpu/JOCL]), instead of vendor-specific binaries.
> In other words, we have shifted towards *a more uniform and high-level
> handling of GPUs as "OpenCL-enabled" devices*. This way, we manage to
> *decouple GPU discovery/management from vendor-specific technicalities*;
> every GPU, no matter the vendor, is the same for E2Data YARN (more
> specifically, for the {{NodeManager}} component), provided that the OpenCL
> runtime and drivers for the GPU(s) of interest are installed on the
> respective node(s) of the cluster.
> This way, we *managed to use GPUs other than NVIDIA* (which are the only ones
> officially supported via the {{nvidia-smi}} binary) with minimal additional
> effort, after our initial changes.
> Ultimately, our goal is to *unify every processing unit* that YARN can
> possible utilize (CPU cores, GPUs, FPGAs) *behind a common, simple,
> high-level interface; that of the OpenCL-enabled device*.
> The only drawback of our approach is that vendor-specific info regarding the
> GPUs is lost (e.g. temperature). We believe, however, that the lost
> information is not necessary for YARN; everything that Hadoop needs in order
> to discover and handle GPU devices is provided by OpenCL.
> This is just a proposition/a prompt for discussion for the time being. This
> modified version is a work in progress. We consider community feedback
> regarding the core concept (and the fact that it may constitute a paradigm
> shift for YARN) crucial before attaching any patch file and diving into more
> (technical) details.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]